- Get a list of files you want to upload (see
get-files-to-upload/) - Download the files in the list (see
curl-all.py) - Unzip downloaded files (if needed)
cd downloads
gunzip *.gz- Use
data_set.pyto create individual data sets (seepython data_set.py --help). You will need a Quilt username and password. Or usebatch.pyto create multiple data sets.
python data_set.py
-e https://quiltdata.com
-u USERNAME
-n "ENCODE data"
-d "#A549 #histone peak data #hg19"
-f downloads/wgEncodeBroadHistoneNhaH3k36me3StdPk.broadPeak| Action | Endpoint | Details |
|---|---|---|
| New table | POST /tables/ |
See below |
| Delete table | DELETE /tables/TABLE_ID/ |
See below |
| Update table meta-data | PATCH /tables/TABLE_ID |
See below |
| Add column to table | POST /tables/TABLE_ID/columns/ |
See below |
| Append row to table | POST /data/TABLE_ID/rows/ |
See below |
| Get table rows | GET /data/TABLE_ID/rows |
See below |
| Get table row | GET /data/TABLE_ID/rows/ROW_ID |
See below |
| Genome intersect or subtract | POST /genemath/ |
See below |
Notes
- For all REST calls, the content-type is
application/JSON. - Description fields automatically linkify URLs and support
<a>, <i>, <em>, <strong>, <b>tags
POST /tables/
{
'name': string,
'description': text `<a>, <i>, <em>, <strong>, <b>` tags supported; automatic linkification of URLs
'columns': [
{
'name': string,
'sqlname': optional string,
'description': optional text,
'type' : one of 'String', 'Number', 'Image', 'Text'
},
...
]
}Table data as JSON object, includes id field with the table's identifier.
POST /tables/TABLE_ID/columns/
{
'name': string,
'sqlname': optional string,
'description': text,
'type': one of 'String', 'Number', 'Image', or 'Text'
}Column data as JSON object, includes id field with the column's identifier.
DELETE /tables/TABLE_ID
PATCH /tables/TABLE_ID
{
'name': string,
'description': text
}- Use column
sqlnameas keys in input data
POST /data/TABLE_ID/rows/
[
{columnSqlname0: value0, columnSqlname1 : value1, ... },
...
]GET /data/TABLE_ID/rows
- Rows are keyed by the Quilt Row ID field
qrid - NOTE: Currently limited to the first 500 rows
Row data as JSON object, keyed by column.sqlname.
GET /data/TABLE_ID/rows/ROW_ID
Row data as JSON object, keyed by column.sqlname.
POST /quilts/
{
'left_table_id': int,
'right_table_id': int,
'left_column_id': int,
'right_column_id': int,
'jointype': one of 'inner', 'leftOuter', 'firstMatch'
}Quilt info as JSON object, includes sqlname field with the quilt's identifier.
DELETE /quilts/QUILT_SQLNAME
- Performs a gene math operation on two tables
- Creates a new table with the result.
- Columns are specified by
column.id.
POST /genemath/
{
'operator': one of 'Intersect' or 'Subtract',
'left_chr': integer (column id),
'left_start': integer (column id),
'left_end': integer (column id),
'right_chr': integer (column id),
'right_start': integer (column id),
'right_end': integer (column id)
}JSON object representing the result table.
The Quilt Python connector uses the Quilt REST API and SQL Alchemy (http://docs.sqlalchemy.org/), if installed, to access and update data sets in Quilt. Quilt tables are available as dictionaries or Pandas (http://pandas.pydata.org/) DataFrames.
To use the Quilt Python connector, add this repository to your PYTHONPATH and import quilt.
Connect to Quilt by creating a Connection object:
import quilt
connection = quilt.Connection(username)
Password: *enter your password*The connection will contain a list of your Quilt tables:
connection.tablesYou can also find tables by searching your own tables and Quilt's public data sets
connection.search('term')Get a table by Table id using get_table:
t = connection.get_table(1234)Using the connection, you can create new tables in Quilt. To create an empty table:
t = connection.create_table(name, description)To create a table from an input file:
t = connection.create_table(name, description, inputfile=path_to_input_file)Or, to create a new table from a DataFrame:
t = connection.save_df(df, name, description="table description")Each Table object has a list of Columns
mytable.columnsAfter the columns have been fetched, columns are available as table attributes.
mytable.column1Tables are iterable. To access table data:
for row in mytable:
print rowSearch for matching rows in a table by calling search.
for row in mytable.search('foo'):
print rowSort the table by any column or set of columns. You can set the ordering by passing a string that is the column's field (name in the database).
mytable.order_by('column1')You can find column field names with their ".field" attribute:
mytable.order_by(mytable.column1.field)You can sort by multiple columns by passing a list of fields.
mytable.order_by(['column2', 'column1'])To sort in descending order, add a "-" in front of the column field name:
mytable.order_by('-column1')Limit the number of rows returned by calling limit(number_of_rows).
Search, order_by and limit can be combined to return just the data you want to see. For example, to return the top 2 finishers with the name Sally from a table of race results (race_results: [name_000, time_001]), you could write:
for result in race_results.search('Sally').order_by('-time_001').limit(2):
print rowAccess a table's data as a Pandas DataFrame by calling mytable.df()
You can also combine the querying methods above to access particular rows. race_results.search('Sally').order_by('-time_001').limit(2).df()
Quilt supports intersect and subtract for tables that store genomic regions. Those operations assume that tables have columns storing: Chromsome, start and end. The function get_bed_cols tries to infer those columns based on column names.
If the guessing fails, or to override the guess, set the chromosome, start, end columns explicitly with set_bed_cols. mytable.set_bed_cols(mytable.chr_001, mytable.start_002, mytable.end_003)
Once the bed columns are set for both tables, they can be intersected and subtracted.
result = tableA.intersect(tableB)
result = tableA.intersect_wao(tableB)
result = tableA.subtract(tableB)