Convert files to live data sets on Quilt

Optional prep (your steps may vary)

Get a list of files you want to upload (see get-files-to-upload/)
Download the files in the list (see curl-all.py)
Unzip downloaded files (if needed)

cd downloads
gunzip *.gz

Upload to Quilt

Use data_set.py to create individual data sets (see python data_set.py --help). You will need a Quilt username and password. Or use batch.py to create multiple data sets.

python data_set.py
  -e https://quiltdata.com
  -u USERNAME
  -n "ENCODE data"
  -d "#A549 #histone peak data #hg19"
  -f downloads/wgEncodeBroadHistoneNhaH3k36me3StdPk.broadPeak

File formats in this example

ENCDOE broadPeak format

Resources

ENCODE Project

REST API

Action	Endpoint	Details
New table	`POST /tables/`	See below
Delete table	`DELETE /tables/TABLE_ID/`	See below
Update table meta-data	`PATCH /tables/TABLE_ID`	See below
Add column to table	`POST /tables/TABLE_ID/columns/`	See below
Append row to table	`POST /data/TABLE_ID/rows/`	See below
Get table rows	`GET /data/TABLE_ID/rows`	See below
Get table row	`GET /data/TABLE_ID/rows/ROW_ID`	See below
Genome intersect or subtract	`POST /genemath/`	See below

Notes

For all REST calls, the content-type is application/JSON.
Description fields automatically linkify URLs and support <a>, <i>, <em>, <strong>, <b> tags

Tables

Create table

POST /tables/

Data format

{
  'name': string,
  'description': text `<a>, <i>, <em>, <strong>, <b>` tags supported; automatic linkification of URLs
  'columns': [
    {
      'name': string,
      'sqlname': optional string,
      'description': optional text,
      'type' : one of 'String', 'Number', 'Image', 'Text'
    },
    ...
  ]
}

Returns

Table data as JSON object, includes id field with the table's identifier.

Add column to table

POST /tables/TABLE_ID/columns/

Data format

{
   'name': string,
   'sqlname': optional string,
   'description': text,
   'type': one of 'String', 'Number', 'Image', or 'Text'
}

Returns

Column data as JSON object, includes id field with the column's identifier.

Delete table

DELETE /tables/TABLE_ID

Update table meta-data

PATCH /tables/TABLE_ID

Data format

{
   'name': string,
   'description': text
}

Table Data

Use column sqlname as keys in input data

Append row

POST /data/TABLE_ID/rows/

Data format

[
  {columnSqlname0: value0, columnSqlname1 : value1, ... },
  ...
]

Get rows

GET /data/TABLE_ID/rows

Rows are keyed by the Quilt Row ID field qrid
NOTE: Currently limited to the first 500 rows

Returns

Row data as JSON object, keyed by column.sqlname.

Get row

GET /data/TABLE_ID/rows/ROW_ID

Returns

Row data as JSON object, keyed by column.sqlname.

Quilt tables

Join

POST /quilts/

Data format

{
  'left_table_id': int,
  'right_table_id': int,
  'left_column_id': int,
  'right_column_id': int,
  'jointype': one of 'inner', 'leftOuter', 'firstMatch'
}

Returns

Quilt info as JSON object, includes sqlname field with the quilt's identifier.

Undo join

DELETE /quilts/QUILT_SQLNAME

Genome Math

Performs a gene math operation on two tables
Creates a new table with the result.
Columns are specified by column.id.

Intersect or subtract

POST /genemath/

Data Format

{
  'operator': one of 'Intersect' or 'Subtract',
  'left_chr': integer (column id),
  'left_start': integer (column id),
  'left_end':  integer (column id),
  'right_chr':  integer (column id),
  'right_start': integer (column id),
  'right_end':  integer (column id)
}

Returns

JSON object representing the result table.

Python

The Quilt Python connector uses the Quilt REST API and SQL Alchemy (http://docs.sqlalchemy.org/), if installed, to access and update data sets in Quilt. Quilt tables are available as dictionaries or Pandas (http://pandas.pydata.org/) DataFrames.

Connection

To use the Quilt Python connector, add this repository to your PYTHONPATH and import quilt.

Connect to Quilt by creating a Connection object:

import quilt
connection = quilt.Connection(username)
Password: *enter your password*

The connection will contain a list of your Quilt tables:

connection.tables

Search for Data Sets

You can also find tables by searching your own tables and Quilt's public data sets

connection.search('term')

Get Table

Get a table by Table id using get_table:

t = connection.get_table(1234)

Create a New Table

Using the connection, you can create new tables in Quilt. To create an empty table:

t = connection.create_table(name, description)

To create a table from an input file:

t = connection.create_table(name, description, inputfile=path_to_input_file)

Or, to create a new table from a DataFrame:

t = connection.save_df(df, name, description="table description")

Table

Each Table object has a list of Columns

mytable.columns

After the columns have been fetched, columns are available as table attributes.

mytable.column1

Accessing Table Data

Tables are iterable. To access table data:

for row in mytable:
    print row

Search

Search for matching rows in a table by calling search.

for row in mytable.search('foo'):
    print row

Order By

Sort the table by any column or set of columns. You can set the ordering by passing a string that is the column's field (name in the database).

mytable.order_by('column1')

You can find column field names with their ".field" attribute:

mytable.order_by(mytable.column1.field)

You can sort by multiple columns by passing a list of fields.

mytable.order_by(['column2', 'column1'])

To sort in descending order, add a "-" in front of the column field name:

mytable.order_by('-column1')

Limit

Limit the number of rows returned by calling limit(number_of_rows).

Putting it all together

Search, order_by and limit can be combined to return just the data you want to see. For example, to return the top 2 finishers with the name Sally from a table of race results (race_results: [name_000, time_001]), you could write:

for result in race_results.search('Sally').order_by('-time_001').limit(2):
    print row

Pandas DataFrame

Access a table's data as a Pandas DataFrame by calling mytable.df()

You can also combine the querying methods above to access particular rows. race_results.search('Sally').order_by('-time_001').limit(2).df()

Gene Math

Quilt supports intersect and subtract for tables that store genomic regions. Those operations assume that tables have columns storing: Chromsome, start and end. The function get_bed_cols tries to infer those columns based on column names.

If the guessing fails, or to override the guess, set the chromosome, start, end columns explicitly with set_bed_cols. mytable.set_bed_cols(mytable.chr_001, mytable.start_002, mytable.end_003)

Once the bed columns are set for both tables, they can be intersected and subtracted.

result = tableA.intersect(tableB)
result = tableA.intersect_wao(tableB)
result = tableA.subtract(tableB)

Name	Name	Last commit message	Last commit date
Latest commit History 119 Commits
examples	examples
get-files-to-uplaod	get-files-to-uplaod
.gitignore	.gitignore
README.md	README.md
batch.py	batch.py
dataset.py	dataset.py
quilt.py	quilt.py

Search code, repositories, users, issues, pull requests...

brennv/quilt-python-api

Folders and files

Latest commit

History

Repository files navigation

Convert files to live data sets on Quilt

Optional prep (your steps may vary)

Upload to Quilt

File formats in this example

Resources

REST API

Tables

Create table

Data format

Returns

Add column to table

Data format

Returns

Delete table

Update table meta-data

Data format

Table Data

Append row

Data format

Get rows

Returns

Get row

Returns

Quilt tables

Join

Data format

Returns

Undo join

Genome Math

Intersect or subtract

Data Format

Returns

Python

Connection

Search for Data Sets

Get Table

Create a New Table

Table

Accessing Table Data

Search

Order By

Limit

Putting it all together

Pandas DataFrame

Gene Math

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages