Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 0db746b

Browse filesBrowse files
mmarcinkiewiczshakandrew
authored andcommitted
[RoseTTAFold] Initial release
1 parent 26d8955 commit 0db746b
Copy full SHA for 0db746b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Dismiss banner
Expand file treeCollapse file tree

83 files changed

+8446
-0
lines changed
Open diff view settings
Collapse file
+73Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
2+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
3+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
4+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
5+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
6+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
7+
# SOFTWARE.
8+
9+
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.09-py3
10+
11+
FROM ${FROM_IMAGE_NAME} AS dgl_builder
12+
ENV DEBIAN_FRONTEND=noninteractive
13+
RUN apt-get update \
14+
&& apt-get install -y git build-essential python3-dev make cmake \
15+
&& rm -rf /var/lib/apt/lists/*
16+
WORKDIR /dgl
17+
RUN git clone --branch v0.7.0 --recurse-submodules --depth 1 https://github.com/dmlc/dgl.git .
18+
RUN sed -i 's/"35 50 60 70"/"60 70 80"/g' cmake/modules/CUDA.cmake
19+
WORKDIR build
20+
RUN cmake -DUSE_CUDA=ON -DUSE_FP16=ON ..
21+
RUN make -j8
22+
23+
24+
FROM ${FROM_IMAGE_NAME}
25+
26+
# VERY IMPORTANT, DO NOT REMOVE:
27+
ENV FORCE_CUDA 1
28+
RUN pip install -v torch-geometric
29+
RUN pip install -v torch-scatter
30+
RUN pip install -v torch-sparse
31+
RUN pip install -v torch-cluster
32+
RUN pip install -v torch-spline-conv
33+
34+
35+
# copy built DGL and install it
36+
COPY --from=dgl_builder /dgl ./dgl
37+
RUN cd dgl/python && python setup.py install && cd ../.. && rm -rf dgl
38+
ENV DGLBACKEND=pytorch
39+
#RUN pip install dgl-cu111 -f https://data.dgl.ai/wheels/repo.html
40+
41+
42+
# HH-Suite
43+
RUN git clone https://github.com/soedinglab/hh-suite.git && \
44+
mkdir -p hh-suite/build
45+
WORKDIR hh-suite/build
46+
RUN cmake .. && \
47+
make && \
48+
make install
49+
50+
51+
# PSIPRED
52+
WORKDIR /workspace
53+
RUN wget http://wwwuser.gwdg.de/~compbiol/data/csblast/releases/csblast-2.2.3_linux64.tar.gz -O csblast-2.2.3.tar.gz && \
54+
mkdir -p csblast-2.2.3 && \
55+
tar xf csblast-2.2.3.tar.gz -C csblast-2.2.3 --strip-components=1 && \
56+
rm csblast-2.2.3.tar.gz
57+
58+
RUN wget https://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz && \
59+
tar xf blast-2.2.26-x64-linux.tar.gz && \
60+
rm blast-2.2.26-x64-linux.tar.gz
61+
62+
RUN wget http://bioinfadmin.cs.ucl.ac.uk/downloads/psipred/psipred.4.02.tar.gz && \
63+
tar xf psipred.4.02.tar.gz && \
64+
rm psipred.4.02.tar.gz
65+
66+
67+
ADD . /workspace/rf
68+
WORKDIR /workspace/rf
69+
70+
RUN wget https://openstructure.org/static/lddt-linux.zip -O lddt.zip && unzip -d lddt -j lddt.zip
71+
72+
RUN pip install --upgrade pip
73+
RUN pip install -r requirements.txt
Collapse file
+21Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2021 RosettaCommons
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
Collapse file
+94Lines changed: 94 additions & 0 deletions
  • Display the source diff
  • Display the rich diff
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# *RoseTTAFold*
2+
This package contains deep learning models and related scripts to run RoseTTAFold.
3+
This repository is the official implementation of RoseTTAFold: Accurate prediction of protein structures and interactions using a 3-track network.
4+
5+
## Installation
6+
7+
1. Clone the package
8+
```
9+
git clone https://github.com/RosettaCommons/RoseTTAFold.git
10+
cd RoseTTAFold
11+
```
12+
13+
2. Create conda environment using `RoseTTAFold-linux.yml` file and `folding-linux.yml` file. The latter is required to run a pyrosetta version only (run_pyrosetta_ver.sh).
14+
```
15+
# create conda environment for RoseTTAFold
16+
# If your NVIDIA driver compatible with cuda11
17+
conda env create -f RoseTTAFold-linux.yml
18+
# If not (but compatible with cuda10)
19+
conda env create -f RoseTTAFold-linux-cu101.yml
20+
21+
# create conda environment for pyRosetta folding & running DeepAccNet
22+
conda env create -f folding-linux.yml
23+
```
24+
25+
3. Download network weights (under Rosetta-DL Software license -- please see below)
26+
While the code is licensed under the MIT License, the trained weights and data for RoseTTAFold are made available for non-commercial use only under the terms of the Rosetta-DL Software license. You can find details at https://files.ipd.uw.edu/pub/RoseTTAFold/Rosetta-DL_LICENSE.txt
27+
28+
```
29+
wget https://files.ipd.uw.edu/pub/RoseTTAFold/weights.tar.gz
30+
tar xfz weights.tar.gz
31+
```
32+
33+
4. Download and install third-party software.
34+
```
35+
./install_dependencies.sh
36+
```
37+
38+
5. Download sequence and structure databases
39+
```
40+
# uniref30 [46G]
41+
wget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz
42+
mkdir -p UniRef30_2020_06
43+
tar xfz UniRef30_2020_06_hhsuite.tar.gz -C ./UniRef30_2020_06
44+
45+
# BFD [272G]
46+
wget https://bfd.mmseqs.com/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz
47+
mkdir -p bfd
48+
tar xfz bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz -C ./bfd
49+
50+
# structure templates (including *_a3m.ffdata, *_a3m.ffindex) [over 100G]
51+
wget https://files.ipd.uw.edu/pub/RoseTTAFold/pdb100_2021Mar03.tar.gz
52+
tar xfz pdb100_2021Mar03.tar.gz
53+
# for CASP14 benchmarks, we used this one: https://files.ipd.uw.edu/pub/RoseTTAFold/pdb100_2020Mar11.tar.gz
54+
```
55+
56+
6. Obtain a [PyRosetta licence](https://els2.comotion.uw.edu/product/pyrosetta) and install the package in the newly created `folding` conda environment ([link](http://www.pyrosetta.org/downloads)).
57+
58+
## Usage
59+
60+
```
61+
# For monomer structure prediction
62+
cd example
63+
../run_[pyrosetta, e2e]_ver.sh input.fa .
64+
65+
# For complex modeling
66+
# please see README file under example/complex_modeling/README for details.
67+
python network/predict_complex.py -i paired.a3m -o complex -Ls 218 310
68+
```
69+
70+
## Expected outputs
71+
For the pyrosetta version, user will get five final models having estimated CA rms error at the B-factor column (model/model_[1-5].crderr.pdb).
72+
For the end-to-end version, there will be a single PDB output having estimated residue-wise CA-lddt at the B-factor column (t000_.e2e.pdb).
73+
74+
## FAQ
75+
1. Segmentation fault while running hhblits/hhsearch
76+
For easy install, we used a statically compiled version of hhsuite (installed through conda). Currently, we're not sure what exactly causes segmentation fault error in some cases, but we found that it might be resolved if you compile hhsuite from source and use this compiled version instead of conda version. For installation of hhsuite, please see [here](https://github.com/soedinglab/hh-suite).
77+
78+
2. Submitting jobs to computing nodes
79+
The modeling pipeline provided here (run_pyrosetta_ver.sh/run_e2e_ver.sh) is a kind of guidelines to show how RoseTTAFold works. For more efficient use of computing resources, you might want to modify the provided bash script to submit separate jobs with proper dependencies for each of steps (more cpus/memory for hhblits/hhsearch, using gpus only for running the networks, etc).
80+
81+
## Links:
82+
83+
* [Robetta server](https://robetta.bakerlab.org/) (RoseTTAFold option)
84+
* [RoseTTAFold models for CASP14 targets](https://files.ipd.uw.edu/pub/RoseTTAFold/casp14_models.tar.gz) [input MSA and hhsearch files are included]
85+
86+
## Credit to performer-pytorch and SE(3)-Transformer codes
87+
The code in the network/performer_pytorch.py is strongly based on [this repo](https://github.com/lucidrains/performer-pytorch) which is pytorch implementation of [Performer architecture](https://arxiv.org/abs/2009.14794).
88+
The codes in network/equivariant_attention is from the original SE(3)-Transformer [repo](https://github.com/FabianFuchsML/se3-transformer-public) which accompanies [the paper](https://arxiv.org/abs/2006.10503) 'SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks' by Fabian et al.
89+
90+
91+
## References
92+
93+
M Baek, et al., Accurate prediction of protein structures and interactions using a 3-track network, bioRxiv (2021). [link](https://www.biorxiv.org/content/10.1101/2021.06.14.448402v1)
94+
Collapse file
+138Lines changed: 138 additions & 0 deletions
  • Display the source diff
  • Display the rich diff
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# RoseTTAFold for PyTorch
2+
3+
This repository provides a script to run inference using the RoseTTAFold model. The content of this repository is tested and maintained by NVIDIA.
4+
5+
## Table Of Contents
6+
7+
- [Model overview](#model-overview)
8+
* [Model architecture](#model-architecture)
9+
- [Setup](#setup)
10+
* [Requirements](#requirements)
11+
- [Quick Start Guide](#quick-start-guide)
12+
- [Release notes](#release-notes)
13+
* [Changelog](#changelog)
14+
* [Known issues](#known-issues)
15+
16+
17+
18+
## Model overview
19+
20+
The RoseTTAFold is a model designed to provide accurate protein structure from its amino acid sequence. This model is
21+
based on [Accurate prediction of protein structures and interactions using a 3-track network](https://www.biorxiv.org/content/10.1101/2021.06.14.448402v1) by Minkyung Baek et al.
22+
23+
This implementation is a dockerized version of the official [RoseTTAFold repository](https://github.com/RosettaCommons/RoseTTAFold/).
24+
Here you can find the [original RoseTTAFold guide](README-ROSETTAFOLD.md).
25+
26+
### Model architecture
27+
28+
The RoseTTAFold model is based on a 3-track architecture fusing 1D, 2D, and 3D information about the protein structure.
29+
All information is exchanged between tracks to learn the sequence and coordinate patterns at the same time. The final prediction
30+
is refined using an SE(3)-Transformer.
31+
32+
<img src="images/NetworkArchitecture.jpg" width="900"/>
33+
34+
*Figure 1: The RoseTTAFold architecture. Image comes from the [original paper](https://www.biorxiv.org/content/10.1101/2021.06.14.448402v1).*
35+
36+
## Setup
37+
38+
The following section lists the requirements that you need to meet in order to run inference using the RoseTTAFold model.
39+
40+
### Requirements
41+
42+
This repository contains a Dockerfile that extends the PyTorch NGC container and encapsulates necessary dependencies. Aside from these dependencies, ensure you have the following components:
43+
- [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
44+
- PyTorch 21.09-py3 NGC container
45+
- Supported GPUs:
46+
- [NVIDIA Volta architecture](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/)
47+
- [NVIDIA Turing architecture](https://www.nvidia.com/en-us/design-visualization/technologies/turing-architecture/)
48+
- [NVIDIA Ampere architecture](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/)
49+
50+
For more information about how to get started with NGC containers, refer to the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning Documentation:
51+
- [Getting Started Using NVIDIA GPU Cloud](https://docs.nvidia.com/ngc/ngc-getting-started-guide/index.html)
52+
- [Accessing And Pulling From The NGC Container Registry](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#accessing_registry)
53+
- [Running PyTorch](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/running.html#running)
54+
55+
For those unable to use the PyTorch NGC container, to set up the required environment or create your own container, refer to the versioned [NVIDIA Container Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html).
56+
57+
In addition, 1 TB of disk space is required to unpack the required databases.
58+
59+
## Quick Start Guide
60+
61+
To run inference using the RoseTTAFold model, perform the following steps using the default parameters.
62+
63+
1. Clone the repository.
64+
```
65+
git clone https://github.com/NVIDIA/DeepLearningExamples
66+
cd DeepLearningExamples/DGLPyTorch/
67+
```
68+
69+
2. Download the pre-trained weights and databases needed for inference.
70+
The following command downloads the pre-trained weights and two databases needed to create derived features to the input to the model.
71+
The script will download the `UniRef30` (~50 GB) and `pdb100_2021Mar03` (~115 GB) databases, which might take a considerable amount
72+
of time. Additionally, unpacking those databases requires approximately 1 TB of free disk space.
73+
74+
By default, the data will be downloaded to `./weights` and `./databases` folders in the current directory.
75+
```
76+
bash scripts/download_databases.sh
77+
```
78+
If you would like to specify the download location you can pass the following parameters
79+
```
80+
bash scripts/download_databases.sh PATH-TO-WEIGHTS PATH-TO-DATABASES
81+
```
82+
83+
3. Build the RoseTTAFold PyTorch NGC container. This step builds the PyTorch dependencies on your machine and can take between 30 minutes and 1 hour to complete.
84+
```
85+
docker build -t rosettafold .
86+
```
87+
88+
4. Start an interactive session in the NGC container to run inference.
89+
90+
The following command launches the container and mount the `PATH-TO-WEIGHTS` directory as a volume to the `/weights` directory in the container, the `PATH-TO-DATABASES` directory as a volume to the `/databases` directory in the container, and `./results` directory to the `/results` directory in the container.
91+
```
92+
mkdir data results
93+
docker run --ipc=host -it --rm --runtime=nvidia -p6006:6006 -v PATH-TO-WEIGHTS:/weights -v PATH-TO-DATABASES:/databases -v ${PWD}/results:/results rosettafold:latest /bin/bash
94+
```
95+
96+
5. Start inference/predictions.
97+
98+
To run inference you have to prepare a FASTA file and pass a path to it or pass a sequence directly.
99+
```
100+
python run_inference_pipeline.py [Sequence]
101+
```
102+
There is an example FASTA file at `example/input.fa` for you to try. Running the inference pipeline consists of four steps:
103+
1. Preparing the Multiple Sequence Alignments (MSAs)
104+
2. Preparing the secondary structures
105+
3. Preparing the templates
106+
4. Iteratively refining the prediction
107+
108+
The first three steps can take between a couple of minutes and an hour, depending on the sequence.
109+
The output will be stored at the `/results` directory as an `output.e2e.pdb` file
110+
111+
6. Start Jupyter Notebook to run inference interactively.
112+
113+
To launch the application, copy the Notebook to the root folder.
114+
```
115+
cp notebooks/run_inference.ipynb .
116+
117+
```
118+
To start Jupyter Notebook, run:
119+
```
120+
jupyter notebook run_inference.ipynb
121+
```
122+
123+
For more information about Jupyter Notebook, refer to the Jupyter Notebook documentation.
124+
125+
126+
## Release notes
127+
128+
### Changelog
129+
130+
October 2021
131+
- Initial release
132+
133+
### Known issues
134+
135+
There are no known issues with this model.
136+
137+
138+

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.