6. Linking AlphaFold setup with CCP4 Cloud¶
6.1. Structure prediction in CCP4 Cloud¶
In CCP4 Cloud, working with AlphaFold database (AFDB) is completely streamlined and does not require any additional configuration or adjustments. Individual AFDB entries may be fetched by UniProt ID in the same way as PDB entries are obtained by PDB ID. All automatic pipelines that download structural homologs from the PDB, acquire them also from AFDB, seamlessly for users. If AFDB entry does not exist for particular UniProt code or if sequence data cannot be sent to remote servers, the structure may be predicted locally using the Structure Prediction task in CCP4 Cloud.
6.2. Linking AlphaFold setup in outline¶
For the Structure Prediction task to work, a local AlphaFold (or OpenFold, or ColabFold) setup must be available on CCP4 Cloud number cruncher(s) and linked with them. The following describes the 4-step procedure of linking AlphaFold setup with CCP4 Cloud.
1. Have one of *AlphaFold*, *OpenFold* or *ColabFold* installed
For AlphaFold, see Appendix A.
For ColabFold, see Appendix B.
For OpenFold, see Appendix C.
2. Prepare JSON-formatted configuration file
See Appendix D
3. Modify CCP4 Cloud start scripts
In the start script for the Front-End server (usually named start-fe.sh
),
put the following line somewhere before calling node
(e.g., after the last
export
statements in the script):
export ALPHAFOLD_CFG=1
In start script(s) for Number-Cruncher servers (usually named start-nc.sh
),
put the following line after the last source
statement in the script:
source /path/to/miniconda3/bin/activate alphafold
where /path/to
depends on details of your AlphaFold / OpenFold / ColabFold
setup. In addition, put the following line after the last export
statement in
the script:
export ALPHAFOLD_CFG=/path/to/af2.conf
where /path/to
depends on where the JSON-formatted AlphaFold configuration file
was placed.
4. Restart CCP4 Cloud servers.
The End
6.3. Appendix A. AlphaFold setup¶
A.1. Install miniconda3
First, install miniconda3. Download the installation script for your platform (Linux or macOS) and run it. For example, on Linux:
sh ./Miniconda3-latest-Linux-x86_64.sh
Follow the instructions that appear on the terminal and install miniconda3 under some directory convenient for you. Close and re-open your current shell. This will activate conda’s base environment.
Now create an environment for AlphaFold:
conda create -n alphafold python=3
Activate the environment:
conda activate alphafold
A.2. Install af2wrapper
Clone the Git repository:
git clone --recursive https://gitlab.com/gchojnowski/af2wrapper.git
Set it up in the ‘alphafold’ conda environment that was activated above.
cd af2wrapper
python setup.py install
Install matplotlib:
conda install -c conda-forge matplotlib
A.3. Install AlphaFold
Download and unpack the AlphaFold Github repository. Follow the setup instructions in the README file for the GPU support (recommended), databases and Docker (recommended). Install the Python requirements in the same ‘alphafold’ conda environment where af2wrapper is run.
The full databases take about 2.4 TB disk space, whereas the reduced databases (with the small_bfd database) take about 600 GB.
AlphaFold can optionally be run without Docker, using the run_alphafold.py script. To use this script, install the Python dependencies in the requirements.txt file (the one in the parent installation directory).
pip install -r requirements.txt
You will also need to install kalign, the HH-suite and jackhmmer. If you use Docker, these needn’t be installed separately.
6.4. Appendix B. ColabFold setup¶
First install miniconda3 and af2wrapper as instructed for the AlphaFold setup in sections A.1 and A.2.
The full instructions for running ColabFold on Linux or macOS are available on the LocalColabFold repository. On Linux (or WSL), install the NVIDIA CUDA Toolkit first, if using a GPU (recommended). For macOS, the CUDA driver is currently not available, so ColabFold can only be run on the CPU.
Then run the installation script. Set up the PATH variable to include the
bin
directory under the colabfold_batch
directory that was just
installed. For example:
export PATH="/home/ccp4/Desktop/colabfold_batch/bin:$PATH"
ColabFold doesn’t require installation of local databases, as it obtains the MSA and template files from the MMseqs2 server.
6.5. Appendix C. OpenFold setup¶
6.6. Appendix D. JSON-formatted configuration file¶
Create file, e.g., af2.conf
with the following content (use copy-paste):
{
"engine" : "alphafold",
"path_to_run_alphafold_py" : "/home/ccp4/alphafold/run_alphafold.py",
"path_to_run_docker" : "/home/ccp4/alphafold/docker/run_docker.py",
"path_to_colabfold_batch" : "colabfold_batch",
"path_to_run_openfold" : "/home/ccp4/openfold/run_openfold.sh",
"max_template_date" : "2020-05-14",
"data_dir" : "/data/alphafold/db",
"db_preset" : "full_dbs",
"pdb70_database_path" : "/data/alphafold/db/pdb70/pdb70",
"uniref90_database_path" : "/data/alphafold/db/uniref90/uniref90.fasta",
"mgnify_database_path" : "/data/alphafold/db/mgnify/mgy_clusters_2018_12.fa",
"template_mmcif_dir" : "/data/alphafold/db/pdb_mmcif/mmcif_files/",
"obsolete_pdbs_path" : "/data/alphafold/db/pdb_mmcif/obsolete.dat",
"bfd_database_path" : "/data/alphafold/db/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt",
"uniclust30_database_path" : "/data/alphafold/db/uniclust30/uniclust30_2018_08/uniclust30_2018_08",
"pdb_seqres_database_path" : "/data/alphafold/db/pdb_seqres/pdb_seqres.txt",
"uniprot_database_path" : "/data/alphafold/db/uniprot/uniprot.fasta",
"use_gpu_relax" : "1",
"colabfold_use_cpu" : "0",
"openfold_device" : "cuda:0",
"hhblits_binary_path" : "hhblits",
"hhsearch_binary_path" : "hhsearch",
"hmmbuild_binary_path" : "hmmbuild",
"hmmsearch_binary_path" : "hmmsearch",
"jackhmmer_binary_path" : "jackhmmer",
"kalign_binary_path" : "kalign",
"run_this_cmd_before_af2" : "",
"run_this_cmd_before_cf" : "",
"run_this_cmd_before_of" : ""
}
and put values of all parameters that correspond to your installation, as described below. For the paths to executable scripts, the full paths are not required if the PATH variable includes them.
- engine
AlphaFold implementation, one of
alphafold
,colabfold
oropenfold
. Example:"engine" : "alphafold"
- path_to_run_alphafold_py
Path to the script to run AlphaFold without Docker. Example:
"path_to_run_alphafold_py" : "/home/ccp4/alphafold/run_alphafold.py"
- path_to_run_docker
Path to the script to run AlphaFold in a Docker container. Example:
"path_to_run_docker" : "/home/ccp4/alphafold/docker/run_docker.py"
- path_to_colabfold_batch
(ColabFold implementation only) Path to the script to run the local ColabFold. Example:
"path_to_colabfold_batch" : "/home/ccp4/colabfold_batch/bin/colabfold_batch"
- path_to_run_openfold
(OpenFold implementation only) Path to the wrapper script to run OpenFold. Example:
"path_to_run_openfold" : "/home/ccp4/openfold/run_openfold.sh"
max_template_date Restricts templates only to structures that were available in PDB before this date. This is useful if you are predicting the structure of a protein that is already in PDB, and want to prevent the previously predicted structure from being used as a template. Example:
"max_template_date" : "2020-05-14"
- data_dir
Directory where the protein databases are stored. Example:
"data_dir" : "/data/alphafold/db"
- db_preset
(AlphaFold implementation only) Whether to use the full or reduced BFD databases. Example:
"db_preset" : "full_dbs"
- pdb70_database_path
(AlphaFold implementation only) Path to the PDB70 database for use with HHsearch. Example:
"pdb70_database_path" : "/data/alphafold/db/pdb70/pdb70"
- uniref90_database_path
(AlphaFold implementation only) Path to the Uniref90 database for use with JackHMMER. Example:
"uniref90_database_path" : "/data/alphafold/db/uniref90/uniref90.fasta"
- mgnify_database_path
(AlphaFold implementation only) Path to the MGnify database for use with JackHMMER. Example:
"mgnify_database_path" : "/data/alphafold/db/mgnify/mgy_clusters_2018_12.fa"
- template_mmcif_dir
(AlphaFold implementation only) Path to a directory with template mmCIF structures, each named <pdb_id>.cif. Example:
"mgnify_database_path" : "/data/alphafold/db/mgnify/mgy_clusters_2018_12.fa"
- obsolete_pdbs_path
(AlphaFold implementation only) Path to file containing a mapping from obsolete PDB IDs to the PDB IDs of their replacements. Example:
"obsolete_pdbs_path" : "/data/alphafold/db/pdb_mmcif/obsolete.dat",
- bfd_database_path
(AlphaFold implementation only) Path to the BFD database for use by HHblits. Example:
"bfd_database_path" : "/data/alphafold/db/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt",
- uniclust30_database_path
(AlphaFold implementation only) Path to the Uniclust30 database for use by HHblits. Example:
"uniclust30_database_path" : "/data/alphafold/db/uniclust30/uniclust30_2018_08/uniclust30_2018_08",
- pdb_seqres_database_path
(AlphaFold implementation only) Path to the PDB seqres database for use by hmmsearch. Example:
"pdb_seqres_database_path" : "/data/alphafold/db/pdb_seqres/pdb_seqres.txt"
- uniprot_database_path
(AlphaFold implementation only) Path to the Uniprot database for use by JackHMMer. Example:
"uniprot_database_path" : "/data/alphafold/db/uniprot/uniprot.fasta"
- use_gpu_relax
(AlphaFold implementation only) Whether to enable NVIDIA runtime to run with GPUs. NB: This option affects both the inference and the (optional) relaxation step. Example:
"use_gpu_relax" : "false"
- colabfold_use_cpu
(ColabFold implementation only) If equal to 1, run the inference on the CPU instead of the GPU. Example:
"colabfold_use_cpu" : "0"
- openfold_device
(OpenFold implementation only) Device option for PyTorch. If equal to ‘cpu’, use the CPU for the inference and relaxation. To use the GPU, the option should include the device number. Example:
"openfold_device" : "cuda:0"
- run_this_cmd_before_af2
(AlphaFold implementation only) Any shell command to run before AlphaFold2. Example:
"run_this_cmd_before_af2" : ""
- run_this_cmd_before_cf
(ColabFold implementation only) Any shell command to run before ColabFold. Example:
"run_this_cmd_before_cf" : "",
- run_this_cmd_before_of
(OpenFold implementation only) Any shell command to run before OpenFold. Example:
"run_this_cmd_before_of" : ""