caching.md

Local PDB Installations

BioJava can automatically download and install most of the data files that it needs. Those downloads will happen only once. Future requests for the data file will re-use the local copy.

The main class that provides this functionality is the AtomCache.

It is hidden inside the StructureIO class, that we already encountered earlier.

	Structure structure = StructureIO.getStructure("4hhb");

is the same as

	AtomCache cache = new AtomCache();
	cache.getStructure("4hhb");

Where Are the Files Written to?

By default the AtomCache writes all files into a temporary location (The system temp directory "java.io.tempdir").

If you already have a local PDB installation, or you want to use a more permanent location to store the files, you can configure the AtomCache by setting the PDB_DIR system property

    -DPDB_DIR=/wherever/you/want/

BioJava will also check for a PDB_DIR environmental variable. If you launch BioJava from the command line, it can be useful to include export PDB_DIR=/wherever/you/want in your .bashrc file.

An alternative is to hard-code the path in this way (but setting it as a property is better style)

	AtomCache cache = new AtomCache();

	cache.setPath("/path/to/pdb/files/");

File Parsing Parameters

The AtomCache also provides access to configuring various options that are available during the parsing of files. The FileParsingParameters class is the main place to influence the level of detail and as a consequence the speed with which files can be loaded.

This example turns on the use of chemical components when loading a Structure. (See also the next chapter)

	AtomCache cache = new AtomCache();

	cache.setPath("/tmp/");

	FileParsingParameters params = cache.getFileParsingParams();

	StructureIO.setAtomCache(cache);

	Structure structure = StructureIO.getStructure("4hhb");

Caching of other SCOP, CATH

The AtomCache not only provides access to PDB, it can also fetch Structure representations of protein domains, as defined by SCOP and CATH, and the algorithms Protein Domain Parser (PDP) and Domain Parser (DP).

	// uses a SCOP domain definition
	Structure domain1 = StructureIO.getStructure("d4hhba_");
	
	// Get a specific protein chain, note: chain IDs are case sensitive, PDB IDs are not.
	Structure chain1 = StructureIO.getStructure("4HHB.A");

There are quite a number of external database IDs that are supported here. See the AtomCache documentation for more details on the supported options.

The non-PDB files can be cached at a different location by setting the PDB_CACHE_DIR property (with java -DPDB_CACHE_DIR=...) or environmental variable.

Navigation: Home | Book 3: The Structure Modules | Chapter 4 : Local Installations

Prev: Chapter 3 : Structure Data Model

Next: Chapter 5 : Chemical Component Dictionary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand file tree

Local PDB Installations

Where Are the Files Written to?

File Parsing Parameters

Caching of other SCOP, CATH

Search code, repositories, users, issues, pull requests...

FilesExpand file tree

caching.md

Latest commit

History

caching.md

File metadata and controls

Local PDB Installations

Where Are the Files Written to?

File Parsing Parameters

Caching of other SCOP, CATH

Expand file tree