concavity - predictor of protein ligand binding sites from structure and conservation
concavity [options] PDBFILE OUTPUT_NAME
ConCavity predicts protein ligand binding sites by combining evolutionary sequence conservation and 3D structure.
ConCavity takes as input a PDB format protein structure PDBFILE and optionally files that characterize the evolutionary sequence conservation of the chains in the structure file.
The following result files are produced by default:
- Residue ligand binding predictions for each chain (*.scores).
- Residue ligand binding predictions in a PDB format file (residue scores placed in the temp. factor field, *_residue.pdb).
- Pocket prediction locations in a DX format file (*.dx).
- PyMOL script to visualize the predictions (*.pml).
To visualize the predictions in PyMol (it if is installed on your system), load the script by typing "pymol 1G6C_test1.pml" at the prompt or by loading it through the pymol interface.
The PDB and DX files can be input into other molecular viewers if preferred. Several additional output formats are available; see below. Note that the residue numbering in the .scores files may not match that of the PDB file.
The ConCavity approach proceeds in three conceptual steps: grid creation, pocket extraction, and residue mapping (see Methods in paper). First, the structural and evolutionary properties of the protein are used to create a regular 3D grid surrounding the protein in which the score associated with each grid point represents an estimated likelihood that it overlaps a bound ligand atom. Second, groups of contiguous, high-scoring grid points are clustered to extract pockets that adhere to given shape and size constraints. Finally, every protein residue is scored with an estimate of how likely it is to bind to a ligand based on its proximity to extracted pockets.
Each of the algorithms described for these steps is implemented in concavity. See the examples.
PDBFILE is a protein structure file in PDB format. OUTPUT_NAME becomes part of the output file names and may not contain "/". Output is written to the current directory.
- -conservation PATH
- If the "-conservation" option is not given, then conservation information is not considered. Note that there are separate conservation files for each protein chain in the structure, and the input to the -conservation option is the prefix of these files. Pre-computed conservation files available for almost the entire PQS on the ConCavity web site. If you'd like to compute sequence conservation values for your own alignments, we recommend the JSD algorithm: <http://compbio.cs.princeton.edu/conservation/>, available as score_conservation(1) from the conservation-code package.
Each of these algorithms is described in the text, and each has a number of additional parameters that change their behavior. The "custom" option allows you to set the values of all parameters for each step yourself. The presets (e.g. ligsite, search, blur) may override values you set on the command line, so use "custom" to have complete control.
There are also several output format options. Pocket prediction grid values can be output in the following formats:
Note: you may have to copy and uncompress the example data files before running the following examples.
- This will run concavity with default values (equivalent to ConCavity^L in
the paper) on the structure 1G6C.pdb and consider the conservation values
found in conservation_data/. This set of predictions will be called
"test1". This produces the following default result files in the
concavity -conservation /usr/share/doc/concavity/examples/conservation_data/1G6C /usr/share/doc/concavity/examples/1G6C.pdb test1
- For example to score the structure 1G6C.pdb with ConCavity_Pocketfinder,
Search, and Blur, you'd type:
concavity -conservation /usr/share/doc/concavity/examples/conservation_data/1G6C -grid_method pocketfinder -extraction_method search -res_map_method blur /usr/share/doc/concavity/examples/1G6C.pdb cc-pocketfinder_search_blur
The authors primarily use PyMol and Chimera for visualization, but the range of output formats means you should be able to import the data into most structural analysis program. Let us know if there are other output formats you'd like to see.