SARA is a very fast method for doing single side chain replacements in protein structures by using a coarse- grained method. It is over five times faster than the leading all-atom approach, and generates biologically realistic side-chain angles. The solutions found by SARA typically deviate less than 1 Å and 12 degrees from native structures or the best all-atom solution. Run-time for the algorithm is highly predictable and can easily be tuned by the user. These characteristics makes SARA an excellent choice for high-throughput applications like structural genomics, evolutionary simulations and structure-based phylogenetics.
SARA was written by Johan Grahnen in object-oriented C++ and is encapsulated in a collection of classes for easy integration with existing software.
SARA is maintained by David Liberles.
We supply SARA 1.0 as a compressed archive containing a pre-compiled 32-bit Linux executable, the C++ source code and some brief documentation. Why not download SARA and try it out right now? Refer to the paper describing the algorithm and this web page for further details.
See Grahnen,J.A., Kubelka,J. and Liberles,D.A. (2010) (submitted) for a description of the algorithm. Please cite the same reference in any published work employing some or all of our code.
Included the software package is a Debian Linux 32-bit binary 'sara' which can be used immediately if your system is compatible (see Testing). To compile from source, verify that your system supports GNU make and has GCC version 4.3 or better installed. Then simply type
to compile the source code. To remove the object files after finishing, type
Note that support for the C++0X standard is required -- in particular, the <tr1/memory> library must exist and shared_ptr must be available in the standard namespace. See Scotty Meyer's summary for a listing of compatible compiler versions. We cannot support any configurations beyond Linux with GCC 4.3 or better, but the code should run on any system supporting a Boost-derived shared_ptr template. It has been successfully compiled on Debian 5.0.4, Ubuntu 10.04 and MacOS Darwin 9.8.0 with GCC upgraded to 4.4.
For problems with compiling the pseudo-random number generator files (randomc.h, stocc.h, mersenne.cpp and stoc1.cpp), refer to the documentation on Agner Fog's web page.
To test your installation, try making the 30L->30Y replacement in chain A of PDB structure 1D4T (necessary files are included):
./sara 1d4tA.pdb 1d4tA-new.seq 1d4tA-new.bead 100 1 1 1
Open 1d4tA.bead and 1d4tA-new.bead in your favorite viewing software to examine the replacement. Note that bead sizes in our 2-bead model are typically not the same as atom sizes in your molecular viewer, and they may need to be adjusted.
SARA takes 7 parameters on the command line. Attempting to run the program with fewer inputs will trigger a usage message:
Usage: ./sara [protein PDB file] [protein seq file] [output file] [# steps] [temp] [step length] [step variance]
The protein PDB file should be cleaned up prior to use (see Pre-processing PDB Structures), and the sequence file should contain an all upper-case one-letter version of the novel sequence, all on one line. See files 1d4tA.pdb and 1d4tA-new.seq for examples. The number of steps, temperature, step length and step variance all effect the speed and accuracy of the algorithm: read the paper for a description of their effects. We recommend 100 steps, a temperature of 1.0, step length of 1.0 and step variance of 1.0 for a fast single replacement.
Input PDB structures must contain a single chain, with residues consecutively numbered starting with 1, and conforming to a particular version of the PDB standard (see 1d4tA.pdb for an example). Unfortunately not all structures in the PDB look like this when you first download them. For your convenience, we provide two Perl scripts to help with formatting your file:
perl split_pdb.pl 1ABC.pdb ./
perl renumber_pdb.pl 1ABCA.pdb 1ABCA-num.pdb
Problem: When installing, I get an error message to the effect of "error: unrecognized command line option "-std=c++0x"".
Your version of GCC likely does not support the full C++0X standard. Run
to check what version is installed. There was no support for <tr1/memory> before version 4.3: upgrade your compiler or refer to the information under 'Installation' above for further instructions. If you have already installed a local copy of GCC that is 4.3 or better, but are having trouble using it, see Using A Local Copy of GCC.
Problem: Program crashes saying "Can't convert to TwoBeadStructure: residue 1 has no C-alpha atom!".
This is likely a parsing error. Confirm that your input structure conforms to the format exemplified in 1d4tA.pdb. Be sure to run the split_pdb.pl and renumber_pdb.pl on your PDB file before starting as described above. The problem could also be caused by reading an empty or non-existent input structure. Verify that the input file exists and that you have read permissions.
Problem: Program crashes with the message "FATAL ERROR: Could not parse structure file".
See previous problem.
Problem: Program crashes with message "FATAL ERROR: Mismatch in length of novel sequence and structure!".
The file specifying the novel sequence has some errors. See 1d4tA-new.seq for an example of the correct format. Every residue should be specified by a single upper-case letter from the standard 20-letter amino acid alphabet, and the sequence should occur on a single line.
Problem: Program crashes with message "FATAL ERROR: Could not open output file".
You may not have write permissions on the output file. Verify that you can write to it by e.g. editing it in a text editor and saving.
Problem: Replacements, particularly of more than one residue, take a very long time (or the program seems frozen).
SARA was optimized to make single residue replacements. The program is comparatively very fast for this purpose, but becomes substantially slower when performing multiple replacements. Replacing every residue from just a protein backbone takes a particularly long time. Try using fewer steps, or making replacements sequentially, to speed up the process.
SARA replaces side chains by inserting them into roughly the same position as the previous one, and then optimizing the angle of the new side chain by finding a minimum of an energy function. We use a Lennard-Jones approximation of the vdW energies of the replaced side chain in the default implementation (see the paper for more detail), but also provide an alternative linear repulsion function. If desired, you can supply your own energy function.
#define USE_SIMPLE_ENERGY_FUNCTION false
#define USE_SIMPLE_ENERGY_FUNCTION true
and then re-compile from source. This should in theory make SARA faster, but less accurate (not tested, use at your own risk).
energy = GrahnenModel::scwrlRepulsionEnergy(pCurrentStruct, active);with a call to your own function and re-compile.
If you have installed a local copy of GCC that is v 4.3 or better, and you are still having problems compiling, try the following:
CC = g++
CC = /usr/local/bin/gcc
assuming that you installed GCC to /usr/local/.
$(CC) -o sara $(sidechain-replacer-objects)
$(CC) -o sara -L /usr/local/lib -lstdc++ $(sidechain-replacer-objects)
again assuming that the new installation is in /usr/local/.
again, which should compile and link the code properly.