CASS is a software package for simulating and sampling protein sequences with an explicit genotype-to-phenotype map. The molecular phenotypes under consideration are folding into a specific conformation, and function in the sense of binding one or more ligands. Using these biophysically motivated restrictions, CASS produces sequences with protein-like properties such as core hydrophobicity and a biologically relevant distribution of evolutionary rates.
CASS 1.0 was written by Johan Grahnen in object-oriented C++ and is encapsulated in a collection of classes for ease of integration with existing software. It is open-source and freely available under the GPL v3 license.
CASS 1.1 and up is maintained by David Liberles with contributions from Nadia Bykova.
CASS 1.1s, with site-specific weights is now available below in updates, with contributions from Peter Chi, Jason Lai, Dohyup Kim, and Nadia Bykova.
We supply CASS 1.1 as a compressed archive containing the C++ source code, example data files, some helpful scripts and brief documentation. Why not download CASS and try it out right now? Refer to the paper describing the algorithm and this web page for further details.
To compile from source, verify that your system supports GNU make and has GCC version 4.3 or better installed. Then unpack the archive and type
to compile the source code. To remove the object files after finishing, type
Note that support for the C++0X/C++11 standard is required. If you are having trouble with compilation, see the section on "Troubleshooting". To date, CASS has been successfully compiled and tested on various 64-bit Linux systems with GCC 4.3-4.6 (see README for complete list).
For problems with compiling the pseudo-random number generator files
stoc1.cpp), refer to the documentation
on Agner Fog's web page.
Finally, please consult the included README file for further instructions.
For a test of the compiled software, run the following command as a single line:
./sh2-evolution-decoys+neo 1d4tA-novel.pheno 10 100 0.0032 -60 -19 data/1d4tA-renum.bead data/1d4tA.dna data/1d4tB.bead data/decoy-ligand-RLPTIYICITG.bead data/novel-ligand-GEPTIYTGVIH.bead > 1d4tA-novel.seqThis may take some time to complete. When finished, the output file
'1d4tA-novel.pheno'should contain 6 tab-separated columns on 12 lines, ending with
A number of basic and advanced uses of the software package are described in the README file. Here we list a few typical tasks and the supplied example applications that can be used to perform them.
Task: Simulating protein sequence evolution with explicit selection to fold into the native conformation and maintain native function.
sh2-evolution-decoys program implements the model described
in reference #1 for sequence simulation. See Basic example
#7 in the README for more detail.
Task: Simulation of protein sequence evolution under changing function.
As described in reference #3, the
program implements a model of acquiring novel functionality while maintaining the
original function. See Advanced example #2 in the README for more detail.
Task: Stability-biased sampling of near-native protein sequences.
sample-seqs program implements a Markov chain Monte Carlo
algorithm for generating protein sequences that are scored as more
thermodynamically stable than the native sequence of a protein under a
particular model. See Advanced example #4 in the README for more detail.
Problem: When installing, I get an error message to the effect of 'error: unrecognized command line option "-std=c++0x"' or errors/warnings about 'shared_ptr'.
Your version of GCC likely does not support the C++0X/C++11 standard. Run
to check what version is installed. There was no support for
<tr1/memory> before version 4.3: please upgrade your
compiler to version 4.3 or newer if possible. In particular, the
<tr1/memory> library must exist and shared_ptr must be
available in the standard namespace. See
Scott Meyer's summary
for a listing of compatible compiler versions. We cannot currently support any
configurations beyond Linux with GCC 4.3 or better, but the code should
theoretically compile on any Unix-based system (such as MacOS) with the correct
For more information about compiling with shared_ptr, Boris Kolpackov's blog has specific information on how to access the template in other compilers and earlier versions of GCC.
Finally, please contact the maintainer if you are unable to resolve installation problems using the above information.
Problem: Program crashes saying "Can't convert to TwoBeadStructure: residue 1 has no C-alpha atom!".
This is likely a parsing error. Confirm that your input structures
conform to the format exemplified in
data/1d4tB.bead. The problem could also be caused by reading an
empty or non-existent input structure. Verify that the input file exists and
that you have read permissions.
Problem: Program crashes with the message "FATAL ERROR: Could not parse structure file".
See previous problem.
Problem: Program crashes with message "FATAL ERROR: Could not open output file".
You may not have write permissions on the output file. Verify that you can write to it by e.g. editing it in a text editor and saving.
Problem: When running the
test-motifs executable, nothing
happens. Alternatively, the terminal hangs or the session crashes.
This program requires the third-party software
SCWRL to be installed, and
some modification of the code. We recommend using v 3.0, but v 4.0 has also
been successfully tested. Note that
test-motifs launches a
large number of child processes, which may have some unintended consequences.
See the README file for more details.
Problem: Simulations, particularly with large population sizes, take a very long time (or a program seems frozen).
The CASS algorithm is a highly complex method for simulating sequence evolution
under biophysical constraints, and as such is very computationally intensive.
Simulations may take hours, days or even weeks to complete, depending on
the properties of the simulated population. To gauge progress of an
ongoing simulation, examine the phenotype output file. It contains one
line per completed generation, and can therefore be easily monitored using
wc or a similar utility program.
When calculating the distance between Cα and Cβ beads, v.1.0 was calculating them in just one direction. v 1.1 measures the distances in both directions.