CASS (Coarse-grained Artificial Sequence Simulator) 1.1

CASS is a software package for simulating and sampling protein sequences with an explicit genotype-to-phenotype map. The molecular phenotypes under consideration are folding into a specific conformation, and function in the sense of binding one or more ligands. Using these biophysically motivated restrictions, CASS produces sequences with protein-like properties such as core hydrophobicity and a biologically relevant distribution of evolutionary rates.

CASS 1.0 was written by Johan Grahnen in object-oriented C++ and is encapsulated in a collection of classes for ease of integration with existing software. It is open-source and freely available under the GPL v3 license.

CASS 1.1 and up is maintained by David Liberles with contributions from Nadia Bykova.

CASS 1.1s, with site-specific weights is now available below in updates, with contributions from Peter Chi, Jason Lai, Dohyup Kim, and Nadia Bykova.



We supply CASS 1.1 as a compressed archive containing the C++ source code, example data files, some helpful scripts and brief documentation. Why not download CASS and try it out right now? Refer to the paper describing the algorithm and this web page for further details.


To compile from source, verify that your system supports GNU make and has GCC version 4.3 or better installed. Then unpack the archive and type

make all

to compile the source code. To remove the object files after finishing, type

make clean

Note that support for the C++0X/C++11 standard is required. If you are having trouble with compilation, see the section on "Troubleshooting". To date, CASS has been successfully compiled and tested on various 64-bit Linux systems with GCC 4.3-4.6 (see README for complete list).

For problems with compiling the pseudo-random number generator files (randomc.h, stocc.h, mersenne.cpp and stoc1.cpp), refer to the documentation on Agner Fog's web page.

Finally, please consult the included README file for further instructions.

Testing Your Installation

For a test of the compiled software, run the following command as a single line:

./sh2-evolution-decoys+neo 1d4tA-novel.pheno 10 100 0.0032 -60 -19 data/1d4tA-renum.bead data/1d4tA.dna data/1d4tB.bead data/decoy-ligand-RLPTIYICITG.bead data/novel-ligand-GEPTIYTGVIH.bead > 1d4tA-novel.seq

This may take some time to complete. When finished, the output file '1d4tA-novel.pheno' should contain 6 tab-separated columns on 12 lines, ending with '#Simulation finished.'.


A number of basic and advanced uses of the software package are described in the README file. Here we list a few typical tasks and the supplied example applications that can be used to perform them.




  1. Grahnen et al (2011) "Biophysical and Structural Considerations for Protein Sequence Evolution", BMC Evolutionary Biology 11:361 describes the evolutionary model underlying the simulation software.
  2. Grahnen and Liberles (2012) "CASS: Protein sequence simulation with explicit genotype-phenotype mapping", Trends in Evolutionary Biology 4:1 is an application note describing the CASS algorithm and software package in brief.
  3. Grahnen et al (2012) "Characterizing the structural and functional underpinnings of covarion models" (submitted) describes the interplay between evolutionary rate changes due to functional shifts and those due to compensatory neutral substitutions for folding stability, using data simulated with CASS. Please contact the author for a pre-print.
Please cite the second reference in any published work employing some or all of our code.