Lattice proteins are highly simplified models of protein-like heteropolymer chains on lattice conformational space which are used to investigate protein folding.[1] Simplification in lattice proteins is twofold: each whole residue (amino acid) is modeled as a single "bead" or "point" of a finite set of types (usually only two), and each residue is restricted to be placed on vertices of a (usually cubic) lattice.[1] To guarantee the connectivity of the protein chain, adjacent residues on the backbone must be placed on adjacent vertices of the lattice.[2] Steric constraints are expressed by imposing that no more than one residue can be placed on the same lattice vertex.[2]
Because proteins are such large molecules, there are severe computational limits on the simulated timescales of their behaviour when modeled in all-atom detail. The millisecond regime for all-atom simulations was not reached until 2010,[3] and it is still not possible to fold all real proteins on a computer. Simplification significantly reduces the computational effort in handling the model, although even in this simplified scenario the protein folding problem is NP-complete.[4]
Overview
Different versions of lattice proteins may adopt different types of lattice (typically square and triangular ones), in two or three dimensions, but it has been shown that generic lattices can be used and handled via a uniform approach.[2]
Lattice proteins are made to resemble real proteins by introducing an energy function, a set of conditions which specify the interaction energy between beads occupying adjacent lattice sites.[5] The energy function mimics the interactions between amino acids in real proteins, which include steric, hydrophobic and hydrogen bonding effects.[2] The beads are divided into types, and the energy function specifies the interactions depending on the bead type, just as different types of amino acids interact differently.[5] One of the most popular lattice models, the hydrophobic-polar model (HP model),[6] features just two bead types—hydrophobic (H) and polar (P)—and mimics the hydrophobic effect by specifying a favorable interaction between H beads.[5]
For any sequence in any particular structure, an energy can be rapidly calculated from the energy function. For the simple HP model, this is an enumeration of all the contacts between H residues that are adjacent in the structure but not in the chain.[7] Most researchers consider a lattice protein sequence protein-like only if it possesses a single structure with an energetic state lower than in any other structure, although there are exceptions that consider ensembles of possible folded states.[8] This is the energetic ground state, or native state. The relative positions of the beads in the native state constitute the lattice protein's tertiary structure[citation needed]. Lattice proteins do not have genuine secondary structure; however, some researchers have claimed that they can be extrapolated onto real protein structures which do include secondary structure, by appealing to the same law by which the phase diagrams of different substances can be scaled onto one another (the theorem of corresponding states).[9]
By varying the energy function and the bead sequence of the chain (the primary structure), effects on the native state structure and the kinetics of folding can be explored, and this may provide insights into the folding of real proteins.[10] Some of the examples include study of folding processes in lattice proteins that have been discussed to resemble the two-phase folding kinetics in proteins. Lattice protein was shown to have quickly collapsed into compact state and followed by slow subsequent structure rearrangement into native state.[11] Attempts to resolve Levinthal paradox in protein folding are another efforts made in the field. As an example, study conducted by Fiebig and Dill examined searching method involving constraints in forming residue contacts in lattice protein to provide insights to the question of how a protein finds its native structure without global exhaustive searching.[12] Lattice protein models have also been used to investigate the energy landscapes of proteins, i.e. the variation of their internal free energy as a function of conformation.[citation needed]
Lattices
A lattice is a set of orderly points that are connected by "edges".[2] These points are called vertices and are connected to a certain number other vertices in the lattice by edges. The number of vertices each individual vertex is connected to is called the coordination number of the lattice, and it can be scaled up or down by changing the shape or dimension (2-dimensional to 3-dimensional, for example) of the lattice.[2] This number is important in shaping the characteristics of the lattice protein because it controls the number of other residues allowed to be adjacent to a given residue.[2] It has been shown that for most proteins the coordination number of the lattice used should fall between 3 and 20, although most commonly used lattices have coordination numbers at the lower end of this range.[2]
Lattice shape is an important factor in the accuracy of lattice protein models. Changing lattice shape can dramatically alter the shape of the energetically favorable conformations.[2] It can also add unrealistic constraints to the protein structure such as in the case of the parity problem where in square and cubic lattices residues of the same parity (odd or even numbered) cannot make hydrophobic contact.[5] It has also been reported that triangular lattices yield more accurate structures than other lattice shapes when compared to crystallographic data.[2] To combat the parity problem, several researchers have suggested using triangular lattices when possible, as well as a square matrix with diagonals for theoretical applications where the square matrix may be more appropriate.[5] Hexagonal lattices were introduced to alleviate sharp turns of adjacent residues in triangular lattices.[13] Hexagonal lattices with diagonals have also been suggested as a way to combat the parity problem.[2]
Hydrophobic-polar model
The hydrophobic-polarprotein model is the original lattice protein model. It was first proposed by Dill et al. in 1985 as a way to overcome the significant cost and difficulty of predicting protein structure, using only the hydrophobicity of the amino acids in the protein to predict the protein structure.[5] It is considered to be the paradigmatic lattice protein model.[2] The method was able to quickly give an estimate of protein structure by representing proteins as "short chains on a 2D square lattice" and has since become known as the hydrophobic-polar model. It breaks the protein folding problem into three separate problems: modeling the protein conformation, defining the energetic properties of the amino acids as they interact with one another to find said conformation, and developing an efficient algorithm for the prediction of these conformations. It is done by classifying amino acids in the protein as either hydrophobic or polar and assuming that the protein is being folded in an aqueous environment. The lattice statistical model seeks to recreate protein folding by minimizing the free energy of the contacts between hydrophobic amino acids. Hydrophobic amino acid residues are predicted to group around each other, while hydrophilic residues interact with the surrounding water.[5]
Different lattice types and algorithms were used to study protein folding with HP model. Efforts were made to obtain higher approximation ratios using approximation algorithms in 2 dimensional and 3 dimensional, square and triangular lattices. Alternative to approximation algorithms, some genetic algorithms were also exploited with square, triangular, and face-centered-cubic lattices.[14]
Problems and alternative models
The simplicity of the hydrophobic-polar model has caused it to have several problems that people have attempted to correct with alternative lattice protein models.[5] Chief among these problems is the issue of degeneracy, which is when there is more than one minimum energy conformation for the modeled protein, leading to uncertainty about which conformation is the native one. Attempts to address this include the HPNX model which classifies amino acids as hydrophobic (H), positive (P), negative (N), or neutral (X) according to the charge of the amino acid,[15] adding additional parameters to reduce the number of low energy conformations and allowing for more realistic protein simulations.[5] Another model is the Crippen model which uses protein characteristics taken from crystal structures to inform the choice of native conformation.[16]
Another issue with lattice models is that they generally don't take into account the space taken up by amino acid side chains, instead considering only the α-carbon.[2] The side chain model addresses this by adding a side chain to the vertex adjacent to the α-carbon.[17]
References
^ abLau KF, Dill KA (1989). "A lattice statistical mechanics model of the conformational and sequence spaces of proteins". Macromolecules. 22 (10): 3986–97. Bibcode:1989MaMol..22.3986L. doi:10.1021/ma00200a030.
^Berger B, Leighton T (1998). "Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete". Journal of Computational Biology. 5 (1): 27–40. doi:10.1089/cmb.1998.5.27. PMID9541869.
^ abcdefghiDubey SP, Kini NG, Balaji S, Kumar MS (2018). "A Review of Protein Structure Prediction Using Lattice Model". Critical Reviews in Biomedical Engineering. 46 (2): 147–162. doi:10.1615/critrevbiomedeng.2018026093. PMID30055531.
^Dill KA (March 1985). "Theory for the folding and stability of globular proteins". Biochemistry. 24 (6): 1501–9. doi:10.1021/bi00327a032. PMID3986190.
^Su SC, Lin CJ, Ting CK (December 2010). "An efficient hybrid of hill-climbing and genetic algorithm for 2D triangular protein structure prediction". 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW). IEEE. pp. 51–56. doi:10.1109/BIBMW.2010.5703772. ISBN978-1-4244-8303-7. S2CID44932436.
^Jiang M, Zhu B (February 2005). "Protein folding on the hexagonal lattice in the HP model". Journal of Bioinformatics and Computational Biology. 3 (1): 19–34. doi:10.1142/S0219720005000850. PMID15751110.