Threading (protein sequence)

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it (protein threading) is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

The prediction is made by "threading" (i.e. placing, aligning) each amino acid in the target sequence to a position in the template structure, and evaluating how well the target fits the template. After the best-fit template is selected, the structural model of the sequence is built based on the alignment with the chosen template. Protein threading is based on two basic observations: that the number of different folds in nature is fairly small (approximately 1300); and that 90% of the new structures submitted to the PDB in the past three years have similar structural folds to ones already in the PDB.

Classification of protein structure

The Structural Classification of Proteins database (SCOP) provides a detailed and comprehensive description of the structural and evolutionary relationships of known structure. Proteins are classified to reflect both structural and evolutionary relatedness. Many levels exist in the hierarchy, but the principal levels are family, superfamily, and fold:

  • Family (clear evolutionary relationship): Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pairwise residue identities between the proteins are 30% and greater. However, in some cases similar functions and structures provide definitive evidence of common descent in the absence of high sequence identity; for example, many globins form a family though some members have sequence identities of only 15%.
  • Superfamily (probable common evolutionary origin): Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable, are placed together in superfamilies. For example, actin, the ATPase domain of the heat shock protein, and hexokinase together form a superfamily.
  • Fold (major structural similarity): Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. Different proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. In some cases, these differing peripheral regions may comprise half the structure. Proteins placed together in the same fold category may not have a common evolutionary origin: the structural similarities could arise just from the physics and chemistry of proteins favoring certain packing arrangements and chain topologies.

Method

A general paradigm of protein threading consists of the following four steps:

  1. The construction of a structure template database: Select protein structures from the protein structure databases as structural templates. This generally involves selecting protein structures from databases such as Protein Data Bank (PDB), Families of Structurally Similar Proteins database (FSSP), Structural Classification of Proteins database (SCOP), or CATH database, after removing protein structures with high sequence similarities.
  2. The design of the scoring function: Design a good scoring function to measure the fitness between target sequences and templates based on the knowledge of the known relationships between the structures and the sequences. A good scoring function should contain mutation potential, environment fitness potential, pairwise potential, secondary structure compatibilities, and gap penalties. The quality of the energy function is closely related to the prediction accuracy, especially the alignment accuracy.
  3. Threading alignment: Align the target sequence with each of the structure templates by optimizing the designed scoring function. This step is one of the major tasks of all threading-based structure prediction programs that take into account the pairwise contact potential; otherwise, a dynamic programming algorithm can fulfill it.
  4. Threading prediction: Select the threading alignment that is statistically most probable as the threading prediction. Then construct a structure model for the target by placing the backbone atoms of the target sequence at their aligned backbone positions of the selected structural template.

Comparison with homology modeling

Homology modeling and protein threading are both template-based methods and there is no rigorous boundary between them in terms of prediction techniques. But the protein structures of their targets are different. Homology modeling is for those targets which have homologous proteins with known structure (usually/maybe of same family), while protein threading is for those targets with only fold-level homology found. In other words, homology modeling is for "easier" targets and protein threading is for "harder" targets.

Homology modeling treats the template in an alignment as a sequence, and only sequence homology is used for prediction. Protein threading treats the template in an alignment as a structure, and both sequence and structure information extracted from the alignment are used for prediction. When there is no significant homology found, protein threading can make a prediction based on the structure information. That also explains why protein threading may be more effective than homology modeling in many cases.

In practice, when the sequence identity in a sequence sequence alignment is low (i.e. <25%), homology modeling may not produce a significant prediction. In this case, if there is distant homology found for the target, protein threading can generate a good prediction.

More about threading

Fold recognition methods can be broadly divided into two types: those that derive a 1-D profile for each structure in the fold library and align the target sequence to these profiles; and those that consider the full 3-D structure of the protein template. A simple example of a profile representation would be to take each amino acid in the structure and simply label it according to whether it is buried in the core of the protein or exposed on the surface. More elaborate profiles might take into account the local secondary structure (e.g. whether the amino acid is part of an alpha helix) or even evolutionary information (how conserved the amino acid is). In the 3-D representation, the structure is modeled as a set of inter-atomic distances, i.e. the distances are calculated between some or all of the atom pairs in the structure. This is a much richer and far more flexible description of the structure, but is much harder to use in calculating an alignment. The profile-based fold recognition approach was first described by Bowie, Lüthy and David Eisenberg in 1991.[1] The term threading was first coined by David Jones, William R. Taylor and Janet Thornton in 1992,[2] and originally referred specifically to the use of a full 3-D structure atomic representation of the protein template in fold recognition. Today, the terms threading and fold recognition are frequently (though somewhat incorrectly) used interchangeably.

Fold recognition methods are widely used and effective because it is believed that there are a strictly limited number of different protein folds in nature, mostly as a result of evolution but also due to constraints imposed by the basic physics and chemistry of polypeptide chains. There is, therefore, a good chance (currently 70-80%) that a protein which has a similar fold to the target protein has already been studied by X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy and can be found in the PDB. Currently there are nearly 1300 different protein folds known, but new folds are still being discovered every year due in significant part to the ongoing structural genomics projects.

Many different algorithms have been proposed for finding the correct threading of a sequence onto a structure, though many make use of dynamic programming in some form. For full 3-D threading, the problem of identifying the best alignment is very difficult (it is an NP-hard problem for some models of threading).[citation needed] Researchers have made use of many combinatorial optimization methods such as conditional random fields, simulated annealing, branch and bound, and linear programming, searching to arrive at heuristic solutions. It is interesting to compare threading methods to methods which attempt to align two protein structures (protein structural alignment), and indeed many of the same algorithms have been applied to both problems.

Protein threading software

  • HHpred is a popular threading server which runs HHsearch, a widely used software for remote homology detection based on pairwise comparison of hidden Markov models.
  • RAPTOR is an integer programming based protein threading software. It has been replaced by a new protein threading program RaptorX, which employs probabilistic graphical models and statistical inference to both single template and multi-template based protein threading.[3][4][5][6] RaptorX significantly outperforms RAPTOR and is especially good at aligning proteins with sparse sequence profile. The RaptorX server is free to public.
  • Phyre is a popular threading server combining HHsearch with ab initio and multiple-template modelling.
  • MUSTER is a standard threading algorithm based on dynamic programming and sequence profile-profile alignment. It also combines multiple structural resources to assist the sequence profile alignment.[7]
  • SPARKS X is a probabilistic-based sequence-to-structure matching between predicted one-dimensional structural properties of query and corresponding native properties of templates.[8]
  • BioShell is a threading algorithm using optimized profile-to-profile dynamic programming algorithm combined with predicted secondary structure.[9]

See also

References

  1. ^ Bowie JU, Lüthy R, Eisenberg D (1991). "A method to identify protein sequences that fold into a known three-dimensional structure". Science. 253 (5016): 164–170. Bibcode:1991Sci...253..164B. doi:10.1126/science.1853201. PMID 1853201.
  2. ^ Jones DT, Taylor WR, Thornton JM (1992). "A new approach to protein fold recognition". Nature. 358 (6381): 86–89. Bibcode:1992Natur.358...86J. doi:10.1038/358086a0. PMID 1614539. S2CID 4266346.
  3. ^ Peng, Jian; Jinbo Xu (2011). "RaptorX: exploiting structure information for protein alignment by statistical inference". Proteins. 79 Suppl 10 (Suppl 10): 161–171. doi:10.1002/prot.23175. PMC 3226909. PMID 21987485.
  4. ^ Peng, Jian; Jinbo Xu (2010). "Low-homology protein threading". Bioinformatics. 26 (12): i294–i300. doi:10.1093/bioinformatics/btq192. PMC 2881377. PMID 20529920.
  5. ^ Peng, Jian; Jinbo Xu (April 2011). "A multiple-template approach to protein threading". Proteins. 79 (6): 1930–1939. doi:10.1002/prot.23016. PMC 3092796. PMID 21465564.
  6. ^ Ma, Jianzhu; Sheng Wang; Jinbo Xu (June 2012). "A conditional neural fields model for protein threading". Bioinformatics. 28 (12): i59–66. doi:10.1093/bioinformatics/bts213. PMC 3371845. PMID 22689779.
  7. ^ Wu S, Zhang Y (2008). "MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information". Proteins. 72 (2): 547–56. doi:10.1002/prot.21945. PMC 2666101. PMID 18247410.
  8. ^ Yang Y, Faraggi E, Zhao H, Zhou Y (2011). "Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates". Bioinformatics. 27 (15): 2076–2082. doi:10.1093/bioinformatics/btr350. PMC 3137224. PMID 21666270.
  9. ^ Gront D, Blaszczyk M, Wojciechowski P, Kolinski A (2012). "BioShell Threader: protein homology detection based on sequence profiles and secondary structure profiles". Nucleic Acids Research. 40 (W1): W257–W262. doi:10.1093/nar/gks555. PMC 3394251. PMID 22693216.

Further reading

Read other articles:

Pangkat terendah dalam golongan Bintara TNI (Sersan Dua) Pangkat Polri Perwira Jenderal Polisi Komisaris Jenderal Polisi Inspektur Jenderal Polisi Brigadir Jenderal Polisi Komisaris Besar Polisi Ajun Komisaris Besar Polisi Komisaris Polisi Ajun Komisaris Polisi Inspektur Polisi Satu Inspektur Polisi Dua Bintara dan Tamtama Ajun Inspektur Polisi Satu Ajun Inspektur Polisi Dua Brigadir Polisi Kepala Brigadir Polisi Brigadir Polisi Satu Brigadir Polisi Dua Ajun Brigadir Polisi Ajun Brigadir Poli...

 

 

Non-profit educational organization Academy of AchievementFormation1961TypeNon-profit organizationHeadquartersWashington, D.C., U.S.Chairman & CEOWayne R. ReynoldsVice ChairmanCatherine B. ReynoldsWebsitewww.achievement.org The American Academy of Achievement, colloquially known as the Academy of Achievement, is a nonprofit educational organization that recognizes some of the highest-achieving people in diverse fields[1] and gives them the opportunity to meet one another.[2 ...

 

 

جامعة مطروح معلومات التأسيس 2018  الموقع الجغرافي إحداثيات 31°20′43″N 27°18′45″E / 31.345166595725°N 27.312575141648°E / 31.345166595725; 27.312575141648[1]  المكان محافظة مطروح  البلد مصر[2]  إحصاءات عضوية اتحاد الجامعات الإفريقية (2022)[3]الوكالة الجامعية الفرانكوفونية[4]&...

العلاقات البرتغالية الليبيرية البرتغال ليبيريا   البرتغال   ليبيريا تعديل مصدري - تعديل   العلاقات البرتغالية الليبيرية هي العلاقات الثنائية التي تجمع بين البرتغال وليبيريا.[1][2][3][4][5] مقارنة بين البلدين هذه مقارنة عامة ومرجعية للدولتين: �...

 

 

Ante Pavelić Poglavnik Negara Merdeka KroasiaMasa jabatan10 April 1941 – 8 Mei 1945Penguasa monarkiTomislav II (1941–1943)Perdana MenteriDirinya sendiri (1941–1943)Nikola Mandić (1943–1945) PendahuluJabatan didirikanPenggantiJabatan dihapuskanPerdana Menteri Negara Merdeka KroasiaMasa jabatan16 April 1941 – 2 September 1943Penguasa monarkiTomislav II PendahuluJabatan didirikanPenggantiNikola MandićPanglima Angkatan Bersenjata Negara Merdeka KroasiaMasa jabatan4 ...

 

 

Second level of British rugby league For the current incarnation of the Second Division, see Championship (rugby league). RFL Championship Second DivisionSportRugby leagueInstituted1902-1996 (as Second Division)1996-1998 (as Division One)1999-2002 (as Northern Premiership)Ceased2002Replaced byChampionshipCountry EnglandMost titles Leigh Centurions Salford Red Devils Oldham (3 titles)Related competitionChallenge CupPromotion toFirst DivisionRelegation toThird Division The Rugby Football L...

Boksus Buxus Common box, Buxus sempervirensTumbuhanJenis buahkapsul TaksonomiDivisiTracheophytaSubdivisiSpermatophytesKladAngiospermaeKladmesangiospermsKladeudicotsOrdoBuxalesFamiliBuxaceaeGenusBuxus Linnaeus, 1753 Tipe taksonomiBuxus sempervirens SpeciesAbout 70 species; see textlbs Buxus adalah genus dari sekitar tujuh puluh spesies dalam keluarga Buxaceae . Nama umumnya adalah boksus. Di Indonesia ia terkenal digunakan sebagai pohon bonsai [1] [2] [3] Pohon bok...

 

 

Details of the rules for the abstract strategy board game for two players For a general overview of the rules of go, see Go (game). Part of a series onGo Game specifics Rules Handicaps professional Proverbs List of terms Strategy and tactics Opening (theory; strategy) Fuseki (whole-board openings) Joseki (corner-based openings) Life and death Tsumego (Go puzzles) History and culture History Equipment Variants Four go houses List of games Players and organizations Players European Female Ranks...

 

 

1935 Australian filmGrandad RuddDirected byKen G. HallWritten byVic Roberts George D. ParkerBased onplay by Steele Ruddstories Grandpa's Selection and Our New Selection by Steele RuddProduced byKen G. HallStarringBert Bailey Fred MacDonaldCinematographyFrank HurleyGeorge HeathEdited byWilliam ShepherdProductioncompanyCinesound ProductionsRelease dateFebruary 1935Running time90 minutesCountryAustraliaLanguageEnglishBudget£8,000[1][2] or £15,000[3][4]Box offic...

Шалфей обыкновенный Научная классификация Домен:ЭукариотыЦарство:РастенияКлада:Цветковые растенияКлада:ЭвдикотыКлада:СуперастеридыКлада:АстеридыКлада:ЛамиидыПорядок:ЯсноткоцветныеСемейство:ЯснотковыеРод:ШалфейВид:Шалфей обыкновенный Международное научное наз...

 

 

Autobiography of Pervez Musharraf In the Line of Fire: A Memoir First edition coverAuthorPervez MusharrafCountryPakistanLanguageEnglishSubjectAutobiography, MemoirPublisherFree PressPublication date2006Published in EnglishSeptember 25, 2006Media typeHardcoverPages368ISBN074-3283449OCLC70778393Dewey Decimal954.9105/3 22LC ClassDS389.22.M87 A3 2006 In the Line of Fire: A Memoir[1][2] is a book that was written by former President of Pakistan Pervez Musharraf and f...

 

 

The top basketball league in Switzerland Basketball leagueSB LeagueFounded1931; 93 years ago (1931)First season1931–32CountrySwitzerlandConfederationFIBA EuropeNumber of teams9Level on pyramid1Relegation toLNBDomestic cup(s)Swiss Cup SBL CupInternational cup(s)Champions LeagueFIBA Europe CupCurrent championsFribourg Olympic (20th title) (2022–23)Most championshipsFribourg Olympic (20 titles)WebsiteLink 2023–24 Swiss Basketball League The Swiss Basketball League, also k...

博里萨夫·约维奇攝於2009年 南斯拉夫社會主義聯邦共和國第12任總統任期1990年5月15日—1991年5月15日总理安特·马尔科维奇前任亚内兹·德尔诺夫舍克继任塞吉多·巴伊拉莫维奇(英语:Sejdo Bajramović) (代任)第12任不结盟运动秘书长任期1990年5月15日—1991年5月15日前任亚内兹·德尔诺夫舍克继任斯捷潘·梅西奇第3任塞尔维亚常驻南斯拉夫社会主义联邦共和国主席团代表任�...

 

 

Val SerianaPanorama sulla media Valle Seriana vista dal Monte FarnoStati Italia Regioni Lombardia Province Bergamo Località principaliVedi apposita sezione Comunità montanaComunità montana della Valle Seriana Altitudineda 280 a 3 052 m s.l.m. CartografiaMappa della Valle Sito web Modifica dati su Wikidata · ManualeCoordinate: 45°55′00.01″N 9°55′00.01″E / 45.91667°N 9.91667°E45.91667; 9.91667 La Val Seriana (Àl Seriàna o àl Heri�...

 

 

American information technology company For other companies, see FIS (disambiguation). FISCompany typePublicTraded asNYSE: FISS&P 500 ComponentIndustryFinancial SectorFounded1968; 56 years ago (1968)Headquarters347 Riverside Avenue Jacksonville, Florida 32202Area servedWorldwideKey peopleStephanie Ferris (CEO & President)ProductsBanking Technology, Payment Technology, Processing Services, Information Based servicesRevenue US$13.88 billion (2021)[1]Operat...

Political party in Hong Kong 123 Democratic Alliance 一二三民主聯盟FounderYum Sin-lingFounded20 March 1994 (1994-03-20)Dissolved3 December 2000 (2000-12-03)IdeologyLiberalism (HK)Conservatism (Taiwan)Anti-communism (HK)Three Principles of the PeoplePolitical positionCentre-right to right-wingRegional affiliationPro-Taiwan campPro-democracy campPolitics of Hong KongPolitical partiesElections This article is part of a series onLiberalism in China P...

 

 

Zainal Mus adalah seorang politikus Indonesia. Dari 2009 sampai 2014, ia menjabat sebagai Ketua Dewan Perwakilan Rakyat Daerah (DPRD) Kabupaten Kepulauan Sula. Dari 2017 sampai 2022, ia menjabat sebagai Bupati Banggai Kepulauan. Pada 11 Februari 2023, ia mengundurkan diri selaku anggota Partai Golkar Maluku Utara dan kemudian bergabung dengan Partai Gerindra. Dewan Pimpinan Daerah (DPD) Partai Gerindra Maluku Utara mendorongnya untuk maju dalam pemilihan umum Bupati Banggai Kepulauan 2024. ...

 

 

الدوري التونسي لكرة اليد للرجال الموسم 1989-1990 البلد تونس  المنظم الجامعة التونسية لكرة اليد  النسخة 35 عدد الفرق 16   الفائز النادي الإفريقي الترجي الرياضي التونسي (الثاني) الدوري التونسي لكرة اليد 1988–89  الدوري التونسي لكرة اليد 1990–91  تعديل مصدري - تعديل   الدو...

1925 San Diego mayoral election ← 1923 March 24, 1925 (1925-03-24) 1927 →   Nominee John L. Bacon Fred A. Heilbron Party Republican Republican Popular vote 11,653 6,056 Percentage 50.3% 26.1% Mayor before election John L. Bacon Republican Elected Mayor John L. Bacon Republican Elections in California Federal government U.S. President 1852 1856 1860 1864 1868 1872 1876 1880 1884 1888 1892 1896 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 194...

 

 

Newspaper in Plymouth, Massachusetts Old Colony Memorial, 1824 The Old Colony Memorial (est.1822) is a semiweekly newspaper published in Plymouth, Massachusetts.[1] Gannett owns the paper;[2] previous owners include the George W. Prescott Publishing Co.[3] and the Memorial Press Group. History 19th century The Old Colony Memorial began in 1822.[4] Publishers have included George F. Andrews, Winslow W. Avery, Allen Danforth, James A. Danforth, Charles Carroll Do...