UniProt

UniProt
Content
DescriptionUniProt is the Universal Protein resource, a central repository of protein data created by combining the Swiss-Prot, TrEMBL and PIR-PSD databases.
Data types
captured
Protein annotation
OrganismsAll
Contact
Research centerEMBL-EBI, UK; SIB, Switzerland; PIR, US.
Primary citationUniProt Consortium[1]
Access
Data formatCustom flat file, FASTA, GFF, RDF, XML.
Websitewww.uniprot.org
www.uniprot.org/news/
Download URLwww.uniprot.org/downloads & for downloading complete data sets ftp.uniprot.org
Web service URLYes – JAVA API see info here & REST see info here
Tools
WebAdvanced search, BLAST, ClustalO, bulk retrieval/download, ID mapping
Miscellaneous
LicenseCreative Commons Attribution-NoDerivs
VersioningYes
Data release
frequency
8 weeks
Curation policyYes – manual and automatic. Rules for automatic annotation generated by database curators and computational algorithms.
Bookmarkable
entities
Yes – both individual protein entries and searches

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

The UniProt consortium

The UniProt consortium comprises the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). EBI, located at the Wellcome Trust Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB, located in Geneva, Switzerland, maintains the ExPASy (Expert Protein Analysis System) servers that are a central resource for proteomics tools and databases. PIR, hosted by the National Biomedical Research Foundation (NBRF) at the Georgetown University Medical Center in Washington, DC, US, is heir to the oldest protein sequence database, Margaret Dayhoff's Atlas of Protein Sequence and Structure, first published in 1965.[2] In 2002, EBI, SIB, and PIR joined forces as the UniProt consortium.[3]

The roots of the UniProt databases

Each consortium member is heavily involved in protein database maintenance and annotation. Until recently, EBI and SIB together produced the Swiss-Prot and TrEMBL databases, while PIR produced the Protein Sequence Database (PIR-PSD).[4][5][6] These databases coexisted with differing protein sequence coverage and annotation priorities.

Swiss-Prot was created in 1986 by Amos Bairoch during his PhD and developed by the Swiss Institute of Bioinformatics and subsequently developed by Rolf Apweiler at the European Bioinformatics Institute.[7][8][9] Swiss-Prot aimed to provide reliable protein sequences associated with a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Recognizing that sequence data were being generated at a pace exceeding Swiss-Prot's ability to keep up, TrEMBL (Translated EMBL Nucleotide Sequence Data Library) was created to provide automated annotations for those proteins not in Swiss-Prot. Meanwhile, PIR maintained the PIR-PSD and related databases, including iProClass, a database of protein sequences and curated families.

The consortium members pooled their overlapping resources and expertise, and launched UniProt in December 2003.[10]

Organization of the UniProt databases

UniProt provides four core databases: UniProtKB (with sub-parts Swiss-Prot and TrEMBL), UniParc, UniRef and Proteome.

UniProtKB

UniProt Knowledgebase (UniProtKB) is a protein database partially curated by experts, consisting of two sections: UniProtKB/Swiss-Prot (containing reviewed, manually annotated entries) and UniProtKB/TrEMBL (containing unreviewed, automatically annotated entries).[11] As of 22 February 2023, release "2023_01" of UniProtKB/Swiss-Prot contains 569,213 sequence entries (comprising 205,728,242 amino acids abstracted from 291,046 references) and release "2023_01" of UniProtKB/TrEMBL contains 245,871,724 sequence entries (comprising 85,739,380,194 amino acids).[12]

UniProtKB/Swiss-Prot

UniProtKB/Swiss-Prot is a manually annotated, non-redundant protein sequence database. It combines information extracted from scientific literature and biocurator-evaluated computational analysis. The aim of UniProtKB/Swiss-Prot is to provide all known relevant information about a particular protein. Annotation is regularly reviewed to keep up with current scientific findings. The manual annotation of an entry involves detailed analysis of the protein sequence and of the scientific literature.[13]

Sequences from the same gene and the same species are merged into the same database entry. Differences between sequences are identified, and their cause documented (for example alternative splicing, natural variation, incorrect initiation sites, incorrect exon boundaries, frameshifts, unidentified conflicts). A range of sequence analysis tools is used in the annotation of UniProtKB/Swiss-Prot entries. Computer-predictions are manually evaluated, and relevant results selected for inclusion in the entry. These predictions include post-translational modifications, transmembrane domains and topology, signal peptides, domain identification, and protein family classification.[13][14]

Relevant publications are identified by searching databases such as PubMed. The full text of each paper is read, and information is extracted and added to the entry. Annotation arising from the scientific literature includes, but is not limited to:[10][13][14]

Annotated entries undergo quality assurance before inclusion into UniProtKB/Swiss-Prot. When new data becomes available, entries are updated.

UniProtKB/TrEMBL

UniProtKB/TrEMBL contains high-quality computationally analyzed records, which are enriched with automatic annotation. It was introduced in response to increased dataflow resulting from genome projects, as the time- and labour-consuming manual annotation process of UniProtKB/Swiss-Prot could not be broadened to include all available protein sequences.[10] The translations of annotated coding sequences in the EMBL-Bank/GenBank/DDBJ nucleotide sequence database are automatically processed and entered in UniProtKB/TrEMBL. UniProtKB/TrEMBL also contains sequences from PDB, and from gene prediction, including Ensembl, RefSeq and CCDS.[15] Since 22 July 2021 it also includes structures predicted with AlphaFold2.[16]

UniParc

UniProt Archive (UniParc) is a comprehensive and non-redundant database, which contains all the protein sequences from the main, publicly available protein sequence databases.[17] Proteins may exist in several different source databases, and in multiple copies in the same database. In order to avoid redundancy, UniParc stores each unique sequence only once. Identical sequences are merged, regardless of whether they are from the same or different species. Each sequence is given a stable and unique identifier (UPI), making it possible to identify the same protein from different source databases. UniParc contains only protein sequences, with no annotation. Database cross-references in UniParc entries allow further information about the protein to be retrieved from the source databases. When sequences in the source databases change, these changes are tracked by UniParc and history of all changes is archived.

Source databases

Currently UniParc contains protein sequences from the following publicly available databases:

UniRef

The UniProt Reference Clusters (UniRef) consist of three databases of clustered sets of protein sequences from UniProtKB and selected UniParc records.[20] The UniRef100 database combines identical sequences and sequence fragments (from any organism) into a single UniRef entry. The sequence of a representative protein, the accession numbers of all the merged entries and links to the corresponding UniProtKB and UniParc records are displayed. UniRef100 sequences are clustered using the CD-HIT algorithm to build UniRef90 and UniRef50.[20][21] Each cluster is composed of sequences that have at least 90% or 50% sequence identity, respectively, to the longest sequence. Clustering sequences significantly reduces database size, enabling faster sequence searches.

UniRef is available from the UniProt FTP site.

Funding

UniProt is funded by grants from the National Human Genome Research Institute, the National Institutes of Health (NIH), the European Commission, the Swiss Federal Government through the Federal Office of Education and Science, NCI-caBIG, and the US Department of Defense.[11]

References

  1. ^ UniProt, Consortium. (January 2015). "UniProt: a hub for protein information". Nucleic Acids Research. 43 (Database issue): D204–12. doi:10.1093/nar/gku989. PMC 4384041. PMID 25348405.
  2. ^ Dayhoff, Margaret O. (1965). Atlas of protein sequence and structure. Silver Spring, Md: National Biomedical Research Foundation.
  3. ^ "2002 Release: NHGRI Funds Global Protein Database". National Human Genome Research Institute (NHGRI). Archived from the original on 24 September 2015. Retrieved 14 April 2018.
  4. ^ O'Donovan, C.; Martin, M. J.; Gattiker, A.; Gasteiger, E.; Bairoch, A.; Apweiler, R. (2002). "High-quality protein knowledge resource: SWISS-PROT and TrEMBL". Briefings in Bioinformatics. 3 (3): 275–284. doi:10.1093/bib/3.3.275. PMID 12230036. Archived from the original on 2024-01-24. Retrieved 2024-01-24.
  5. ^ Wu, C. H.; Yeh, L. S.; Huang, H.; Arminski, L.; Castro-Alvear, J.; Chen, Y.; Hu, Z.; Kourtesis, P.; Ledley, R. S.; Suzek, B. E.; Vinayaka, C. R.; Zhang, J.; Barker, W. C. (2003). "The Protein Information Resource". Nucleic Acids Research. 31 (1): 345–347. doi:10.1093/nar/gkg040. PMC 165487. PMID 12520019.
  6. ^ Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M. C.; Estreicher, A.; Gasteiger, E.; Martin, M. J.; Michoud, K.; O'Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M. (2003). "The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003". Nucleic Acids Research. 31 (1): 365–370. doi:10.1093/nar/gkg095. PMC 165542. PMID 12520024.
  7. ^ Bairoch, A.; Apweiler, R. (1996). "The SWISS-PROT protein sequence data bank and its new supplement TREMBL". Nucleic Acids Research. 24 (1): 21–25. doi:10.1093/nar/24.1.21. PMC 145613. PMID 8594581.
  8. ^ Bairoch, A. (2000). "Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times!". Bioinformatics. 16 (1): 48–64. doi:10.1093/bioinformatics/16.1.48. PMID 10812477. Archived from the original on 2024-02-05. Retrieved 2024-02-05.
  9. ^ Séverine Altairac, "Naissance d’une banque de données: Interview du prof. Amos Bairoch Archived 2010-07-12 at the Wayback Machine". Protéines à la Une Archived 2011-06-21 at the Wayback Machine, August 2006. ISSN 1660-9824.
  10. ^ a b c Apweiler, R.; Bairoch, A.; Wu, C. H. (2004). "Protein sequence databases". Current Opinion in Chemical Biology. 8 (1): 76–80. doi:10.1016/j.cbpa.2003.12.004. PMID 15036160.
  11. ^ a b Uniprot, C. (2009). "The Universal Protein Resource (UniProt) in 2010". Nucleic Acids Research. 38 (Database issue): D142 – D148. doi:10.1093/nar/gkp846. PMC 2808944. PMID 19843607.
  12. ^ "UniProtKB/Swiss-Prot Release 2023_01 statistics". web.expasy.org. Archived from the original on 4 April 2023. Retrieved 31 March 2023.
  13. ^ a b c "How do we manually annotate a UniProtKB entry?". UniProt. September 21, 2011. Archived from the original on Dec 13, 2013. Retrieved 14 April 2018.
  14. ^ a b Apweiler, R.; Bairoch, A.; Wu, C. H.; Barker, W. C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; Martin, M. J.; Natale, D. A.; o’Donovan, C.; Redaschi, N.; Yeh, L. S. (2004). "UniProt: The Universal Protein knowledgebase". Nucleic Acids Research. 32 (90001): 115D–1119. doi:10.1093/nar/gkh131. PMC 308865. PMID 14681372.
  15. ^ "Where do the UniProtKB protein sequences come from?". UniProt. September 21, 2011. Archived from the original on Dec 15, 2013. Retrieved 14 April 2018.
  16. ^ Hassabis, Demis (22 July 2022). "Putting the power of AlphaFold into the world's hands". Deepmind. Archived from the original on 24 July 2021. Retrieved 24 July 2021.
  17. ^ Leinonen, R.; Diez, F. G.; Binns, D.; Fleischmann, W.; Lopez, R.; Apweiler, R. (2004). "UniProt archive". Bioinformatics. 20 (17): 3236–3237. doi:10.1093/bioinformatics/bth191. PMID 15044231. Archived (PDF) from the original on Mar 30, 2024.
  18. ^ "Protein Research Foundation". Archived from the original on 2010-08-30. Retrieved 2010-08-25.
  19. ^ ftp://ftp.isrec.isb-sib.ch/pub/databases/trome[permanent dead link]
  20. ^ a b Suzek, B. E.; Huang, H.; McGarvey, P.; Mazumder, R.; Wu, C. H. (2007). "UniRef: Comprehensive and non-redundant UniProt reference clusters". Bioinformatics. 23 (10): 1282–1288. doi:10.1093/bioinformatics/btm098. PMID 17379688.
  21. ^ Li, W.; Jaroszewski, L.; Godzik, A. (2001). "Clustering of highly homologous sequences to reduce the size of large protein databases". Bioinformatics. 17 (3): 282–283. doi:10.1093/bioinformatics/17.3.282. PMID 11294794.

Read other articles:

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Desember 2022. Berikut ini merupakan daftar spesies amfibia yang dideskripsikan pada 2017. Spesies Pristimantis ashaninka. Hyalinobatrachium yaku. Brachycephalus curupira. Nyctibatrachus webilla. Dendropsophus nekronastes. Bryophryne phuyuhampatu. Pristimantis atten...

 

هذه المقالة تحتاج للمزيد من الوصلات للمقالات الأخرى للمساعدة في ترابط مقالات الموسوعة. فضلًا ساعد في تحسين هذه المقالة بإضافة وصلات إلى المقالات المتعلقة بها الموجودة في النص الحالي. (يوليو 2019) منتخب كمبوديا لاتحاد الرغبي بلد الرياضة كمبوديا  أكبر فوز أكبر خسارة تعديل م...

 

SantiagoNama lokal: SantiaguJulukan: Ilha-berço (cradle island)SantiagoTampilkan peta Cape VerdeSantiagoTampilkan peta Samudera AtlantikGeografiLokasiSamudra AtlantikKoordinat15°04′N 23°38′W / 15.067°N 23.633°W / 15.067; -23.633Koordinat: 15°04′N 23°38′W / 15.067°N 23.633°W / 15.067; -23.633Luas991 km2Panjang54.9 kmLebar28.8 kmTitik tertinggiPico de Antónia (1.392 m)PemerintahanNegara Tanjung ...

Gideon AdlonAdlon di SXSW 2018Lahir30 Maret 1997 (umur 27)[1][2]Los Angeles, California, Amerika SerikatKebangsaan Amerika Jerman PekerjaanAktrisTahun aktif2011−sekarangOrang tuaPamela Adlon (ibu) Gideon Adlon (lahir 30 Maret 1997) adalah seorang aktris asal Amerika Serikat. Film-filmnya termasuk Blockers (2018), The Mustang (2019), The Craft: Legacy (2020) dan Sick (2022). Di televisi, ia dikenal karena perannya dalam serial Netflix, The Society (2019) dan serial...

 

John B. Sanborn The Sanborn incident or Sanborn contract was an American political scandal which occurred in 1874. William Adams Richardson, President Ulysses S. Grant's Secretary of the Treasury, hired a private citizen, John B. Sanborn, a former Union General, to collect $427,000 in unpaid taxes. Richardson agreed Sanborn could keep half of what he collected. After extorting money from companies to pay back taxes by falsely making claims of tax evasion, Sanborn kept $213,000, of which $156,...

 

Claudio Coralli Nazionalità  Italia Altezza 176 cm Peso 72 kg Calcio Ruolo Attaccante Squadra Gallianese Carriera Giovanili 2000-2002 Empoli Squadre di club1 2002-2003→  Tivoli16 (0)2003-2004→  Meda31 (7)2004-2006→  Pizzighettone63 (26)[1]2006-2007→  Lucchese17 (3)2007 Empoli3 (0)2007-2008→  Cittadella29 (19)[2]2008-2012 Empoli87 (34)2012→  Cremonese5 (0)[3]2012-2013 Empoli1 (0)2013-2016 Cittadel...

Seattle TotemsHockey su ghiaccio Segni distintiviUniformi di gara Casa Trasferta Colori socialiVerde, bianco Dati societariCittàSeattle Paese Stati Uniti LegaWestern Hockey LeagueCentral Hockey League Fondazione1958 Scioglimento1975 DenominazioneSeattle Ironmen(1944-1952)Seattle Bombers(1952-1954)Seattle Americans(1955-1958)Seattle Totems(1958-1975) Squadre affiliateVedi lista Impianto di giocoSeattle Center Coliseum(15.177 posti) PalmarèsTitoli di Division1 Titoli nazionali3 Western H...

 

Pangeran Hanzoku di India sedang diserang rubah berekor sembilan (lukisan ukiyo-e Utagawa Kuniyoshi dari abad ke-19) Kitsune (狐, キツネcode: ja is deprecated , IPA: [kitsu͍ne] ( simak)) adalah sebutan untuk binatang rubah dalam bahasa Jepang. Dalam cerita rakyat Jepang, rubah sering ditampilkan dalam berbagai cerita sebagai makhluk cerdas dengan kemampuan sihirnya yang semakin sempurna sejalan dengan semakin bijak dan semakin tua rubah tersebut. Selain itu, semua rubah dapat m...

 

General area of the Indian Ocean, Gulf of Aden and Socotra Passage where pirates operate Map all coordinates using OpenStreetMap Download coordinates as: KML GPX (all coordinates) GPX (primary coordinates) GPX (secondary coordinates) Piracy off the coast of Somalia has been a threat to international shipping since the beginning of the Somali Civil War in the early 1990s.[1] Since 2005, many international organizations have expressed concern over the rise in acts of piracy.[2]...

この記事は検証可能な参考文献や出典が全く示されていないか、不十分です。出典を追加して記事の信頼性向上にご協力ください。(このテンプレートの使い方)出典検索?: コルク – ニュース · 書籍 · スカラー · CiNii · J-STAGE · NDL · dlib.jp · ジャパンサーチ · TWL(2017年4月) コルクを打ち抜いて作った瓶の栓 コルク(木栓、�...

 

Jean DecouxDecoux tahun 1919Lahir(1884-05-05)5 Mei 1884BordeauxMeninggal21 Oktober 1963(1963-10-21) (umur 79)ParisPengabdian Republik Prancis Ketiga Prancis VichyDinas/cabangAngkatan Laut PrancisLama dinas1901–1949PangkatLaksamanaKomandanKetua Komandan Angkatan Laut Timur Jauh Jean Decoux (5 Mei 1884 – 21 Oktober 1963) merupakan seorang laksamana angkatan laut Prancis, yang menjadi Gubernur-Jenderal Indochina Prancis dari bulan Juli 1940 hingga tanggal 9 ...

 

2020年夏季奥林匹克运动会波兰代表團波兰国旗IOC編碼POLNOC波蘭奧林匹克委員會網站olimpijski.pl(英文)(波兰文)2020年夏季奥林匹克运动会(東京)2021年7月23日至8月8日(受2019冠状病毒病疫情影响推迟,但仍保留原定名称)運動員206參賽項目24个大项旗手开幕式:帕维尔·科热尼奥夫斯基(游泳)和马娅·沃什乔夫斯卡(自行车)[1]闭幕式:卡罗利娜·纳亚(皮划艇)&#...

U.S. state This article is about the U.S. state. For the archipelago, see Hawaiian Islands. For the largest island in the archipelago, see Hawaii (island). For other uses, see Hawaii (disambiguation). Not to be confused with Hawai. State in the United StatesHawaii Hawaiʻi (Hawaiian)StateState of HawaiiMokuʻāina o Hawaiʻi (Hawaiian) FlagSealNickname(s): The Aloha State (official), Paradise of the Pacific,[1] The Islands of Aloha, The 808 State[2]Motto(s):...

 

2022 Indiana State Treasurer election ← 2018 November 8, 2022 2026 →   Nominee Daniel Elliott Jessica McClellan Party Republican Democratic Popular vote 1,120,934 720,701 Percentage 60.9% 39.1% County results Elliott:      50–60%      60–70%      70–80%      80–90% McClellan:      50–60%      60–70% S...

 

African-American businesswoman, activist, and philanthropist Fannie Mae Duncan Fannie Mae Duncan (1918-2005) was an African-American entrepreneur, philanthropist, and community activist in Colorado Springs, Colorado. She is best known as the proprietor of the Cotton Club, an early integrated jazz club in Colorado Springs named for the famous club in Harlem.[1] In 2012, Duncan was inducted into the Colorado Women's Hall of Fame. She was recognized for her courageous stand fostered the ...

Canadian-American musician and radio host Oscar BrandOscar Brand as pictured on the cover of his 1962 live album MoralityBornFebruary 7, 1920Winnipeg, Manitoba, CanadaDiedSeptember 30, 2016 (aged 96)Great Neck, New York, U.S.EducationBrooklyn College (BS)EmployerWNYCWebsiteoscarbrand.com Oscar Brand (February 7, 1920 – September 30, 2016) was a Canadian-born American folk singer-songwriter, radio host, and author. In his career, spanning 70 years, he composed at least 300 songs and rele...

 

This article is about the chemical element. For other uses, see Argon (disambiguation). Not to be confused with Aragon. Chemical element with atomic number 18 (Ar)Argon, 18ArArgonPronunciation/ˈɑːrɡɒn/ ​(AR-gon)Appearancecolorless gas exhibiting a lilac/violet glow when placed in an electric fieldStandard atomic weight Ar°(Ar)[39.792, 39.963][1]39.95±0.16 (abridged)[2] Argon in the periodic table Hydrogen Helium Lithium Beryllium Boron Carb...

 

This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: Peter Ladner – news · newspapers · books · scholar · JSTOR (February 2019) (Learn how and when to remove this message) Peter Ladner in 2008 Peter ...

You can help expand this article with text translated from the corresponding article in Hungarian. (April 2010) Click [show] for important translation instructions. View a machine-translated version of the Hungarian article. Machine translation, like DeepL or Google Translate, is a useful starting point for translations, but translators must revise errors as necessary and confirm that the translation is accurate, rather than simply copy-pasting machine-translated text into the English Wi...

 

Questa voce o sezione sull'argomento stadi di calcio non cita le fonti necessarie o quelle presenti sono insufficienti. Puoi migliorare questa voce aggiungendo citazioni da fonti attendibili secondo le linee guida sull'uso delle fonti. AOK Stadion Informazioni generaliStato Germania Informazioni tecnichePosti a sedere5 200 Mat. del terrenoErba Uso e beneficiariCalcio Wolfsburg II Wolfsburg femminile Mappa di localizzazione Modifica dati su Wikidata · Manuale L'AOK S...