UniProt

UniProt
Content
DescriptionUniProt is the Universal Protein resource, a central repository of protein data created by combining the Swiss-Prot, TrEMBL and PIR-PSD databases.
Data types
captured
Protein annotation
OrganismsAll
Contact
Research centerEMBL-EBI, UK; SIB, Switzerland; PIR, US.
Primary citationUniProt Consortium[1]
Access
Data formatCustom flat file, FASTA, GFF, RDF, XML.
Websitewww.uniprot.org
www.uniprot.org/news/
Download URLwww.uniprot.org/downloads & for downloading complete data sets ftp.uniprot.org
Web service URLYes – JAVA API see info here & REST see info here
Tools
WebAdvanced search, BLAST, ClustalO, bulk retrieval/download, ID mapping
Miscellaneous
LicenseCreative Commons Attribution-NoDerivs
VersioningYes
Data release
frequency
8 weeks
Curation policyYes – manual and automatic. Rules for automatic annotation generated by database curators and computational algorithms.
Bookmarkable
entities
Yes – both individual protein entries and searches

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

The UniProt consortium

The UniProt consortium comprises the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). EBI, located at the Wellcome Trust Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB, located in Geneva, Switzerland, maintains the ExPASy (Expert Protein Analysis System) servers that are a central resource for proteomics tools and databases. PIR, hosted by the National Biomedical Research Foundation (NBRF) at the Georgetown University Medical Center in Washington, DC, US, is heir to the oldest protein sequence database, Margaret Dayhoff's Atlas of Protein Sequence and Structure, first published in 1965.[2] In 2002, EBI, SIB, and PIR joined forces as the UniProt consortium.[3]

The roots of the UniProt databases

Each consortium member is heavily involved in protein database maintenance and annotation. Until recently, EBI and SIB together produced the Swiss-Prot and TrEMBL databases, while PIR produced the Protein Sequence Database (PIR-PSD).[4][5][6] These databases coexisted with differing protein sequence coverage and annotation priorities.

Swiss-Prot was created in 1986 by Amos Bairoch during his PhD and developed by the Swiss Institute of Bioinformatics and subsequently developed by Rolf Apweiler at the European Bioinformatics Institute.[7][8][9] Swiss-Prot aimed to provide reliable protein sequences associated with a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Recognizing that sequence data were being generated at a pace exceeding Swiss-Prot's ability to keep up, TrEMBL (Translated EMBL Nucleotide Sequence Data Library) was created to provide automated annotations for those proteins not in Swiss-Prot. Meanwhile, PIR maintained the PIR-PSD and related databases, including iProClass, a database of protein sequences and curated families.

The consortium members pooled their overlapping resources and expertise, and launched UniProt in December 2003.[10]

Organization of the UniProt databases

UniProt provides four core databases: UniProtKB (with sub-parts Swiss-Prot and TrEMBL), UniParc, UniRef and Proteome.

UniProtKB

UniProt Knowledgebase (UniProtKB) is a protein database partially curated by experts, consisting of two sections: UniProtKB/Swiss-Prot (containing reviewed, manually annotated entries) and UniProtKB/TrEMBL (containing unreviewed, automatically annotated entries).[11] As of 22 February 2023, release "2023_01" of UniProtKB/Swiss-Prot contains 569,213 sequence entries (comprising 205,728,242 amino acids abstracted from 291,046 references) and release "2023_01" of UniProtKB/TrEMBL contains 245,871,724 sequence entries (comprising 85,739,380,194 amino acids).[12]

UniProtKB/Swiss-Prot

UniProtKB/Swiss-Prot is a manually annotated, non-redundant protein sequence database. It combines information extracted from scientific literature and biocurator-evaluated computational analysis. The aim of UniProtKB/Swiss-Prot is to provide all known relevant information about a particular protein. Annotation is regularly reviewed to keep up with current scientific findings. The manual annotation of an entry involves detailed analysis of the protein sequence and of the scientific literature.[13]

Sequences from the same gene and the same species are merged into the same database entry. Differences between sequences are identified, and their cause documented (for example alternative splicing, natural variation, incorrect initiation sites, incorrect exon boundaries, frameshifts, unidentified conflicts). A range of sequence analysis tools is used in the annotation of UniProtKB/Swiss-Prot entries. Computer-predictions are manually evaluated, and relevant results selected for inclusion in the entry. These predictions include post-translational modifications, transmembrane domains and topology, signal peptides, domain identification, and protein family classification.[13][14]

Relevant publications are identified by searching databases such as PubMed. The full text of each paper is read, and information is extracted and added to the entry. Annotation arising from the scientific literature includes, but is not limited to:[10][13][14]

Annotated entries undergo quality assurance before inclusion into UniProtKB/Swiss-Prot. When new data becomes available, entries are updated.

UniProtKB/TrEMBL

UniProtKB/TrEMBL contains high-quality computationally analyzed records, which are enriched with automatic annotation. It was introduced in response to increased dataflow resulting from genome projects, as the time- and labour-consuming manual annotation process of UniProtKB/Swiss-Prot could not be broadened to include all available protein sequences.[10] The translations of annotated coding sequences in the EMBL-Bank/GenBank/DDBJ nucleotide sequence database are automatically processed and entered in UniProtKB/TrEMBL. UniProtKB/TrEMBL also contains sequences from PDB, and from gene prediction, including Ensembl, RefSeq and CCDS.[15] Since 22 July 2021 it also includes structures predicted with AlphaFold2.[16]

UniParc

UniProt Archive (UniParc) is a comprehensive and non-redundant database, which contains all the protein sequences from the main, publicly available protein sequence databases.[17] Proteins may exist in several different source databases, and in multiple copies in the same database. In order to avoid redundancy, UniParc stores each unique sequence only once. Identical sequences are merged, regardless of whether they are from the same or different species. Each sequence is given a stable and unique identifier (UPI), making it possible to identify the same protein from different source databases. UniParc contains only protein sequences, with no annotation. Database cross-references in UniParc entries allow further information about the protein to be retrieved from the source databases. When sequences in the source databases change, these changes are tracked by UniParc and history of all changes is archived.

Source databases

Currently UniParc contains protein sequences from the following publicly available databases:

UniRef

The UniProt Reference Clusters (UniRef) consist of three databases of clustered sets of protein sequences from UniProtKB and selected UniParc records.[20] The UniRef100 database combines identical sequences and sequence fragments (from any organism) into a single UniRef entry. The sequence of a representative protein, the accession numbers of all the merged entries and links to the corresponding UniProtKB and UniParc records are displayed. UniRef100 sequences are clustered using the CD-HIT algorithm to build UniRef90 and UniRef50.[20][21] Each cluster is composed of sequences that have at least 90% or 50% sequence identity, respectively, to the longest sequence. Clustering sequences significantly reduces database size, enabling faster sequence searches.

UniRef is available from the UniProt FTP site.

Funding

UniProt is funded by grants from the National Human Genome Research Institute, the National Institutes of Health (NIH), the European Commission, the Swiss Federal Government through the Federal Office of Education and Science, NCI-caBIG, and the US Department of Defense.[11]

References

  1. ^ UniProt, Consortium. (January 2015). "UniProt: a hub for protein information". Nucleic Acids Research. 43 (Database issue): D204–12. doi:10.1093/nar/gku989. PMC 4384041. PMID 25348405.
  2. ^ Dayhoff, Margaret O. (1965). Atlas of protein sequence and structure. Silver Spring, Md: National Biomedical Research Foundation.
  3. ^ "2002 Release: NHGRI Funds Global Protein Database". National Human Genome Research Institute (NHGRI). Archived from the original on 24 September 2015. Retrieved 14 April 2018.
  4. ^ O'Donovan, C.; Martin, M. J.; Gattiker, A.; Gasteiger, E.; Bairoch, A.; Apweiler, R. (2002). "High-quality protein knowledge resource: SWISS-PROT and TrEMBL". Briefings in Bioinformatics. 3 (3): 275–284. doi:10.1093/bib/3.3.275. PMID 12230036. Archived from the original on 2024-01-24. Retrieved 2024-01-24.
  5. ^ Wu, C. H.; Yeh, L. S.; Huang, H.; Arminski, L.; Castro-Alvear, J.; Chen, Y.; Hu, Z.; Kourtesis, P.; Ledley, R. S.; Suzek, B. E.; Vinayaka, C. R.; Zhang, J.; Barker, W. C. (2003). "The Protein Information Resource". Nucleic Acids Research. 31 (1): 345–347. doi:10.1093/nar/gkg040. PMC 165487. PMID 12520019.
  6. ^ Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M. C.; Estreicher, A.; Gasteiger, E.; Martin, M. J.; Michoud, K.; O'Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M. (2003). "The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003". Nucleic Acids Research. 31 (1): 365–370. doi:10.1093/nar/gkg095. PMC 165542. PMID 12520024.
  7. ^ Bairoch, A.; Apweiler, R. (1996). "The SWISS-PROT protein sequence data bank and its new supplement TREMBL". Nucleic Acids Research. 24 (1): 21–25. doi:10.1093/nar/24.1.21. PMC 145613. PMID 8594581.
  8. ^ Bairoch, A. (2000). "Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times!". Bioinformatics. 16 (1): 48–64. doi:10.1093/bioinformatics/16.1.48. PMID 10812477. Archived from the original on 2024-02-05. Retrieved 2024-02-05.
  9. ^ Séverine Altairac, "Naissance d’une banque de données: Interview du prof. Amos Bairoch Archived 2010-07-12 at the Wayback Machine". Protéines à la Une Archived 2011-06-21 at the Wayback Machine, August 2006. ISSN 1660-9824.
  10. ^ a b c Apweiler, R.; Bairoch, A.; Wu, C. H. (2004). "Protein sequence databases". Current Opinion in Chemical Biology. 8 (1): 76–80. doi:10.1016/j.cbpa.2003.12.004. PMID 15036160.
  11. ^ a b Uniprot, C. (2009). "The Universal Protein Resource (UniProt) in 2010". Nucleic Acids Research. 38 (Database issue): D142 – D148. doi:10.1093/nar/gkp846. PMC 2808944. PMID 19843607.
  12. ^ "UniProtKB/Swiss-Prot Release 2023_01 statistics". web.expasy.org. Archived from the original on 4 April 2023. Retrieved 31 March 2023.
  13. ^ a b c "How do we manually annotate a UniProtKB entry?". UniProt. September 21, 2011. Archived from the original on Dec 13, 2013. Retrieved 14 April 2018.
  14. ^ a b Apweiler, R.; Bairoch, A.; Wu, C. H.; Barker, W. C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; Martin, M. J.; Natale, D. A.; o’Donovan, C.; Redaschi, N.; Yeh, L. S. (2004). "UniProt: The Universal Protein knowledgebase". Nucleic Acids Research. 32 (90001): 115D–1119. doi:10.1093/nar/gkh131. PMC 308865. PMID 14681372.
  15. ^ "Where do the UniProtKB protein sequences come from?". UniProt. September 21, 2011. Archived from the original on Dec 15, 2013. Retrieved 14 April 2018.
  16. ^ Hassabis, Demis (22 July 2022). "Putting the power of AlphaFold into the world's hands". Deepmind. Archived from the original on 24 July 2021. Retrieved 24 July 2021.
  17. ^ Leinonen, R.; Diez, F. G.; Binns, D.; Fleischmann, W.; Lopez, R.; Apweiler, R. (2004). "UniProt archive". Bioinformatics. 20 (17): 3236–3237. doi:10.1093/bioinformatics/bth191. PMID 15044231. Archived (PDF) from the original on Mar 30, 2024.
  18. ^ "Protein Research Foundation". Archived from the original on 2010-08-30. Retrieved 2010-08-25.
  19. ^ ftp://ftp.isrec.isb-sib.ch/pub/databases/trome[permanent dead link]
  20. ^ a b Suzek, B. E.; Huang, H.; McGarvey, P.; Mazumder, R.; Wu, C. H. (2007). "UniRef: Comprehensive and non-redundant UniProt reference clusters". Bioinformatics. 23 (10): 1282–1288. doi:10.1093/bioinformatics/btm098. PMID 17379688.
  21. ^ Li, W.; Jaroszewski, L.; Godzik, A. (2001). "Clustering of highly homologous sequences to reduce the size of large protein databases". Bioinformatics. 17 (3): 282–283. doi:10.1093/bioinformatics/17.3.282. PMID 11294794.

Read other articles:

U.S. Army special operations training center SWCS redirects here. For the submarine, see Shallow Water Combat Submersible. U.S. Army John F. Kennedy Special Warfare Center and SchoolU.S. Army John F. Kennedy Special Warfare Center and School shoulder sleeve insigniaCountry United StatesBranch United States ArmyTypeSpecial operationsRoleRecruit, assess, select, train and educate the U.S. Army Civil Affairs, Psychological Operations and Special Forces soldiers by providing training, e...

 

 

Baronang susu Siganus fuscescens Status konservasiRisiko rendahIUCN69689554 TaksonomiKerajaanAnimaliaFilumChordataKelasActinopteriOrdoPerciformesFamiliSiganidaeGenusSiganusSpesiesSiganus fuscescens Tata namaSinonim takson Centrogaster fuscescens Houttuyn, 1782 Amphacanthus fuscescens (Houttuyn, 1782) Teuthis fuscescens (Houttuyn, 1782) Amphacanthus ovatus Marion de Procé, 1822 Amphacanthus nebulosus Quoy & Gaimard, 1825 Siganus nebulosus (Quoy & Gaimard, 1825) Teuthis nebulosa (Quoy ...

 

 

Bilateral relationsBrunei–Japan relations Brunei Japan Diplomatic missionEmbassyEmbassyEnvoyAmbassador Shahbudin MusaAmbassador Maeda Toru Embassy of Japan in Bandar Seri Begawan Brunei–Japan relations (Malay: Hubungan Brunei - Jepun, Japanese: 日本とブルネイの関係) refers to bilateral foreign relations between Brunei and Japan. Brunei has an embassy in Tokyo, and Japan has an embassy in Bandar Seri Begawan.[1] History Relations has been established since 2 April 1984.&...

Artikel ini bukan mengenai Zapatismo. Bagian dari seriSosialisme Perkembangan Sejarah sosialisme Perdebatan kalkulasi sosialis Ekonomi sosialis Gagasan Penghitungan dalam barang Kepemilikan kolektif Koperasi Kepemilikan bersama Demokrasi ekonomi Perencanaan ekonomi Kesetaraan kesempatan Asosiasi bebas Demokrasi industri Model masukan-keluaran Internasionalisme Kupon kerja Keseimbangan material Ekonomi sejawat ke sejawat(Ekonomi berbagi) Produksi untuk penggunaan Kepemilikan negara Manajemen m...

 

 

Tai Po Tsat Yeuk Rural Committee building in 2007. Tai Po Tsat Yeuk (Chinese: 大埔七約; lit. 'Tai Po Seven Alliances') was an inter-village alliance (約, yeuk) in today's Hong Kong.[1] It collectively comprised 64 villages.[2] History The alliance established Tai Wo Market (太和市) in 1892 in order to break the monopoly of the old Tai Po Market (大埔墟) founded by the Tang Clan of Lung Yeuk Tau (龍躍頭鄧氏).[1] Alliance The seven constituent...

 

 

Jan HusCukil kayu dari Jan Hus, c. 1587Lahirc. 1369Husinec, Kerajaan Bohemia, Kekaisaran Romawi Suci(sekarang Republik Ceko)Meninggal6 Juli 1415(1415-07-06) (umur 44–45)Konstanz, Keuskupan Konstanz, Kekaisaran Romawi Suci(sekarang Jerman)Nama lainJohn Hus, John HussAlmamaterUniversitas Karlova PrahaEraFilsafat RenaisansKawasanFilsafat BaratAliranHusiteMinat utamaTeologi Dipengaruhi John Wycliffe, Conrad Waldhauser Memengaruhi John Wesley, Jerome dari Praha, Girolamo Savonaro...

Storage space for water This article is about an artificial body of water or a natural lake. For other uses, see Reservoir (disambiguation). For other types of man-made water bodies, see artificial lake. Kardzali Reservoir in Bulgaria is a reservoir in the Rhodope Mountains. Lake Osceola on campus of the University of Miami in Coral Gables, Florida, May 2006 Some reservoirs such as this in Argos, Peloponnese are made for recreational purposes, rather than storing fresh water. A reservoir (/ˈ...

 

 

Cet article est une ébauche concernant une localité luxembourgeoise. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Dommeldange (lb) Dummeldeng (de) Dommeldingen La gare. Administration Pays Luxembourg Canton Luxembourg Commune Luxembourg Démographie Population 2 951 hab.[1] (31 décembre 2023) Densité 1 253 hab./km2 Géographie Coordonnées 49° 38′ 02″ nord, 6° ...

 

 

City HallPoster filmSutradaraHarold BeckerProduserHarold BeckerKenneth LipperCharles MulvehillEdward R. PressmanDitulis olehKenneth LipperPaul SchraderNicholas PileggiBo GoldmanPemeranAl PacinoJohn CusackBridget FondaDanny AielloRichard SchiffDavid PaymerMartin LandauPenata musikJerry GoldsmithSinematograferMichael SeresinPenyuntingDavid BrethertonRobert C. JonesPerusahaanproduksiCastle Rock EntertainmentDistributorColumbia Pictures[1]Tanggal rilis16 Februari 1996Durasi111 menit...

Division of Holden Holden Special VehiclesCompany typePrivateIndustryAutomotiveFounded15 October 1987FounderTom WalkinshawDefunctAugust 18, 2020SuccessorGeneral Motors Specialty Vehicles (GMSV)HeadquartersClayton, VictoriaProductsAutomobilesOwnersWalkinshaw Group General Motors Specialty VehiclesWebsitewww.hsv.com.au Holden Special Vehicles (HSV) was the officially designated performance vehicle division for Holden. Established in 1987 and based in Clayton, Victoria, the privately owned compa...

 

 

Whispering SmithDa sinistra, Guy Mitchell e Audie Murphy.PaeseStati Uniti d'America Anno1961 Formatoserie TV Generewestern Stagioni1 Episodi26 Durata30 min Lingua originaleinglese Dati tecniciB/N1,33 : 1 CreditiInterpreti e personaggi Audie Murphy: Tom 'Whispering' Smith Guy Mitchell: George Romack MusicheRichard Shores ProduttoreJoseph Hoffman, Herbert Coleman, Willard Willingham Casa di produzioneWhispering Co., NBC Prima visioneDall'8 maggio 1961 Al30 ottobre 1961 Rete televisivaNBC O...

 

 

Artikel ini memberikan informasi dasar tentang topik kesehatan. Informasi dalam artikel ini hanya boleh digunakan untuk penjelasan ilmiah; bukan untuk diagnosis diri dan tidak dapat menggantikan diagnosis medis. Wikipedia tidak memberikan konsultasi medis. Jika Anda perlu bantuan atau hendak berobat, berkonsultasilah dengan tenaga kesehatan profesional. Demam babi klasikPendarahan pada ginjal merupakan salah satu perubahan histopatologis pada demam babi klasik.Informasi umumNama lainClassical...

本條目存在以下問題,請協助改善本條目或在討論頁針對議題發表看法。 此條目需要擴充。 (2013年1月1日)请協助改善这篇條目,更進一步的信息可能會在討論頁或扩充请求中找到。请在擴充條目後將此模板移除。 此條目需要补充更多来源。 (2013年1月1日)请协助補充多方面可靠来源以改善这篇条目,无法查证的内容可能會因為异议提出而被移除。致使用者:请搜索一下条目的...

 

 

Target Rock National Wildlife RefugeIUCN category IV (habitat/species management area)Show map of New YorkShow map of the United StatesLocationSuffolk County, New York, United StatesNearest cityLloyd Harbor, New YorkCoordinates40°55′22″N 73°25′51″W / 40.92277°N 73.43083°W / 40.92277; -73.43083[1]Area80 acres (0.32 km2)Established1967Governing bodyU.S. Fish and Wildlife ServiceWebsiteTarget Rock National Wildlife Refuge The Target Rock...

 

 

此條目可能包含不适用或被曲解的引用资料,部分内容的准确性无法被证實。 (2023年1月5日)请协助校核其中的错误以改善这篇条目。详情请参见条目的讨论页。 各国相关 主題列表 索引 国内生产总值 石油储量 国防预算 武装部队(军事) 官方语言 人口統計 人口密度 生育率 出生率 死亡率 自杀率 谋杀率 失业率 储蓄率 识字率 出口额 进口额 煤产量 发电量 监禁率 死刑 国债 ...

ميدل إيست آيالشعارمعلومات عامةالنوع موقع ويب إخباريالتأسيس فبراير 2014[1] مواقع الويب middleeasteye.net[1] (الإنجليزية، ‏الفرنسية)middleeasteye.net[2] (الإنجليزية، ‏الفرنسية) شخصيات هامةالمالك شركة ميدل إيست آي إل تي ديرئيس التحرير ديفيد هيرستالتحريراللغة الإنجليزيةالمواضيع �...

 

 

У этого термина существуют и другие значения, см. Российская империя (значения). ИмперияРоссийская империярус. дореф. Россійская имперія Флаг[~ 1][~ 2] Большой государственный герб Девиз: «Съ нами Богъ!» Гимн: «Молитва русских»(1816—1833)«Боже, Царя храни!»(1833—1917)  Тер...

 

 

L'alfabeto turco, una variante dell'alfabeto latino, include due versioni della lettera I: la i con il puntino (in maiuscolo İ) e la ı senza puntino (in maiuscolo I). La stessa particolarità si ritrova nell'alfabeto usato per l'azero, il tataro e il tataro di Crimea. In scrittura le lettere ı e i si distinguono dal puntino anche se maiuscole, quindi la i maiuscola uguale a quella italiana deve avere un piccolo puntino in alto (come nella grafia turca di İstanbul). La i con il puntino è ...

人民议会 مجلس الشعبMajlis al-Shaʻb种类种类一院制领导议长(英语:Speaker of the People's Assembly of Syria)汉穆达·萨巴格(叙利亚阿拉伯复兴社会党) 自2017年9月28日结构议员250政党  全国进步阵线 (183)   阿拉伯复兴社会党-叙利亚地区 (167)   叙利亚民族社会党 (3)   叙利亚共产党 (巴格达什派) (2)   社会主义统一分子党 (2)   阿拉伯社会�...

 

 

Town in North Lincolnshire, England Human settlement in EnglandBarton-upon-HumberMarket Place, Barton-upon-HumberBarton-upon-HumberLocation within LincolnshirePopulation11,066 (2011 Census)[1]OS grid referenceTA030221• London150 mi (240 km) SCivil parishBartonUnitary authorityNorth LincolnshireCeremonial countyLincolnshireRegionYorkshire and the HumberCountryEnglandSovereign stateUnited KingdomPost townBarton-upon-HumberPostcode ...