Single nucleotide polymorphism annotation (SNP annotation) is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.[1]
Introduction
Single nucleotide polymorphisms (SNPs) play an important role in genome wide association studies because they act as primary biomarkers. SNPs are currently the marker of choice due to their large numbers in virtually all populations of individuals. The location of these biomarkers can be tremendously important in terms of predicting functional significance, genetic mapping and population genetics.[3] Each SNP represents a nucleotide change between two individuals at a defined location. SNPs are the most common genetic variant found in all individual with one SNP every 100–300 bp in some species.[4] Since there is a massive number of SNPs on the genome, there is a clear need to prioritize SNPs according to their potential effect in order to expedite genotyping and analysis.
[5]
Annotating large numbers of SNPs is a difficult and complex process, which need computational methods to handle such a large dataset. Many tools available have been developed for SNP annotation in different organisms: some of them are optimized for use with organisms densely sampled for SNPs (such as humans), but there are currently few tools available that are species non-specific or support non-model organism data. The majority of SNP annotation tools provide computationally predicted putative deleterious effects of SNPs. These tools examine whether a SNP resides in functional genomic regions such as exons, splice sites, or transcription regulatory sites, and predict the potential corresponding functional effects that the SNP may have using a variety of machine-learning approaches. But the tools and systems that prioritize functionally significant SNPs, suffer from few limitations: First, they examine the putative deleterious effects of SNPs with respect to a single biological function that provide only partial information about the functional significance of SNPs. Second, current systems classify SNPs into deleterious or neutral group.[6]
Many annotation algorithms focus on single nucleotide variants (SNVs), considered more rare than SNPs as defined by their minor allele frequency (MAF).[7][8] As a consequence, training data for the corresponding prediction methods may be different and hence one should be careful to select the appropriate tool for a specific purpose. For the purposes of this article, "SNP" will be used to mean both SNP and SNV, but readers should bear in mind the differences.
SNP annotation
For SNP annotation, many kinds of genetic and genomic information are used. Based on the different features used by each annotation tool, SNP annotation methods may be split roughly into the following categories:
Gene based annotation
Genomic information from surrounding genomic elements is among the most useful information for interpreting the biological function of an observed variant. Information from a known gene is used as a reference to indicate whether the observed variant resides in or near a gene and if it has the potential to disrupt the protein sequence and its function. Gene based annotation is based on the fact that non-synonymous mutations can alter the protein sequence and that splice site mutation may disrupt the transcript splicing pattern.[9]
Knowledge based annotation
Knowledge base annotation is done based on the information of gene attribute, protein function and its metabolism. In this type of annotation more emphasis is given to genetic variation that disrupts the protein function domain, protein-protein interaction and biological pathway. The non-coding region of genome contain many important regulatory elements including promoter, enhancer and insulator, any kind of change in this regulatory region can change the functionality of that protein.[10] The mutation in DNA can change the RNA sequence and then influence the RNA secondary structure, RNA binding protein recognition and miRNA binding activity.[11][12]
Functional annotation
This method mainly identifies variant function based on the information whether the variant loci are in the known functional region that harbor genomic or epigenomic signals. The function of non-coding variants are extensive in terms of the affected genomic region and they involve in almost all processes of gene regulation from transcriptional to post translational level [13]
Transcriptional gene regulation
Transcriptional gene regulation process depends on many spatial and temporal factors in the nucleus such as global or local chromatin states, nucleosome positioning, TF binding, enhancer/promoter activities. Variant that alter the function of any of these biological processes may alter the gene regulation and cause phenotypic abnormality.[14] Genetic variants that located in distal regulatory region can affect the binding motif of TFs, chromatin regulators and other distal transcriptional factors, which disturb the interaction between enhancer/silencer and its target gene.[15]
Alternative splicing
Alternative splicing is one of the most important components that show functional complexity of genome. Modified splicing has significant effect on the phenotype that is relevance to disease or drug metabolism. A change in splicing can be caused by modifying any of the components of the splicing machinery such as splice sites or splice enhancers or silencers.[16] Modification in the alternative splicing site can lead to a different protein form which will show a different function. Humans use an estimated 100,000 different proteins or more, so some genes must be capable of coding for a lot more than just one protein. Alternative splicing occurs more frequently than was previously thought and can be hard to control; genes may produce tens of thousands of different transcripts, necessitating a new gene model for each alternative splice.
RNA processing and post transcriptional regulation
Mutations in the untranslated region (UTR) affect many post-transcriptional regulation. Distinctive structural features are required for many RNA molecules and cis-acting regulatory elements to execute effective functions during gene regulation. SNVs can alter the secondary structure of RNA molecules and then disrupt the proper folding of RNAs, such as tRNA/mRNA/lncRNA folding and miRNA binding recognition regions.[17]
Translation and post translational modifications
Single nucleotide variant can also affect the cis-acting regulatory elements in mRNA’s to inhibit/promote the translation initiation. Change in the synonymous codons region due to mutation may affect the translation efficiency because of codon usage biases. The translation elongation can also be retarded by mutations along the ramp of ribosomal movement. In the post-translational level, genetic variants can contribute to proteostasis and amino acid modifications. However, mechanisms of variant effect in this field are complicated and there are only a few tools available to predict variant’s effect on translation related modifications.[18]
Protein function
Non-synonymous is the variant in exons that change the amino acid sequence encoded by the gene, including single base changes and non frameshift indels. It has been extremely investigated the function of non-synonymous variants on protein and many algorithms have been developed to predict the deleteriousness and pathogenesis of single nucleotide variants (SNVs). Classical bioinformatics tools, such as SIFT, Polyphen and MutationTaster, successfully predict the functional consequence of non-synonymous substitution.[19][20][21][22] PopViz webserver provides a gene-centric approach to visualize the mutation damage prediction scores (CADD, SIFT, PolyPhen-2) or the population genetics (minor allele frequency) versus the amino acid positions of all coding variants of a certain human gene.[23] PopViz is also cross-linked with UniProt database, where the protein domain information can be found, and to then identify the predicted deleterious variants fall into these protein domains on the PopViz plot.[23]
Evolutionary conservation and nature selection
Comparative genomics approaches were used to predict the function-relevant variants under the assumption that the functional genetic locus should be conserved across different species at an extensive phylogenetic distance. On the other hand, some adaptive traits and the population differences are driven by positive selections of advantageous variants, and these genetic mutations are functionally relevant to population specific phenotypes. Functional prediction of variants’ effect in different biological processes is pivotal to pinpoint the molecular mechanism of diseases/traits and direct the experimental validation.[24]
List of available SNP annotation tools
To annotate the vast amounts of available NGS data, currently a large number of SNPs annotation tools are available. Some of them are specific to specific SNPs while others are more general. Some of the available SNPs annotation tools are as follows SNPeff, Ensembl Variant Effect Predictor (VEP), ANNOVAR, FATHMM, PhD-SNP, PolyPhen-2, SuSPect, F-SNP, AnnTools, SeattleSeq, SNPit, SCAN, Snap, SNPs&GO, LS-SNP, Snat, TREAT, TRAMS, Maviant, MutationTaster, SNPdat, Snpranker, NGS – SNP, SVA, VARIANT, SIFT, LIST-S2, PhD-SNP and FAST-SNP. The functions and approaches used in SNPs annotation tools are listed below.
Tools
Description
External resources use
WebsiteURL
References
PhyreRisk
Maps genetics variants onto experimental and predicted protein structures
Reports structural impact of a missense variant onto PDB and user-supplied protein coordinates. Developed to be applicable to experimental and predicted protein structures
Suitable for predicting damaging effects of missense mutations. Uses sequence conservation, structure to model position of amino acid substitution, and SWISS-PROT annotation
An SVM-trained predictor of the damaging effects of missense mutations. Uses sequence conservation, structure and network (interactome) information to model phenotypic effect of amino acid substitution. Accepts VCF file
UniProt, PDB, Phyre2 for predicted structures, DOMINE and STRING for interactome
Design to Identify novel and SNP/SNV, INDEL and SV/CNV. AnnTools searches for overlaps with regulatory elements, disease/trait associated loci, known segmental duplications and artifact prone regions
dbSNP, UCSC, GATK refGene, GAD, published lists of common structural genomic variation, Database of Genomic Variants, lists of conserved TFBs, miRNA
Uses physical and functional based annotation to categorize according to their position relative to genes and according to linkage disequilibrium (LD) patterns and effects on expression levels
Suitable for species non-specific or support non-model organism data. SNPdat does not require the creation of any local relational databases or pre-processing of any mandatory input files
VARIANT increases the information scope outside the coding regions by including all the available information on regulation, DNA structure, conservation, evolutionary pressures, etc. Regulatory variants constitute a recognized, but still unexplored, cause of pathologies
dbSNP,1000 genomes, disease-related variants from GWAS, OMIM, COSMIC
SIFT is a program that predicts whether an amino acid substitution affects protein function. SIFT uses sequence homology to predict whether an amino acid substitution will affect protein function
LIST-S2 (Local Identity and Shared Taxa, Species-specific) is based on the assumption that variations observed in closely related species are more significant when assessing conservation compared to those in distantly related species
A web server that allows users to efficiently identify and prioritize high-risk SNPs according to their phenotypic risks and putative functional effects
NCBI dbSNP, Ensembl, TFSearch, PolyPhen, ESEfinder, RescueESE, FAS-ESS, SwissProt, UCSC Golden Path, NCBI Blast and HapMap
PANTHER relate protein sequence evolution to the evolution of specific protein functions and biological roles. The source of protein sequences used to build the protein family trees and used a computer-assisted manual curation step to better define the protein family clusters
Variant annotation tools use machine learning algorithms to predict variant annotations. Different annotation tools use different algorithms. Common algorithms include:
A large number of variant annotation tools are available for variant annotation. The annotation by different tools does not alway agree amongst each other, as the defined rules for data handling differ between applications. It is frankly impossible to perform a perfect comparison of the available tools. Not all tools have the same input and output nor the same functionality. Below is a table of major annotation tools and their functional area.
Different annotations capture diverse aspects of variant function.[60] Simultaneous use of multiple, varied functional annotations could improve rare variants association analysis power of whole exome and whole genome sequencing studies.[61] Some tools have been developed to enable functionally-informed phenotype-genotype association analysis for common and rare variants by incorporating functional annotations in biobank-scale cohorts. [62][63][64][65]
Conclusions
The next generation of SNP annotation webservers can take advantage of the growing amount of data in core bioinformatics resources and use intelligent agents to fetch data from different sources as needed. From a user’s point of view, it is more efficient to submit a set of SNPs and receive results in a single step, which makes meta-servers the most attractive choice. However, if SNP annotation tools deliver heterogeneous data covering sequence, structure, regulation, pathways, etc., they must also provide frameworks for integrating data into a decision algorithms, and quantitative confidence measures so users can assess which data are relevant and which are not.
^P. H. Lee, H. Shatkay, “Ranking single nucleotide polymorphisms by potential deleterious effects”, Computational Biology and Machine Learning Lab, School of Computing, Queen’s University, Kingston, ON, Canada
^Sauna ZE, Kimchi-Sarfaty C (August 2011). "Understanding the contribution of synonymous mutations to human disease". Nature Reviews. Genetics. 12 (10): 683–691. doi:10.1038/nrg3051. PMID21878961. S2CID8358824.
^M. J. Li, J. Wang, "Current trend of annotating single nucleotide variation in humans – A case study on SNVrap", Elsevier, 2014, pp. 1–9
^J. Wu, R. Jiang, "Prediction of Deleterious Nonsynonymous Single-Nucleotide Polymorphism for Human Diseases", The Scientific World Journal, 2013, 10 pages
Chemical law relating ionizing power of a solvent and reaction rate In physical organic chemistry, the Grunwald–Winstein equation is a linear free energy relationship between relative rate constants and the ionizing power of various solvent systems, describing the effect of solvent as nucleophile on different substrates. The equation, which was developed by Ernest Grunwald and Saul Winstein in 1948, could be written[1][2] log k x , s o l k x , 80 % ...
Berikut ini adalah daftar kota-kota terbesar di Asia Timur dan Asia Tenggara. Asia Timur Tiongkok Hong Kong Macau Mongolia Jepang Korea Utara Korea Selatan Taiwan Asia Tenggara Brunei Kamboja Timor-Leste (Timor Timur) Indonesia Laos Malaysia Myanmar (Burma) Filipina Singapura Thailand Vietnam Wilayah megalopolis terbesar Artikel utama: Megalopolis Urutan Wilayah Negara 1 Sabuk Tai...
DicksoniaceaeRentang fosil: Kapur Awal–Sekarang PreЄ Є O S D C P T J K Pg N Dicksonia antarctica (paku pohon lunak) di Kebun Botani Kew, dekat London. Klasifikasi ilmiah Kerajaan: Plantae Divisi: Pteridophyta Kelas: Pteridopsida Ordo: Cyatheales Famili: Dicksoniaceae Genera Calochlaena Dicksonia Lophosoria †Conantiopteris †Coniopteris †Erboracia †Lophosoriorhachis †Nishidicaulis †Onychiopsis Dicksoniaceae merupakan salah satu suku anggota tumbuhan paku (Pteridophyta) yang ter...
Unincorporated community in Mississippi, United StatesMoon Lake, MississippiUnincorporated communityMoonLakeShow map of MississippiMoonLakeShow map of the United StatesCoordinates: 34°26′24″N 90°29′43″W / 34.44000°N 90.49528°W / 34.44000; -90.49528CountryUnited StatesStateMississippiCountyCoahomaElevation174 ft (53 m)Time zoneUTC-6 (Central (CST)) • Summer (DST)UTC-5 (CDT)ZIP code38644Area code662GNIS feature ID689308[1] Moon ...
Épinouze Mairie Administration Pays France Région Auvergne-Rhône-Alpes Département Drôme Arrondissement Valence Intercommunalité Communauté de communes Porte de DrômArdèche Maire Mandat Yves Lafaury 2020-2026 Code postal 26210 Code commune 26118 Démographie Gentilé Epinouziens, Epinouziennes Populationmunicipale 1 523 hab. (2021 ) Densité 136 hab./km2 Géographie Coordonnées 45° 18′ 36″ nord, 4° 55′ 42″ est Altitude Min. 191...
Ilustrasi kompresi uniform Modulus kompresi atau modulus curah (Inggris: bulk modulus; dengan lambang K {\displaystyle K} atau B {\displaystyle B} ) suatu zat adalah ukuran resistansi zat itu pada kompresi uniform. Didefinisikan sebagai rasio kenaikan tekanan infinitesimal terhadap penurunan relatif volume yang dihasilkan. Satuan SI modulus kompresi adalah pascal, dan bentuk dimensionalnya adalah M1L−1T−2.[1] Definisi Modulus kompresi K > 0 {\displaystyle K>0} dapat seca...
South Korean politician In this Korean name, the family name is Lee. Lee In-young이인영Minister of UnificationIn office27 July 2020 – 9 May 2022PresidentMoon Jae-inPreceded byKim Yeon-chul Suh Ho (acting)Succeeded byKwon Young-seMember of the National Assembly for Guro 1stIncumbentAssumed office 30 May 2012Preceded byLee Beom-raeIn office30 May 2004 – 29 May 2008Preceded byKim Ki-baeSucceeded byLee Beom-rae Personal detailsBorn (1964-06-28) 28 June 1964 (age ...
Australian rules footballer (born 1996) Australian rules footballer Peter Wright Wright playing in June 2017.Personal informationFull name Peter WrightNickname(s) Two-metre Peter, Seven Seater Peter[1]Date of birth (1996-09-07) 7 September 1996 (age 27)Original team(s) Calder Cannons (TAC Cup)Draft No. 8, 2014 national draftHeight 203 cm (6 ft 8 in)Weight 102 kg (225 lb)Position(s) Key ForwardClub informationCurrent club EssendonNumber 20Playing...
Disambiguazione – Stig rimanda qui. Se stai cercando altri significati, vedi Stig (disambigua). Questa voce o sezione sull'argomento professionisti televisivi non cita le fonti necessarie o quelle presenti sono insufficienti. Commento: Mancano le fonti nella seconda parte, e lo stile è rivedibile Puoi migliorare questa voce aggiungendo citazioni da fonti attendibili secondo le linee guida sull'uso delle fonti. Segui i suggerimenti del progetto di riferimento. The Stig a Londr...
烏克蘭總理Прем'єр-міністр України烏克蘭國徽現任杰尼斯·什米加尔自2020年3月4日任命者烏克蘭總統任期總統任命首任維托爾德·福金设立1991年11月后继职位無网站www.kmu.gov.ua/control/en/(英文) 乌克兰 乌克兰政府与政治系列条目 宪法 政府 总统 弗拉基米尔·泽连斯基 總統辦公室 国家安全与国防事务委员会 总统代表(英语:Representatives of the President of Ukraine) 总...
American press secretary Scott McClellan24th White House Press SecretaryIn officeJuly 15, 2003 – May 10, 2006PresidentGeorge W. BushPreceded byAri FleischerSucceeded byTony SnowWhite House Deputy Press SecretaryIn officeJanuary 20, 2001 – July 15, 2003PresidentGeorge W. BushLeaderAri FleischerPreceded byJake SiewertSucceeded byDana Perino Personal detailsBorn (1968-02-14) February 14, 1968 (age 56)Austin, Texas, U.S.Political partyRepublican (Formerly)IndependentSpo...
American college baseball season 1966 USC Trojans baseballCIBA ChampionsDistrict VIII ChampionsCollege World Series, T-3rdConferenceCalifornia Intercollegiate Baseball AssociationRecord42–9 (16–4 CIBA)Head coachRod Dedeaux (25th season)Home stadiumBovard FieldSeasons← 19651967 → 1966 Athletic Association of Western Universities baseball standings vte Conf Overall Team W L T PCT W L T PCT North Division No. 11 Washin...
Collection of information outside a laboratory, library or workplace setting This article is about the scientific method. For the military term, see fortification. Fieldwork and field work redirect here. For other topics named similarly, see Fieldwork (disambiguation). Biologists collecting information in the field Part of a series onResearch Research design Ethics Proposal Question Writing Argument Referencing Research strategy Interdisciplinary Multimethodology Qualitative Art-based Quantit...
For other ships with the same name, see Georgia (disambiguation) § Ships, SS Pickhuben, and Housatonic (disambiguation) § Transport. German-built cargo ship sunk in 1917 The ship as Georgia History Name 1891: Pickhuben 1895: Georgia 1915: Housatonic Namesake 1891: a street in Hamburg 1895: Georgia 1915: Housatonic River Owner 1891: Dampfschiffs-Reederei „Hansa“ 1892: Hamburg America Line 1915: Housatonic Steamship Co Operator 1915: Edward F Geer 1916: Brown, Jenkinson & Co...
إيفانو فرانكيفسك (بالأوكرانية: Івано-Франківськ) إيفانو فرانكيفسك إيفانو فرانكيفسك خريطة الموقع سميت باسم إيفان فرانكو (أديب أوكراني) تاريخ التأسيس 1662 تقسيم إداري البلد أوكرانيا (24 أغسطس 1991–) [1][2] عاصمة لـ إيفانو فرانكيفسك أوبلاست التقسي...
Town in North Rhine-Westphalia, GermanyLügde Town Coat of armsLocation of Lügde within Lippe district Lügde Show map of GermanyLügde Show map of North Rhine-WestphaliaCoordinates: 51°57′00″N 09°15′00″E / 51.95000°N 9.25000°E / 51.95000; 9.25000CountryGermanyStateNorth Rhine-WestphaliaAdmin. regionDetmold DistrictLippe Government • Mayor (2020–25) Torben Blome[1] (SPD)Area • Total88.64 km2 (34.22 sq mi...
This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (March 2018) (Learn how and when to remove this message) For other uses, see Untersee. UnterseeView from an aeroplane above Rickenbach (CH) of the Untersee and the island of Reichenau (D) with Lake Überlingen (D), the northwestern part of the Obersee (D/CH/A) behind.UnterseeShow map of Baden-WürttembergUntersee...