Ancestry-informative marker

AIMS can be used to identify five European "clusters"

In population genetics, an ancestry-informative marker (AIM) is a single-nucleotide polymorphism that exhibits substantially different frequencies between different populations. A set of many AIMs can be used to estimate the proportion of ancestry of an individual derived from each population.

A single-nucleotide polymorphism is a modification of a single nucleotide base within a DNA sequence.[1] There are an estimated 15 million SNP (Single-nucleotide polymorphism) sites (out of roughly 3 billion base pairs, or about 0.4%) from among which AIMs may potentially be selected.[2] The SNPs that relate to ancestry are often traced to the Y chromosome and mitochondrial DNA because both of these areas are inherited from one parent, eradicating complexities that come with parental gene recombination.[3][page needed] SNP mutations are rare, so sequences with SNPs tend to be passed down through generations rather than altered each generation. However, because any given SNP is relatively common in a population, analysts must examine groups of SNPs (otherwise known as AIMS) to determine someone's ancestry. Using statistical methods such as apparent error rate and Improved Bayesian Estimate, the set of SNPs with the highest accuracy for predicting a specific ancestry can be found.[4]

Examining a suite of these markers more or less evenly spaced across the genome is also a cost-effective way to discover novel genes underlying complex diseases in a technique called admixture mapping or mapping by admixture linkage disequilibrium.

As one example, the Duffy Null allele (FY*0) has a frequency of almost 100% of Sub-Saharan Africans, but occurs very infrequently in populations outside of this region. A person having this allele is thus more likely to have Sub-Saharan African ancestors. North and South Han Chinese ancestry can be distinguished unambiguously using a set of 140 AIMS.[5]

Collections of AIMs have been developed that can estimate the geographical origins of ancestors from within Europe.[6]

Following the development of ancient DNA databases, ancient ancestry-informative marker (aAIM) were similarly defined as a single-nucleotide polymorphism that exhibits substantially different frequencies between different ancient populations. A set of aAIMs can be used to identify the ancestry of ancient populations and eventually quantify the genetic similarity to modern-day individuals.[7]

Discovery and development

The discovery of ancestry-informative markers was made possible by the development of next generation sequencing, or NGS. NGS enables the study of genetic markers by isolating specific gene sequences.[8] One such method for sequence extraction is the use restriction enzymes, specifically endonuclease, which modifies the DNA sequence. This enzyme can be used with DNA ligase (connecting two different DNA), modifying DNA by inserting DNA from other organism.[9] Another method, cDNA sequencing, or RNA-seq, can also help to acquire information of the transcriptomes in a broad range of organisms and find SNPs (single nucleotide polymorphisms), within a DNA sequence.

Applications

Ancestry informative markers have a number of applications in genetic research, forensics, and private industry. AIMs that indicate a predisposition for diseases such as type 2 diabetes mellitus and renal disease have been shown to reduce the effects of genetic admixture in ancestral mapping when using admixture mapping software.[10] The differential ability of ancestry-informative markers allows scientists and researchers to narrow geographical populations of concern; for example, illegal organ trafficking can be traced to certain areas by comparing the samples taken from organ recipients and deciphering the foreign marker in their body.[11] An array of private companies, such as 23andMe and AncestryDNA, provide cost-effective direct-to-consumers (DTC) genetic testing by analyzing ancestry informative markers to determine geographic origins. These private companies collect massive quantities of data such as biological samples and self-reported information from consumers, a practice known as biobanking, enabling their researchers to discover more insights on AIMs.[12]

Though AIM panels can be useful for disease screening, the Genetic Information Nondiscrimination Act (GINA) prevents the use of genetic information for insurance and workplace discrimination.[13]

Medical research

Different ancestral traits and their affiliation to diseases can help scientists determine appropriate approaches of treatment for a specific population.[14] Medical researchers have revealed the link between ancestry traits and some common diseases; for example, individuals of African descent have been found to be at higher risk of asthma than those of European ancestry.[15]

AIM panels can be used for detecting disease risk factors. One such panel was created for African American ancestry based on subsets of commercially available SNP arrays. These types of arrays can help reduce the cost of identifying risk factors, since they allow researchers to screen for ancestry markers instead of the entire genome. This is due to the fact that these SNP arrays narrow the scope of the necessary screening from hundreds of thousands of SNP markers to a panel of a few thousands of AIMs.[16]

While some believe that structured populations should be used in studies to better ascertain genetic associations to diseases, the social implications of the potential racial stigma that may result from such studies is a major concern. However, the study done by Yang et al. (2005) suggests that the technology to conduct deeper research into and identify ancestry-associated variations in human disease does already exist.[14]

See also

References

  1. ^ "Polymorphism (genetics)". AccessScience. doi:10.1036/1097-8542.535500.
  2. ^ Pennisi, Elizabeth (2007). "Human Genetic Variation". Science. 318 (5858): 1842–1843. doi:10.1126/science.318.5858.1842. PMID 18096770.
  3. ^ Houck, Max M (2015). Forensic biology. Oxford, England ; San Diego, California : Academic Press. ISBN 9780128007112.
  4. ^ Sampson, Joshua N.; Kidd, Kenneth K.; Kidd, Judith R.; Zhao, Hongyu (2011-06-14). "Selecting SNPs to Identify Ancestry". Annals of Human Genetics. 75 (4): 539–553. doi:10.1111/j.1469-1809.2011.00656.x. ISSN 0003-4800. PMC 3141729. PMID 21668909.
  5. ^ Qu, Hui-Qi; Li, Quan; Xu, Shuhua; McCormick, Joseph B.; Fisher-Hoch, Susan P.; Xiong, Momiao; Qian, Ji; Jin, Li (2012). "Ancestry Informative Marker Set for Han Chinese Population". G3: Genes, Genomes, Genetics. 2 (3): 339–341. doi:10.1534/g3.112.001941. PMC 3291503. PMID 22413087.
  6. ^ Bauchet, Marc; McEvoy, Brian; Pearson, Laurel N.; Quillen, Ellen E.; Sarkisian, Tamara; Hovhannesyan, Kristine; Deka, Ranjan; Bradley, Daniel G.; Shriver, Mark D. (2007). "Measuring European Population Stratification with Microarray Genotype Data". The American Journal of Human Genetics. 80 (5): 948–956. doi:10.1086/513477. PMC 1852743. PMID 17436249.
  7. ^ Elhaik, Eran; Pirooznia, Mehdi; Syed, Syakir; Das, Ranajit; Esposito, Umberto (2018-12-12). "Ancient Ancestry Informative Markers for Identifying Fine-Scale Ancient Population Structure in Eurasians". Genes. 9 (12): 625. doi:10.3390/genes9120625. PMC 6316245. PMID 30545160.
  8. ^ Davey, John W.; Hohenlohe, Paul A.; Etter, Paul D.; Boone, Jason Q.; Catchen, Julian M.; Blaxter, Mark L. (July 2011). "Genome-wide genetic marker discovery and genotyping using next-generation sequencing". Nature Reviews Genetics. 12 (7): 499–510. doi:10.1038/nrg3012. ISSN 1471-0056. PMID 21681211. S2CID 15080731.
  9. ^ Loenen, Wil A. M.; Dryden, David T. F.; Raleigh, Elisabeth A.; Wilson, Geoffrey G.; Murray, Noreen E. (2013-10-18). "Highlights of the DNA cutters: a short history of the restriction enzymes". Nucleic Acids Research. 42 (1): 3–19. doi:10.1093/nar/gkt990. ISSN 1362-4962. PMC 3874209. PMID 24141096.
  10. ^ Keene, Keith L.; Mychaleckyj, Josyf C.; Leak, Tennille S.; Smith, Shelly G.; Perlegas, Peter S.; Divers, Jasmin; Langefeld, Carl D.; Freedman, Barry I.; Bowden, Donald W. (2008-07-25). "Exploration of the utility of ancestry informative markers for genetic association studies of African Americans with type 2 diabetes and end stage renal disease". Human Genetics. 124 (2): 147–154. doi:10.1007/s00439-008-0532-6. ISSN 0340-6717. PMC 2786006. PMID 18654799.
  11. ^ Severini, S.; Carnevali, E.; Margiotta, G.; Garcia-González, M.A.; Carracedo, Á. (2015-12-01). "Use of ancestry-informative markers as a scientific tool to combat the illegal traffic in human kidneys". Forensic Science International: Genetics Supplement Series. 5: e302 – e304. doi:10.1016/j.fsigss.2015.09.120. ISSN 1875-1768.
  12. ^ Stoeklé, Henri-Corto; Mamzer-Bruneel, Marie-France; Vogt, Guillaume; Hervé, Christian (2016-03-31). "23andMe: a new two-sided data-banking market model". BMC Medical Ethics. 17 (1): 19. doi:10.1186/s12910-016-0101-9. ISSN 1472-6939. PMC 4826522. PMID 27059184.
  13. ^ Slaughter (April 25, 2007). "Statement of Administration Policy: Genetic Information Nondiscrimination Act (2007)" (PDF).
  14. ^ a b Yang, Nan; Li, Hongzhe; Criswell, Lindsey A.; Gregersen, Peter K.; Alarcon-Riquelme, Marta E.; Kittles, Rick; Shigeta, Russell; Silva, Gabriel; Patel, Pragna I. (2005-09-29). "Examination of ancestry and ethnic affiliation using highly informative diallelic DNA markers: application to diverse and admixed populations and implications for clinical epidemiology and forensic medicine". Human Genetics. 118 (3–4): 382–392. doi:10.1007/s00439-005-0012-1. ISSN 0340-6717. PMID 16193326. S2CID 20152083.
  15. ^ Vergara, Candelaria; Caraballo, Luis; Mercado, Dilia; Jimenez, Silvia; Rojas, Winston; Rafaels, Nicholas; Hand, Tracey; Campbell, Monica; Tsai, Yuhjung J. (2009-03-17). "African ancestry is associated with risk of asthma and high total serum IgE in a population from the Caribbean Coast of Colombia". Human Genetics. 125 (5–6): 565–579. doi:10.1007/s00439-009-0649-2. ISSN 0340-6717. PMID 19290544. S2CID 21141741.
  16. ^ Tandon, Arti; Patterson, Nick; Reich, David (2010-12-22). "Ancestry informative marker panels for African Americans based on subsets of commercially available SNP arrays". Genetic Epidemiology. 35 (1): 80–83. doi:10.1002/gepi.20550. ISSN 0741-0395. PMC 4386999. PMID 21181899.
General