Genome skimming is a sequencing approach that uses low-pass, shallow sequencing of a genome (up to 5%), to generate fragments of DNA, known as genome skims.[1][2] These genome skims contain information about the high-copy fraction of the genome.[2] The high-copy fraction of the genome consists of the ribosomal DNA, plastid genome (plastome), mitochondrial genome (mitogenome), and nuclear repeats such as microsatellites and transposable elements.[3] It employs high-throughput, next generation sequencing technology to generate these skims.[1] Although these skims are merely 'the tip of the genomic iceberg', phylogenomic analysis of them can still provide insights on evolutionary history and biodiversity at a lower cost and larger scale than traditional methods.[2][3][4] Due to the small amount of DNA required for genome skimming, its methodology can be applied in other fields other than genomics. Tasks like this include determining the traceability of products in the food industry, enforcing international regulations regarding biodiversity and biological resources, and forensics.[5]
The Internal transcribed spacers (ITS) are non-coding regions within the 18-5.8-28S rDNA in eukaryotes and are one feature of rDNA that has been used in genome skimming studies.[7] ITS are used to detect different species within a genus, due to their high inter-species variability.[7] These have low individual variability, preventing the identification of distinct strains or individuals.[7] They are also present in all eukaryotes, have a high evolution rate and has been used in phylogenetic analysis between and across species.[7]
When targeting nuclear rDNA, it is suggested that a minimum final sequencing depth of 100X is achieved, and sequences with less than 5X depth are masked.[1]
Plastomes
The plastid genome, or plastome, has been used extensively in identification and evolutionary studies using genome skimming due to its high abundance within plants (~3-5% of cell DNA), small size, simple structure, greater conservation of gene structure than nuclear or mitochondrial genes.[8][9] Plastids studies have previously been limited by the number of regions that could be assessed in traditional approaches.[9] Using genome skimming, the sequencing of the entire plastid genome, or plastome, can be done at a fraction of the cost and time required for typical sequencing approaches like Sanger sequencing.[3] Plastomes have been suggested as a method to replace traditional DNA barcodes in plants,[3] such as the rbcL and matK barcode genes. Compared to the typical DNA barcode, genome skimming produces plastomes at a tenth of the cost per base.[5] Recent uses of genome skims of plastomes have allowed greater resolution of phylogenies, higher differentiation of specific groups within taxa, and more accurate estimates of biodiversity.[9] Additionally, the plastome has been used to compare species within a genus to look at evolutionary changes and diversity within a group.[9]
When targeting plastomes, it is suggested that a minimum final sequencing depth of 30X is achieved for single-copy regions to ensure high-quality assemblies. Single nucleotide polymorphisms (SNPs) with less than 20X depth should be masked.[1]
Mitogenomes
The mitochondrial genome, or mitogenome, is used as a molecular marker in a great variety of studies because of its maternal inheritance, high copy-number in the cell, lack of recombination, and high mutation rate. It is often used for phylogenetic studies as it is very uniform across metazoan groups, with a circular, double-stranded DNA molecule structure, about 15 to 20 kilobases, with 37 ribosomal RNA genes, 13 protein-coding genes, and 22 transfer RNA genes. Mitochondrial barcode sequences, such as COI, NADH2, 16S rRNA, and 12S rRNA, can also be used for taxonomic identification.[10] The increased publishing of complete mitogenomes allows for inference of robust phylogenies across many taxonomic groups, and it can capture events such as gene rearrangements and positioning of mobile genetic elements. Using genome skimming to assemble complete mitogenomes, the phylogenetic history and biodiversity of many organisms can be resolved.[4]
When targeting mitogenomes, there are no specific suggestions for minimum final sequencing depth, as mitogenomes are more variable in size and more variable in complexity in plant species, increasing the difficulty of assembling repeated sequences. However, highly conserved coding sequences and nonrepetitive flanking regions can be assembled using reference-guided assembly. Sequences should be masked similarly to targeting plastomes and nuclear ribosomal DNA.[1]
Nuclear repeats in the genome are an underused source of phylogenetic data. When the nuclear genome is sequenced at 5% of the genome, thousands of copies of the nuclear repeats will be present. Although the repeats sequenced will only be representative of those in the entire genome, it has been shown that these sequenced fractions accurately reflect genomic abundance. These repeats can be clustered de novo and their abundance is estimated. The distribution and occurrence of these repeat types can be phylogenetically informative and provide information about the evolutionary history of various species.[1]
Low-copy DNA
Low-copy DNA can prove useful for evolution developmental and phylogenetic studies.[11] It can be mined from high-copy fractions in a number of ways such as developing primers from databases that contain conserved orthologous genes, single‐copy conserved orthologous gene, and shared copy genes.[11] Another method is looking for novel probes that target low-copy genes using transcriptomics via Hyb-Seq.[11] While nuclear genomes assembled using genome skims are extremely fragmented, some low-copy single-copy nuclear genes can be successfully assembled.[12]
Low-quantity degraded DNA
Previous methods of trying to recover degraded DNA were based on Sanger sequencing and relied on large intact DNA templates and were affected by contamination and method of preservation. Genome skimming, on the other hand, can be used to extract genetic information from preserved species in herbariums and museums, where the DNA is often very degraded, and very little remains.[4][13] Studies in plants show that DNA as old as 80 years and with as little as 500 pg of degraded DNA, can be used with genome skimming to infer genomic information.[13] In herbaria, even with low yield and low-quality DNA, one study was still able to produce "high-quality complete chloroplast and ribosomal DNA sequences" at a large scale for downstream analyses.[14]
In field studies, invertebrates are stored in ethanol which is usually discarded during DNA-based studies.[15] Genome skimming has been shown to detect the low quantity of DNA from this ethanol-fraction and provide information about the biomass of the specimens in a fraction, the microbiota of outer tissue layers and the gut contents (like prey) released by the vomit reflex.[15] Thus, genome skimming can provide an additional method of understanding ecology via low copy DNA.[15]
Workflow
DNA extraction
DNA extraction protocols will vary depending on the source of the sample (i.e. plants, animals, etc.). The following DNA extraction protocols have been used in genome skimming:
Library preparation protocols will depend on a variety of factors: organism, tissue type, etc. In the cases of preserved specimens, specific library preparation protocols modifications may have to be made.[1] The following library preparation protocols have been used in genome skimming:
Illumina TruSeq DNA Sample Preparation kit[5][6][15]
Sequencing with short reads or long reads will depend on the target genome or genes. Microsatellites in nuclear repeats require longer reads.[23] The following sequencing platforms have been used in genome skimming:
The Illumina MiSeq platform has been chosen by certain researchers for its long read length for short reads.[6]
Assembly
After genome skimming, high-copy organellar DNA can be assembled with a reference guide or assembled de novo. High-copy nuclear repeats can be clustered de novo.[1] Assemblers chosen will depend on the target genome and whether short or long reads are used. The following tools have been used to assemble genomes from genome skims:
Annotation is used to identify genes in the genome assemblies. The annotation tool chosen will depend on the target genome and the target features of that genome. The following annotation tools have been used in genome skimming to annotate organellar genomes:
Various protocols, pipelines, and bioinformatic tools have been developed to help automate the downstream processes of genome skimming.
Hyb-Seq
Hyb-Seq is a new protocol for capturing low-copy nuclear genes that combines target enrichment and genome skimming.[29] Target enrichment of the low-copy loci is achieved through designed enrichment probes for specific single-copy exons, but requires a nuclear draft genome and transcriptome of the targeted organism. The target-enriched libraries are then sequenced, and the resulting reads processed, assembled, and identified. Using off-target reads, rDNA cistrons and complete plastomes can also be assembled. Through this process, Hyb-Seq is able to produce genome-scale datasets for phylogenomics.
GetOrganelle
GetOrganelle is a toolkit that assembles organellar genomes uses genome skimming reads.[30] Organelle-associated reads are recruited using a modified “baiting and iterative mapping” approach. The reads aligning to the target genome, using Bowtie2,[31] are referred to as “seed reads”. The seed reads are used as “baits” to recruit more organelle-associated reads via multiple iterations of extension. The read extension algorithm uses a hashing approach, where the reads are cut into substrings of certain lengths, referred to as “words”. At each extension iteration, these “words” are added to a hash table, referred to as a “baits pool”, which dynamically increases in size with each iteration. Due to the low sequencing coverage of genome skims, non-target reads, even those with high sequence similarity to target reads, are largely not recruited. Using the final recruited organellar-associated reads, GetOrganelle conducts a de novoassembly, using SPAdes.[32] The assembly graph is filtered and untangled, producing all possible paths of the graph, and therefore all configurations of the circular organellar genomes.
Skmer
Skmer is an assembly-free and alignment-free tool to compute genomic distances between the query and reference genome skims.[33] Skmer uses a 2 stage approach to compute these distances. First, it generates k-mer frequency profiling using a tool called JellyFish[34] and then these k-mers are converted into hashes.[33] A random subset of these hashes are selected to form a so-called "sketch".[33] For its second stage, Skmer uses Mash[35] to estimate the Jaccard index of two of these sketches.[33] The combination of these 2 stages is used to estimate the evolutionary distance.[33]
Geneious
Geneious is an integrative software platform that allows users to perform various steps in bioinformatic analysis such as assembly, alignment, and phylogenetics by incorporating other tools within a GUI based platform.[18][28]
Although genome skimming is usually chosen as a cost-effective method to sequence organellar genomes, genome skimming can be done in silico if (deep) whole-genome sequencing data has already been obtained. Genome skimming has been demonstrated to simplify organellar genome assembly by subsampling the reads of the nuclear genome via in silico genome skimming.[37][38] Since the organellar genomes will be high-copy in the cell, in silico genome skimming essentially filters out nuclear sequences, leaving a higher organellar to nuclear sequence ratio for assembly, reducing the complexity of the assembly paradigm. In silico genome skimming was first done as a proof-of-concept, optimizing the parameters for read type, read length, and sequencing coverage.[1]
Other Applications
Other than the current uses listed above, genome skimming has also been applied to other tasks, such as quantifying pollen mixtures,[19] monitoring and conservation of certain populations.[39] Genome skimming can also be used for variant calling, to examine single nucleotide polymorphisms across a species.[22]
Advantages
Genome skimming is a cost-effective, rapid and reliable method to generate large shallow datasets,[5] since several datasets (plastid, mitochondrial, nuclear) are generated per run.[3] It is very simple to implement, requires less lab work and optimization, and does not require a priori knowledge of the organism nor its genome size.[3] This provides a low-risk avenue for biological inquiry and hypothesis generation without a huge commitment of resources.[6]
Genome skimming is an especially advantageous approach regarding cases where the genomic DNA may be old and degraded from chemical treatments, such as specimens from herbarium and museum collections,[4] a largely untapped genomic resource. Genome skimming allows for the molecular characterization of rare or extinct species.[5] The preservation processes in ethanol often damage the genomic DNA, which hinders the success of standard PCR protocols[3] and other amplicon-based approaches.[5] This presents an opportunity to sequence samples with very low DNA concentrations, without the need for DNA enrichment or amplification. Library preparation for specific to genome skimming has been shown to work with as low as 37 ng of DNA (0.2 ng/ul), 135-fold less than recommended by Illumina.[1]
Although genome skimming is mostly used to extract high-copy plastomes and mitogenomes, it can also provide partial sequences of low-copy nuclear sequences. These sequences may not be sufficiently complete for phylogenomic analysis, but can be sufficient for designing PCR primers and probes for hybridization-based approaches.[1]
Genome skimming is not dependent on any specific primers and remains unaffected by gene rearrangements.[4]
Limitations
Genome skimming scratches the surface of the genome, so it will not suffice for biological questions that require gene prediction and annotation.[6] These downstream steps are required for deep and more meaningful analyses.
Although plastid genomic sequences are abundant in genome skims, the presence of mitochondrial and nuclear pseudogenes of plastid origin can potentially pose issues for plastome assemblies.[1]
A combination of sequencing depth and read type, as well as genomic target (plastome, mitogenome, etc.), will influence the success of single-end and paired-end assemblies, so these parameters must be carefully chosen.[1]
Scalability
Both the wet-lab and the bioinformatics parts of genome skimming have certain challenges with scalability. Although the cost of sequencing in genome skimming is affordable at $80 for 1 Gb in 2016, the library preparation for sequencing is still very expensive, at least ~$200 per sample (as of 2016). Additionally, most library preparation protocols have not been fully automated with robotics yet. On the bioinformatics side, large complex databases and automated workflows need to be designed to handle the large amounts of data resulting from genome skimming. The automation of the following processes need to be implemented:[40]
Assembly of the standard barcodes
Assembly of organellar DNA (as well as nuclear ribosomal tandem repeats)
Annotation of the different assembled fragments
Removal of potential contaminant sequences
Estimation of sequencing coverage for single-copy genes
Extraction of reads corresponding to single-copy genes
Identification of unknown specimen from a small shotgun sequencing or any DNA fragment
Identification of the different organisms from shotgun sequencing of environmental DNA (metagenomics)
Some of these scalability challenges have already been implemented, as shown above in the "Tools and Pipelines" section.
^ abcdefghijklmnoStraub, Shannon C. K.; Parks, Matthew; Weitemier, Kevin; Fishbein, Mark; Cronn, Richard C.; Liston, Aaron (February 2012). "Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics". American Journal of Botany. 99 (2): 349–364. doi:10.3732/ajb.1100335. PMID22174336.
^ abcdefgDodsworth, Steven Andrew, author. Genome skimming for phylogenomics. OCLC1108700470. {{cite book}}: |last= has generic name (help)CS1 maint: multiple names: authors list (link)
Radio station in Muldrow, Oklahoma (Fort Smith, Arkansas) For the Anaheim, California radio station that held the call sign KXMX at 1190 AM from 2000 to 2011, see KGBN. KXMXMuldrow, OklahomaBroadcast areaFort Smith, ArkansasFrequency105.1 FM (MHz)BrandingMix 105.1ProgrammingFormatVarietyOwnershipOwnerG2 Media Group LLCTechnical informationFacility ID189538ClassAERP6,000 wattsHAAT98 metersTransmitter coordinates35°30′49″N 94°35′18″W / 35.51364°N 94.58836°W /...
Pengetahuan akan bahasa-bahasa asing di Belanda dalam persen dari penduduk berusia 15 tahun keatas, 2006. Data diambil dari survei UE. (europa.eu) Pengetahuan akan bahasa Jerman di Belanda, 2005. Menurut Eurobarometer: 70% responden menyatakan tahu bahasa Jerman untuk sekadar bercakap-cakap. Di antaranya, 12% (persen, bukan titik persentase) menyatakan tahu betul akan bahasa tersebut sementara 22% tahu baik dan 43% tahu sekadar dasar-dasar bahasa Jerman. Bahasa resmi di Belanda adalah bahasa ...
Coppa dell'Unione Sovietica 1951Kubok SSSR 1951 Competizione Kubok SSSR Sport Calcio Edizione 12ª Organizzatore FFSSSR Date dal 12 agosto 1951al 17 ottobre 1951 Luogo Unione Sovietica Partecipanti 51 Formula turni ad eliminazione diretta Risultati Vincitore CDSA Mosca(3º titolo) Secondo Città di Kalinin Semi-finalisti VVS Mosca Šachtyor Stalino Statistiche Incontri disputati 52 Gol segnati 202 (3,88 per incontro) Cronologia della competizione 1950 1952...
Cet article est une ébauche concernant le cyclisme. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Tour de La Provence 2024 aux Baux de ProvenceGénéralitésCourse8e Tour de La ProvenceCompétitionUCI Europe Tour 2024 2.1Étapes4Dates8 – 11 février 2024Distance510,4 kmPays FranceLieu de départMarseilleLieu d'arrivéeArlesÉquipes17Partants117Arrivants68Vitesse moyenne42,975 km/hSite officielSite officielR...
Italian political party Federation of Liberals Federazione dei LiberaliLeadersRaffaello MorelliValerio ZanoneFounded6 February 1994Dissolvedc. 2014[1]Preceded byItalian Liberal Party[2]HeadquartersVia Laurina 20 - 00187 RomeIdeologyLiberalismSocial liberalismPolitical positionCentre-leftPolitics of ItalyPolitical partiesElections The Federation of Liberals (Italian: Federazione dei Liberali, FdL) was a minor liberal political party in Italy. The party was founded on ...
Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Stadion Kuonoto – berita · surat kabar · buku · cendekiawan · JSTOR Stadion KuonotoLokasiLokasi Kabupaten Buol, Sulawesi TengahData teknisKapasitas8.000 penontonPemakaiPersbul BuolSunting kotak info ...
Scribblenauts Unmasked redirects here. For the limited comic book series, see Scribblenauts Unmasked: A Crisis of Imagination. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Scribblenauts Unmasked: A DC Comics Adventure – news · newspapers · books · scholar · JSTOR (May 2014) (Learn how and when to remove t...
烏克蘭總理Прем'єр-міністр України烏克蘭國徽現任杰尼斯·什米加尔自2020年3月4日任命者烏克蘭總統任期總統任命首任維托爾德·福金设立1991年11月后继职位無网站www.kmu.gov.ua/control/en/(英文) 乌克兰 乌克兰政府与政治系列条目 宪法 政府 总统 弗拉基米尔·泽连斯基 總統辦公室 国家安全与国防事务委员会 总统代表(英语:Representatives of the President of Ukraine) 总...
Udachny Pembagian administratif Rusiakota Уда́чный (ru) flag of Udachny Tempat Negara berdaulatRusiaRepublik di RusiaSakhaMunicipal districtMirninsky DistrictUrban settlement in RussiaQ23898794 Ibu kota dariQ23898794 NegaraRusia PendudukTotal11.676 (2018 )GeografiLuas wilayah2 km² [convert: unit tak dikenal]Ketinggian380 m SejarahPembuatan1967 Informasi tambahanKode pos678188 Kode telepon41136 OKTMO ID98631109001 OKATO ID98231509000 Lain-lainSitus webLaman resmi Udachny (R...
Railway station in Kagoshima, Kagoshima Prefecture, Japan Goino Station五位野駅General informationLocation895 Hirakawa-chō, Kagoshima, Kagoshima(鹿児島県鹿児島市平川町895)JapanOperated byJR KyushuLine(s)Ibusuki Makurazaki LineHistoryOpened1930 Goino Station (五位野駅, Goino-eki) is a railway station located in Kagoshima, Kagoshima, Japan. The station opened in 1930. Lines Kyushu Railway Company Ibusuki Makurazaki Line JR 1 ■ Ibusuki Makurazaki Line for Ibusuki a...
Railway station in Cumbria, England This article is about the open English railway station. For the Australian station, see Carlisle railway station, Perth. For the now-closed former Newcastle and Carlisle Railway station, see Carlisle London Road railway station. CarlisleCarlisle CitadelThe main facade of Carlisle station in 2018General informationLocationCarlisle, CumberlandEnglandCoordinates54°53′28″N 2°56′02″W / 54.891°N 2.934°W / 54.891; -2.934Grid ref...
نجمة عشتار تصوير لنجمة عشتار (يسار) على كودورو لميلي شيباك الثاني (القرن 12 قبل الميلاد). نجمة عشتار أو نجمة إنانا وهي رمز سومري قديم للاله إنانا ونظيرتها في اللغات السامية الشرقية عشتار. إلى جانب الأسد، هو أحد الرموز الأساسية لعشتار. لأن عشتار كانت مرتبطًه بكوكب الزهرة، ويُع...
Tour-monastère de l'abbaye de LérinsLe monastère fortifié face au large, au sud de l’île.PrésentationType MonastèrePartie de Abbaye de LérinsConstruction fin XIe siècle - XVe sièclePatrimonialité Patrimoine en péril (2022) Classé MH (1840)LocalisationDépartement Alpes-MaritimesCommune CannesCoordonnées 43° 30′ 19″ N, 7° 02′ 52″ E Géolocalisation sur la carte : France Géolocalisation sur la carte : Provence-Alp...
Ne doit pas être confondu avec La Fortaleza. Fortaleza Devise : Fortitudine Héraldique Différentes vues de la ville. Administration Pays Brésil Région Nord-Est État Ceará Langue(s) portugais Maire Roberto Cláudio (PSB) Code postal 60 000-000 Fuseau horaireHeure d'été UTC-3UTC-3 Indicatif 85 Démographie Gentilé fortalesenseou fortaliciense Population 2 703 391 hab.[1] (2021) Densité 8 655 hab./km2 Population de l'agglomération 3 415 455...
Norwegian anthem & drinking song (1771) Norges SkaalEnglish: Norway’s ToastMusic sheet of Norges SkaalFormer unofficial national anthem of NorwayLyricsJohan Nordahl BrunMusicErnest Modeste Grétry, 1771Adopted1782Relinquished1820Audio sampleNorges Skaalfilehelp Norges Skaal (English: Norway’s Toast) was written in 1771 by Johan Nordahl Brun in Copenhagen during the period when Norway was in a personal union with Denmark, as a drinking song for the Norwegian literary society in Co...