Circular consensus sequencing (CCS) is a DNA sequencing method that is used in conjunction with single-molecule real-time sequencing to yield highly accurate long-read sequencing datasets with read lengths averaging 15–25 kb with median accuracy greater than 99.9%.[1][2] These long reads, which are created via the formation of consensus sequencing obtained from multiple passes on a single DNA molecule, can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes.[3]
CCS allows resolution of large or complex genomes – such as the California Redwood genome, nine times the size of the human genome - of any species, including variant detection single nucleotide variants (SNVs) to structural variants, with high precision.[4][5] CCS also enables separation of the different copies of each chromosome (e.g., maternal and paternal for diploid), known as haplotypes. CCS reads offer the benefits of high accuracy equivalent to short-read sequencing data, but with the length necessary for complex genome assemblies and phasing of variants across the genome.[6][7]
Technology
In this method, circularized fragments of DNA in solution float across the surface of a nanofluidic chip called a SMRT (Single Molecule, Real-Time) Cell. The surface of the chip is covered with millions of wells called zero-mode waveguides (ZMWs), each a few nanometers wide.[8] To prepare a sample for CCS/HiFi sequencing, primers and DNA polymerase are added to SMRTbell libraries. The circularized DNA becomes trapped in the ZMW, nucleotides are added, and the DNA polymerase enzyme begins to copy the molecule base by base. As this happens, a tiny amount of light is released and read by a detector, which helps the sequencer’s computer determine the order of bases present in the sample. The circularized DNA is sequenced in repeated passes to ensure accuracy – thus the name “circular” consensus sequencing – then the primers and adapters are removed using bioinformatics to deliver a highly accurate consensus DNA read.[9]
In CCS, the genomic DNA is prepared without amplification such that individual base modifications such as methylation can be detected during sequencing. This allows for the capture of both sequence and valuable methylation information in a single experiment.[10]
History
This sequencing method was first described by Travers, K.J., et al. in Nucleic Acids Research in 2010.[3] It was later commercialized by Pacific Biosciences in 2018 and made available on Sequel II and Revio long-read sequencing instruments.[11][12]
CCS technology has subsequently been used to power numerous studies in several fields, including: Human, telomere-to-telomere, whole genome assembly and pangenome research,[13][14][15] pediatric rare disease genomic analysis,[16][17] understanding DNA methylation in a rare disease cohorts,[18] assembly of whole genomes of non-human vertebrates,[19] assembly of whole genomics of other agriculturally significant species,[20] analysis of cancer genomes[21][22] and Metagenomics and microbial research, among others.[23][24]
Recognizing the importance of this technology in future genomic exploration and discovery, the editors of Nature Methods named long-read sequencing technology its method of the year for 2022.[25]
Applications
Human and conservation biology
CCS can be useful to researchers seeking to perform de novosequencing assembly or studying haplotyped phased sequences from each chromosomal copy, regardless of how many chromosomes are present in the species.Many biodiversity-oriented consortia have leveraged such technology to complete their conservation biology studies including African Biogenome Project, California Conservation Genomics Project, Darwin Tree of Life, Desert Agriculture Initiative, Earth Biogenome Project, Global Ant Genomics Alliance, Human Pangenome, Telomere-to-Telomere Consortium, The 10,000 Fish Genomes Project and Vertebrate Genomes Project.[26][27][28]
Human health
Circular consensus sequencing is helping researchers identify and characterize rare or structural variants with high confidence to better identify the underlying genomics of a given phenotype, with numerous applications to human health including rare disease research, microbiology and infectious disease, cancer research, and other genetic disease research areas.[29][30]
Rare diseases
Although they occur with low frequency in the human population, rare diseases as a collective are common and most have a genetic cause, presenting unique diagnostic challenges. An estimated 50–80% of structural variants are tandem repeats.[31]
Because CCS provides a comprehensive view of variation in the human genome, producing complete, accurate, and phased assemblies for variant calling, identification of repeat expansions and medically relevant interruption sequences, it is enabling the identification of causative pathogenic variants and helping researchers discover novel disease-associated genes.[32]
Microbiology and infectious diseases
Circular consensus sequencing can rapidly identify emerging pathogens and/or detection of changing pathogen genomics as part of regional or global surveillance operations.Where other molecular technologies for public health surveillance may require re-validation or the development of new panels, the unbiased nature of circular consensus sequencing delivers comprehensive genetic information to further characterize global outbreaks, pandemics, and epidemics.[12]
Cancer research
Comprehensive resolution of structural variants enables researchers to better study and detect somatic variants driving cancer. Because of their size (>50 bp), structural variants and tandem repeats account for much genomic variation between individuals.[33]
Long-read RNA sequencing can be useful in cancer research to uncover sources of alternative splicing and fusion events which power cancer growth.[34][35][36][37] CCS also provides an advantage over other sequencing technologies as it can provide phasing information of expressed mutations.[38]