GESTALT

Flowchart for GESTALT workflow.

Genome editing of synthetic target arrays for lineage tracing (GESTALT) is a method used to determine the developmental lineages of cells in multicellular systems.[1] GESTALT involves introducing a small DNA barcode that contains regularly spaced CRISPR/Cas9 target sites into the genomes of progenitor cells. Alongside the barcode, Cas9 and sgRNA are introduced into the cells. Mutations in the barcode accumulate during the course of cell divisions and the unique combination of mutations in a cell's barcode can be determined by DNA or RNA sequencing to link it to a developmental lineage.

Background

Fate mapping is the process of identifying the embryonic origins of adult tissues. Lineage tracing is more specific, encompassing methods which examine the progeny that arise from a single/few cells.[2] One of the first lineage tracing methods developed involved the injection of dyes into specific cells of an early embryo, thereby labeling them and their progeny at each cell division.[3] Later methods used retroviral labeling, employing retroviruses to introduce a marker gene like fluorescent protein or beta-galactosidase into the genomes of the cells of interest, resulting in constitutive expression of the marker in those cells and their progeny.[4] These methods have the drawback of being invasive, and relatively difficult in targeting which cells to label.[2] Currently, the most widely used approach involves cell labeling via genetic recombination systems. These methods use recombinases, the two main ones being the Cre-loxP and Flp-firt systems, which can delete segments of DNA flanked by the loxP and frt sites, respectively.[5][6] In this method, a transgenic model is created that can express Cre recombinase and has a reporter gene with an upstream stop cassette flanked by loxP sites. Cre recombination deletes the STOP cassette upstream of a reporter gene, allowing for expression of the reporter.[7] Spatial control over the labeled cells is achieved by using specific Cre alleles under the control elements of a chosen marker gene, and temporal control can be obtained if inducible Cre alleles are used.[7] For example, CreERT only has active recombination activity upon administration of tamoxifen.[8] Although powerful, it requires significant optimization to facilitate single cell lineage tracing and is low throughput.[9] Sequencing-based methods of lineage tracing have begun to emerge as they provide significantly higher resolution and high-throughput tracing of cell fate.[9][10] Early approaches leveraged naturally occurring somatic mutations to identify cell lineage relationships.[11]

Principles

GESTALT takes advantage of the CRISPR-Cas9 system, which allows for the targeting of double stranded breaks in DNA to highly specific sites adjacent to PAM motifs based on the sequence of the sgRNA.[12] These breaks are then repaired by one of the endogenous cellular DNA mechanisms: non-homologous end joining DNA repair, or homology-directed repair.[12] Non-homolgous end joining is the more active of the two repair pathways, resulting in indels occurring at the targeted site.[13] The GESTALT system uses an array of ten CRISPR/Cas9 targets, with the first site having perfect specificity to the designed sgRNA, and the other nine having less Cas9 activity due to mismatches with the sgRNA.[1] Introducing the CRISPR-Cas9 reagents to cells carrying this array will cause the accumulation of indels at potentially each target of the array, marking the cell with a unique barcode sequence that can be used to identify it and its progeny via DNA or RNA-sequencing.[1]

Procedure

Design of the barcode array

The target sequences are 23 bp long, including a protospacer and PAM sequence. The target sequences are placed in contiguous array, separated by 3 to 5 bp linker sequences. Each target sequence must be screened against the genome of the host organism to ensure the specificity of the target sequences. Cas9 activity at each target site can be assessed using the GUIDE-seq assay.[14]

Introducing the array into the target cell/organism

Two separate methods of introducing barcode arrays into the genomes of cells are used. The first method transduces progenitor cells with a lentivirus construct containing the barcode array inserted into the 3'-UTR of EGFP. This results in the incorporation of the barcode array into the genome and marks barcoded cells through stable expression of EGFP. A second method involves creating transgenic animal lines; the transgenic model has previously been generated using a Tol2 transgenesis vector which contains a barcode array cloned into the 3' UTR of DsRed under control of the ubiquitin promoter.[15][additional citation(s) needed]

Induction of the CRISPR-Cas9-mediated editing of cellular barcodes

Initiation of barcode editing and labeling of cells is done by introducing the Cas9 protein and sgRNAs into progenitor cells. The CRISPR-Cas9 complex randomly produces double-stranded breaks in the barcode regions and subsequent NHEJ repair introduces random indels, resulting in a unique DNA sequence at the barcode region in each cell at time of labeling. There are multiple methods of delivering the CRISPR-Cas9 reagents into cells and it is an active field of research.[16] CRISPR-Cas9 reagents can be introduced into cells via transfection using lipid nanoparticles.[16] Alternatively, microinjection of the CRISPR-Cas9 reagents can be performed on 1-cell embryos.[17] The delivery of CRISPR-Cas9 reagents can be done at different developmental times to change the labeled populations. Barcode editing may persist for several hours after delivery.[1]

Sequencing of barcodes and reconstruction of cell lineage tree

Following delivery of the CRISPR-Cas9 reagents, time is allowed for barcode editing and further development to occur, resulting in the expansion of the labeled populations and the unique marking of their progeny. Genomic DNA or RNA can then be extracted from the progeny cells or tissues of interest and the barcodes can be PCR-amplified. Unique molecular identifiers are used to correct for PCR bias and each UMI-barcode combo is therefore from a single cell. All barcode alleles can then be sequenced via NGS and the entire set of identified alleles can be subjected to phylogenetic analysis, identifying cell lineage based on barcode similarity. To control for sequencing error, only indels can be considered as most sequencing errors inherent to next-generation sequencing are base substitutions.[18][1]

scGESTALT

Diagram describing the transgenic zebrafish engineered for scGESTALT.

Single cell GESTALT (scGESTALT) adds upon the GESTALT system by integrating simultaneous capture of barcode and transcriptome information using scRNA-seq.[19] In scGESTALT, the barcode is cloned into progenitor cells of interest downstream of an inducible promoter. When the developmental period is complete, expression of the barcode will be induced and the barcode mRNA will be sequenced alongside the rest of the transcriptome using scRNA-seq.[19] The transcriptomic data can be used to track cell type differentiation while the barcodes can be used to create developmental relationships with other cells. An additional improvement is the ability to induce labeling at two different time points. This is enabled through the cloning of the Cas9/sgRNA under a heat shock promoter; the first labeling event is induced via microinjection like traditional GESTALT, while a subsequent second labeling period is initiated by heat shock-induced expression of Cas9 and sgRNAs.[19] This enables lineage tracing during later stages of development, beyond what is possible with GESTALT.

Limitations

  • GESTALT is restricted to early embryogenesis because microinjection of Cas9 and sgRNA is only viable when performed on a small number of progenitor cells.[1] As a result, barcode editing is restricted to early development, meaning that deciphering later lineage relationships is not possible. This limitation was partially addressed by the development of scGESTALT and related methods with inducible Cas9 and sgRNA expression systems which enable labeling at later developmental time points.[19]
  • The barcode sequences of GESTALT alone do not provide any information about the cell type the barcode was identified in. scGESTALT addresses this challenge by linking the barcodes to the transcriptome of the cell, allowing for determination of cell identity.[19]
  • In a portion of the cells, overlapping deletions may result in the loss of previously accrued marks in the barcode region, resulting in the loss of lineage information.[1]
  • It is possible that a similar or identical edit can emerge by chance in cells belonging to two separate lineages, resulting in the erroneous association of those lineages.[1]
  • scGESTALT suffers from the same drop-out issues observed in single-cell methods.[20] Barcode sequences are only captured in 30% of the cells.[19] Additionally, some cell types may silence the expression of the barcode construct, resulting in loss of lineage information for that cell type.[9]

Applications

GESTALT was initially developed to examine the contributions of embryonic progenitors to the adult organ systems of zebrafish.[1] By sequencing the barcodes from bulk extractions of organ systems, each organ was found to possess only a small number of the barcode alleles, indicating that organs arise from the clonal expansion of a small number of early progenitors.[1] The lineage information of thousands of differentiated cells was captured in the experiment and demonstrated the high-throughput lineage tracing capabilities of GESTALT.[1]

scGESTALT has been used to refine the lineage tree of the zebrafish brain.[19] The existence of multipotent progenitors which give rise to cells that migrate across the brain was discovered following a scGESTALT experiment where some barcode sequences were captured in cell populations in the forebrain, midbrain, and the hindbrain.[19] Pseudotime trajectories generated using the scRNA-seq data for oligodendrocyte progenitors to oligodendrocytes as well as atoh1c+ progenitors to pax6b+ neurons were found to be consistent with the barcode distribution across those cell types.[19]

  • Memory by Engineered Mutagenesis with Optical In Situ Readout (MEMOIR) is a related lineage tracing method that relies on Cas9/gRNA modification of a barcode.[21] There are two major differences from GESTALT. Firstly, instead of introducing mutations to a barcode, in MEMOIR the Cas9-sgRNA deletes regions of the barcode. Secondly, instead of traditional sequencing, MEMOIR employs sequential multiplexed single-molecule RNA fluorescence hybridization (seqFISH) to in-situ read the barcode within single cells.
  • Lineage tracing by nuclease-activated editing of ubiquitous sequences (LINNAEUS) is an attempt to improve upon scGESTALT.[22] In LINNAEUS, the barcode is replaced with multiple transgenic reporter genes which are targeted by Cas9/sgRNA. The reporter genes are spread throughout the genome which ensures that subsequent Cas9 editing does not overwrite previous editing.
  • ScarTrace is a method based on the same principles as scGESTALT. It tracks cell lineages through Cas9/sgRNA editing of a barcode composed of eight in-tandem copies of a histone–green fluorescent protein (GFP) transgene.[23] ScarTrace integrates scRNA-seq data for cell type analysis but instead of only sequencing the barcode from mRNA as in scGESTALT, ScarTrace also uses a nested PCR to amplify the barcode from gDNA. This is purported to be more reliable as the mRNA barcode could be unstable or situationally silenced.[23]

References

  1. ^ a b c d e f g h i j k McKenna, Aaron; Findlay, Gregory M.; Gagnon, James A.; Horwitz, Marshall S.; Schier, Alexander F.; Shendure, Jay (2016-07-29). "Whole-organism lineage tracing by combinatorial and cumulative genome editing". Science. 353 (6298): aaf7907. doi:10.1126/science.aaf7907. ISSN 0036-8075. PMC 4967023. PMID 27229144.
  2. ^ a b Kretzschmar, Kai; Watt, Fiona M. (2012). "Lineage Tracing". Cell. 148 (1–2): 33–45. doi:10.1016/j.cell.2012.01.002. ISSN 0092-8674. PMID 22265400.
  3. ^ Vogt, Walther (1929-06-01). "Gestaltungsanalyse am Amphibienkeim mit Örtlicher Vitalfärbung". Wilhelm Roux' Archiv für Entwicklungsmechanik der Organismen (in German). 120 (1): 384–706. doi:10.1007/BF02109667. ISSN 1432-041X. PMID 28354436. S2CID 31738009.
  4. ^ Turner, David L.; Cepko, Constance L. (1987). "A common progenitor for neurons and glia persists in rat retina late in development". Nature. 328 (6126): 131–136. Bibcode:1987Natur.328..131T. doi:10.1038/328131a0. ISSN 1476-4687. PMID 3600789. S2CID 4263087.
  5. ^ Sauer, B; Henderson, N (1988). "Site-specific DNA recombination in mammalian cells by the Cre recombinase of bacteriophage P1". Proceedings of the National Academy of Sciences. 85 (14): 5166–5170. Bibcode:1988PNAS...85.5166S. doi:10.1073/pnas.85.14.5166. ISSN 0027-8424. PMC 281709. PMID 2839833.
  6. ^ Golic, Kent G.; Lindquist, Susan (1989). "The FLP recombinase of yeast catalyzes site-specific recombination in the drosophila genome". Cell. 59 (3): 499–509. doi:10.1016/0092-8674(89)90033-0. ISSN 0092-8674. PMID 2509077. S2CID 44880098.
  7. ^ a b Branda, Catherine S.; Dymecki, Susan M. (2004). "Talking about a Revolution". Developmental Cell. 6 (1): 7–28. doi:10.1016/s1534-5807(03)00399-x. ISSN 1534-5807. PMID 14723844.
  8. ^ Feil, R; Brocard, J; Mascrez, B; LeMeur, M; Metzger, D; Chambon, P (1996). "Ligand-activated site-specific recombination in mice". Proceedings of the National Academy of Sciences. 93 (20): 10887–10890. Bibcode:1996PNAS...9310887F. doi:10.1073/pnas.93.20.10887. ISSN 0027-8424. PMC 38252. PMID 8855277.
  9. ^ a b c VanHorn, Sadie; Morris, Samantha A. (2021). "Next-Generation Lineage Tracing and Fate Mapping to Interrogate Development". Developmental Cell. 56 (1): 7–21. doi:10.1016/j.devcel.2020.10.021. ISSN 1534-5807. PMID 33217333.
  10. ^ Masuyama, Nanami; Konno, Naoki; Yachie, Nozomu (2022-07-29). "Molecular recorders to track cellular events". Science. 377 (6605): 469–470. Bibcode:2022Sci...377..469M. doi:10.1126/science.abo3471. ISSN 0036-8075. PMID 35901151. S2CID 251159144.
  11. ^ Lodato, Michael A.; Woodworth, Mollie B.; Lee, Semin; Evrony, Gilad D.; Mehta, Bhaven K.; Karger, Amir; Lee, Soohyun; Chittenden, Thomas W.; D’Gama, Alissa M.; Cai, Xuyu; Luquette, Lovelace J.; Lee, Eunjung; Park, Peter J.; Walsh, Christopher A. (2015-10-02). "Somatic mutation in single human neurons tracks developmental and transcriptional history". Science. 350 (6256): 94–98. Bibcode:2015Sci...350...94L. doi:10.1126/science.aab1785. ISSN 0036-8075. PMC 4664477. PMID 26430121.
  12. ^ a b Doudna, Jennifer A.; Charpentier, Emmanuelle (2014-11-28). "The new frontier of genome engineering with CRISPR-Cas9". Science. 346 (6213). doi:10.1126/science.1258096. ISSN 0036-8075. PMID 25430774. S2CID 6299381.
  13. ^ Xue, Chaoyou; Greene, Eric C. (2021). "DNA Repair Pathway Choices in CRISPR-Cas9-Mediated Genome Editing". Trends in Genetics. 37 (7): 639–656. doi:10.1016/j.tig.2021.02.008. ISSN 0168-9525. PMC 8187289. PMID 33896583.
  14. ^ Malinin, Nikolay L.; Lee, GaHyun; Lazzarotto, Cicera R.; Li, Yichao; Zheng, Zongli; Nguyen, Nhu T.; Liebers, Matthew; Topkar, Ved V.; Iafrate, A. John; Le, Long P.; Aryee, Martin J.; Joung, J. Keith; Tsai, Shengdar Q. (2021). "Defining genome-wide CRISPR–Cas genome-editing nuclease activity with GUIDE-seq". Nature Protocols. 16 (12): 5592–5615. doi:10.1038/s41596-021-00626-x. ISSN 1750-2799. PMC 9331158. PMID 34773119.
  15. ^ Kawakami, Koichi (2007-10-31). "Tol2: a versatile gene transfer vector in vertebrates". Genome Biology. 8 (1): S7. doi:10.1186/gb-2007-8-s1-s7. ISSN 1474-760X. PMC 2106836. PMID 18047699.
  16. ^ a b Yang, Wu; Yan, Jiaqi; Zhuang, Pengzhen; Ding, Tao; Chen, Yu; Zhang, Yu; Zhang, Hongbo; Cui, Wenguo (2022-08-03). "Progress of delivery methods for CRISPR-Cas9". Expert Opinion on Drug Delivery. 19 (8): 913–926. doi:10.1080/17425247.2022.2100342. ISSN 1742-5247. PMID 35818792. S2CID 250455556.
  17. ^ Hruscha, Alexander; Schmid, Bettina (2015), Lossi, Laura; Merighi, Adalberto (eds.), "Generation of Zebrafish Models by CRISPR/Cas9 Genome Editing", Neuronal Cell Death: Methods and Protocols, Methods in Molecular Biology, vol. 1254, New York, NY: Springer, pp. 341–350, doi:10.1007/978-1-4939-2152-2_24, ISBN 978-1-4939-2152-2, PMID 25431076, retrieved 2024-02-29
  18. ^ Stoler, Nicholas; Nekrutenko, Anton (2021-01-06). "Sequencing error profiles of Illumina sequencing instruments". NAR Genomics and Bioinformatics. 3 (1): lqab019. doi:10.1093/nargab/lqab019. ISSN 2631-9268. PMC 8002175. PMID 33817639.
  19. ^ a b c d e f g h i Raj, Bushra; Wagner, Daniel E.; McKenna, Aaron; Pandey, Shristi; Klein, Allon M.; Shendure, Jay; Gagnon, James A.; Schier, Alexander F. (2018). "Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain". Nature Biotechnology. 36 (5): 442–450. doi:10.1038/nbt.4103. ISSN 1546-1696. PMC 5938111. PMID 29608178.
  20. ^ AlJanahi, Aisha A.; Danielsen, Mark; Dunbar, Cynthia E. (2018). "An Introduction to the Analysis of Single-Cell RNA-Sequencing Data". Molecular Therapy - Methods & Clinical Development. 10: 189–196. doi:10.1016/j.omtm.2018.07.003. ISSN 2329-0501. PMC 6072887. PMID 30094294.
  21. ^ Wang, Zhifu; Zhu, Jianhong (2017-12-01). "MEMOIR: A Novel System for Neural Lineage Tracing". Neuroscience Bulletin. 33 (6): 763–765. doi:10.1007/s12264-017-0161-y. ISSN 1995-8218. PMC 5725379. PMID 28780643.
  22. ^ Spanjaard, Bastiaan; Hu, Bo; Mitic, Nina; Olivares-Chauvet, Pedro; Janjuha, Sharan; Ninov, Nikolay; Junker, Jan Philipp (May 2018). "Simultaneous lineage tracing and cell-type identification using CRISPR–Cas9-induced genetic scars". Nature Biotechnology. 36 (5): 469–473. doi:10.1038/nbt.4124. ISSN 1546-1696. PMC 5942543. PMID 29644996.
  23. ^ a b Alemany, Anna; Florescu, Maria; Baron, Chloé S.; Peterson-Maduro, Josi; van Oudenaarden, Alexander (April 2018). "Whole-organism clone tracing using single-cell sequencing". Nature. 556 (7699): 108–112. doi:10.1038/nature25969. ISSN 1476-4687.