Pharmacogenomics annotation refers to the use of genomic data as input to generate clinical recommendations tailored to the individual genotype. Examples of pharmacogenomics annotation tools are PharmCAT,[1] PAnno,[2] and PharmVIP.[3] For those three tools, genomic data is inputted as a Variant Call Format (VCF) file, and the output is the corresponding prescribing recommendations.
Background
Pharmacogenomics
Pharmacogenomics is a field of study combining pharmacology with genomics. It seeks to tailor drug prescribing to individual patients using, in part, genotyping data from those patients.[4] Individuals may possess genetic variants (differences in their DNA) that may alter the pharmacokinetics (PK; the way that drugs are absorbed, distributed to tissues, metabolized, and excreted) or pharmacodynamics (PD; how the drug interacts with its receptor) of specific drugs. Alterations in pharmacokinetics or pharmacodynamics can impact the effectiveness of pharmacological interventions,[5] even in non-pharmacogenomic contexts.[6] Pharmacogenomics research has benefitted from the adoption of Next Generation Sequencing (NGS) technologies, which enable higher-throughput methods of identifying pharmacologically relevant variants.[7]
Through research into these variants, both clinicians and researchers have found that modifying the doses of drugs can be effective for treating patients carrying variants affecting their PK and PD.[5] Prescribing recommendations already exist for certain variants that have been identified through prior research into genetic predictors of drug effectiveness and side effects.[8]
Star alleles
Clinically actionable haplotypes are referred to in the literature as star (*) alleles.[9] Haplotypes are groups of variants that are inherited together, due to being physically close on the same chromosome, reducing the chance of crossover during meiosis. Star alleles are of particular interest in pharmacogenomics due to their clinical utility. Pharmacogene diplotypes are generated from the maternal and paternal star alleles, and represent the combination of parental haplotypes.[10]
Variant call format
Genomic data is inputted as a VCF File.[1][2] VCF Files are one type of file format used in bioinformatics. They can be generated from genomic data through separate bioinformatics software.
Annotation
Pharmacogenomics annotation also relies on genes and variants being annotated. Annotation in the genetics context means matching DNA sequences to a corresponding gene, protein, or variant. Pharmacogenomics annotation tools do exactly that, but with an extra step - they match a specific sequence (or inferred haplotype) with a phenotype, which in turn is matched to a set of corresponding dosing or prescribing recommendations.[2][11][12]
Implementation of pharmacogenomics in clinical settings
Initiatives have sought to formulate plans, guidelines, and recommendations for implementing pharmacogenomic testing and personalized drug prescribing into clinical settings.[8][13] Pharmacogenomics faces a number of obstacles to successful implementation into clinical practice - notably, cost to sequence, lack of knowledge/education on the subject, and lack of approachability.[5] These tools seek to address, in part, this lack of approachability through software platforms that take in genomic data and output relevant and actionable clinical recommendations.
General workflow used for pharmacogenomics annotation tools.
Use
The general workflow for pharmacogenomics annotation consists of two steps:[14]
Processing and allele determination
Matching pharmacogenomic phenotypes to diplotypes
The first phase consists of a preprocessing step and an allele determination step. This preprocessing step removes extraneous information and downloads the corresponding human reference genome sequence. It then formats the file to the standardized format.[15] The allele determination step matches inputted genotypes to named alleles. If the inputted data is phased, the step can match diplotypes without extra computation steps. If the data is unphased, the software will go through additional steps to attempt to correctly match alleles.[16][2]
The second phase also has two steps: matching phenotypes and report generation. The phenotype matching step matches the diplotypes generated in the first phase to known pharmacogenomic phenotypes (like metabolizer status, discussed below). The report generation step compiles all of the information generated in that previous step into a comprehensive report.[14] Different tools will differ in terms of how that report is generated and presented, as shown in the table below.
Limitations
Like any bioinformatics tools or methods, the quality of the output from these tools depends largely on the quality and type of the inputted data. Low quality data can result from repetitive sequences, as NGS technologies still struggle with identifying repetitive sequences. This in turn leads to low accuracy in genes involved in those regions, such as UGT1A1.[17] This would result in lower accuracy of the output report.[2][11]
As with the input data, output quality depends on genes and variants being annotated correctly and comprehensively, which in turn depends on research previously conducted examining those variants and their effects on response to drugs.[2][11] Generalizability to different populations also depends on the amount and quality of pharmacogenomics research conducted on those populations. Historical underrepresentation has resulted and continues to result in a lack of data in genomics.[18] As a result of this lack of data, historically underrepresented populations are the least likely to see benefit from personalized treaments.[19]
In addition, while tools may be able to parse non-SNP variants, VCF files generally do not incorporate copy number or structural variants.[2] As such, that information will not be translated into drug response phenotypes or prescribing recommendations. Structural variation has been estimated to account for approximately 22% of pharmacogenomic variability.[20] Excluding structural variation (and the accompanying 22% of pharmacogenomic variability) would therefore result in potentially inaccurate recommendations.
Comparison
PharmCAT, PAnno, and PharmVIP are pharmacogenomics annotation tools designed to analyze sequenced or genotyped genomic data (in VCF or BAM format) to predict individual drug responses, based on genetic profiles. The key difference between these tools lies in their respective use cases:
PharmCAT is primarily used for detecting clinically relevant pharmacogenomic (PGx) variants based on CPIC recommendations,[8] among other sources. It relies on pre-annotated clinically significant variants and doesn't have the ability to analyze de novo variants.[1][11]
PAnno functions as a general annotation tool for research purposes. It can predict phenotypes of certain drugs based on toxicity, dosage, efficacy and metabolism.[2] It also relies on sources such as CPIC.
PharmVIP provides diplotype classification and drug-response recommendations similar to the above tools. It is unique in its HLA gene prediction module, which aims to predict alleles in HLA genes, outputting relevant information on adverse drug reactions.[21][12] It also has a separate module that aims to predict the effects of novel variants in known PGx genes.[12]
Functional & methodological comparison between different tools
Predicts HLA alleles, predicts effects of novel variants[3]
No
Type of report
HTML annotation report
Annotation report and/or predictive report
HTML annotation report
Focus Area
Research & general PGx annotation
Predicting variant impact & PGx annotation
Clinical PGx annotation, drug dosing
Use-case
Researchers analyzing PGx variants
Clinicians & researchers predicting drug response (particularly regarding HLA)
Clinicians & researchers analyzing PGx variants
Applications
Pharmacogenomic annotation is used to analyze and annotate pharmacogenes by helping to predict drug metabolism, efficacy and potential adverse effects, in research and clinical applications such as:
Variant identification by detecting changes in genome sequence such as SNPs and indels.[11]
Converting raw genomic data into functional phenotype by stratifying the genotype-to-phenotype characteristics into low, medium, or high function, or for metabolism-related genes, into poor, intermediate or rapid metabolizers.[2]
Analyzing the interaction between drugs and genes.
Assisting in clinical interpretation by recommending drug dosage, assessing impact of drug metabolism for medications such as warfarin or codeine.[22][2]
Pharmacogene annotations have several advantages. They allow physicians to select the right drug and recommend the right dosage amount to reduce adverse reactions. This can reduce toxicity, improve drug efficacy and eliminate or reduce trial and error prescriptions.[25]
Future Directions
Multiomics
Multiomics or Multi-Omics seeks to analyze the genome, proteome, transcriptome, metabolome, microbiome, and others, in a concerted effort. Applying multi-omics to medicine has the potential to enhance personalized care.[26] As a concept (and a discipline) within genomics, pharmacogenomics annotation would invariably play a role in a multi-omics approach to personalized medicine, alongside transcriptomics, proteomics, microbiomics, and other-omics annotation. As with pharmacogenomics, multi-omics presents challenges, both technological and social, to clinicians and investigators.[27]
Clinical implementation through pre-emptive testing
Pharmacogenomics remains uncommon in clinical practice.[5][28] Implementation in the clinic has been discussed, yet faces significant hurdles in order to be widespread.[29] Pre-emptive testing for pharmacologically relevant variants has been proposed as a means to achieve more widespread adoption of pharmacogenomics in practice.[30] Pre-emptive testing is available at specific hospital sites participating in trials evaluating its clinical utility.[31] As with any other genomic data, clinical pre-emptively collected data would require annotation in order to be clinically useful.