Research Letter | Neurology Accuracy of Imputation for Apolipoprotein E εAlleles in Genome-Wide Genotyping Data Eero Vuoksimaa, PhD; Teemu Palviainen, MSc; Noora Lindgren, MSc; Juha O. Rinne, MD, PhD; Jaakko Kaprio, MD, PhD Introduction Given the importance of the apolipoprotein E (APOE) gene for risk of Alzheimer disease, determining this genotype is important in cognitive aging studies. Before the genome-wide genotyping era, the APOE gene (alleles ε2, ε3, ε4) was directly genotyped and defined by 2 single-nucleotide polymorphisms (SNPs), rs429358 and rs7412, in chromosome 19. Owing to rapid development of genotyping technology, the price of genome-wide arrays covering approximately 500000 SNPs has decreased to less than $100, making it more cost-effective compared with direct genotyping of a single gene, such as APOE. In addition to genotyped SNPs, the information can be used to impute common nonmeasured variants such as rs429358 and rs7412 that are not directly genotyped on many chips. However, imputation accuracy depends on the genome-wide arrays, quality control, and reference samples.1,2 In this diagnostic study, we evaluated the association of reference panels with imputation quality of rs429358 and rs7412 by comparing imputation based on 3 different reference panels: 1000 Genomes (1000G),3 Haplotype Reference Consortium (HRC),4 and the Finnish-specific Sequencing Initiative Suomi (SISu).5 Methods We used a population-based older Finnish Twin Cohort6 study to examine the correspondence between rs429358 and rs7412 directly genotyped using a Sequenom (Taqman) and imputed rs429358 and rs7412 using 1000G Phase III version 5, HRC release 1.1, and SISu reference panels. Raw genotype data in a larger sample (5343 participants) using 5 array versions (12 v1.0 A, 12 v1.1 A, 24 v1.0 A, 24 v1.1 A, and 24 v1.2 A) of HumanCoreExome (Illumina) were merged before the quality control phase. We removed variants with call rate less than 97.5%, samples with call rate less than 95%, variants with minor allele frequency less than 1%, and variants with Hardy-Weinberg equilibrium P < 1.0 × 10−6. We removed samples with heterozygosity test method-of-moments F coefficient estimate values less than −0.03 or greater than 0.05, multidimensional scaling principal component analysis outliers, and samples that failed sex check. The number of genotyped autosomal variants after quality control was 239 894 (5328 participants). We then performed prephasing using Eagle software version 2.3 (Broad Institute) and imputationwithMinimac3 software version 2.0.1 (University ofMichigan) (Table 1). The study sample (1704 participants) with directly genotyped rs429358 and rs7412 was extracted from each imputed data set. Ethical approval was obtained from the ethical committee of the Hospital District of Southwest Finland, and participants gave written informed consent. Results Participants were of European ancestry (mean [SD] age, 74.2 [4.9] years; 775 [45%] women). For directly genotyped individuals, 984 (57.7%) had ε3/ε3 genotype, 521 (30.6%) had ε3/ε4 or ε4/ε4, 171 (10%) had ε2/ε2 or ε2/ε3, and 28 (1.6%) had ε2/ε4. Allele frequencies were 0.060 for ε2, 0.765 for ε3, and 0.175 for ε4. Author affiliations and article information are listed at the end of this article. Open Access. This is an open access article distributed under the terms of the CC-BY License. JAMA Network Open. 2020;3(1):e1919960. doi:10.1001/jamanetworkopen.2019.19960 (Reprinted) January 24, 2020 1/4 Downloaded From: https://jamanetwork.com/ by a University of Turku User on 05/25/2020 Based on 1000G, 1701 individuals (99.82%) had correctly classified alleles of both rs429358 and rs7412 (Table 1). Results were similar for HRC and SISu: 1702 individuals (99.88%) and 1703 individuals (99.94%) had correctly classified rs429358 and rs7412, respectively (Table 1). Using the HRC reference panel, 1702 individuals (99.88%) had correctly classified APOE genotype and ε4 carrier status (Table 2). Two (0.12%) of the ε4 noncarriers based on directly genotyped SNPs were incorrectly classified as ε4 carriers. Table 1. Cross Tabulation of Directly Genotyped and Imputed Single-Nucleotide Polymorphism of rs429358 and rs7412 for 1704 Individualsa Imputation Based Genotyped, No. (%) Total, No. 1000 Genomes rs429358 TT CT CC TT 1152 (99.74) 3 (0.26) 0 1155 CT 0 503 (100) 0 503 CC 0 0 46 (100) 46 rs7412 CC CT TT CC 1505 (100) 0 0 1505 CT 2 (1.03) 192 (98.97) 0 194 TT 0 1 (20.0) 4 (80.0) 5 Haplotype Reference Consortium rs429358 TT CT CC TT 1153 (99.83) 2 (0.17) 0 1155 CT 0 503 (100) 0 503 CC 0 0 46 (100) 46 rs7412 CC CT TT CC 1505 (100) 0 0 1505 CT 0 194 (100) 0 194 TT 0 1 (20.0) 4 (80.0) 5 Sequencing Initiative Suomi rs429358 TT CT CC TT 1153 (99.83) 2 (0.17) 0 1155 CT 0 503 (100) 0 503 CC 0 0 46 (100) 46 rs7412 CC CT TT CC 1505 (100) 0 0 1505 CT 0 194 (100) 0 194 TT 0 1 (20.0) 4 (80.0) 5 a Genotypes were imputed to 1000 Genomes Phase III version 5,3 Haplotype Reference Consortium release 1.1,4 and Sequencing Initiative Suomi Finnish- only reference panels.5 Imputation to 1000 Genomes and Haplotype Reference Consortium reference panels was done using the University of Michigan Imputation Server. The Sequencing Initiative Suomi reference panel consists of 16 962023 variants from 3775 high-pass whole- genome (depth up to 30×) sequences. Table 2. Cross Tabulation of Directly Genotyped andHaplotype Reference ConsortiumReference Panel Imputation–BasedAPOE Status in 1704 Individuals APOE Status (% Individuals) APOE Status Based on Imputed Single-Nucleotide Polymorphisms, No. Totalε2/ε2 ε2/ε3 ε2/ε4 ε3/ε3 ε3/ε4 ε4/ε4 ε2/ε2 (0.29%) 4 0 1 0 0 0 5 ε2/ε3 (9.74%) 0 166 0 0 0 0 166 ε2/ε4 (1.64%) 0 0 28 0 0 0 28 ε3/ε3 (57.75%) 0 0 0 983 1 0 984 ε3/ε4 (27.88%) 0 0 0 0 475 0 475 ε4/ε4 (2.70%) 0 0 0 0 0 46 46 Abbreviation: APOE, apolipoprotein E. JAMANetworkOpen | Neurology Accuracy of Imputation for Apolipoprotein E ε Alleles in Genome-Wide Data JAMA Network Open. 2020;3(1):e1919960. doi:10.1001/jamanetworkopen.2019.19960 (Reprinted) January 24, 2020 2/4 Downloaded From: https://jamanetwork.com/ by a University of Turku User on 05/25/2020 Discussion This study found that by using arrays described in theMethods section, imputation to all 3 reference panels, 1000G, HRC, and SISu, yielded high imputation accuracy of rs429358 and rs7412, 2 SNPs needed to determine polymorphic APOE ε alleles. The number of Finnish samples do vary in different reference panels: 99 in 1000G Phase III, approximately 1900 in the HRC, and 3800 in the Finnish- only SISu. Considering 1000G Phase III yielded improved accuracy compared with Phase I.1,2 Our results also suggest that determination of APOE can be reached equally well with most recent freely available cosmopolitan reference panels comparedwith a population-specific reference panel. Still, all Finnish samples and inclusion of only 1 brand of arrays were also limitations, and these results should be confirmed in people with different ancestry. ARTICLE INFORMATION Accepted for Publication:December 2, 2019. Published: January 24, 2020. doi:10.1001/jamanetworkopen.2019.19960 Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 Vuoksimaa E et al. JAMA Network Open. Corresponding Author: Eero Vuoksimaa, PhD, Institute for Molecular Medicine Finland (FIMM), PO Box 20 (Tukholmankatu 8), 00014 University of Helsinki, Helsinki, Finland (eero.vuoksimaa@helsinki.fi). Author Affiliations: Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland (Vuoksimaa, Palviainen, Kaprio); Turku PET Centre, University of Turku, Turku, Finland (Lindgren, Rinne); Division of Clinical Neurosciences, Turku University Hospital, Turku, Finland (Rinne); Department of Public Health, Clinicum, University of Helsinki, Helsinki, Finland (Kaprio). Author Contributions:Dr Vuoksimaa andMr Palviainen had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design: Vuoksimaa, Palviainen, Kaprio. Acquisition, analysis, or interpretation of data: All authors. Drafting of the manuscript: Vuoksimaa, Palviainen. Critical revision of the manuscript for important intellectual content: All authors. Statistical analysis: Palviainen. Obtained funding: Vuoksimaa, Rinne, Kaprio. Administrative, technical, or material support: Kaprio. Conflict of Interest Disclosures:Dr Rinne reported serving as a neurology consultant for CRST Oy (Clinical Research Services Turku). No other disclosures were reported. Funding/Support: This study was supported by the Academy of Finland (grants 314639 and 320109 to Dr Vuoksimaa, grant 312073 to Dr Kaprio, and grant 310962 to Dr Rinne) and a Sigrid Juselius Foundation grant to Dr Rinne. Ms Lindgren was supported by the Finnish Cultural Foundation, Päivikki and Sakari Sohlberg Foundation, Yrjö Jahnsson Foundation, Turku University Foundation, Finnish Brain Foundation, and Finnish State Research Funding. Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of themanuscript; and decision to submit themanuscript for publication. Additional Contributions:We thank the participants of the older Finnish Twin Cohort study. REFERENCES 1. Oldmeadow C, Holliday EG, McEvoyM, et al. Concordance between direct and imputed APOE genotypes using 1000 Genomes data. J Alzheimers Dis. 2014;42(2):391-393. doi:10.3233/JAD-140846 2. LuptonMK, Medland SE, Gordon SD, et al. Accuracy of inferred APOE genotypes for a range of genotyping arrays and imputation reference panels. J Alzheimers Dis. 2018;64(1):49-54. doi:10.3233/JAD-171104 3. Auton A, Brooks LD, Durbin RM, et al; 1000Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68-74. doi:10.1038/nature15393 JAMANetworkOpen | Neurology Accuracy of Imputation for Apolipoprotein E ε Alleles in Genome-Wide Data JAMA Network Open. 2020;3(1):e1919960. doi:10.1001/jamanetworkopen.2019.19960 (Reprinted) January 24, 2020 3/4 Downloaded From: https://jamanetwork.com/ by a University of Turku User on 05/25/2020 4. McCarthy S, Das S, KretzschmarW, et al; Haplotype Reference Consortium. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48(10):1279-1283. doi:10.1038/ng.3643 5. Lim ET,Würtz P, Havulinna AS, et al; Sequencing Initiative Suomi (SISu) Project. Distribution andmedical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 2014;10(7):e1004494. doi:10.1371/ journal.pgen.1004494 6. Vuoksimaa E, Rinne JO, Lindgren N, Heikkilä K, KoskenvuoM, Kaprio J. Middle age self-report risk score predicts cognitive functioning and dementia in 20-40 years. Alzheimers Dement (Amst). 2016;4:118-125. doi:10. 1016/j.dadm.2016.08.003 JAMANetworkOpen | Neurology Accuracy of Imputation for Apolipoprotein E ε Alleles in Genome-Wide Data JAMA Network Open. 2020;3(1):e1919960. doi:10.1001/jamanetworkopen.2019.19960 (Reprinted) January 24, 2020 4/4 Downloaded From: https://jamanetwork.com/ by a University of Turku User on 05/25/2020