All published articles of this journal are available on ScienceDirect.
FastImpute: Development and Validation of a Workflow for Open-source, Reference-Free Genotype Imputation Methods - An Example in Breast Cancer (PRS313_BC)
Abstract
Background
Genotype imputation is crucial for enhancing genetic data from genotyping arrays by predicting missing single nucleotide polymorphisms (SNPs). Traditional imputation methods often compromise data privacy or are computationally demanding, limiting their accessibility. While newer deep learning methods offer a privacy-preserving alternative, their large model sizes make them difficult to deploy on client-side devices like personal computers or smartphones.
Methods
We developed FastImpute, a workflow for creating lightweight, reference-free imputation models designed for client-side deployment. As a case study, we trained linear and logistic regression models to impute SNPs for the breast cancer polygenic risk score, PRS313_BC. We used whole-genome sequencing data from 2,504 individuals in the 1000 Genomes Project as a training and testing set. The models were trained to predict target PRS SNPs using input from SNPs on commercial genotyping arrays. Performance was evaluated against true sequencing data and benchmarked against Beagle.
Results
The correlation (R2) between a PRS calculated using our simple linear regression model and a PRS calculated using true sequencing data was 0.86. This significantly outperformed both no imputation and simple minor allele frequency imputation (R2 = 0.38). Our lightweight models performed comparably to Beagle in identifying high-risk individuals, correctly classifying 3 (linear) and 4 (logistic) out of 6 individuals in the top 1% of risk, similar to Beagle (4 out of 6).
Conclusion
The FastImpute pipeline demonstrates that simple, lightweight models can provide effective and privacy-preserving, and accessible genotype imputation, enabling real-time genetic risk assessment on edge devices.
Availability
Web application: https://aaronge-2020.github.io/FastImpute/
Code: https://github.com/aaronge-2020/FastImpute
