All published articles of this journal are available on ScienceDirect.

RESEARCH ARTICLE

FastImpute: Development and Validation of a Workflow for Open-source, Reference-Free Genotype Imputation Methods - An Example in Breast Cancer (PRS313_BC)

The Open Bioinformatics Journal 27 Nov 2025 RESEARCH ARTICLE DOI: 10.2174/0118750362421210250929110508

Abstract

Background

Genotype imputation is crucial for enhancing genetic data from genotyping arrays by predicting missing single nucleotide polymorphisms (SNPs). Traditional imputation methods often compromise data privacy or are computationally demanding, limiting their accessibility. While newer deep learning methods offer a privacy-preserving alternative, their large model sizes make them difficult to deploy on client-side devices like personal computers or smartphones.

Methods

We developed FastImpute, a workflow for creating lightweight, reference-free imputation models designed for client-side deployment. As a case study, we trained linear and logistic regression models to impute SNPs for the breast cancer polygenic risk score, PRS313_BC. We used whole-genome sequencing data from 2,504 individuals in the 1000 Genomes Project as a training and testing set. The models were trained to predict target PRS SNPs using input from SNPs on commercial genotyping arrays. Performance was evaluated against true sequencing data and benchmarked against Beagle.

Results

The correlation (R2) between a PRS calculated using our simple linear regression model and a PRS calculated using true sequencing data was 0.86. This significantly outperformed both no imputation and simple minor allele frequency imputation (R2 = 0.38). Our lightweight models performed comparably to Beagle in identifying high-risk individuals, correctly classifying 3 (linear) and 4 (logistic) out of 6 individuals in the top 1% of risk, similar to Beagle (4 out of 6).

Conclusion

The FastImpute pipeline demonstrates that simple, lightweight models can provide effective and privacy-preserving, and accessible genotype imputation, enabling real-time genetic risk assessment on edge devices.

Keywords: Genotype imputation, Reference-free methods, FastImpute, Breast cancer, PRS313, Client-side imputation, Privacy, Accessibility, Web technologies, Polygenic risk score, Direct-to-consumer test.
Fulltext HTML PDF
1800
1801
1802
1803
1804