Benchmarking Somatic Copy Number Variation Detection Tools in Cancer Genomes
Abstract
Introduction
The accurate detection of somatic Copy Number Variations (CNVs) is a challenging task in cancer genomics. This study addresses the significant variability in performance and the lack of consensus among computational tools for somatic CNV calling.
Methods
We conducted a comprehensive benchmark evaluation of four widely used tools - CNVkit, Sequenza, Facets, and ASCAT. Their performance was assessed in terms of recall, precision, reproducibility, and inter-tool concordance using an orthogonally validated real-world dataset derived from the HCC1395 cell line.
Results
Our analysis revealed considerable differences in tool performance. Facets and Sequenza showed the most balanced accuracy and the highest reproducibility. In contrast, we observed poor consensus among tools, particularly for amplifications, where pairwise concordance values were frequently below 0.6. CNVkit showed high sensitivity for deletions but exhibited critically low and unstable performance for amplifications.
Discussion
The results show that tool selection is a primary source of variability in CNV studies, which can significantly impact downstream biological interpretation. The high discordance rates, especially for amplifications, highlight the inherent limitations and the risk of false negatives when relying on a single algorithm.
Conclusion
We conclude that for reliable somatic CNV detection, tools such as Facets or Sequenza are required. We also recommend adopting a consensus-based approach to reduce error rates and improve the quality of findings from individual algorithms.
