Haplotype Classification Using Copy Number Variation and Principal Components Analysis



Kevin Blighe*
Sheffield Children's NHS Foundation Trust, Western Bank, Sheffield, S10 2TH, United Kingdom


Article Metrics

CrossRef Citations:
0
Total Statistics:

Full-Text HTML Views: 164
Abstract HTML Views: 235
PDF Downloads: 182
Total Views/Downloads: 581
Unique Statistics:

Full-Text HTML Views: 112
Abstract HTML Views: 165
PDF Downloads: 133
Total Views/Downloads: 410



© 2013 Kevin Blighe

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Sheffield Children's NHS Foundation Trust, Western Bank, Sheffield, S10 2TH, United Kingdom; Tel: +44 7500 190333; E-mail: kevinblighe@gmail.com


Abstract

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation?

In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.

Keywords: Principal components analysis, multivariate data analysis, haplotype-tagging, copy number variation.