Bridging Data Management and Knowledge Discovery in the Life Sciences
Abstract
In this work we present an application for integrating and analyzing life science data using a biomedical data warehouse system and tools developed in-house enabling knowledge discovery tasks. Knowledge discovery is known as a process where different steps have to be coupled in order to solve a specified question. In order to create such a combination of steps, a data miner using our in-house developed knowledge discovery tool KD3 is able to assemble functional objects to a data mining workflow. The generated workflows can easily be used for ulterior purposes by only adding new data and parameterizing the functional objects in the process. Workflows guide the performance of data integration and aggregation tasks, which were defined and implemented using a public available open source tool. To prove the concept of our application, intelligent query models were designed and tested for the identification of genotype-phenotype correlations in Marfan Syndrome. It could be shown that by using our application, a data miner can easily develop new knowledge discovery algorithms that may later be used to retrieve medical relevant information by clinical researchers.