A data integration methodology for systems biology

Top Cited Papers

Open Access

21 November 2005

journal article
research article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences

Vol. 102 (48) , 17296-17301
https://doi.org/10.1073/pnas.0508647102

Abstract

Different experimental technologies measure different aspects of a system and to differing depth and breadth. High-throughput assays have inherently high false-positive and false-negative rates. Moreover, each technology includes systematic biases of a different nature. These differences make network reconstruction from multiple data sets difficult and error-prone. Additionally, because of the rapid rate of progress in biotechnology, there is usually no curated exemplar data set from which one might estimate data integration parameters. To address these concerns, we have developed data integration methods that can handle multiple data sets differing in statistical power, type, size, and network coverage without requiring a curated training data set. Our methodology is general in purpose and may be applied to integrate data from any existing and future technologies. Here we outline our methods and then demonstrate their performance by applying them to simulated data sets. The results show that these methods select true-positive data elements much more accurately than classical approaches. In an accompanying companion paper, we demonstrate the applicability of our approach to biological data. We have integrated our methodology into a free open source software package named POINTILLIST.

Keywords

This publication has 18 references indexed in Scilit:

Systems Biology and New Technologies Enable Predictive and Preventative Medicine
Science, 2004
A statistical framework for combining and interpreting proteomic datasets
Bioinformatics, 2004
Computational discovery of gene modules and regulatory networks
Nature Biotechnology, 2003
Transcriptional Regulatory Networks in Saccharomyces cerevisiae
Science, 2002
Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search
Analytical Chemistry, 2002
Teamed up for transcription
Nature, 2002
Comparative assessment of large-scale data sets of protein–protein interactions
Nature, 2002
Is There a Bias in Proteome Research?
Genome Research, 2001
A NEWAPPROACH TODECODINGLIFE: Systems Biology
Annual Review of Genomics and Human Genetics, 2001
Random variate generation for multivariate unimodal densities
ACM Transactions on Modeling and Computer Simulation, 1997