High-throughput protein analysis integrating bioinformatics and experimental assays
- 21 January 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 32 (2) , 742-748
- https://doi.org/10.1093/nar/gkh257
Abstract
The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins.Keywords
This publication has 30 references indexed in Scilit:
- The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003Nucleic Acids Research, 2003
- Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAsGenome Research, 2001
- Systematic subcellular localization of novel proteins identified by large‐scale cDNA sequencingEMBO Reports, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Identification and application of the concepts important for accurate and reliable protein secondary structure predictionProtein Science, 1996
- ParaMEME: a parallel implementation and a web interface for a DNA and protein motif discovery toolBioinformatics, 1996
- The antigenic index: a novel algorithm for predicting antigenic determinantsBioinformatics, 1988
- Refined structure of glutathione reductase at 1.54 Å resolutionJournal of Molecular Biology, 1987
- A simple method for displaying the hydropathic character of a proteinJournal of Molecular Biology, 1982
- Prediction of protein antigenic determinants from amino acid sequences.Proceedings of the National Academy of Sciences, 1981