Automatic Classification and Pattern Discovery in High-throughput Protein Crystallization Trials
- 1 September 2005
- journal article
- research article
- Published by Springer Nature in Journal of Structural and Functional Genomics
- Vol. 6 (2) , 195-202
- https://doi.org/10.1007/s10969-005-5243-9
Abstract
Conceptually, protein crystallization can be divided into two phases search and optimization. Robotic protein crystallization screening can speed up the search phase, and has a potential to increase process quality. Automated image classification helps to increase throughput and consistently generate objective results. Although the classification accuracy can always be improved, our image analysis system can classify images from 1536-well plates with high classification accuracy (85%) and ROC score (0.87), as evaluated on 127 human-classified protein screens containing 5600 crystal images and 189472 non-crystal images. Data mining can integrate results from high-throughput screens with information about crystallizing conditions, intrinsic protein properties, and results from crystallization optimization. We apply association mining, a data mining approach that identifies frequently occurring patterns among variables and their values. This approach segregates proteins into groups based on how they react in a broad range of conditions, and clusters cocktails to reflect their potential to achieve crystallization. These results may lead to crystallization screen optimization, and reveal associations between protein properties and crystallization conditions. We also postulate that past experience may lead us to the identification of initial conditions favorable to crystallization for novel proteins.Keywords
This publication has 13 references indexed in Scilit:
- Robotic Cloning and Protein Production Platform of the Northeast Structural Genomics ConsortiumPublished by Elsevier ,2005
- Machine-learning techniques for macromolecular crystallization dataActa Crystallographica Section D-Biological Crystallography, 2004
- Automatic classification of protein crystallization images using a curve-tracking algorithmJournal of Applied Crystallography, 2004
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- Automatic classification of sub-microlitre protein-crystallization trials in 1536-well platesActa Crystallographica Section D-Biological Crystallography, 2003
- SPINE 2: a system for collaborative structural proteomics within a federated database frameworkNucleic Acids Research, 2003
- A deliberate approach to screening for initial crystallization conditions of biological macromoleculesJournal of Structural Biology, 2003
- The biological macromolecule crystallization database and NASA protein crystal growth archiveJournal of Research of the National Institute of Standards and Technology, 1996
- Mining association rules between sets of items in large databasesPublished by Association for Computing Machinery (ACM) ,1993
- THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMSAnnals of Eugenics, 1936