Coupling of Functional Gene Diversity and Geochemical Data from Environmental Samples
Open Access
- 1 November 2004
- journal article
- research article
- Published by American Society for Microbiology in Applied and Environmental Microbiology
- Vol. 70 (11) , 6525-6534
- https://doi.org/10.1128/aem.70.11.6525-6534.2004
Abstract
Genomic techniques commonly used for assessing distributions of microorganisms in the environment often produce small sample sizes. We investigated artificial neural networks for analyzing the distributions of nitrite reductase genes (nirSandnirK) and two sets of dissimilatory sulfite reductase genes (dsrAB1anddsrAB2) in small sample sets. Data reduction (to reduce the number of input parameters), cross-validation (to measure the generalization error), weight decay (to adjust model parameters to reduce generalization error), and importance analysis (to determine which variables had the most influence) were useful in developing and interpreting neural network models that could be used to infer relationships between geochemistry and gene distributions. A robust relationship was observed between geochemistry and the frequencies of genes that were not closely related to known dissimilatory sulfite reductase genes (dsrAB2). Uranium and sulfate appeared to be the most related to distribution of two groups of these unusualdsrAB-related genes. For the other three groups, the distributions appeared to be related to pH, nickel, nonpurgeable organic carbon, and total organic carbon. The models relating the geochemical parameters to the distributions of thenirS,nirK, anddsrAB1genes did not generalize as well as the models fordsrAB2. The data also illustrate the danger (generating a model that has a high generalization error) of not using a validation approach in evaluating the meaningfulness of the fit of linear or nonlinear models to such small sample sizes.Keywords
This publication has 69 references indexed in Scilit:
- Distribution and diversity of thermophilic sulfate-reducing bacteria within a Cu-Pb-Zn mine (Toyoha, Japan)FEMS Microbiology Ecology, 2002
- Spatial and Resource Factors Influencing High Microbial Diversity in SoilApplied and Environmental Microbiology, 2002
- Variable selection in classification of environmental soil samples for partial least square and neural network modelsAnalytica Chimica Acta, 2001
- Ammonia-Oxidizing Bacteria: A Model for Molecular Microbial EcologyAnnual Review of Microbiology, 2001
- A molecular phylogenetic survey of sea-ice microbial communities (SIMCO)FEMS Microbiology Ecology, 2001
- Asymptotic statistical theory of overtraining and cross-validationIEEE Transactions on Neural Networks, 1997
- Sensitivity analysis and related analyses: A review of some statistical techniquesJournal of Statistical Computation and Simulation, 1997
- Overtraining, regularization and searching for a minimum, with application to neural networksInternational Journal of Control, 1995
- Programming based learning algorithms of neural networks with self-feedback connectionsIEEE Transactions on Neural Networks, 1995
- Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappingsNeural Networks, 1990