The Clostridium thermocellum celI gene, coding for endoglucanase I (CelI), consists of an open reading frame (ORF) of 2640 nucleotides and codes for a protein of M(r) 98531. The ORF was confirmed as celI by comparing the N-terminal sequence of purified recombinant CelI with that deduced from the nucleotide sequence. CelI hydrolysed lichenan and carboxymethylcellulose, but was principally active against barley beta-glucan. It exhibited significant sequence identity with subfamily E2 endoglucanases, and by analogy with others in this group contains a catalytic domain of around 500 residues located in the N-terminal half of the protein. The C-terminal region of CelI was highly homologous with the cellulose-binding domain of the non-catalytic cellulosome subunit, S1. A repeated segment, previously shown to be highly conserved in xylanase Z and in other endoglucanases from C. thermocellum, was absent from CelI. Antiserum raised against purified recombinant CelI cross-reacted with proteins contained in the cellulosomes of two strains of C. thermocellu, suggesting that CelI is either a component of the cellulosome or is homologous to other cellulosome proteins. A second gene, located upstream of celI, consisted of an ORF of 1671 nucleotides, coding for a protein of M(r) 61042. Based on its homology with the Escherichia coli tar gene product, the polypeptide encoded by the second gene is tentatively identified as a sensory transducer.