The complete nucleotide sequence of the coffee (Coffea arabicaL.) chloroplast genome: organization and implications for biotechnology and phylogenetic relationships amongst angiosperms

Abstract
Summary: The chloroplast genome sequence ofCoffea arabicaL., the first sequenced member of the fourth largest family of angiosperms, Rubiaceae, is reported. The genome is 155 189 bp in length, including a pair of inverted repeats of 25 943 bp. Of the 130 genes present, 112 are distinct and 18 are duplicated in the inverted repeat. The coding region comprises 79 protein genes, 29 transfer RNA genes, four ribosomal RNA genes and 18 genes containing introns (three with three exons). Repeat analysis revealed five direct and three inverted repeats of 30 bp or longer with a sequence identity of 90% or more. Comparisons of the coffee chloroplast genome with sequenced genomes of the closely related family Solanaceae indicated that coffee has a portion ofrps19duplicated in the inverted repeat and an intact copy ofinfA. Furthermore, whole‐genome comparisons identified large indels (> 500 bp) in several intergenic spacer regions and introns in the Solanaceae, includingtrnE(UUC)–trnT(GGU) spacer,ycf4cemAspacer,trnI(GAU) intron andrrn5trnR(ACG) spacer. Phylogenetic analyses based on the DNA sequences of 61 protein‐coding genes for 35 taxa, performed using both maximum parsimony and maximum likelihood methods, strongly supported the monophyly of several major clades of angiosperms, including monocots, eudicots, rosids, asterids, eurosids II, and euasterids I and II.Coffea(Rubiaceae, Gentianales) is only the second order sampled from the euasterid I clade. The availability of the complete chloroplast genome of coffee provides regulatory and intergenic spacer sequences for utilization in chloroplast genetic engineering to improve this important crop.