Loading... Please wait...

The Tea Tree Genome Provides Insights into Tea Flavor and Independent Evolution of Caffeine Biosynthesis

Tea is the world's oldest and most popular caffeine-containing beverage with immense economic, medicinal, and cultural importance. Here, we present the first high-quality nucleotide sequence of the repeat-rich (80.9%), 3.02-Gb genome of the cultivated tea tree Camellia sinensis. We show that an extraordinarily large genome size of tea tree is resulted from the slow, steady, and long-term amplification of a few LTR retrotransposon families. In addition to a recent whole-genome duplication event, lineage-specific expansions of genes associated with flavonoid metabolic biosynthesis were discovered, which enhance catechin production, terpene enzyme activation, and stress tolerance, important features for tea flavor and adaptation. We demonstrate an independent and rapid evolution of the tea caffeine synthesis pathway relative to cacao and coffee. A comparative study among 25 Camellia species revealed that higher expression levels of most flavonoid- and caffeine- but not theanine-related genes contribute to the increased production of catechins and caffeine and thus enhance tea-processing suitability and tea quality. These novel findings pave the way for further metabolomic and functional genomic refinement of characteristic biosynthesis pathways and will help develop a more diversified set of tea flavors that would eventually satisfy and attract more tea drinkers worldwide.


Socially and habitually consumed by more than 3 billion people across 160 countries, tea is the oldest (since 3000 BC) and most popular nonalcoholic caffeine-containing beverage in the world (Banerjee, 1992Mondal et al., 2004). Besides its attractive aroma and pleasant taste, the tea beverage has numerous healthful and medicinal benefits for humans due to many of the characteristic secondary metabolites in tea leaves, such as polyphenols, caffeine, theanine, vitamins, polysaccharides, volatile oils, and minerals (Yamamoto et al., 1997Cabrera et al., 2006Rogers et al., 2008Chacko et al., 2010). The tea plant Camellia sinensis is the source of commercially grown tea and a member of the genus Camellia in the tea family Theaceae, which also contains several other economically important species, including well-known camellias with their attractive flowers (e.g., C.japonicaC. reticulata, and C. sasanqua) and the traditional oil tree C. oleifera that produces high-quality edible seed oil (Ming and Bartholomew, 2007). The first credible record of tea as a medicinal drink occurred during the Shang dynasty of China and dates back to the third century AD (Weinberg and Bealer, 2001Heiss and Heiss, 2007). The global expansion of tea is long and complex, spreading across multiple cultures over the span of thousands of years and expanding worldwide to more than 100 countries (Heiss and Heiss, 2007Liu et al., 2015). Today, tea is commercially cultivated on more than 3.80 million hectares of land on a continent-wide scale, and 5.56 million metric tons of tea worldwide were produced annually in 2014.

As one of the most popular beverages worldwide, tea has well-established nutritional and medicinal properties derived from the three major characteristic secondary metabolites: catechins, theanine, and caffeine. These phytochemical compounds, especially catechins, are beneficial for human health (Khan and Mukhtar, 2007), the contents and component proportions of which in large part determine the flavor of tea. The genus Camellia, consisting of ∼119 species (Ming and Bartholomew, 2007) with differential metabolite profiles, provides a uniquely powerful system for dissecting the variation and evolution of flavonoid, theanine, and caffeine biosynthesis pathways that define tea-processing suitability. Thousands of years of continental introduction and conventional selective breeding efforts have resulted in a large number of land race and elite cultivars that adapt to globally diverse habitats, thus ensuring different tea productivity and quality worldwide. The rich metabolite constituents within the tea tree may play an important role in adaptations to diverse ecological niches on Earth. Unraveling the genomic basis of these global adaptations remains an unsolved mystery. Although it is well recognized that the differential accumulation of the three major characteristic constituents in tea tree leaves largely determines the quality of tea, little genomic information is currently available regarding the complex transcriptional regulation of catechins, theanine, and caffeine metabolic pathways. Sequencing of the tea tree genome would facilitate to uncover the molecular mechanisms underlying secondary metabolic biosynthesis with the promise to improve breeding efficiency and thus develop better tea cultivars with even higher quality.

Here, we report a high-quality genome assembly of Yunkang 10 (2n = 2x = 30 chromosomes), a diploid elite cultivar of C. sinensis var. assamica widely grown in Southwestern China, based on sequence data from whole-genome shotgun sequencing. Together with comparative transcriptomic and phytochemical analyses for the representative Camellia species, we aim to obtain new insights into the molecular basis of the biosynthesis of the three characteristic secondary metabolites with an emphasis on the suitability of tea-processing and the formation of tea flavor.


Genome Sequencing, Assembly, and Annotation

We sequenced the tea tree genome (cultivar Yunkang 10) from Yunnan Province, China. We performed a whole-genome shotgun sequencing analysis with the Illumina next-generation sequencing platform (HiSeq 2000). This generated raw sequence data sets of ∼707.88 Gb, thus yielding approximately 159.43-fold high-quality sequence coverage (Supplemental Table 1). Using two orthogonal methods, we estimated that the genome size of Yunkang 10 is between 2.9 and 3.1 Gb (Supplemental Figures 1 and 2Supplemental Table 2). The tea tree genome was assembled using Platanus (Kajitani et al., 2014), followed by scaffolding preassembled contig sequences and paired-read next-generation sequencing data using SSPACE (Boetzer et al., 2011). This finally yielded a ∼3.02-Gb genome assembly that spans ∼98% of the estimated genome size and contains 37 618 scaffolds (N50 = 449 kb) and 258 790 contigs (N50 = 20.0 kb) (Table 1 and Supplemental Table 3). To validate the genome assembly quality, we first aligned all available DNA and expressed sequence tags of the tea tree from public databases and obtained mapping rates of 75.56% and 88.30%, respectively (Supplemental Table 5); secondly, we mapped all high-quality reads (∼339.49 Gb) to the assembled genome sequences, which show good alignments with a mapping rate of 93.96% (Supplemental Table 5); and thirdly, the transcripts we assembled also showed excellent alignments/sequence identities to the assembled genome: out of 198 175 transcripts, 76.23% were mapped (transcript coverage ≥90% and identity ≥90%; Supplemental Table 5 and Supplemental Section 1.6).