Vous êtes sur la page 1sur 17

Differentially Expressed Genes in

Hypericin-Containing Hypericum
perforatum Leaf Tissues as Revealed by De
Novo Assembly of RNA-Seq
Miroslav Sotk, Odeta Czerankov,
Daniel Klein, Katarna Nigutov, Lothar
Altschmied, Ling Li, Adarsch Jose, Eve
Syrkin Wurtele, et al.
Plant Molecular Biology Reporter
ISSN 0735-9640
Plant Mol Biol Rep
DOI 10.1007/s11105-016-0982-2

1 23

Your article is protected by copyright and all


rights are held exclusively by Springer Science
+Business Media New York. This e-offprint is
for personal use only and shall not be selfarchived in electronic repositories. If you wish
to self-archive your article, please use the
accepted manuscript version for posting on
your own website. You may further deposit
the accepted manuscript version in any
repository, provided it is only made publicly
available 12 months after official publication
or later and provided acknowledgement is
given to the original source of publication
and a link is inserted to the published article
on Springer's website. The link must be
accompanied by the following text: "The final
publication is available at link.springer.com.

1 23

Author's personal copy


Plant Mol Biol Rep
DOI 10.1007/s11105-016-0982-2

ORIGINAL PAPER

Differentially Expressed Genes in Hypericin-Containing


Hypericum perforatum Leaf Tissues as Revealed by De Novo
Assembly of RNA-Seq
Miroslav Sotk 1 & Odeta Czerankov 1 & Daniel Klein 2 & Katarna Nigutov 1 &
Lothar Altschmied 3 & Ling Li 4,5 & Adarsch Jose 6 & Eve Syrkin Wurtele 4,5 & Eva ellrov 1

# Springer Science+Business Media New York 2016

Abstract Hypericum perforatum is a traditional medicinal


plant used for various purposes since ancient times because
of the valuable secondary metabolites. The genomic information for H. perforatum is limited and the regulatory networks
of secondary metabolism are still unknown. The
naphthodianthrone hypericin is the metabolite of our main
interest. It is a red-colored photodynamic pigment localized
in dark glands of some Hypericum species. High-throughput
sequencing technology, especially RNA-Seq, followed by de
novo assembly and analysis of differential gene expression
provides an important tool for functional genomics of nonmodel organisms. It represents an opportunity of insight into

dynamic biological processes including those of secondary


metabolism. Transcriptome analysis followed by the analysis
of differential gene expression of H. perforatum leaf tissues
containing and lacking dark glands was performed to identify
the genes involved in hypericin biosynthesis and to profile
expression patterns in greater detail. A total of 18.53 G of
cleaned read nucleotides were generated and assembled into
139,959 contigs with N50 of 1801 bp. Among them, 66,817
(47.74 %) contigs were annotated. Differentially expressed
genes were discovered by comparison of dark glands with
adjacent leaf tissue and contrasting inner part leaf tissue without dark glands. A total of 799 upregulated genes were found

Electronic supplementary material The online version of this article


(doi:10.1007/s11105-016-0982-2) contains supplementary material,
which is available to authorized users.
* Miroslav Sotk
miroslav.sotak@upjs.sk
Odeta Czerankov
odeta.czerankova@student.upjs.sk
Daniel Klein
daniel.klein@upjs.sk
Katarna Nigutov
katarina.nigutova@upjs.sk

Eva ellrov
eva.cellarova@upjs.sk

Department of Genetics, Institute of Biology and Ecology, Faculty of


Science, Pavol Jozef afrik University in Koice, Mnesova 23, 041
54 Koice, Slovakia

Institute of Mathematics, Faculty of Science, Pavol Jozef afrik


University in Koice, Jesenn 5, 040 01 Koice, Slovakia

Department of Molecular Genetics, Leibniz-Institute of Plant


Genetics and Crop Plant Research (IPK), Corrensstrae 3,
06466 Stadt Seeland, Germany

Department of Genetics, Development, and Cell Biology, Iowa State


University, Ames, IA 50011-3260, USA

Center for Metabolic Biology, Iowa State University,


Ames, IA 50011-3260, USA

Bioinformatics and Computational Biology, NFS Engineering


Research Center for Bio Renewable Chemicals, Iowa State
University, Ames, IA 50011-3260, USA

Lothar Altschmied
lothar@ipk-gatersleben.de
Ling Li
liling@iastate.edu
Adarsch Jose
adarshjos@gmail.com
Eve Syrkin Wurtele
mash@iastate.edu

Author's personal copy


Plant Mol Biol Rep

in the tissues containing dark glands and 263 enzymes were


identified, including candidate genes of hypericin biosynthesis, especially the genes coding for polyketide synthases and
those involved in defense reactions. This study determined
candidate genes involved in hypericin biosynthesis providing
a valuable source for perspective metabolic engineering of
bioactive substances.
Keywords Hypericum perforatum . Secondary metabolism .
Hypericin . Transcriptome . RNA-Seq . Differential gene
expression

Introduction
The scientific effort to study new bioactive compounds from
medicinal plants is rapidly increasing nowadays. Pure compounds essentially isolated from Hypericum spp., namely,
naphthodianthrones and phloroglucinols, are proven to possess anti-depressive, anti-cancer, anti-viral, anti-inflammatory,
and other activities (Crockett and Robson 2011). Drugs based
on Hypericum perforatum (St. Johns wort) extracts are
among the most widely prescribed pharmaceuticals for depression treatment (Linde and Knuppel 2005; Huang et al.
2011).
H. perforatum leaf contains different secretory structures
such as dark glands, translucent glands, and secretory canals,
similarly to other aerial plant parts, especially flowers. The
dark glands (nodules) accumulate naphtodianthrones
(hypericin and pseudohypericin) including their protoforms
and are localized mostly along the leaf margins (Curtis and
Lersten 1990). These structures are multi-cellular, more nodule-like, and characterized by a cluster of irregularly shaped
specialized cells surrounded by a single- or double-layered
sheath (Ciccarelli et al. 2001). Their secretory activity begins
at early stages of leaf development (Onelli et al. 2002). The
accumulation of naphthodianthrones in the dark glands is
more likely a mechanism to avoid the toxicity of these metabolites toward the adjacent cells (Ciccarelli et al. 2001). The
size and number of these glands correlate positively with the
content of naphtodianthrones. Zobayed et al. (2006) suggested
that the site of the biosynthesis of hypericin is in the dark
glands. Recently, Kusari et al. (2015) analyzed the metabolites
of the leaves by MALDI-HRMS in selected Hypericum spp.
an d re ve aled loc aliza tion of e modi n, h ype ricin ,
pseudohypericin, and their protoforms. Hypericin precursor
emodin was detected throughout the leaf tissue with the
highest concentration in close proximity to the glands.
Hypericin content was proven inside the dark glands but also
around and/or outside the dark glands, particularly around the
bottom of the dark glands (Kusari et al. 2015). This metabolic
study indicates the hypericin biosynthesis is accordingly presented also in dark glands surrounding tissues.

The naphtodianthrone hypericin as the compound of our


interest has been found in the plant kingdom only in some
Hypericum species. It is estimated that the representatives of
about two thirds of the Hypericum sections are able to produce
hypericin (Robson 2003). Outside the plant kingdom, only
few cases of hypericin presence have been reported. To our
knowledge, these include the endophytic fungus Thielavia
subthermophila isolated form H. perforatum (Kusari et al.
2008), Dermocybe austroveneta (Gill et al. 1988; Gill and
Gimenez 1991), fossil crinoids (Wolkenstein et al. 2006), the
integument of Australian Lac insects of the Coccoidea family
(Banks et al. 1976) and the protozoa ciliate Stentor coerulus
(Walker et al. 1979).
Hypericin biosynthesis is presumed to follow the acetate
malonate (polyketide) metabolic pathway (Fig. 1). It is assumed that one unit of acetyl-CoA is condensed with seven
molecules of malonyl-CoA to form an octaketide chain that
subsequently leads to the formation of emodin anthrone via
cyclization and decarboxylation. Emodin is probably directly
converted to protohypericin which can undergo condensation
and conversion to hypericin under irradiation with visible light
(Bais et al. 2003; Zobayed et al. 2006; Karioti and Bilia 2010;
Huang et al. 2014). Emodin, the proposed precursor of
hypericin, was also observed to primarily accumulate in the
dark glands; in addition, it was dispersed outside the dark
glands throughout the leaves. Pillai and Nair (2014) tested
hypotheses of anthraquinone synthesis, to include/exclude alternative routes of emodin formation and to establish experimentally the existence of polyketide synthase (PKS) mediated
biosynthesis of hypericin. The inhibitors (glyphosate,
mevilonin, fosmidomycin) were applied to block the specific
steps in the secondary metabolic pathways of plants, which
are possible biosynthetic routes for the anthraquinone synthesis and so excluded the alternative routes (shikimate,
mevalonate, MEP pathway) of hypericin biosynthesis. Plantspecific type III polyketide synthases are involved in the biosynthesis of a large variety of plant secondary metabolites,
including hypericin biosynthesis. Emodin anthrone, an anthraquinone precursor of hypericins, is biosynthesized via the
polyketide pathway by type III polyketide synthase that represents the first part of proposed hypericin biosynthesis reaction (Karppinen et al. 2008). These enzymes catalyze the formation of polyketides by condensing various CoA-thioesters
with malonyl-CoA in a reaction sequence that closely parallels
fatty acid biosynthesis (Austin and Noel 2003). Four different
PKS family genes have been identified in the genus
Hypericum. Chalcone synthase and benzophenone synthase
have been cloned from both Hypericum androsaemum (Liu
et al. 2003) and H. perforatum. In addition, two complementary DNAs (cDNAs) from H. perforatum encoding for PKSs,
designated as HpPKS1 and HpPKS2 (5887 % deduced amino acid sequence homology with some plant-specific PKS
family proteins), were cloned (Karppinen and Hohtola

Author's personal copy


Plant Mol Biol Rep

Fig. 1 Proposed metabolic pathway of hypericin biosynthesis

2008). HpPKS2 was found to be an octaketide synthase, specifically expressed in the dark glands accumulating hypericins
(Karppinen et al. 2008).
Next-generation sequencing (NGS), especially RNA-Seq
and de novo transcriptome assembly, as a recently developed
approach for profiling transcriptomes, is the preferred method
to obtain knowledge of genes and their expression in nonmodel organisms (Strickler et al. 2012). RNA-Seq is more
practical than the whole-genome sequencing because of its
cost-effectiveness, high sensitivity, simple sequence annotation, and reduced problems with repetitive sequence elements.
De novo assembly of the sequenced transcriptomes of the
non-model species including H. perforatum allows identification of potential genes associated with the secondary metabolism coding for specific enzymes involved in the respective
metabolic pathways. Despite the global importance of secondary products from H. perforatum and growing number of raw
genomic information provided by different resources, the
knowledge of secondary metabolites and related pathways
remains insufficient. Actually, the main repositories for genomic data of H. perforatum are SRA NCBI containing 37 entries of raw reads. Medicinal Plant Genomic resource provided first assembly and annotation in 2011. The PhytoMetaSyn
Project (www.phytometasyn.ca), dedicated to research on

production of high value plant metabolites, created the portal


for transcriptomes of 75 non-model plants by February 2012,
including H. perforatum that produces natural metabolites belonging to three general categories: terpenoids, alkaloids, and
polyketides (Xiao et al. 2013). The study of He et al. (2012)
was the first scientific paper using Illumina/Solexa deep sequencing for identification of the unigenes within the H.
perforatum gene pool. The authors analyzed samples from
different developmental stages in order to cover the entire life
cycle of the plant. From among 59,184 unigenes, 40,813 were
annotated and 2359 were assigned to secondary metabolic
pathways. Among them, 260 unigenes were involved in the
production of hypericin, hyperforin, and melatonin. De novo
pyrosequencing of the H. perforatum flower transcriptome
was attempted by Galla et al. (2015) to identify genes related
to plant reproduction, including transcripts specifically or
preferentially expressed in anthers and/or pistils, and possibly
differentially expressed during sexual and asexual meiosis and
gametogenesis.
The aim of our work is to search for candidate hypericin
biosynthetic genes and to gain new knowledge on their regulation. Illumina HiSeqTM 2000 was used for transcriptome
sequencing of H. perforatum leaf tissues and subsequent bioinformatic analysis for characterization of genes encoding

Author's personal copy


Plant Mol Biol Rep

enzymes in hypericin biosynthesis. Special attention was paid


to genes expressed at a higher level in the leaf margin containing dark glands with adjacent leaf tissues.

Experimental Procedures
Plant Material and RNA Isolation
Diploid H. perforatum L. seeds were germinated and the seedlings were cultured in vitro on basal medium containing salts
according to Murashige and Skoog (1962), Gamborgs B5
vitamins (Gamborg et al. 1968), 30 g l1 sucrose, 100 mg l1
myoinositol, 2 mg l1 glycine, and 7 g l1 agar. The pH of the
media was adjusted to 5.6 before autoclaving. The plants were
cultured at 23 C, 40 % relative humidity, under 16/8-h (day/
night) photoperiod and artificial irradiance of
80 mol m2 s1. Leaf tissues from 4-week-old seedlings were
isolated. Each sample contained approximately 10 individual
genetically identical plants representing biological replicates
from one specimen. The dark glands with adjacent leaf tissue
(MSN1, MSN2) and the inner part leaf tissue without dark
glands (MSX1, MSX2) were selected for differential expression analysis (Fig. 2). The samples were processed on ice
under sterile conditions, immediately frozen in liquid nitrogen, and stored at 80 C. Total RNA was extracted in accordance with Spectrum Plant Total RNA Kit (Sigma-Aldrich)
protocol. The samples were homogenized by TissueLyser II
(Qiagen) and placed in denaturing lysis buffer according to the
manufacturers instructions. The quality and quantity of isolated RNA were evaluated on the basis of UVabsorption ratios
(i.e., 260/280 and 260/230 nm) assessed by Nanodrop 2000C
spectrophotometer (Thermo Scientific), and the RNA integrity was tested on 2100 Bioanalyzer (Agilent Technologies).
Transcriptome Sequencing
The messenger RNA (mRNA) from total RNA was enriched
by the use of oligo(dT) magnetic beads for transcriptome analysis. Then, the mRNA was fragmented into short pieces (180

Fig. 2 H. perforatum leaf. (A) Dark glands with adjacent leaf tissue; (B)
the inner part leaf tissue without dark glands

500 bp) with the fragment buffer treatment. The first-strand


cDNA was synthesized by random hexamer primer with the
mRNA fragments as templates. Buffer, dNTPs, RNase H, and
DNA polymerase I was used to synthesize the second strand.
The double-stranded cDNAs, purified with QiaQuick PCR
extraction kit, were used for the end repair and base A addition. Short fragments were purified by agarose gel electrophoresis after connecting short fragments to sequencing adapters,
and enriched by PCR to create the final cDNA library.
Transcriptome sequencing on two technical replicates was
carried out on an Illumina HiSeqTM 2000 platform that generated 100-bp paired-end raw reads (MSN1 and MSX1 BGI
Americas, USA; MSN2 and MSX2 IPK, Gatersleben,
Germany).
Data Processing and De Novo Transcriptome Assembly
The raw reads were filtered before data analysis, and the adaptor contamination was trimmed with the software Cutadapt
(Martin 2010). The Phred quality score of 20 and the read
length of 25 bp were set as threshold for low-quality reads
and removed by FASTX-Toolkit (http://hannonlab.cshl.edu/
fastx_toolkit). Ribosomal RNA from H. perforatum,
mitochondrial DNA, and plastid DNA from related species
(Ricinus communis, Hevea brasiliensis) were downloaded
from NCBI: Organelle Genome Resource (http://www.ncbi.
nlm.nih.gov/genome/organelle/), and the reads with the
positive hit were cleared out. RNA-Seq de novo assembly
was carried out with program Trinity (v2.0.6) (Grabherr et
al. 2011). CD-HIT-EST (Fu et al. 2012) was used for reducing
the redundancy with sequence identity threshold set to 0.98
and word length 10 and/while comparing both strands.
Sequence Cleaner program (SeqClean) was implemented to
process and clean the contigs according to different criteria
(https://souceforge.net/projects/seqclean).
Functional Annotation and Gene Ontology Classification
The assembled sequences were compared and annotated
based on the sequence similarity comparison against
SwissProt database (Swiss Institute of Bioinformatics databases) with verified proteins (http://www.uniprot.org/
downloads) by applying Blastx with a cutoff E-value of
105. Gene names were assigned to each assembled contig
on the basis of the best five Blast hits.
Gene Ontology (GO) is an international classification system for standardized gene functions, offering a controlled vocabulary and a strictly defined conceptualization for comprehensive description of the properties of genes and their products within an organism (Ashburner et al. 2000). GO annotation analysis was performed with Blast2GO software (Conesa
and Gtz 2008), an automated tool for the assignment of GO
terms. The annotation results were categorized with respect to

Author's personal copy


Plant Mol Biol Rep

the biological process, molecular function, and cellular component. GO terms taxonomically specified to green plants
(Viridiplantae) were assigned to each query sequence with a
reference to the top Blastx hit against SwissProt database.
Subsequently, to acquire the protein domain information for
the putative sequences and to determine functional motifs, the
embedded feature InterProScan (Zdobnow and Apweiler
2001) was carried out at the default parameters.
InterProScan annotation was also conducted via Blast2GO.
The annotation was refined by running BAugment
Annotation by ANNEX^ function (Myhre et al. 2006).
BValidate annotation^ and BRemove 1st level annotation^
were used to remove all the redundant GO terms for a given
sequence and to assign only the most specific GO terms.
The contigs were searched against the reference canonical
pathways in Kyoto Encyclopedia of Genes and Genomes
(KEGG) to further explore the gene interactions and biological functions in H. perforatum leaves. KEGG PATHWAY
Database (http://www.genome.jp/kegg/pathway.html) is an
important resource for interpreting high-level functions and
utilities of the biological system, such as the cell, the organism, and the ecosystem, from molecular-level information,
especially large-scale molecular datasets generated by genome
sequencing and other high-throughput experimental technologies. KEGG Database represents the collection of manually
drawn pathway maps, and it helps our understanding of the
biological functions of genes (Kanehisa et al. 2014).

Annotation Plot (WEGO) software (http://wego.genomics.


org.cn/cgi-bin/wego/index.pl) was used to perform GO
functional classification of all contigs, in order to view the
distribution of gene functions within H. perforatum leaf at
the macrolevel (Ye et al. 2006).
Validation of the 10 randomly selected genes from NGS
was processed by quantitative real-time (qRT)-PCR. Total
RNAs were isolated and reverse transcribed for the firststrand cDNA, using RevertAid Reverse Transcriptase
(Thermo Fisher Scientific) according to the manufacturers
instructions. The primer sets for each transcript were designed
using Primer3Plus. qRT-PCR was performed using universal
SYBR Green Supermix and run on CFX96 Real time C1000
(BioRad). The amplification process steps were as follows:
first denaturation at 95 C for 10 min, followed by 40 cycles
of denaturation at 95 C for 15 s, annealing at 60 C for 10 s,
and extension at 72 C for 60 s with a single fluorescence
measurement, melting curve program (6095 C with a
heating rate of 0.1 C per second and a continuous fluorescence measurement), and final cooling step. The relative expression levels for each gene were calculated using the 2Ct
method. Three technical replicates were performed for each
gene.

Results and Discussion

Differential Expression Analysis and qRT-PCR Validation

RNA Sequencing and De Novo Transcriptome Assembly


of H. perforatum

RNA-Seq by Expectation-Maximization (RSEM) (Li and


Dewey 2011), the RNA-Seq quantification tool, requiring no
reference genome, estimated the relative abundances and expected read counts for the contigs. RSEM first mapped reads
onto the contigs of de novo assembly with the Bowtie aligner
(Langmead et al. 2009), and in order to calculate the normalized read counts for each library, RNA composition bias and
normalization factors were taken into account and determined.
Differential expression analysis of the dark glands with
adjacent leaf tissue and the inner part leaf tissue without dark
glands were performed by modeling the count data with the
negative binomial distributions by DESeq2 method (Love
et al. 2014). This method represents the widely accepted and
accurate analysis approaches of RNA-Seq data; it is faster and
simpler and the fold change values are cleaner and more sensible. FDR 0.01 and the P value of 0.05 were set as the thresholds for the significance of the gene expression difference
between two different tissue samples.
The upregulated genes were annotated by Blastx search
against the NCBI non-redundant protein (NR) database
(ftp://ftp.ncbi.nlm.nih.gov/) and SwissProt database with an
E-value cutoff of 105. GO terms were assigned and KEGG
analyses were performed by Blast2GO. Web Gene Ontology

RNA-Seq of H. perforatum was conducted with the aim to


identify candidate genes involved in hypericin biosynthesis on
the basis of differential gene expression in the dark nodule
containing and lacking tissues. Total RNA was extracted from
the dark glands with the adjacent leaf tissue (MSN1, MSN2)
and the inner parts of the leaf tissue without dark glands
(MSX1, MSX2) both with two technical replicates. cDNA
libraries were constructed from the high-quality RNA samples
(RIN >8). NGS by the Illumina HiSeq 2000 sequencing
platform generated 6.02 G (MSN1), 7.12 G (MSX1), 4.96G
(MSN2), and 4.41 G (MSX2) of total read nucleotides.
Sequencing results were deposited in the Sequence Read
Archive (SRA-NCBI) under accession numbers
SRR2062465 (MSN1), SRR2062466 (MSX1),
SRR2062467 (MSN2), and SRR2062468 (MSX2). Of reads,
95.03 % passed the Phred quality values at the level of Q20
(sequencing error rate, 1 %), and the average GC percentage
was 49.91 %. Since adaptor sequences, ambiguous reads, and
other low-quality reads could lead to incorrect assemblies,
removing of this kind of oligonucleotides from both ends of
the reads resulted in pre-processed 18.53 G of total read nucleotides. Trinity platform developed for the transcript sequence reconstruction from RNA-Seq was used to generate

Author's personal copy


Plant Mol Biol Rep
Fig. 3 Contig length distribution

de novo leaf transcriptome of H. perforatum. A total of 139,


959 Trinity transcripts with a GC content of 44.67 % were
produced with a median contig length of 603 bp, an average
contig length of 1041.48 bp, and N50 length of 1801 bp (50 %
of the assembled bases were incorporated into sequences with
length of N50 or longer) and minimum contig length was set
to 200 bp. A total of 97,394 contigs comprising 145,763,914
nucleotides after removal of redundant sequences were recovered (Fig. 3).
Functional Annotation and Gene Ontology Classification
One of the crucial aspects of the transcriptomic data mining is
assigning the correct function to individual transcripts.
Functional annotation is the effective way to categorize genes

Fig. 4 Blast2GO data annotation score distribution of all assembled contigs

into physiological classes to describe large quantity of transcripts and for evaluating functional differences among the
subgroups of sequences. The assembled contigs were used
for similarity searches against public protein databases after
filtering out short-length and redundant sequences. All six
frame translations of the sequences were searched against
the SwissProt protein database using Blastx (E-value 105).
SwissProt database was selected due to our interest in the key
enzymes of polyketide synthesis and the expectation of high
similarity to already verified proteins. Database matches were
found for 66,817 (47.74 %) contigs while 72,999 were blasted
with no hits (Fig. 4). Protein domains and motif information
were retrieved by InterProScan via Blast2GO. The expressed
H. perforatum genes were searched against the GO database
to categorize the standardized gene functions.

Author's personal copy


Plant Mol Biol Rep

Fig. 5 Histogram of GO classifications of assembled H. perforatum


contigs. Results are summarized in three main GO categories: BP,
biological process; MF, molecular function; and CC, cellular
component. The major subcategories among (above 20,000 terms) the
biological processes were Bcellular process (51,001), Bsingle-organism
process^ (46,891), Bmetabolic process^ (46,852), Bresponses to

stimulus^ (32,061), Bbiological regulation^ (30,679), Bcellular


component organization or biogenesis^ (22,485), and Bdevelopmental
process^ (22,246). BBinding^ (38,664) and Bcatalytic activity^ (32,404)
were the prevailing molecular functions. The most represented cellular
component was Bcell^ (53,834), followed by Borganelle^ (45,944) and
Bmembrane^ (26,602)

The Blast2GO software assigned GO terms to 65,632


contigs out of the 66,817 previously annotated to the
SwissProt database. Contigs were summarized into the three
main GO categories (biological process, cellular component,
and molecular function) and then into 42 subcategories
(Fig. 5).

Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis was accomplished in order to get an overview of the gene
pathway networks. Annotations were conducted only for
contigs with significant Blast hits below E-value 105, value
of 55 as the annotation cutoff, and value of 5 as the GO
weight. Based on sequence homology searches (Blastx)
against the KEGG database, 45,863 contigs were assigned to
147 metabolic pathways. The most represented pathways
were BPurine metabolism^ (4221 members), BThiamine metabolism (3481 members), BBiosynthesis of antibiotics^ (2173
members), BAminobenzoate degradation^ (1605 members),
BStarch and sucrose metabolism^ (1582 members), and
BDrug metabolism other enzymes^ (1028 members).
Identification of Differentially Expressed Genes Using
RNA-Seq Data

Fig. 6 Distribution of differentially expressed genes. Red dots indicate


significantly different expression FDR 0.01, log2Ratio 1, and
log2Ratio (1), and black dots indicate no significant differences
(Color figure online)

Pre-processed reads (clean reads) from the two different


samples were mapped back to the assembled contigs
using Bowtie and quantified with RSEM. The identification of the differentially expressed genes (DEGs) in
different H. perforatum tissues was performed with
DESeq2. This enables detection and visualization of
genes that are significantly expressed at the different
levels among samples. Upregulated transcripts in the

Author's personal copy


Plant Mol Biol Rep

Fig. 7 Blast2GO data annotation score distribution of upregulated genes in the dark glands with adjacent leaf tissue

dark glands with adjacent leaf tissue were found at adjusted P value of <0.05 in 799 sequences (2058 isoforms) (Fig. 6). The Pearson correlation index between
two technical replicates was calculated from raw read
counts for tissues containing nodules (0.973) and for
tissues without nodules (0.978) after adjusting outliers.
Annotation and GO Classification of the Upregulated
Genes in the Dark Glands with Adjacent Leaf Tissue
The upregulated sequences were annotated against the
NCBI non-redundant protein (NR) database and
SwissProt database with threshold E-value 10 5 .

Fig. 8 GO classifications comparison of SwissProt and NR-NCBI. The


major subcategories (above 25 % genes) among the biological processes
were Bcellular process^ (1306/756; 83.6 %/36.7 %), Bmetabolic process^
(1155/866; 73.9 %/42.1 %), Bresponse to stimuli^ (805/103; 51.5 %/5 %),
Bmulticellular organismal process^ (778/137; 49.8 %/6.7 %), Bcellular
component organization^ (658/185; 42.1 %/9 %), Bdevelopmental
process^ (741/118; 47.4 %/5.7 %), Bbiological regulation^ (879/249;
56.2 %/12.1 %), Bpigmentation^ (760/217; 48.6 %/10.5 %), and

Blast2GO assigned GO terms using a pro-SimilarityHit-Filter value of 15, an annotation cutoff value of
55, and a GO weight value of 5. Approximately 89 %
of the 799 upregulated genes had a Blast hit against the
NR NCBI database (Fig. 7) and 75.95 % against the
SwissProt database. The upregulated genes were further
used for the examination of GO and pathway analysis
with Blast2GO and grouped into 47 GO subcategories
(Fig. 8).
KEGG analysis based on SwissProt annotation revealed that the upregulated contigs were highly associated with metabolism of purines, thiamine, starch and
sucrose, drugs, and biosynthesis of antibiotics (Table 1).

Breproduction^ (405/50; 25.7 %/2.4 %). BBinding^ (956/831; 61.2 %/


40.4 %) and Bcatalytic activity^ (777/724; 49.7 %/35.2 %) were the
dominant molecular functions. The most represented cellular
component were Bcell^ (1345/677; 86.1 %/32.9 %), Bcell part^ (1340/
675; 85.7 %/32.8 %), Borganelle^ (1132/470; 72.4 %/22.8 %), and
Bo rg a n e l l e p a r t ( 7 1 9 / 3 3 5 ; 4 6 . 0 % / 1 6 . 3 % ) f o l l o w e d b y
Bmacromolecular complex^ (506/299; 32.4 %/14.5 %)

Author's personal copy


Plant Mol Biol Rep
Table 1

KEGG biochemical mapping of H. perforatum leaf DEGs

Pathway

Sequences

Purine metabolism

163

Thiamine metabolism

155

Starch and sucrose metabolism


Drug metabolism - other enzymes

65
59

Biosynthesis of antibiotics
Aminobenzoate degradation

51
38

Cysteine and methionine metabolism

30

Drug metabolism - cytochrome P450


Metabolism of xenobiotics by cytochrome P450

29
29

Pentose and glucuronate interconversions


Glutathione metabolism

27
26

Glycolysis/gluconeogenesis

26

Phenylpropanoid biosynthesis
Glycerolipid metabolism

26
24

Fatty acid degradation


Phenylalanine metabolism

21
20

Fatty acid biosynthesis

19

Galactose metabolism
Fructose and mannose metabolism
Tryptophan metabolism
Retinol metabolism

17
17
16
16

Lysine degradation
Arachidonic acid metabolism
Methane metabolism
Pyrimidine metabolism
Pentose phosphate pathway

15
14
14
14
13

Glycerophospholipid metabolism
Cyanoamino acid metabolism
Glyoxylate and dicarboxylate metabolism
Amino sugar and nucleotide sugar metabolism
Flavonoid biosynthesis

13
13
13
12
12

Pyruvate metabolism
Arginine and proline metabolism
Valine, leucine, and isoleucine degradation
Selenocompound metabolism
Tyrosine metabolism

11
10
10
10
10

Enzyme Function Predicted from Differentially Expressed


Genes (DEGs)
The option of enzyme code statistics was carried out on the
upregulated DEGs with dark gland content by Blast2GO software. Predicted enzymes from the groups of hydrolases, transferases, and oxidoreductases were the most frequent. The basic
statistic was cleaned up from all duplicates which led to 263
enzymes in six major classes of enzymes (Fig. 9). This categorization provides us the valuable information about the presumed
genes (Supplementary Tables S1, S2, S3, S4, S5, and S6).

Fig. 9 Basic statistics related to enzyme code mapping results

Candidate Genes Involved in the Hypericin Biosynthesis


Pathway
The number of NGS data originated from the basic transcriptome analysis even after precise bioinformatic processing
remains huge, and discovery of candidate genes leading to
hypericin biosynthesis is a challenging process. He et al.
(2012) investigated the complex transcriptome originated
from H. perforatum tissuesroots, stems, leaves, and flowerand offered basic insight into the hypericin, hyperforin,
and melatonin biosynthesis. The differential gene expression
adjusting RNA-Seq data of the contrasting leaf tissue based on
the presence/absence of the dark glands as the site of accumulation of hypericin seems to be effective to reduce the amount
of the candidate genes. The dark glands with the adjacent leaf
tissues were selected for differential expression analysis due to
the localization of emodin, hypericin, pseudohypericin, and
their protoforms that were recently found not only in the dark
glands but also around and/or outside the dark glands (Kusari
et al. 2015).
Due to very little information about the genes of H.
perforatum, various criteria were chosen in context to
hypericin biosynthesis pathway. All of the biosynthetic reaction steps were analyzed considering enzyme types and
searching for adequate DEGs. Resulting sequences with exact
hits to the specific enzyme were chosen. Potential similar sequences were also considered because of annotation to different plant species while no genome of H. perforatum is available. The main part for candidate gene discovery of the potential hypericin biosynthesis pathway was the exploration of
polyketide synthaserelated sequences, sequences belonging
to stress response and defense.
Among the PKSs, 8 DEGs sequences were identified as
benzophenone synthase, 1 as putative chalcone synthase, 1
as polyketide synthase, and 13 PKS-like synthases (Table 2).
In the proposed pathway, one unit of acetyl-CoA condenses
with seven molecules of malonyl-CoA and forms an
octaketide that subsequently undergoes cyclization and decarboxylation, which results in the formation of emodin anthrone
(Zobayed et al. 2006). Neither other ligases nor cyclases were

Author's personal copy


Plant Mol Biol Rep
Table 2

Contigs annotated to PKSs and related enzymes

Seq name

Seq description

Seq
Min E-value
length

Mean
GO terms
similarity

TR25393|c0_
g2_i1

Benzophenone synthase

780

3.33153E-157 86.6 %

F:benzophenone synthase activity; P:biosynthetic process;


P:benzoyl-CoA metabolic process; P:malonyl-CoA metabolic
process

TR56958|c0_
g1_i1

Benzophenone synthase

1014

4.68185E-160 86.0 %

F:benzophenone synthase activity; P:biosynthetic process;


P:benzoyl-CoA metabolic process; P:malonyl-CoA metabolic
process

TR56958|c0_
g1_i2

Benzophenone synthase

767

5.32363E-120 87.6 %

F:benzophenone synthase activity; P:biosynthetic process;


P:benzoyl-CoA metabolic process; P:malonyl-CoA metabolic
process

TR101727|c0_ Benzophenone synthase


g1_i1

402

1.42911E-85

F:benzophenone synthase activity; P:biosynthetic process;


P:benzoyl-CoA metabolic process; P:malonyl-CoA metabolic
process

TR101727|c0_ Benzophenone synthase


g2_i1

824

1.16869E-161 91.4 %

TR101727|c0_ Benzophenone synthase


g2_i2

865

5.08618E-161 90.8 %

TR101727|c0_ Benzophenone synthase


g3_i1

827

9.13773E-124 83.4 %

TR103454|c0_ Benzophenone synthase


g1_i1

489

1.43533E-94

88.6 %

TR37481|c0_
g1_i1

Polyketide synthase

1680

0.0

87.8 %

TR59463|c0_
g1_i1

Putatative chalcone
synthase

361

3.26544E-19

87.6 %

TR5576|c0_
g1_i1

Vinorine synthase-like

1595

8.65539E-111 59.6 %

F:transferase activity, transferring acyl groups other than amino-acyl


groups

TR63144|c0_ Flavonol synthase


g1_i1
flavanone 3-hydroxylase
TR63144|c0_ Flavonol synthase
g2_i1
flavanone 3-hydroxylase
TR101024|c0_ Galactinol synthase
g1_i1

1420

7.45769E-168 83.8 %

1424

3.49132E-155 83.8 %

254

4.97738E-8

78.4 %

F:flavonol synthase activity; F:metal ion binding; P:flavonol


biosynthetic process; P:oxidation-reduction process
F:flavonol synthase activity; F:metal ion binding; P:flavonol
biosynthetic process; P:oxidation-reduction process
F:inositol 3-alpha-galactosyltransferase activity; P:galactose
metabolic process

TR101024|c0_ Galactinol synthase


g1_i2
TR96435|c0_ 3-ketoacyl-synthase 10
g1_i2

322

9.26818E-8

78.2 %

1229

0.0

90.6 %

TR96435|c0_
g1_i3

3-ketoacyl-synthase 10

2090

0.0

90.6 %

TR96435|c0_
g1_i1
TR14589|c0_
g1_i1
TR82074|c0_
g1_i1
TR96435|c0_
g1_i4

3-ketoacyl-synthase 10-like 2028

0.0

91.2 %

3-ketoacyl-synthase 11

318

7.05499E-68

99.0 %

3-ketoacyl-synthase 11

333

1.25091E-14

94.6 %

Beta-ketoacyl-synthase
family protein

1989

0.0

91.2 %

89.8 %

F:benzophenone synthase activity; P:biosynthetic process;


P:benzoyl-CoA metabolic process; P:malonyl-CoA metabolic
process
F:benzophenone synthase activity; P:biosynthetic process;
P:benzoyl-CoA metabolic process; P:malonyl-CoA metabolic
process
F:benzophenone synthase activity; P:biosynthetic process;
P:benzoyl-CoA metabolic process; P:malonyl-CoA metabolic
process
F:naringenin-chalcone synthase activity; F:benzophenone synthase
activity; P:flavonoid biosynthetic process; P:benzoyl-CoA
metabolic process; P:malonyl-CoA metabolic process
F:transferase activity, transferring acyl groups other than amino-acyl
groups; P:biosynthetic process
F:benzophenone synthase activity; P:biosynthetic process;
P:benzoyl-CoA metabolic process; P:malonyl-CoA metabolic
process

F:inositol 3-alpha-galactosyltransferase activity; P:galactose


metabolic process
C:endoplasmic reticulum; C:membrane; F:transferase activity,
transferring acyl groups other than amino-acyl groups; P:very
long-chain fatty acid metabolic process; P:fatty acid biosynthetic
process; P:cuticle development
C:endoplasmic reticulum; C:membrane; F:transferase activity,
transferring acyl groups other than amino-acyl groups; P:very
long-chain fatty acid metabolic process; P:fatty acid biosynthetic
process; P:cuticle development
C:membrane; F:transferase activity, transferring acyl groups other
than amino-acyl groups; P:fatty acid biosynthetic process
C:membrane; F:transferase activity, transferring acyl groups other
than amino-acyl groups; P:fatty acid biosynthetic process
C:membrane; F:transferase activity, transferring acyl groups other
than amino-acyl groups; P:fatty acid biosynthetic process
C:endoplasmic reticulum; C:membrane; F:transferase activity,
transferring acyl groups other than amino-acyl groups; P:very

Author's personal copy


Plant Mol Biol Rep
Table 2 (continued)
Seq name

Seq description

Seq
Min E-value
length

Mean
GO terms
similarity
long-chain fatty acid metabolic process; P:fatty acid biosynthetic
process; P:cuticle development

TR61322|c1_
g1_i7

Cyclopropane-fatty-acylphospholipid synthase
family protein

3241

0.0

90.8 %

TR61322|c1_
g1_i10

Cyclopropane-fatty-acylphospholipid synthase
family protein

4171

0.0

90.2 %

found in the dataset of DEGs with potential to catalyze condensation or cyclization reactions. Two decarboxylases were
found, namely, lysine decarboxylase family protein isoform 2
and aromatic-amino-acid decarboxylase-like protein supposedly without relation to hypericin biosynthesis. Emodin
anthrone further oxidizes to emodin. Blast2GO identified only
one (five isoforms) aldehyde dehydrogenase enzyme belonging to oxidoreductases and acting on aldehyde or oxo group.
Emodin-anthrone-oxygenase (Chen et al. 1995) was not found
to be differentially expressed. For the conversion of emodin to
hypericin, an initial condensation reaction between emodin
and emodin anthrone takes place, followed by dehydration
to form emodin dianthrone, which may undergo oxidation to
form protohypericin and finally hypericin by irradiation with
visible light (Zobayed et al. 2006; Huang et al. 2014).
Plant secondary products can be induced by elicitors, such
as jasmonic acid and its analogs, via the jasmonate pathway,
involved in the plant stress defense system (Zhao et al. 2005).
Hypericins are considered to be involved in the chemical defense arsenal of plant against herbivores and plant pathogens
while they can be increased in the presence of exogenous
methyl jasmonate (Sirvent and Gibson 2002). According to
GO terms linked to stress and defense, 32 DEGs were computed with functional similarities (Table 3). Three transcripts
(four isoforms) were recognized as phenolic oxidative coupling protein. He et al. (2012) also revealed in complex transcriptome 12 contigs homologous to hyp-1 and by the use of
PKSIIIexplorer 2291 contigs belonging to PKS III proteins.
The gene named hyp-1 (phenolic oxidative coupling protein) has been originally proposed as a gene with role in the
final steps of hypericin biosynthesis, possibly a condensation
followed by dehydration and two phenolic oxidative coupling
reactions (Bais et al. 2003). However, this presumption has
not been later proved. The highest expression level of hyp-1
was found in roots, which contain neither dark glands nor
hypericin (Kouth et al. 2007). The gene was also expressed
in 15 different Hypericum species regardless of whether
hypericins and emodin were detected in the plants (Kouth
et al. 2011). The Hyp-1 protein shares about 50 % sequence
similarity with pathogenesis-related proteins (PR-10) and has

F:cyclopropane-fatty-acyl-phospholipid synthase activity;


F:oxidoreductase activity; P:lipid biosynthetic process;
P:methylation; P:oxidation-reduction process
F:cyclopropane-fatty-acyl-phospholipid synthase activity; P:lipid
biosynthetic process; P:methylation

similar response to stress conditions which implies its possible


role in defense mechanisms. The conserved position of an
intron in the hyp-1 gene at the codon 62 is also a characteristic
feature of plant PR-10 proteins (Kouth et al. 2013). Also,
other research groups (Michalska et al. 2010) failed to dimerize emodin to hypericin using the Hyp-1 enzyme.
Since the phenolic coupling protein was proven to be differentially expressed in leaf tissues while its highest expression level was detected in roots (Kouth et al. 2007), transferases may play a significant role for transporting it throughout
the plant. Taking into consideration that hypericin is found in
small quantities outside the dark glands (Kusari et al. 2015),
the results from RNA-Seq analyses comprising the dark
glands and adjacent leaf tissue may not show all of the differentially expressed genes or some of the genes coding for the
key enzymes may not be strongly linked with the dark glands.
Moreover, in some hypericin-producing Hypericum species,
the distribution of the dark glands differs from that of
H. perforatum.
The RNA-Seq gene expression analysis quantitative realtime PCR (qRT-PCR) was performed on genes for validation.
Correlation between RNA-Seq and qRT-PCR was evaluated
using log2 fold change and Ct (cycle threshold). The results exhibited the same expression profiles as calculated from
the original RNA-Seq data.
Analysis of additional samples from other hypericinproducing species/tissues would bring more light into
interpreting of genetic regulation of biosynthesis of these
valuable bioactive photodynamic compounds.

Conclusion
Knowledge on the biosynthetic pathways of unique metabolites is fundamental for their biotechnological commercial
production. Specialized plant metabolites have often complex
biosynthetic pathways, and it is challenging to identify all of
the enzymes that catalyze the numerous metabolic transformations. Our understanding of the biochemical pathways in
H. perforatum L. that synthesize the commercially relevant

Author's personal copy


Plant Mol Biol Rep
Table 3

Stress-related contigs

Seq name

Seq description

Seq
Min. E-value
length

Mean
GO terms
similarity

TR24220|c0_
g1_i1
TR82269|c0_
g1_i1
TR93881|c0_
g1_i1
TR93881|c0_
g1_i2
TR1044|c0_
g1_i1
TR6737|c0_
g2_i1

Phenolic oxidative coupling


protein
Phenolic oxidative coupling
protein
Phenolic oxidative coupling
protein
Phenolic oxidative coupling
protein
Pr-10 type pathogenesis-related
protein
Cellulose synthase a catalytic
subunit 3

915

2.2018E-52

71.6 %

P:defense response; P:response to biotic stimulus

906

1.69115E-50

71.0 %

P:defense response; P:response to biotic stimulus

1026

2.37227E-41

64.4 %

P:response to biotic stimulus; P:defense response

1071

3.33743E-41

64.4 %

P:response to biotic stimulus; P:defense response

1060

9.4963E-33

59.4 %

P:response to biotic stimulus; P:defense response

3688

0.0

95.4 %

TR35073|c1_
g1_i1

Cellulose synthase a catalytic


subunit 1

1846

0.0

96.8 %

TR35073|c1_
g2_i1

Cellulose synthase a catalytic


subunit 3

3696

0.0

90.0 %

TR53602|c0_
g1_i1
TR56451|c0_
g1_i2

Universal stress protein

1090

6.48547E-68

80.6 %

C:integral component of membrane; F:cellulose synthase


(UDP-forming) activity; F:1,4-beta-D-xylan synthase
activity; F:mannan synthase activity; P:microtubule
cytoskeleton organization; P:double-strand break repair via
homologous recombination; P:cytokinesis by cell plate
formation; P:starch metabolic process; P:sucrose metabolic
process; P:UDP-glucose metabolic process; P:response to
water deprivation; P:response to salt stress; P:leaf
morphogenesis; P:response to cyclopentenone; P:cellulose
biosynthetic process; P:regulation of cell proliferation; P:cell
wall biogenesis; P:mannosylation
C:plasma membrane; C:integral component of membrane;
F:zinc ion binding; F:cellulose synthase (UDP-forming)
activity; P:starch metabolic process; P:sucrose metabolic
process; P:UDP-glucose metabolic process; P:cellulose
biosynthetic process; P:cell wall organization
C:integral component of membrane; F:cellulose synthase
(UDP-forming) activity; P:starch metabolic process;
P:sucrose metabolic process; P:UDP-glucose metabolic
process; P:cellulose biosynthetic process
P:response to stress

1250

1.26752E-41

85.0 %

F:peroxidase activity; P:oxidation-reduction process;


P:peroxidase reaction; P:response to oxidative stress

TR60524|c0_
g1_i1

Acid phosphatase vanadiumdependent haloperoxidaserelated protein isoform 1


Aldehyde dehydrogenase family
2 member c4-like

411

4.14401E-56

84.6 %

TR60524|c0_
g1_i2

Aldehyde dehydrogenase family


2 member c4-like

617

6.6804E-70

79.2 %

TR62294|c0_
g1_i1

Gibberellin-regulated protein
14-like

720

6.13342E-20

85.2 %

TR74456|c1_
g1_i1

Peroxiredoxin chloroplastic

1166

2.24936E-105 83.2 %

TR74456|c1_
g1_i2

Peroxiredoxin chloroplastic-like

794

6.70597E-94

91.4 %

1678

0.0

88.8 %

F:coniferyl-aldehyde dehydrogenase activity; P:systemic


acquired resistance; P:phenylpropanoid biosynthetic
process; P:response to nitrate; P:nitrate transport; P:response
to endoplasmic reticulum stress; P:oxidation-reduction
process
F:coniferyl-aldehyde dehydrogenase activity; P:systemic
acquired resistance; P:phenylpropanoid biosynthetic
process; P:response to nitrate; P:nitrate transport; P:response
to endoplasmic reticulum stress; P:oxidation-reduction
process
C:plasma membrane; P:response to salt stress; P:response to
abscisic acid; P:regulation of reactive oxygen species
metabolic process
C:chloroplast thylakoid; F:peroxidase activity; F:peroxiredoxin
activity; P:peroxidase reaction; P:response to oxidative
stress; P:oxidation-reduction process
F:peroxidase activity; F:peroxiredoxin activity; P:peroxidase
reaction; P:response to oxidative stress; P:oxidationreduction process
C:fatty acid synthase complex; C:chloroplast stroma;
C:thylakoid; C:chloroplast envelope; F:copper ion binding;
F:enoyl-[acyl-carrier-protein] reductase activity; P:acetylCoA metabolic process; P:calcium ion transport; P:Golgi
organization; P:response to salt stress; P:sterol biosynthetic

TR102222|c0_ nad-binding rossmann-fold


superfamily protein isoform 1
g1_i2

Author's personal copy


Plant Mol Biol Rep
Table 3 (continued)
Seq name

Seq description

Seq
Min. E-value
length

Mean
GO terms
similarity
process; P:brassinosteroid biosynthetic process; P:oxidationreduction process
C:nucleus; C:plant-type cell wall; C:extracellular matrix;
C:apoplast; F:manganese ion binding; F:nutrient reservoir
activity; P:auxin-activated signaling pathway; P:stomatal
complex morphogenesis; P:photosynthesis, light reaction;
P:cellular cation homeostasis; P:defense response to
bacterium; P:divalent metal ion transport

TR5230|c0_
g1_i1

Auxin-binding protein abp20

1197

3.08115E-87

85.2 %

TR5230|c0_
g1_i2

Auxin-binding protein abp20

1216

1.75685E-88

85.8 %

C:extracellular region; C:nucleus; C:plant-type cell wall;


C:extracellular matrix; F:manganese ion binding; F:nutrient
reservoir activity; P:stomatal complex morphogenesis;
P:photosynthesis, light reaction; P:cellular cation
homeostasis; P:defense response to bacterium; P:divalent
metal ion transport

TR24258|c0_
g2_i5

Histone protein hist2h3c1

468

1.93943E-69

95.2 %

TR27746|c0_
g1_i1

Auxin-binding protein abp19alike

425

1.80674E-15

82.2 %

TR36927|c0_
g1_i1
TR36927|c0_
g1_i2

mlo-like protein 4

1744

0.0

81.2 %

mlo-like protein 4

1131

8.08737E-143 83.8 %

C:nuclear nucleosome; C:membrane; C:extracellular exosome;


F:DNA binding; F:chromatin binding; F:histone binding;
F:protein heterodimerization activity; P:negative regulation
of transcription from RNA polymerase II promoter;
P:positive regulation of defense response to virus by host;
P:DNA replication-dependent nucleosome assembly;
P:protein heterotetramerization; P:regulation of gene
silencing
C:extracellular region; C:nucleus; C:plant-type cell wall;
C:extracellular matrix; F:manganese ion binding; F:nutrient
reservoir activity; P:stomatal complex morphogenesis;
P:photosynthesis, light reaction; P:cellular cation
homeostasis; P:defense response to bacterium; P:divalent
metal ion transport
C:integral component of membrane; F:calmodulin binding;
P:defense response; P:response to biotic stimulus
C:integral component of membrane; F:calmodulin binding;
P:defense response; P:response to biotic stimulus

mlp-like protein 328

951

1.26962E-49

68.8 %

P:defense response; P:response to biotic stimulus

TR53888|c0_
g1_i1
TR53888|c0_
g1_i2
TR77683|c0_
g1_i2

mlp-like protein 328

949

6.43926E-49

68.8 %

P:defense response; P:response to biotic stimulus

Auxin-binding protein
abp19a-like

388

1.26965E-15

82.2 %

TR94097|c1_
g1_i1

Homogentisate phytyltransferase 1196


chloroplastic

C:extracellular region; C:nucleus; C:plant-type cell wall;


C:extracellular matrix; F:manganese ion binding; F:nutrient
reservoir activity; P:stomatal complex morphogenesis;
P:photosynthesis, light reaction; P:cellular cation
homeostasis; P:defense response to bacterium; P:divalent
metal ion transport
C:integral component of membrane; F:homogentisate
phytyltransferase activity; P:sulfur amino acid metabolic
process; P:glycine catabolic process; P:unsaturated fatty acid
biosynthetic process; P:oxidoreduction coenzyme metabolic
process; P:vitamin metabolic process; P:cellular amino acid
biosynthetic process; P:aromatic amino acid family
metabolic process; P:lipoate metabolic process; P:coenzyme
biosynthetic process; P:nucleotide metabolic process;
P:response to temperature stimulus; P:response to light
stimulus; P:jasmonic acid biosynthetic process; P:phloem
sucrose loading; P:leaf morphogenesis; P:chlorophyll
metabolic process; P:regulation of lipid metabolic process;
P:isopentenyl diphosphate biosynthetic process,
methylerythritol 4-phosphate pathway; P:secondary
metabolic process; P:cell differentiation; P:regulation of
defense response; P:oxylipin biosynthetic process; P:cell
wall organization

8.85042E-148 76.2 %

Author's personal copy


Plant Mol Biol Rep
Table 3 (continued)
Seq name

Seq description

Seq
Min. E-value
length

Mean
GO terms
similarity

TR94340|c0_
g1_i1
TR97567|c0_
g1_i1

mlp-like protein 423

863

1.23722E-79

87.4 %

mlp-like protein 423

892

2.25496E-79

87.4 %

TR99891|c0_
g1_i1

Porphobilinogen deaminase

432

3.98284E-86

95.0 %

TR30252|c0_
g1_i1

tcp family transcription


factor 4 isoform 1

1069

1.79908E-76

61.8 %

TR39058|c0_
g1_i2

atp-dependent DNA helicase


srs2-like protein at4g25120
isoform x1

3633

0.0

76.8 %

TR81320|c0_
g1_i2

Condensin-2 complex subunit h2 528

6.98096E-12

70.2 %

compound hypericin is incomplete due in part to a lack of


molecular, genetic, and genomic resources for the identification of the genes involved in these specialized metabolic pathways. To address these limitations, large-scale transcriptome
sequencing and differential expression analyses of dark glands
with adjacent leaf tissue and inner part of the leaves without
dark glands were performed. Using RNA-Seq technology, a
total of 18.53 G of cleaned read nucleotides were generated
and assembled into 139,959 contigs, in which 47.74 % were
annotated to SwissProt database. The number of candidate
genes involved in the biosynthesis of hypericin was reduced
to 799 upregulated genes in dark glands with adjacent leaf
tissue: 55 genes with possible relation to biosynthesis of
polyketides and 263 enzymes belonging to the main six major
classes of enzymes. Our findings predict individual genes with
possible role in hypericin biosynthesis for further downstream
analyses and improve information from our previous metabolic study. Acquired data provide valuable source for perspective bioengineering and in vitro synthesis of the natural compounds for medical research and potential drug development.
Acknowledgments This research was supported by the Slovak
Research and Development Agency APVV-14-0154, the Scientific
Grant Agency of Slovak Republic VEGA 1/0090/15, and the grant project SOFOS-knowledge and skill development of staff, students of P. J.
afrik University in Koice (contract number: 003/2013/1.2/OPV, ITMS

C:membrane; P:defense response; P:response to biotic


stimulus; P:mRNA modification
C:membrane; P:defense response; P:response to biotic
stimulus; P:mRNA modification
C:chloroplast stroma; C:chloroplast envelope; C:apoplast;
F:hydroxymethylbilane synthase activity; P:pentosephosphate shunt; P:rRNA processing; P:ubiquinone
biosynthetic process; P:protoporphyrinogen IX biosynthetic
process; P:aromatic amino acid family biosynthetic process;
P:response to cold; P:salicylic acid biosynthetic process;
P:defense response, incompatible interaction; P:leaf
morphogenesis; P:chlorophyll biosynthetic process;
P:peptidyl-pyrromethane cofactor linkage; P:isopentenyl
diphosphate biosynthetic process, methylerythritol 4phosphate pathway; P:cysteine biosynthetic process;
P:photosynthesis, light reaction; P:cell differentiation;
P:positive regulation of transcription, DNA-templated
P:single-organism cellular process; P:system development;
P:regulation of cellular process; P:response to stimulus
C:replication fork; F:DNA binding; F:ATP-dependent DNA
helicase activity; F:ATP binding; P:DNA recombination;
P:chromosome segregation; P:meiosis I; P:DNA geometric
change; P:regulation of chromosome organization; P:singleorganism metabolic process; P:response to stimulus
P:positive regulation of response to DNA damage stimulus

code: 26110230088), funded by the European Social Fund through the


Operational Program Education and KVARK (ITMS code:
26110230084). We thank Peter Pisark, Maro Andrejko, Tom
Horvth, and Gabriel Semaniin from Institute of Computer Sciences,
Faculty of Science, Pavol Jozef afrik University, in Koice and
Libuse Brachova from Roy J. Carver Department of Biochemistry,
Biophysics, and Molecular Biology, Iowa State University, for support.

References
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM,
Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP,
Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE,
Ringwald M, Rubin GM, Sherlock G (2000) Consortium GO: Gene
Ontology: tool for the unification of biology. Nat Genet 25:2529
Austin MB, Noel JP (2003) The chalcone synthase superfamily of type III
polyketide synthases. Nat Prod 20:79110
Bais HP, Vepachedu R, Lawrence CB, Stermitz FR, Vivanco JM (2003)
Molecular and biochemical characterization of an enzyme responsible for the formation of hypericin in St. Johns Wort (Hypericum
perforatum L.). J Biol Chem 278(34):3241322
Banks HJ, Cameron DW, Raverty WD (1976) Chemistry of the
coccoidea. II Condensed polycyclic pigments from two Australian
pseudococcids (Hemiptera). Aust J Chem 29:1509
Chen Z, Fujii I, Ebizuka Y, Sankawa U (1995) Purification and characterization of emodinanthrone oxygenase from Aspergillus terreus.
Phytochemistry 38:299305
Ciccarelli D, Andreucci AC, Pagni AM (2001) Translucent glands and
secretory canals in Hypericum Perforatum L. (Hypericaceae):

Author's personal copy


Plant Mol Biol Rep
morphological, anatomical and histochemical studies during the
course of ontogenesis. Ann Bot-London 88:63744
Conesa A, Gtz S (2008) Blast2GO: a comprehensive suite for functional
analysis in plant genomics. Int J Plant Genomics 2008:619832
Crockett SL, Robson NKB (2011) Taxonomy and chemotaxonomy of the
genus Hypericum. MAPSB 5:113
Curtis JD, Lersten NR (1990) Internal secretory structures in Hypericum
(Clusiaceae): H. perforatum L. and H. balearicum L. New Phytol
114:571580
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering
the next generation sequencing data. Bioinformatics 28:31503152
Galla G, Vogel H, Sharbel TF, Barcaccia G (2015) De novo sequencing of
the Hypericum perforatum L. flower transcriptome to identify potential genes that are related to plant reproduction sensu lato. BMC
Genomics 16:254
Gamborg OL, Miller RA, Ojima K (1968) Nutrient requirements of suspension cultures of soybean root cells. Exp Cell Res 50:151158
Gill M, Gimenez A (1991) Austrovenetin, the principal pigment of the
toadstool Dermocybe austroveneta. Phytochemistry 30:951955
Gill M, Gimenez A, McKenzie RW (1988) Pigments of fungi, part 8.
Bianthraquinones from Dermocybe austroveneta. J Nat Prod 51:
12511256
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I,
Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E,
Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum
C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644652
He M, Wang Y, Hua W, Zhang Y, Wang Z (2012) De novo sequencing of
Hypericum Perforatum transcriptome to identify potential genes involved in the biosynthesis of active metabolites. PLoS One 7, e42081
Huang N, Rizshsky L, Hauck C, Nikolau BJ, Murphy PA, Birt DF (2011)
Identification of anti-inflammatory constituents in Hypericum
Perforatum and Hypericum Gentianoides extracts using RAW
264.7 mouse macrophages. Phytochemistry 72:201523
Huang HL, Wang ZH, Chen SL (2014) Hypericin: chemical synthesis
and biosynthesis. CJNM 12:8188
Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M
(2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42:199205
Karioti A, Bilia AR (2010) Hypericins as potential leads for new therapeutics. Int J Mol Sci 11:56294
Karppinen K, Hohtola A (2008) Molecular cloning and tissue-specific
expression of two cDNAs encoding polyketide synthases from
Hypericum perforatum. J Plant Physiol 165:10791086
Karppinen K, Hokkanen J, Mattila S, Neubauer P, Hohtola A (2008)
Octaketide-producing type III polyketide synthase from
Hypericum perforatum is expressed in dark glands accumulating
hypericin. Febs J 275:43294342
Kouth J, Katkovkov Z, Olexov P, ellrov E (2007) Expression of
the hyp-1 gene in early stages of development of Hypericum
perforatum L. Plant Cell Rep 26:211217
Kouth J, Smelcerovic A, Borsch T, Zuehlke S, Karppinen K, Spiteller M,
Hohtola A, ellrov E (2011) The hyp-1 gene is not a limiting
factor for hypericin biosynthesis in the genus Hypericum. Funct
Plan Biol 38:3543
Kouth J, Hrehorov D, Jaskolski M, ellrov E (2013) Stress-induced
expression and structure of the putative gene hyp-1 for hypericin
biosynthesis. Plant Cell Tiss Orga Cult 114:207216
Kusari S, Lamshft M, Zhlke S, Spiteller M (2008) An endophytic
fungus from that produces hypericin. J Nal Prod 71:159162
Kusari S, Selahaddin SS, Nigutov K, ellrov E, Spiteller M (2015)
Spatial chemo-profiling of hypericin and related phytochemicals in
Hypericum species using MALDI-HRMS imaging. Anal Bioanal
Chem 407:47794791

Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and


memory-efficient alignment of short DNA sequences to the human
genome. Genome Biol 10:R25
Li B, Dewey CN (2011) RSEM: accurate transcript quantification from
RNA Seq data with or without a reference genome. BMC bioinformatics 12:323
Linde K, Knuppel L (2005) Large-scale observational studies of
hypericum extracts in patients with depressive disordersa systematic review. Phytomedicine 12:148157
Liu B, Falkenstein-Paul H, Schmidt W (2003) Beerhues L.
Benzophenone synthase and chalcone synthase from Hypericum
androsaemum cell cultures: cDNA cloning, functional expression,
and site-directed mutagenesis of two polyketide synthases. Plant J
34:847855
Love MI, Huber W, Anders S (2014) Moderated estimation of fold
change and dispersion for RNA-Seq data with DESeq2. Genome
Biol 15:550
Martin M (2010) Cutadapt removes adapter sequences from highthroughput sequencing reads. EMBnet.journal 17:1012
Michalska K, Fernandes H, Sikorski M, Jaskolski M (2010) Crystal structure of Hyp-1, a St. Johns wort protein implicated in the biosynthesis of hypericin. J Struct Biol 169:161171
Murashige T, Skoog F (1962) A revised medium for rapid growth and
bioassays with tobacco tissue cultures. Plant Physiolog 15:473497
Myhre S, Tveit H, Mollestad T, Laegreid A (2006) Additional gene ontology structure for improved biological reasoning. Bioinformatics
22:20202027
Onelli E, Rivetta A, Giorgi A, Bignami M, Cocucci M, Patrignani G
(2002) Ultrastructural studies on the developing secretory nodules
of Hypericum perforatum. New Phytol 197:92102
Pillai PP, Nair AR (2014) Hypericin biosynthesis in Hypericum
hookerianum Wight and Arn: investigation on biochemical pathways using metabolite inhibitors and suppression substractive hybridization. C R Biologies 337:571580
Robson NKB (2003) Hypericum botany. In: Ernst E (ed) Hypericumthe
genus Hypericum. Taylor and Francis, London, New York, pp 122
Sirvent T, Gibson D (2002) Induction of hypericin and hyperforin in
Hypericum perforatum L. in response to biotic and chemici elicitors.
Physiol Mol Plant P60:311320
Strickler SR, Bombarely A, Mueller LA (2012) Designing a transcriptome next-generation sequencing project for a nonmodel plant
species. Am J Bot 99:25766
Walker EB, Lee T, Song PS (1979) Spectroscopic characterization of the
Stentor photoreceptor. Biochim Biophys Acta 587:129144
Wolkenstein K, Gross JH, Falk H, Schler HF (2006) Preservation of
hypericin and related polycyclic quinone pigments in fossil crinoids.
Proc Biol Sci 273:45156
Xiao M, Zhang Z, Chen X, Lee E, Barber CJS, Chakrabarty R, DesgagnPenix I, Haslam TM, Kim Y, Liu E, MacNevin G, Masada-Atsumi
S, Reed D, Stout JM, Zerbe P, Zhang Y, Bohlmann J, Covello PS,
Luca V, Page JE, Ro D, Martin VJJ, Facchini PJ, Sensen CW (2013)
Transcriptome analysis based on next-generation sequencing of
non-model plants producing specialized metabolites of biotechnological interest. J Biotechnol 166:122134
Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R,
Bolund L, Wang J (2006) WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 34:293297
Zdobnow EM, Apweiler R (2001) InterProScanan integration platform
for the signature-recognition methods in InterPro. Bioinformatics
17:847848
Zhao J, Davis LC, Verpoorte R (2005) Elicitor signal transduction leading
to production of plant secondary metabolites. Biotechnol Adv 23:
283333
Zobayed SM, Afreen F, Goto E, Kozai T (2006) Plant-environment interactions: accumulation of hypericin in dark glands of Hypericum
perforatum. Ann Bot-London 98:793804

Vous aimerez peut-être aussi