Biofuel Optim

ALMA MATER STUDIORUM
UNIVERSIT DI BOLOGNA
FACOLT DI SCIENZE MATEMATICHE, FISICHE E NATURALI
CORSO DI LAUREA MAGISTRALE IN BIOINFORMATICA
Optimising Biofuel production

computational characterisation of gene and
related promoter and enhancer involved in
fatty acid production in algae
Candidato:
Id Antonino
Relatore:
Prof. Giovanni Perini
ANNO ACCADEMICO 2007/2008

SESSIONE III
Supervisione:
Prof. Ugur Sezerman
per I miei genitori
Abstract
Photosynthetic organisms, including plants, algae, and some photosynthetic

bacteria, efficiently utilize the energy from the sun to convert water and CO2
from the air into biomass . Basic research efforts have been made to produce
renewable fuels and chemicals from biomass. Particularly, algae (microscopic,
photosynthetic organisms that live in saline or freshwater environments) were
investigated for their ability to produce lipids as a feedstock and primary storage
molecules for liquid fuel or chemical production. In this respect, the research
conducted on algae emphasized the use of photosynthetic organisms from
aquatic environments, especially species that grow in environments unsuitable
for crop production.
Fatty acids, the building blocks for triacylglycerols (TAG) and all other cellular
lipids are synthesized in the chloroplast using a single set of enzymes, of which
acetyl CoA carboxylase (ACCase) is the key in regulating fatty acid synthesis
rates.
However, the expression of genes involved in fatty acid synthesis are currently
relatively unknown with reference to algae. Synthesis and sequestration of TAG

into cytosolic lipid bodies appears to be a protective mechanism by which algal
cells cope with stress conditions little is known about genomic sequence and
regulation of TAG formation at the molecular and cellular level.
In this project we designed primers to use the predicted gene from the strain
already sequenced to recognize and characterize the (ACCase) from
Scenedesmus Protuberans, a strain of algae that were previously screened as a
potential feedstock of fatty acids taken from strains investigated. We also used
the flanking region of the homologous predicted gene to run up
computational
two
tools to seek for the related conserved regulatory motif,
Transcription Factor Binding Site (TFBS) and identify the differences amongst
the sequences according to the differing amount of fatty acids in these strain.
Index
1 Introduction ..........................................................................................................................1
1.1Statement of the problem ..................................................................................................1
2 Background information .....................................................................................................5
2.1 Algae ............................................................................................................................5
2.2 Fatty acid composition...................................................................................................9
2.3 Biosynthesis of Fatty acids and triaciyglycerols ............................................................14
2.4 Regulation of fatty acid Synthesis .................................................................................18
2.5 Knowledge in other species............................................................................................19
2.6 Feedback regulation .......................................................................................................23
2.7What Controls Promoter Activity of FAS Genes?...........................................................24
3 State of the art ...................................................................................................................25
3.1 Comparison of lipid metabolism in algae and higher plants..........................................25
3.2 Factor affecting tryacilglycerolipids accumulation and fatty acids composition............26
3.2.1 Nutrients..................................................................................................................27
3.2.2Temperature.............................................................................................................28
3.2.3Light intensity..........................................................................................................29
3.2.4Growth phase and Physiological status ...................................................................30
3.2.5Physiological roles of triacylglycerol accumulation ...............................................32
3.3Algae genomic and proposed model system in biofuel production ................................33
3.4Acetil CoA carboxylase protein and genetic characterization ........................................38
4 Methods................................................................................................................................40
4.1 Genetic source ...............................................................................................................40
4.2 Sequence analysis...........................................................................................................43
4.3 Computational methodologies........................................................................................44
4.3.1Bioprospector ..........................................................................................................46
4.3.1.1Scoring segment with background Markov dependency..................................47
4.3.1.2Using motif score distribution to measure goodness of a motif.......................48
4.3.2GALF_P ..................................................................................................................48
4.3.2.1Representation .................................................................................................49
4.3.2.2Fitness Evaluation............................................................................................50
4.3.2.3Selection and Genetic Operators .....................................................................51
4.3.2.4Local filtering Operator ..................................................................................52
4.3.2.5Replacement Strategy.......................................................................................53
4.3.2.6Shift Operator ..................................................................................................53
4.3.3Implementation........................................................................................................54
4.4 Primer design ................................................................................................................55
4.5Experimental procedure .................................................................................................58
5 Result and future perspective.............................................................................................65
1 Introduction
1.1 Statement of the problem
The current global energetic crisis, and the possibility of an ever decreasing oil
and gasoline production, leads to the necessity to research and explore effective
alternatives. Among these effective alternatives biodiesel stands out. Biodiesel
can be defined as the ester derived from oils and fats extracted from renewable
biological sources (1)
Produced by the trans-esterification of triaglycerides with methanol, and

consisting of long chain made of alkyl, methyl propyl or ethyl esters, Biodiesel
comes as an alternative to petroleum-based diesel fuel (2) . It can be used
alone, or mixed with conventional petro-diesel in unmodified diesel-engine
vehicles. Biodiesel is cleaner than petroleum diesel, and distinguished from the
straight vegetable oils (SVO), sometime classified by specialists as "waste
vegetable oil", "WVO", "used vegetable oil", "UVO", "pure plant oil", "PPO"(3).
It is virtually free of sulphur, therefore reducing the quantity of sulfur oxides
normally produced during combustion. The emission of hydrocarbons, carbon
monoxide and particulates during combustion are also significantly reduced in
comparison to emission from petroleum diesel. These properties make biodiesel
effective and in full compliance with the Kyoto Protocol concerning the
greenhouse gas emission. An additional benefit of biodiesel is that it is thought
to be a non-toxic, biodegradable fuel and provides essentially the same energy
content and power output as petroleum-based diesel fuel while reducing
emissions.
Biodiesel production is constantly increasing, with an average annual growth
rate of over 40% during the 2002-2006 period (4). In 2006, the amount of
biodiesel produced in the world ranged 5-6 million tonnes, with 4.9 million
tonnes processed in Europe (of which 2.7 million tonnes in Germany), and great
part of the remaining quantity processed in the USA (5). the sole European
fabrication increased to 5.7 million tonnes (6).Accordingly, the volume of
biodiesel produced in Europe in 2008 has been calculated for a total of 16
million tonnes. These figures have to be considered as part of the circa 490
million tonnes (147 billion gallons) demand for diesel fuel in the US and
Europe(6). Moreover, the global production of vegetable oil for all purposes in
2005/06 touched 110 million tonnes, of which about 34 million tonnes of palm
oil and soybean oil(7) .

However, feedstocks limited efficiency per acre is still the main obstacle to
consider bio-fuel as a reliable supply to sustain the current global fuel demand,
and the
increasing oil demand along with the high costs of good quality
vegetable constantly challenge bio-fuels reliability on industrial scale. Some

typical yields in cubic decimeters (liters) of biodiesel per hectare (10,000 square
meters):
Algae: 2763 dm3 (liter) or more (~300 gallons per acre; est.- see soy
figures and DOE quote below)
Hemp: 1535 dm3 (8)
Chinese tallow: 772 dm3(9) - 970 GPa(10)
Palm oil: 780 - 1490 dm3 (11)
Coconut: 353 dm3 (11)

Rapeseed: 157 dm3] (11)
Soy: 76-161 dm3 in Indiana (12)(Soy is used in 80% of USA biodiesel(13))
Peanut: 138 dm3 (11)
Sunflower: 126 dm3 (11)

Food-grade vegetable oil pricing is on a similar upward ramp as food in general.
Non-food grade vegetable oils, however, are also used to make biodiesel. In
some poor countries the rising price of vegetable oil is causing problems(14)
(15). Some scientists propose that fuel can be made from non-edible vegetable
oils like camelina, jatropha or seashore mallow which can thrive on marginal
agricultural land where many trees and crops will not grow, or render scarce
harvests.
Others trace back this problem to different sources. Some farmers may give up
producing food crops for biofuel crops for economical reasons, even if the new
crops are not edible. Of course, the increasing demand for first generation
biofuel is likely to result in price increases for many kinds of food supplies. On
the other hand, some have pointed out that such situation might bring a wave of
financial gain to those poor farmers and poor countries investing in bio-fuel
crops(16).
2 Background information
2.1
Algae represent an extremely diverse, yet highly specialized group of organism

that live in diverse ecological habitats such as freshwater, brackish, marine and
hyper-saline, with a range of temperature and pH, and unique nutrient
availabilities.(17)
With over 40 000 species already identified, algae are classified in multiple
major groupings as follows :
cyanobacteria (Cyanophyceae)
green algae (Chlorophyceae)
diatoms (Bacillariophyceae)
yellow-green algae (Xanthophyceae)
golden algae (Chrysophyceae)
red algae (Rhodophyceae)
brown algae (Phaeophyceae)

dinoflagellates (Dinophyceae)
pico-plankton (Prasinophyceae and Eustigmatophyceae)
Several additional division and classes of unicellular algae have been described,
and details of their structure and biology are available. (18)
The ability of to survive or proliferate over a wide range of environmental
condition is, to a large extent, reflected in the tremendous diversity and
sometime unusual pattern of cellular lipids as well as the ability to modify lipid
metabolism efficiently in response to changes in environmental condition.(19)
(20)(21)
The lipids may include, but are not limited to, neutral lipids, polar lipids, wax
ester, sterols and hydrocarbons, as well as prenyl derivatives such as tocopherols,
carotetenoids ,terpenes, quinones and phytylated pyrrole derivatives such as the
clorophylls. Unlike higher plants where individual classes of lipid may be
synthesized and localized in a specific cell, tissue or organ, many of these
different types of lipids occur in a single algal cell. After being synthesized,
TAGs are deposited in densely packed lipid bodies located in cytoplasm of the
algal cell, although formation and accumulation on of lipid bodies also occur in
the inter-thylakoid space of the chloroplast in certain green algae, such as
Dunaliella bardawill.(22) in the latter case, the chloroplastic lipid bodies are
referred to as plastoglobuli.
Hydrocarbons are another type of neutral lipid
that can be found in algae at quantities generally <5% DCW.(23) Only the
colonial green algae Botryococcus Brauni, has been shown to produce, under
adverse environmental condition, large quantities (up to 80% DCW) of very long
chain (C23-C40) hydrocarbons, similar to those found in petroleum, and thus has
been explored over the decades as a feedstock for biofuel and biomaterials.(24)
As many algal species have been found to grow rapidly and produce substantial
amounts of TAG or oil, and are thus referred to as oleaginous algae, it has long
been postulated that algae could be employed as a cell factories to produce oils
and other lipids for biofuel and biomaterial (25)(26)
The potential advantages of algae as feedstock for biofuel and biomaterial
include their ability to:
synthesize and accumulate large quantities of neutral lipids/oil (20-50% DCW),
grow at high rate (e.g. 1-3 doubling per day),
thrive in the saline/brackish water/coastal seawater for which there are few
competing demands
tolerate marginal lands (e.g. desert, arid and semiarid lands) that are not
suitable for the conventional agriculture

utilize growth nutrients such as nitrogen and phosphores from a variety wastewater sources (e.g. agricultural run-off, concentrate animal feed operations, and
industrial and municipal waste-water) providing additional benefit of wastewater bio-remediation ,
sequester carbon dioxide from flue gasses emitted from fossil fuel-fired power
plants and other sources, thereby reducing emissions of a major greenhouse gas
produced value added co-products or by-products (e.g. byopolimers, protein,
polysaccharides, pigments, animal feed, fertilizer and H2)
grow in suitable culture vessels (photo-bioreactors) throughout the year with an
annual biomass productivity, on an area basis, exceeding that of terrestrial plants
by approximately tenfold
Based upon the photosynthetic efficiency and growth potential of algae,
theoretical calculation indicate that annual oil production of 30 000 l or about
200 barrels of algal oil per hectare of land may be achievable in mass culture of
oleaginous algae, witch is 100-fold greater than that of soybeans , a major
feedstock currently being used for biodiesel in the USA.
While the 'algae for fuel' concept has been explored in the USA and some other
countries, with interest and funding growing and waning according to the
fluctuations of the world petroleum oil market over the past few decades, no
effort in algae based biofuel production have proceeded beyond rather small
laboratory or field testing stage. The lipid yields obtained from algal mass
culture effort performed to date fall short of the theoretical maximum (at least
10-20 times lower), and have historically made algal oil production technology
prohibitively
expensive.(27)(28)
Recent soaring oil prices, diminishing world oil reserves, and the environmental
deterioration associated with fossil fuel consumption have generated renewed
interest in using algae as an alternative and renewable
feedstock for fuel
production. However, before this concept can become commercial reality, many
fundamental biological questions relating to the biosynthesis and regulation of
fatty acid and TAG in algae need to be fully answered. Clearly, physiological and
genetic manipulations of growth and lipid metabolism must be readily
implementable, and critical engineering breakthroughs related to alga mass
culture and down-stream processing are necessary.
2.2
Fatty acid composition
10
Algae synthesize fatty acids as building blocks for the formation of various types
of lipids. The most commonly synthesized fatty acids have chain lengths that
range from C16 to C18 (Table1), similar to those of higher plants (29). Fatty
acids are either saturated or unsaturated, and unsaturated fatty acids may vary in
the number and position of double bonds on the carbon chain backbone. In
general, saturated and mono-unsaturated fatty acids are predominant in most
algae examined (26). Specifically, the major fatty acids are C16:0 and C16:1 in
the Bacillariophyceae, C16:0 and C18:1 in the Chlorophyceae, C16:0 and C18:1
in the Euglenophyceae, C16:0, C16:1 and C18:1 in the Chrysophyceae, C16:0
and C20:1 in the Cryptophyceae, C16:0 and C18:1 in the Eustigmatophyceae,
C16:0 and C18:1 in the Prasinophyceae, C16:0 in the Dinophyceae, C16:0,
C16:1 and C18:1 in the Prymnesiophyceae, C16:0 in the Rhodophyceae, C14:0,
C16:0 and C16:1 in the Xanthophyceae, and C16:0, C16:1 and C18:1 in
cyanobacteria (30).Polyunsaturated fatty acids (PUFAs) contain two or more
double bonds. Based on the number of double bonds, individual fatty acids are
named dienoic, trienoic, tetraenoic, pentaenoic and hexaenoic fatty acids. Also,
depending on the position of the first double bond from the terminal methyl end
() of the carbon chain, a fatty acid may be either an 3 PUFA (i.e. the third
carbon from the end of the fatty acid) or an 6 PUFAs (i.e. the sixth carbon
from the end of the fatty acid). The major PUFAs are C20:53 and C22:63 in
Bacillarilophyceae, C18:2 and C18:33 in green algae, C18:2 and C18:3 3 in
11
Euglenophyceae, C20:5, C22:5 and C22:6 in Chrysophyceae, C18:33, 18:4 and

C20:5 in Cryptophyceae, C20:3 and C20:4 3 in Eustigmatophyceae, C18: 33
and C20:5 in Prasinophyceae, C18:53 and C22:63 in Dinophyceae, C18:2,
C18:33 and C22:63 in Prymnesiophyceae, C18:2 and C20:5 in
Rhodophyceae, C16:3 and C20:5 in Xanthophyceae, and C16:0, C18:2 and
C18:33 in cyanobacteria (30)(31)
In contrast to higher plants, greater variation in fatty acid composition is found
in algal taxa. Some algae and cyanobacteria possess the ability to synthesize
medium-chain fatty acids (e.g. C10, C12 and C14) as predominant species,
whereas others produce very-long-chain fatty acids (>C20). For instance, a C10
fatty acid comprising 2750% of the total fatty acids was found in the
filamentous cyanobacterium Trichodesmium erythraeum (32), and a C14 fatty
acid makes up nearly 70% of the total fatty acids in the golden alga Prymnesium
parvum (23). Another distinguishing feature of some algae is the large amounts
of very-long-chain PUFAs. For example, in the green alga Parietochloris incise
(33),
the
diatom
Phaeodactylum
tricornutum
and
the
dinoflagellate
Crypthecodinium cohnii (34), the very-long-chain fatty acids arachidonic acid

(C20:46), eicosapentaenoic acid (C20:53) or docosahexaenoic acid
(C22:63) are the major fatty acid species. accounting for 33.642.5%,
approximately 30% and 3050% of the total fatty acid content of the three
species, respectively.
12
It should be noted that much of the data provided previously comes from the
limited number of species of algae that have been examined to date, and most of
the analyses of fatty acid composition from algae have used total lipid extracts
rather than examining individual lipid classes. Therefore, these data represent
generalities, and deviations should be expected. This may explain why some
fatty acids seem to occur almost exclusively in an individual algal taxon. In
addition, the fatty acid composition of algae can vary both quantitatively and
qualitatively with their physiological status and culture conditions.
The properties of biodiesel are largely determined by the structure of its
component fatty acid esters (35).The most important characteristics include
ignition quality (i.e. cetane number), cold-flow properties and oxidative stability.
While saturation and fatty acid profile do not appear to have much of an impact
on the production of biodiesel by the trans-esterification process, they do affect
the properties of the fuel product. For example, saturated fats produce a
biodiesel with superior oxidative stability and a higher cetane number, but rather
poor low-temperature properties. Biodiesels produced using these saturated fats
are more likely to gel at ambient temperatures. Biodiesel produced from
feedstocks that are high in PUFAs, on the other hand, has good cold-flow
properties. However, these fatty acids are particularly susceptible to oxidation.
Therefore, biodiesel produced from feedstocks enriched with these fatty acid
species tends to have instability problems during prolonged storage.
13
14
Table 1: Abbreviation of algal species: B.a., Biddulphia aurica (Orcuut and patterson,
1975); C.sp. Chaetoceros sp. (Renaud et al., 2002); N.sp., Nannochloropsis sp.
(Sukenik,1999); M.s., Monodus subterraneus (Cohen,1999); C.s., Chlorella sorokiniana
(Patterson,1970); C.v., Chlorella vulgaris (Herris et al.,1965); P.i., Parietochloris incise
(Khonizin-Goldberg et al., 2002); E.h., Emiliania huxleyi (Volkman et al., 1981); I.g.,
Isochrysis galbana (Volkman et al., 1981); P.p., Phaeomonas parva (Kawachi et al., 2002);
G.c., Glossomastrix chrysoplasta (Kawachi et al., 2002); A.sp., Aphanocapsa sp. (Kenyon,
1972); S.p., Spirulina platensis (Mhling et al., 2005); T.e., Trichodesmium erythraeum
(Parker et al ., 1967); H.b., Hemiselmis brunescens (Chuecas and Riley, 1969); R.l.,
Rhodomonas lens (Beach et al., 1970); G.s., Gymnodinium sanguineum; S.sp., Scrippsiella
sp. (Mansour et al., 1999).
15
2.3
Biosynthesis of Fatty acids and triaciyglycerols
Lipid metabolism, particularly the biosynthetic pathways of fatty acids and TAG,
has been poorly studied in algae in comparison to higher plants. Based upon the
sequence homology and some shared biochemical characteristics of a number of
genes and/or enzymes isolated from algae and higher plants that are involved in
lipid metabolism, it is generally believed that the basic pathway of fatty acids
and TAG biosynthesis in algae are directly analogous to those demonstrated in
higher plants. It should be noted that because the evidence obtained from algal
lipid research is still fragmentary, some broad generalization are made in this
section based on limited experimental data.
In algae, the de novo synthesis of fatty acid occur primarily in the chloroplast. A
generalized scheme for fatty acids biosynthesis is show in (Figure 2). Overall,
the pathway produces a 16- or 18-carbon fatty acids or both. These are then used
as a precursor for the synthesis of chloroplast and other cellular membranes as
well as for the synthesis of neutral storage lipids, manly TAGs. The committed
step in fatty acid synthesis is the conversion of acetyl CoA to malonyl CoA,
catalyzed by acetyl CoA carboxylase (Acetyl-CoA carboxylase). In the
16
chloroplast, photosynthesis provides an endogenous source of acetyl CoA, and

more than one pathway may contribute to maintaining the acetyl CoA pool. In
oil seed plants, a major route of carbon flux to fatty acid synthesis may involve
cytosolic glycolysis to phosphoenolpyruvate (PEP), which is then preferentially
transported from the cytosol to the plastid, where it is converted to pyruvate and
consequently to acetyl CoA (36) In green algae, as glycolysis and pyruvate
kinase (PK), which catalyzes the irreversible synthesis of pyruvate from PEP,
occur in the chloroplast in addition to the cytosol (37) it is possible that
glycolysis-derived pyruvate is the major photosynthate to be converted to acetyl
CoA for de novo fatty acid synthesis. An Acetyl-CoA carboxylase is generally
considered to catalyze the first reaction of the fatty acid biosynthetic pathway
the formation of malonyl CoA from acetyl CoA and CO2. This reaction takes
place in two steps and is catalyzed by a single enzyme complex. In the first step,
which is ATP-dependent, CO2 (from HCO3) is transferred by the biotin
carboxylase prosthetic group of Acetyl-CoA carboxylase to a nitrogen of a biotin
prosthetic group attached to the -amino group of a lysine residue. In the second
step, catalyzed by carboxyltransferase, the activated CO2 is transferred from
biotin to acetyl CoA to form malonyl CoA (29). malonyl CoA, the product of the
carboxylation reaction, is the central carbon donor for fatty acid synthesis. The
malonyl group is transferred from CoA to a protein co-factor on the acyl carrier
17
protein (ACP; Figure 2)
All subsequent reactions of the pathway involve ACP until the finished products
are ready for transfer to glycerolipids or export from the chloroplast. The
malonyl group of malonyl ACP participates in a series of condensation reactions
with acyl ACP (or acetyl CoA) acceptors. The first condensation reaction forms
a four-carbon product, and is catalyzed by the condensing enzyme, 3-ketoacyl
ACP synthase III (KAS III). Another condensing enzyme, KAS I, is responsible
for producing varying chain lengths (616 carbons). Three additional reactions
occur after each condensation. To form a saturated fatty acid the 3-ketoacyl ACP
product is reduced by the enzyme
3-ketoacyl ACP reductase, dehydrated by hydroxyacyl ACP dehydratase and
then reduced by the enzyme enoyl ACP reductase (Figure 2). These four
reactions lead to a lengthening of the precursor fatty acid by two carbons. The
fatty acid biosynthesis pathway produces saturated 16:0- and 18:0-ACP. To
produce an unsaturated fatty acid, a double bond is introduced by the soluble
enzyme stearoyl ACP desaturase. The elongation of fatty acids is terminated
either when the acyl group is removed from ACP by an acyl-ACP thioesterase
that hydrolyzes the acyl ACP and releases free fatty acid or acyltransferases in
the chloroplast transfer the fatty acid directly from ACP to glycerol-3-phosphate
or monoacylglycerol-3-phosphate. The final fatty acid composition of individual
algae is determined by the activities of enzymes that use these acyl ACPs at the
termination phase of fatty acid synthesis.
18
Figure
2.
Fatty
acid
de
novo
synthesis
pathway
in
chloroplasts.
Acetyl CoA enters the pathway as a substrate for acetyl CoA carboxylase (Reaction 1) as well
as a substrate for the initial condensation reaction (Reaction 3). Reaction 2, which is catalyzed
by malonyl CoA:ACP transferase and transfers malonyl from CoA to form malonyl ACP.
Malonyl ACP is the carbon donor for subsequent elongation reactions. After subsequent
condensations, the 3-ketoacyl ACP product is reduced (Reaction 4), dehydrated (Reaction 5)
and reduced again (Reaction 6), by 3-ketoacyl ACP reductase, 3-hydroxyacyl ACP dehydrase
and enoyl ACP reductase, respectively (adapted from Ohlrogge and Browse 1995).
19
2.4 Regulation of fatty acid Synthesis
All algae must produce fatty acids, and this synthesis must be tightly controlled
to the balance supply and demand for acyl chains. This need can be highly
variable and it depend on
the
stage of development, rate of growth and
surrounding factor as stress or nutrient deficiency[48], and therefore rates of

fatty acids biosynthesis must be closely regulated to meet these changes. Overall
fatty acids synthesis, and consequently its regulation may be more complicated,
unlike other organism, algae fatty acid synthesis , like plants, is not localized
within the cytosol but occurs in an organelle, the plastid, then most of the
amount is exported into the cytosol for glycerolipid assembly at the endoplasmic
reticulum (ER) or other sites. Both of the compartmentalization of lipid
metabolism and the intermixing of lipid intermediates in these pools present
special requirements for the regulation of the synthesis. A system for
communicating between the source and the sinks for fatty acids utilization is
essential. The nature of this communication and the signal molecules involved
remain an unsolved mystery in all vegetal.
Biochemists used different approaches to identify where the regulation occur in
20
a pathway. One approaches depends on examination of the in vitro properties of

enzymes of pathway. Enzymes that have low activity related to other members of
the pathway are frequently considered as potential rate limiting. Other
properties, such as charging during developmental regulation of the pathway, or
influence (activator or inhibition) on the enzyme by intermediates of the
pathway can provide additional evidence toward identification of control points.
2.5
Knowledge in other species
Comparing algae with plants and other species, and based upon sequence
homology of gene and enzyme from algae and plants involved in the lipid
pathway, some generalization hypothesis have been made about to understand
which enzyme regulate fatty acids synthesis. Acetyl-CoA carboxylase is
frequently considered the first committed step in the fatty acid biosynthetic
pathway.
In animals(38) and in yeast (39) there is evidence for Acetyl-CoA carboxylase as
a major regulatory enzyme in fatty acid production. Also for plants this enzyme
was proposed to be rate determining for fatty acid biosynthesis. Several lines of
21
evidence supported this suggestion: Acetate or pyruvate were incorporated into

acetyl-CoA in the dark by isolated chloroplasts, but malonyl-CoA and fatty acids
were formed only in the light(40). Thus, the light-dependent step of fatty acid
synthesis appeared to be at the Acetyl-CoA carboxylase reaction. Eastwell &
Stumpf (41) found that chloroplast and wheat germ Acetyl-CoA carboxylase
were inhibited by ADP and suggested this may account for light-dark regulation
of the enzyme. Nikolau & Hawke (42) characterized the pH, Mg, ATP, and ADP
dependence of maize Acetyl-CoA carboxylase activity and concluded that
changes in these parameters between dark and light conditions could account for
increased Acetyl-CoA carboxylase activity upon illumination of chloroplasts.
Finally, Acetyl-CoA carboxylase activity and protein levels are coincident with
increases and decreases in oil biosynthesis in developing seeds . However, in
vitro approaches are limited because they show only that the enzyme has in vitro
properties consistent with control, The sites of metabolic control of a pathway
can be more reliably identified by examination of in vivo properties of enzymes.
Although it is often technically difficult, examining the concentrations of the
substrates and products of each enzymatic step in a pathway provides
information on which reactions are at equilibrium and which are displaced from
equilibrium. This information is important because an essential feature of almost
all regulatory enzyme.
22
Most of the substrates and intermediates of plant Fatty acids synthesis are
attached to acyl carrier protein (ACP) (Figure 2). Analysis of acyl-ACPs is aided
because the chain length of fatty acids attached to ACP alters the mobility of the
protein in native or urea PAGE gels. Because of these alterations in mobility,
most of the acyl-ACP intermediates of fatty acid synthesis can be resolved, and
when transferred to nitrocellulose, antibodies to ACP can provide sensitive
detection at nanogram levels. Although the acyl-ACP intermediates have a half
life in vivo of only a few seconds (43) (44), by rapidly freezing tissues in liquid
nitrogen it has been possible to determine the relative concentrations of free,
nonacylated ACPs and of the individual acyl-ACPs. Analysis of acyl- ACP pools
has been used to study regulation of fatty acids synthesis in spinach leaf and
seed (45) in chloroplasts in developing castor seeds and in tobacco suspension
cultures (46)(47) . The initial examination of the composition of the acyl-ACP
pools provided information about the potential regulatory reactions in plant fatty
acid biosynthesis. The various saturated acyl-ACP intermediates between 4:0
and 14:0 occur in approximately equal concentrations. Because the 3-ketoacylACPs, enoyl-ACPs, or 3-hydroxyacyl-ACPs, which are substrates for the two
reductases and dehydrase reactions, were not detected, it is likely that these
reactions are close to equilibrium and that the in vivo activities of these enzymes
are in excess. Thus it is not likely that these enzymes are regulatory. In contrast,
the concentration of acetyl-ACP was considerably above that of malonyl-ACP.
23
This result suggests that the acetyl-CoA carboxylase reaction, which has an
equilibrium constant slightly favouring malonyl-CoA formation, is significantly
displaced from equilibrium and therefore potentially regulatory (43). The
condensing enzymes can also be considered displaced from equilibrium because
of the concentration of malonyl-ACP and the saturated acyl-ACPs. To obtain
more information on sites of regulation, the changes in pool sizes when flux
through the fatty acid biosynthetic pathway changes were examined. The rate of
spinach leaf fatty acid biosynthesis in the dark is approximately one sixth the
rate observed in the light (49).
In the light, the predominant form of ACP was the free, nonacylated form,
whereas acetyl-ACP represented about 56% of the total ACP (46). In the dark,
the level of acetyl-ACP increased substantially with a corresponding decrease in
free ACP, such that acetyl-ACP was now the predominant form of ACP. In
similar experiments, when chloroplasts are shifted to the dark, malonyl-ACP and
malonyl-CoA disappear within a few seconds, and acetyl-ACP levels increase
over a period of several minutes. The rapid decrease in malonyl-ACP and
malonyl-CoA when fatty acid synthesis slows, together with the increase in
acetyl-ACP and lack of change in other intermediate acyl-ACP pools all lead to
the conclusion that Acetyl-CoA carboxylase activity is the major determinant of
light/dark control over fatty acids synthesis rates in leaves.
24
The above experiments on acyl-ACP and acyl-CoA pools have been carried out
with dicot plants. Gramineae species such as maize and wheat have a
substantially different (homodimeric) structure of Acetyl-CoA carboxylase. (50)
by a different approach toward evaluating metabolic control. They
took
advantage of the susceptibility of maize and barley plastid Acetyl-CoA

carboxylase to the herbicides fluaxifop and sethoxidim. When chloroplasts or
leaves were incubated with herbicides and radiolabeled acetate, a flux control
coefficient of 0.5 to 0.6 was calculated for acetate incorporation into lipids. Flux
control coefficients of this magnitude indicate strong control by the Acetyl-CoA
carboxylase reaction over fatty acid synthesis (51). Thus, in a wide variety of
species and tissues, both in vivo and in vitro experiments point ton Acetyl-CoA
carboxylase as a major regulatory point for plant fatty acid synthesis.
2.6 Feedback regulation
Most biochemical pathways are controlled in part by a feedback mechanism

which fine-tunes the flux of metabolites through the pathway. Whenever the
product of a pathway builds up in the cell to levels in excess of needs, the end
product inhibits the activity of the pathway. In most cases this inhibition occurs
at a regulatory enzyme which is often the first committed step of the pathway.
When the activity of the regulatory enzyme is reduced, all subsequent reactions
25
are also slowed as their substrates become depleted by mass-action. Because

enzyme activity can be rapidly changed by allosteric modulators, feedback
inhibition of regulatory enzymes provides almost instantaneous control of the
flux through the pathway. It has long been considered that fatty acid synthesis is
partly controlled by feedback on Acetyl-CoA carboxylase by long-chain acylCoAs. Since acyl-CoAs are one end product of the FAS pathway. Although this
inhibition seems logical, it has been called into question by the discovery that
acyl-CoAbinding proteins exist at high concentrations in the cytosol of animals
(52), yeast (53), and plants (54). Because these proteins have extremely high
affinity for acyl-CoAs, the concentration of free acyl-CoA in the cytoplasm may
be only nanomolar, a level unlikely to inhibit Acetyl-CoA carboxylase.
Several other potential feedback inhibitors such as acyl-CoA, free fatty acids,
and glycerolipids also fail to strongly inhibit the plant Acetyl-CoA carboxylase at
physiological concentrations (55). Because FAS occurs inside the plastid but the
major utilization of the products of fatty acid synthesis is at the ER membranes,
it is likely that feedback regulation must allow communication across the plastid
envelope. At this time we do not have any clear indications of what molecules
are involved in feedback regulation of plastid fatty acid synthesis.
2.7 What Controls Promoter Activity of FAS Genes?

A major challenge for the future is to discover how the level of expression of
genes of lipid synthesis are controlled. Efforts are under way to identify
26
transcription factors that may bind to these elements in different organism. In

addition,computational and
genetic approaches may allow identification of
additional controls.
Understanding how cells regulate the production of these fatty acids and direct
them toward their different functions is thus central to understanding a large
range of fundamental questions in algae biology. We now have convincing
evidence that Acetyl-CoA carboxylase is one enzyme that is involved in
regulating fatty acid synthesis rates, this is only the beginning. But what
molecules regulate Acetyl-CoA carboxylase by feedback or other mechanisms
and what metabolic signals or mechanisms control those molecules? We don't
have information about the nature of these controls. Thus, understanding
regulation of fatty acid synthesis is a rich and relatively unexplored field with
much work left to be done.
27
3 State of the art
3.1 Comparison of lipid metabolism in algae and higher

plants
Although algae generally share similar fatty acid and TAG synthetic pathways
with higher plants, there is some evidence that differences in lipid metabolism
do occur. In algae, for example, the complete pathway from carbon dioxide
fixation to TAG synthesis and sequestration takes place within a single cell,
whereas the synthesis and accumulation of TAG only occur in special tissues or
organs (e.g. seeds or fruits) of oil crop plants. In addition, very long PUFAs
above C18 cannot be synthesized in significant amounts by naturally occurring
higher plants, whereas many algae (especially marine species) have the ability to
synthesize and accumulate large quantities of very long PUFAs, such as
eicosapentaenoic acid (C20:53), docosahexaenoic acid (C22:63) and
arachidonic acid (C20:46). Recently, annotation of the genes involved in lipid
metabolism in the green alga C.reinhardtii has revealed that algal lipid
metabolism may be less complex than in Arabidopsis, and this is reflected in the
presence and/or absence of certain pathways and the apparent sizes of the gene
families that represent the various activities (55)
28
3.2 Factor affecting tryacilglycerolipids accumulation

and fatty acids composition
Although the occurrence and the extent to which TAG is produced appear to be
species/strain-specific, and are ultimately controlled by the genetic make-up of
individual organisms, oleaginous algae produce only small quantities of TAG
under optimal growth or favourable environmental conditions (57). Synthesis
and accumulation of large amounts of TAG accompanied by considerable
alterations in lipid and fatty acid composition occur in the cell when oleaginous
algae are placed under stress conditions imposed by chemical or physical
environmental stimuli, either acting individually or in combination. The major
chemical stimuli are nutrient starvation, salinity and growth-medium pH. The
major physical stimuli are temperature and light intensity. In addition to
chemical and physical factors, growth phase and/or aging of the culture also
affects TAG content and fatty acid composition.
3.2.1
Nutrients
Of all the nutrients evaluated, nitrogen limitation is the single most critical
nutrient affecting lipid metabolism in algae. A general trend towards
29
accumulation of lipids, particularly TAG, in response to nitrogen deficiency has

been observed in numerous species or strains of various algal taxa, (58)
In diatoms, silicon is an equally important nutrient that affects cellular lipid
metabolism. For example, silicon-deficient Cyclotella cryptica cells had higher
levels of neutral lipids (primarily TAG) and higher proportions of saturated and
mono-unsaturated fatty acids than silicon-replete cells (59).
Other types of nutrient deficiency that promote lipid accumulation include
phosphate limitation and sulfate limitation. Phosphorus limitation resulted in
increased
lipid
content,
mainly
TAG,
in
Monodus
subterraneus
(Eustigmatophyceae) (60) P.tricornutum and Chaetocerossp. (Bacillariophyceae),

and I.galbana and Pavlova lutheri (Prymnesiophyceae), but decreased lipid
content in Nannochloris atomus (Chlorophyceae) and Tetraselmis sp.
(Prasinophyceae) (61). Of marine species examined(61), increasing phosphorus
deprivation was found to result in a higher relative content of 16:0 and 18:1 and a
lower relative content of 18:43, 20:53 and 22:63. Studies have also shown
that sulfur deprivation enhanced the total lipid content in the green algae
Chlorella sp. and C. reinhardtii (62).
Cyanobacteria appear to react to nutrient deficiency differently to eukaryotic
algae. Piorreck in the (1996) investigated the effects of nitrogen deprivation on
the lipid metabolism of the cyanobacteria Anacystis nidulans, Microcystis
30
aeruginosa, Oscillatoria rubescens and Spirulina platensis, and reported that

either lipid content or fatty acid composition of these organisms was changed
significantly under nitrogen-deprivation conditions. When changes in fatty acid
composition occur in an individual species or strain in response to nutrient
deficiency, the C18:2 fatty acid levels decreased, whereas those of both C16:0
and C18:1 fatty acids increased, similar to what occurs in eukaryotic algae. In
some cases, nitrogen starvation resulted in reduced synthesis of lipids and fatty
acids (63).
3.2.2 Temperature
Temperature has been found to have a major effect on the fatty acid composition
of algae. A general trend towards increasing fatty acid unsaturation with
decreasing temperature and increasing saturated fatty acids with increasing
temperature has been observed in many algae and cyanobacteria (64) . It has
been generally speculated that the ability of algae to alter the physical properties
and thermal responses of membrane lipids represents a strategy for enhancing
physiological acclimatization over a range of temperatures, although the
underlying regulatory mechanism is unknown (65). Temperature also affects the
total lipid content in algae. For example, the lipid content in the chrysophytan
Ochromonas danica (66) and the eustigmatophyte Nannochloropsis salina (67)
31
increases with increasing temperature. In contrast, no significant change in the

lipid content was observed in Chlorella sorokiniana grown at various
temperatures (68). As only a limited amount of information is available on this
subject, a general trend cannot be established.
3.2.3 Light intensity

Algae grown at various light intensities exhibit remarkable changes in their gross
chemical composition, pigment content and photosynthetic activity (69).
Typically, low light intensity induces the formation of polar lipids, particularly
the membrane polar lipids associated with the chloroplast, whereas high light
intensity decreases total polar lipid content with a concomitant increase in the
amount of neutral storage lipids, mainly TAGs (70).
The degree of fatty acid saturation can also be altered by light intensity. In
Nannochloropsis sp., for example, the percentage of the major PUFA C20:53
remained fairly stable (approximately 35% of the total fatty acids) under lightlimited conditions. However, it decreased approximately threefold under lightsaturated conditions, concomitant with an increase in the proportion of saturated
and mono-unsaturated fatty acids (i.e. C14, C16:0 and C16:17) (71). Based
upon the algal species/strains examined (72), it appears, with a few exceptions,
32
that low light favors the formation of PUFAs, which in turn are incorporated into
membrane structures. On the other hand, high light alters fatty acid synthesis to
produce more of the saturated and mono-unsaturated fatty acids that mainly
make up neutral lipids.
3.2.4 Growth phase and Physiological status

Lipid content and fatty acid composition are also subject to variability during the
growth cycle. In many algal species examined, an increase in TAGs is often
observed during stationary phase. For example, in the chlorophyte Parietochloris
incise, TAGs increased from 43% (total fatty acids) in the logarithmic phase to
77% in the stationary phase (23), and in the marine dinoflagellate Gymnodinium
sp., the proportion of TAGs increased from 8% during the logarithmic growth
phase to 30% during the stationary phase. Coincident increases in the relative
proportions of both saturated and mono-unsaturated 16:0 and 18:1 fatty acids
and decreases in the proportion of PUFAs in total lipid were also associated with
growth-phase transition from the logarithmic to the stationary phase. In contrast
to these decreases in PUFAs, however, the PUFA arachidonic acid (C20:46) is
the major constituent of TAG produced in Parietochloris incise cells (23) while
33
docosahexaenoic acid (22:63) and eicosapentaenoic acid (20:53) are

partitioned to TAG in the Eustigmatophyceae N.oculata, the diatoms
P.tricornutum and T.pseudonana, and the haptophyte Pavlova lutheri (73).
Culture aging or senescence also affects lipid and fatty acid content and
composition. The total lipid content of cells increased with age in the green alga
Chlorococcum macrostigma, and the diatoms Nitzschia palea, Thalassiosira
fluviatillis and Coscinodiscus eccentricus . An exception to this was reported in
the diatom P.tricornutum, where culture age had almost no influence on the total
fatty acid content, although TAGs were accumulated and the polar lipid content
was reduced (74). Analysis of fatty acid composition in the diatoms
P.tricornutum and Chaetoceros muelleri revealed a marked increase in the levels
of saturated and mono-unsaturated fatty acids (e.g. 16: 0, 16:17 and 18:19),
with a concomitant decrease in the levels of PUFAs (e.g. 16:34 and 20:53)
with increasing culture age (75). Most studies on algal lipid metabolism have
been carried out in a batch culture mode. Therefore, the age of a given culture
may or may not be associated with nutrient depletion, making it difficult to
separate true aging effects from nutrient deficiency-induced effects on lipid
metabolism.
34
3.2.5 Physiological roles of triacylglycerol

accumulation
Synthesis of TAG and deposition of TAG into cytosolic lipid bodies may be,
with few exceptions, the default pathway in algae under environmental stress
conditions. In addition to the obvious physiological role of TAG serving as
carbon and energy storage, particularly in aged algal cells or under stress, the
TAG synthesis pathway may play more active and diverse roles in the stress
response. The de novo TAG synthesis pathway serves as an electron sink under
photo-oxidative stress. Under stress, excess electrons that accumulate in the
photosynthetic electron transport chain may induce over-production of reactive
oxygen species, which may in turn cause inhibition of photosynthesis and
damage to membrane lipids, proteins and other macromolecules. The formation
of a C18 fatty acid consumes approximately 24 NADPH derived from the
electron transport chain, which is twice that required for synthesis of a
carbohydrate or protein molecule of the same mass, and thus relaxes the overreduced electron transport chain under high light or other stress conditions. The
TAG synthesis pathway is usually coordinated with secondary carotenoid
synthesis in algae(76). The molecules (e.g. -carotene, lutein or astaxanthin)
produced in the carotenoid pathway are esterified with TAG and sequestered into
cytosolic lipid bodies. The peripheral distribution of carotenoid-rich lipid bodies
serve as a 'sunscreen' to prevent or reduce excess light striking the chloroplast
35
under stress. TAG synthesis may also utilize PC, PE and galactolipids or toxic
fatty acids excluded from the membrane system as acyl donors, thereby serving
as a mechanism to detoxify membrane lipids and deposit them in the form of
TAG.
3.3 Algae genomic and proposed model system in biofuel

production
Some eukaryotes genome have been sequenced. These eukaryotes include C.

reinhardtii and Volvox carteri (green alga), Cyanidioschizon merolae (red alga),
Osteococcus lucimarinus and Osteococcus tauris (marine pico-eukaryotes),
Aureococcus annophageferrens (a harmful algal bloom component), P.
tricornutum and T. pseudonana (diatoms) (table 1).
Many effort have been done to sequenced diverse eukaryotic algae that represent
a diverse group of organisms and at the time many project are in progress to
improve the genetic knowledge but the only organism among the latter for
which extensive genomic, biological and physiological data exist is C.
reinhardtii, a unicellular, water-oxidizing green alga (77). Among these ,
Chlamydomonas has been used as a model eukaryote microbe for the study of
many processes, including photosynthesis, phototaxis, flagellar function,
36
nutrient acquisition, and the biosynthesis and functions of lipids. The advantage
of C.reinhardtii as a model for oxygenic photosynthesis derives mainly from its
ability to grow either photo-, mixo- or heterotrophically (in the dark and in the
presence of acetate) while maintaining an intact, functional photosynthetic
apparatus. This property has allowed researchers to study photosynthetic
mutations that are lethal in other organisms. Moreover, C.reinhardtii spends most
of its life cycle as a haploid organism of either mating type + or .
Gametogenesis is triggered by environmental stresses, particularly nitrogen
deprivation , and its occurrence can be synchronized by light/dark periods of
growth. During its haploid stage, C. reinhardtii can be genetically engineered
and single genotypes easily generated. Additionally, different phenotypes can be
obtained by crossing two haploid mutants of different mating types carrying
different genotypes. Conversely, single-mutant genotypes can be unveiled by
back-crossing mutants carrying multiple mutations with the wild-type strain of
the opposite mating type.
Global expression profiling of Chlamydomonas under conditions that produce
biofuels (H2 in this case) (78) has been reported using second-generation
microarrays with 10 000 genes of the over 15 000 genes predicted (77). Much of
the information that was reported involves fermentative metabolism. No
concerted effort to characterize up- and downregulation of genes associated with
37
lipid metabolism when Chlamydomonas is exposed to nutrient stress has yet

been reported. Nevertheless, N-deprived C.reinhardtii will over-accumulate
starch and lipids that can be used for formate, alcohol and biodiesel production
(78).
Procedures for metabolite profiling of C. reinhardtii CC-125 cells, which quickly
inactivate enzymatic activity, optimize extraction capacity, and are amenable to
large sample sizes, were reported recently (79). Sulfur, Nitrogen-, phosphateand iron-deprivation profiles were examined, and each metabolic profile was
different. Sulfur depletion leads to the anaerobic conditions required for
induction of the hydrogenase enzyme and H2 production (80). Rapidly sampled
cells (cell leakage controls were determined by
14C-labeling
techniques) were
analyzed by gas chromatography coupled to time-of-flight mass spectrometry,

and more than 100 metabolites (e.g. amino acids, carbohydrates, phosphorylated
intermediates, nucleotides and organic acids) out of about 800 detected could be
identified. The concentrations of a number of phosphorylated glycolysis
intermediates increase significantly during sulfur stress (79), consistent with the
upregulation of many genes associated with starch degradation and fermentation
observed
in
anaerobic
Chlamydomonas
cells(78).
Unfortunately
lipid
metabolism was not studied. Finally, researchers are starting to ask whether
Chlamydomonas and other green algae have the required metabolic pathways to
38
produce other energy-rich products such as butanol.

Chlamydomonas proteomics is in its infancy, but there have been a number of
relevant studies, as reviewed by (81). However, to our knowledge, no proteomics
research has yet been reported in algae under biofuel-producing conditions.
Genus species
Sequencing status
reference
green algae
complete assembly release,
v 3.0
Chlamydomonas reinhardtii
JGI genome project
complete assembly release
Chorella spNC64A
JGI genome project
v 1.0
Chorella vulgaris
in progress
genbank
v.2.0
Ostreococcus lucimaris
JGI genome project
Complete assembly release
Ostreococcus tauri
JGI genome project
v2.0,
v1.0,
Volvox carteri
JGI genome project
Scenedesmus obliquus
minimal
genbank
Dunaliella salina
minimal
genbank
diatom
Phaeodactylum tricornutum
v2.0
JGI genome project
v 3.0
Thalassiosira pseudonana
JGI genome project
red algae
Cyanidioschyzon
merolae
Cyanidioschyzon merolae
complete
project
brown algae
Aureococcus anophagefferens v1.0,
JGI genome project
39
3.4 Acetil CoA carboxylase protein and genetic

characterization
To characterized the first committed step in the fatty acid biosynthetic pathway,
different study have been conducted to seek for Acetyl-CoA carboxylase but just
in few species it is know .
The gene that encodes Acetyl-CoA carboxylase in Cyclotella cryptica has been
isolated and cloned [82]. The gene was shown to encode a polypeptide
composed of 2089 amino acids, with a molecular mass of 230 kDa. The deduced
amino acid sequence exhibited strong similarity to the sequences of animal and
yeast Acetyl-CoA carboxylases in the biotin carboxylase and carboxyltransferase
domains. Less sequence similarity was observed in the biotin carboxyl carrier
protein domain, although the highly conserved Met-Lys-Met sequence of the
biotin binding site was present. The N-terminus of the predicted Acetyl-CoA
carboxylase sequence has characteristics of a signal sequence, indicating that the
40
enzyme may be imported into chloroplasts via the endoplasmic reticulum. Also
the protein has been purified and kinetically characterized from two unicellular
algae, the diatom Cyclotella cryptica [82] and the prymnesiophyte Isochrysis
galbana [83]. Native Acetyl-CoA carboxylase isolated from Cyclotella cryptica
has a molecular mass of approximately 740 kDa and appears to be composed of
four identical biotin-containing subunits. The molecular mass of the native
Acetyl-CoA carboxylase from I.galbana was estimated at 700 kDa. This
suggests that Acetyl-CoA carboxylases from algae and the majority of AcetylCoA carboxylases from higher plants are similar in that they are composed of
multiple identical subunits, each of which are multi-functional peptides
containing domains responsible for both biotin carboxylation and subsequent
carboxyl transfer to acetyl CoA [83].
Investigated changes in the activities of various lipid and carbohydrate
biosynthetic enzymes in the diatom Cyclotella cryptica in response to silicon
deficiency. The activity of Acetyl-CoA carboxylase increased approximately
two- and fourfold after 4 and 15 h of silicon-deficient growth, respectively,
suggesting that the higher enzymatic activity may partially result from a covalent
modification of the enzyme. As the increase in enzymatic activity can be
blocked by the addition of protein synthesis inhibitors, it was suggested that the
enhanced Acetyl-CoA carboxylase activity could also be the result of an increase
in the rate of enzyme synthesis[82].
41
No more experimental result are available about the sequence of this gene, but
from the algae's genome know some sequence are predicted by comparison with
other species
4 Methods
4.1
Genetic source
The purpose of this thesis is to analyse the gene (ACCase) and the amount of
fatty acids in algae. Thus the main source of information is represented by
amount of fatty acids, (TAGs), predicted genes and related information on them.
As mentioned above, only one gene from Cyclotella cryptica has been
experimental characterized. To retrieve the most part of this information we
used different databases, principally AlgaeBase and Joint Genome Institute
databases, although Genebank and EMBL- BANK were also used. A brief
description of the less known databases are give below.
AlgaeBase is a database of information on algae that includes terrestrial, marine
42
and freshwater organisms. At present, the data for the marine algae, particularly
seaweeds, are the most complete and also include sea-grasses. Unfortunately it
is also a work in progress and much of the data is incomplete. This database was
used particularly to collect information about the quantity of fatty acids and
(TAGs) from previously published articles.
JGI is a comprehensive database of the genomes of Eukaryotic species and it
has the most complete genomic data of algae strains, data is organized following
the scheme whereby the main object is the genome; this is organized into
portions of the genomic sequence reconstructed from the end sequence , called
the scaffold. They are composed of contigs and gaps. One chromosome may be
represented by many scaffolds, depending on the extend of the genome
information. The database holds a list of prediction genes that can be retrieved
by browser (KOG) EuKaryotic Orthologous Group. Using a KOG tool for a JGI
sequence organism provides a way to find predicted genes by functional
classification or identified number as well as protein sequence and related. the
genes are referred to using Internal and external cross-referenced annotations.
The data was found using a query on the KOG browser by functional KOG
database and by following the cross reference annotation to NCBI genebank
stored in local. More details are available in the (table 2)
43
44
Genus species
Alpha
Beta
Biotin
Lipids
Source
Clamidomonas reinhardtii
ID:5722616
ID:5728708
ID:5727859
18-24% DW
NCBI
Chorella spNC64A
ID: 36222
-----
-----
29% DW
GJI
Chorella Protothecoides
-----
-----
15-55% DW
Xu et al. 2006
Volvox Carteri
ID: 106840*
Scenedesmus obliquus
ID:EC187118**
Scenedesmus TR84
-----
-----
ID: 82311
-----
-----
GJI
-----
-----
-----
EBI
-----
-----
Locus :CR954208
Ostreococcus Tauri
Sheehan et al.1998
-----
NCBI GenBank
ID:4999505
-----
NCBI
-----
-----
EBI
Ostreococcus Lucimaris
-----
Dunaliella Salina
-----
Dunadiella Tertioleca
-----
-----
-----
36-42% DW
Kischimoto et. al.1994
Sticoccus
-----
-----
-----
33% DW
sheehan et al.1998
Ankistrodesmus TR84
-----
-----
-----
28-40 DW
Tornabene et al. 1998
Botryococcus Braunii
-----
-----
-----
29-75% DW
Metzger et al. 2005
Parietochloris Incisa
-----
-----
-----
35% DW
Roessler (1988)
Thalassiosira Pseudonana
-----
-----
21-31% DW
GJI
Cyclotella Criptica
21% DW
NCBI GenBank
Cyclotella Di-35
Nitzschia TR-114
Hantzschia DI 160
M Phaeodactylum Triconutum
-----
45% DW
ID:EF363909*
ID: 6770
Locus :L20784
-------------
-----
-----
-----
42% DW
Sheehan et al.1998
-----
-----
28-50% DW
Kyle DJ, et al. 1991
-----
-----
11-31% DW
Sheehan et al.1998
11-31% DW
GJI
-----
ID:55209
Table2:of gene sequence and quantity of fatty acids (*cDNA sequence; ** besed on proteine
aligment
45
4.2
Sequence analysis
In order to find a data set as reliable as possible, different strategies were applied
these depended on the database employed. Nevertheless, in most cases the use of
a Basic Local Alignment sequence tool (Blast) in comparison to the nucleotide
or protein resulted in being able to see what was relevant and of particularly
importance. BLAST uses statistical theory to produce a bit-score and expect
value (E-value) for each alignment pair. The E-value gives an indication of the
statistical significance of a given pairwise alignment and reflects the size of the
database and the scoring system used. The lower the E-value, the more
significant the hit. A sequence alignment that has an E-value of 0.05 means that
this similarity has a 5 in 100 (1 in 20) chance of occurring by chance alone. A
strict, high E-value threshold (1-80) has been applied in order to keep the most
reliable sequences. Following these strategies we managed to seek out ACCase
predicted genes and some sequences that point out high E-value beyond the
threshold, furthermore to evaluate meaningful sequences ClustalW in local was
used. This perform a global-multiple sequences alignment by the progressive
method with the following steps: perform pair-wise alignment of all sequences
by dynamic programming, use the alignment scores to produce a phylogenetic
46
tree by neighbour-joining, align the multiple sequences sequentially guided by

the phylogenetic tree, thus, the most closely related sequences are aligned first,
then additional sequences and then groups of sequences are added. These are
guided by initial alignment shown in each column of the sequence variations
among the sequences.
Using Clustalw it was possible to discriminate between the sequences in the
family groups Diatom and Green algae and highlight the difference between
them in the ACCase sequences, in different sub-sequence of the same gene
called ( alpha, Beta and Biotin) in accordance with the predicted sequences as
shown in table2. However, it was useful to discard those sequences that showed
a high E-value and lower similarities with each other.
The best matches were determined in different stages refining the alignments by
setting up the Clustalw's parameters like: slow- Accurate alignment, cost of the
gap, and the IUB DNA weight matrix.
4.3
Computational methodologies
47
The methodology that was used to carry out this study was supported by
bioinformatic tools to find out the promoter gene sequence. In the first place,
the theoretical idea will be introduced and after that the tools that were used
Computational methods for the discovery of the conserved TFBSs can be split
into two broad categories: the 'single species, many genes' approach and the
'single gene, many species' one [84]. In the first case, a set of regions (i.e.,
promoters) from co-regulated genes were analysed to look for over-represented
motifs, that is, the TFBSs responsible for the co-regulation of the genes; while in
the second approach, known as phylogenetic footprinting , a single gene was
investigated and non coding regions flanking were compared to their homologs
in other species. Non coding sequence elements that were found to be conserved
by evolution are likely to be involved in the regulation of the expression of the
gene.
Clearly, the two approaches can be merged and each set of co-regulated genes
can be compared both to its homologs and to each other, this analysis can also be
performed on a full-genomic scale . Given that an ever increasing number of
annotated genomic sequences are available, phylogenetic footprinting has
become more and more widely used since it avoids the need to assemble a set of
co-regulated genes, this in turn implies the need to build a reliable dataset, and
allows for the investigation of a single gene.
48
The available methods that are more commonly used first build an alignment,
either local or global, of the sequences investigated, or to take advantage of the
pre-computed full genomic alignments now available. One simple solution is to
identify conserved functional elements by using descriptors of the binding
specificity of TFBSs I.e
position specific weight matrices, provided, for
example, in the TRANSFAC database, and also to look for conserved aligned
regions fitting the descriptor. This approach can be used for the detection of
single TFBSs.
Methods of this kind need reliable descriptors of the binding specificity of the
different TFs. Usually, PWMs yield a large number of false positive matches and
in requiring a match to be conserved throughout different sequences it is
possible to reduce them, however, the problem of defining whether a match is
significant in all the species considered remains. Secondly, and more
importantly, is the need to have a reliable alignment of the sequences
investigated. TFBSs tend to be quite short (815 nucleotides), in comparison
with a normally analysed region of 5001000 bp, in these cases the sequences
aligned are too divergent resulting in the possibilities that conserved TFBSs
may be missed simply because they are not aligned correctly.
To overcome this, different solution were identified : Bioprospector [85] and
Galf_P [86]were used. Both of them used a heuristic method and both seek for
TFBSs without any prior preset knowledge, a brief description of both programs
is given in the following paragraphs.
49
4.3.1 Bioprospector
This examines the upstream region of genes in the same gene expression pattern
group and looks for conservative sequence motifs. BioProspector uses zero to
third-order Markov background models, whose parameters are either given by
the user or estimated from a specified sequence file. The significance of each
motif found is judged based on a motif score distribution .
It takes the following input parameters; a file where the flanking region are
stored, a file with the background distribution using calculations on the bases of
the input file and the widths of the motif. At the end of the bioprospector run the
following results can be obtained : The motif score, significance value and the
number of the significance alignments segments. A regular expression of the
motif consensus and degenerates, as well as a probability matrix expression of
the motif. The number of segments each input sequence contributes to the motif,
the starting position and the sequence of each segment.
Within each run of BioProspector, a process called threshold sampler is
performed a number of times. Threshold sampler initializes a motif probability
matrix by a random alignment of the input sequences and improves the matrix
iteratively and stochastically
4.3.1.1Scoring segment with background Markov

dependency
Every possible segment of width w within a randomly chosen sequence s (in
input file) is considered. A score Ax = Qx / Px is computed and a new
50
alignment position is sampled with probability proportional to Ax. Here Qx and

Px are the probability of generating segment x from the current motif matrix
and from the independent background model , respectively. In DNA, however,
the presence of a particular nucleotide usually has influence on its neighbouring
positions, so a better way to evaluate Px is based on Markov background.
4.3.1.2Using motif score distribution to measure goodness of

a motif
Motif Score = #seg exp{[all positions i all nucleotides j qi,j log(qi,j /pj)] / w}
in which #seg is the number of aligned segments in the motif, qi,j is the
probability of observing nucleotide j at position i of the motif matrix , and pj
is the probability of observing nucleotide j from the background probabilities .
To see how significant an observed motif score is, the program generates M
independently and identically distributed sequence sets under the input sequence
probability model, where each generated set is identical to the input file in
sequence number and length. For each generated sequence set, a number of
threshold sampler runs are performed and the highest motif score is recorded. A
normal distribution is then fitted to the M recorded scores. With this score
distribution, BioProspector runs the original sequence through the threshold
sampler and reports motifs that are (defaults to 5) standard deviations above the
motif score distribution mean.
51
4.3.2 GALF_P
This is a genetic algorithm which combines and utilises both, position-led and
consensus-led motif representations.
The consensus-led approach is the one in which an individual is encoded as
potential consensus represented by (A,T,G,C). The position-led represent in a
array the possible starting position of TFBS in each sequence. Both methods
present advantages and disadvantages. The individuals of the consensus method
can be randomly generated or picked up randomly from the subsequences with
motif length among all the input sequences. One disadvantage of consensus-led
approaches is that they require the scanning of all the sequences to align each
one to the consensus when evaluating a single individual. this imposes an
intensive computational load. Additionally, when the consensus happens to be a
shifted version of the true one, they have no easy techniques to correct the
consensus. Position-led approaches have more flexibility to move around the
search space compared with consensus-led ones, because there is freedom to
change any starting position, which consequently changes the motif
configuration, it is easy to shift the motif by changing all positions with the
same small step. However, the representation cannot provide a global view of the
quality of each TFBS position and cannot remove a small portion of the
unsuitable positions easily.
This algorithm not only used both of the representations to take advantage, but
also, to some extend, limits the drawbacks by the implementation of a filtering
operator, which employs consensus-led representation and avoids the tendency
of position-led to accumulate false positives within an individual. Consequently
52
both efficiency and accuracy can be achieved.
4.3.2.1Representation
For an individual, the basic representation is the position- led one, which is an
array storing the starting position in each sequence. Starting from each position
in an individual, a subsequence with the motif width can be extracted and is
called a motif instance. The consensus of an individual is represented by a
Position-specific Weight Matrix (PWM) generated from the motif instances.
Each cell in the PWM indicates the normalized frequency of the nucleotide in a
particular position of the motif instances
4.3.2.2Fitness Evaluation
The fitness function adopt for each individual is its information content. For
each position i in the extracted motif instances, the positional information
content is :
53
where fb is the observed frequency of nucleotide b on the column and pb is the

background frequency of the same nucleotide. The summation is taken over the
four possible types of nucleotides (b {A, T, C, G}). And the fitness is the sum
of positional information content which has the following form
where W is the motif width.

Though known regulatory motifs do not always have the highest information
content at every base position, the sum of positional information content is still a
good measure to reflect the overall conservations since, for the moment, no
completely satisfactory measurement exists.
As for the consensus representation, the similarity score is used to pick out those
false positives of motif instances from a position-led individual. The instance
similarity is calculated as the sum of the score of each corresponding letter in
the PWM of the consensus:
where bi is the nucleotide in position i of the motif instance, and PWM (bi , i) is
54
the score of bi at position i in the PWM.
4.3.2.3Selection and Genetic Operators
Binary tournament is employed for parent selection. In particular, when

choosing a parent, it is randomly picked up by two individuals and the one with
higher fitness is chosen. The purpose is to maintain appropriate selection
pressure under which some of the currently unfit individuals have the chance to
reproduce and this may yield robust offspring in further generations. For
reproduction, single-point mutation for a single parent and single-point crossover
for double parents are applied. Mutation and crossover are performed with a
total probability of 1. While mutation is chosen, one of the positions of the
single parent will be shifted randomly. While crossover is applied, a crossover
point is chosen at random from [1, SeqNum 1], where SeqNum is the number
of sequences. Then the segments of the two parents after the crossover point are
swapped, yielding two children. One of them will be chosen at random as the
offspring. After reproduction to generate offspring, the population is increased
by a half for replacement.
4.3.2.4Local filtering Operator
55
One of the feature operators in GALF is the local filtering operator, which can
filter out the false positives in a position-led individual in terms of the motif
instances' similarities to the consensus represented by PWM.
Firstly, the motif instances within an individual is ranked by their similarity
scores to the consensus. Secondly, the sequence containing the instance with the
lowest similarity score is scanned. Among all the possible starting sites of the
instance, the one giving the best similarity to the consensus is chosen. If the rank
does not change, which means this best instance is not, in fact, better than its
original preceding instance based on the other sequence in terms of similarity
score. In this case the local filtering stops, otherwise the preceding instance
becomes the worst, the sequence containing it is then selected and scanned as in
the first step. This is iterated until the rank does not change or the sequence
containing the original second best instance is scanned. It is notice that the
PWM will not be updated before the local filtering is finished for two purposes;
one is to save the computational load compared with the on-line update, the
other purpose is to try not to be too greedy. In order to keep the contribution of
evolutionary process, the filtering operator is only triggered once after certain
generation intervals and applied to those only newly generated
4.3.2.5Replacement Strategy
56
Replacement is applied to keep the population size constant after the increase of
individuals during reproduction. Before replacement, all duplicate individuals
will be removed to avoid a take-over rate that is too fast . This is done by
assigning an arbitrarily low fitness to those duplicates. Each individual competes
with K randomly chosen from other individuals and scores a win if its fitness is
higher than its competitor. K is user defined and fixed at 10 . The number of
wins of each individual are recorded and ranked, when there is a tie in the
number of wins between the two different individuals. They are then re-ranked
by their fitness. Those whose final rankings are beyond the desired population
size will be eliminated.
4.3.2.6Shift Operator
When the individual with best fitness stagnates, which means it does not change
after certain generations ( the generations of stagnation), a small number of
shifts (all of the positions of the individual are moved in either
direction by the same bases) are tested for improvement of fitness. Based on the
gain of fitness, the smallest shift with positive gain will be chosen. If both
directions of the same shift number achieve improvement, the direction with
better gain is chosen. This moderate shift operator is to prevent a drastic shift
which may drag the solutions to local optima too fast before convergence.
57
4.3.3 Implementation
Both softwares were run with the same dataset, consisting of the upstream alpha
fragment region of Clamidomonas reindhardtii; Volvox Carterii ; Chorella sp
NC64A. It was necessary to keep those sequences as no other sequences had
upstream parts available.
To keep the parameters as the author suggested for GALF_P the population is
initialised by randomly generating the start position in each sequence. The
population size was set at 500 and the offspring 250, whilst for BioProspector
a background sequence for the whole genome of Clamidomonas Reinardhii and
40 time reinitializes options that try to avoid a local maximal were set. These
were ran 5 time with different motif width (8; 10; 12; 14; 16; ) to underline
the shared motif. Only 3 shared motifs were found, these occurred several times
showing a high motif score or fitness evaluation as shown in table3.
Sequences
Bioprospector + Bg
GALF_P
TGTTTT(N)C
Motif score: 1.559
Fitness : 3.344442
AA(N)CCTGCA(N)
Motif score: 1.446
Fitness : 2.900421
AATC(N)TGC(N)C
Motif score: 1.547
Fitness : 3.004174
Table 3: motif shared
58
4.4
Primer design
Once the sequences were found, a primer was design to characterise the ACCase
in Scenedesmus Protuberans. To reach this purpose, the first step consisted in
using the sequences among the homologous strains phylogenetically as closely
as possible, in order to find the sub-sequences highly conserved. Although few
sequences were available for this purpose, a script in python was written to use
the power of regular expression in order to calculate a potential sequences
primer. The script takes an input-file from Clustalw in multi-alignment format.
In this way a list of potential primers was obtained, derived from the length of
the alignment sequences, without mismatch, with a threshold of extension on
the temperature of annealing that had been set out with the formula (Tm = 2(A +
T) + 4(G + C) ). Also the percentage of GC
stop or start signal and the possible matches between the primer sequences were
proofed. The mismatch extension was allowed if the sequences did not reach the
temperature threshold. An example of output in (table 4) is shown below .
59
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Forward 5'-3'
Reverse 3'-5'
Primer design
ipotetical primer
lenght %GC Cmelting Pb conservative GAP
GTATCGGCAGCATCAATGG
19 52,63
58
15
GTCACTATTGATGTCTCATCT
21 38,09
58

ACGAGCGTTGAGCTCAACG
19 57,89
60

CTTCTTGTCTTCGTACGGTT
20
45
58

GTATCGGCAGCATCAATGG
19 52,63
58

CTTCTTGTCTTCGTACGGTT
20
45
58

GTGCGTTGCCACGAGCGT
18 66,66
60
13
CGCGAACCGACCCGCG
16 81,25
58
11
ATCGGCAGCATCAATGGCA
19 52,63
58
13
CCCACCGTGCGCGCCA
16 81,25
58
12
TAGCGATAATGAGAGCAGGG
20
50
60
12
GCCCTCATTCACTTGCGTG
19 57,89
60
13
TCGCCGCAACTTCGGCAT
18 61,11
58
12
GCTCCTCCTCTAGTTCGAC
19 57,89
60
12
TAACCCACTTGAGGAGCAC
19 52,63
58
12
CCAGGTTCCGACACATGC
18 61,11
58
11
CGCGTGCGTGGCCGCG
16
87,5
60
11
CCAACGTCGTACCACGCG
18 66,66
60
14
TAAGCGCATCAAGGAGGTG
19 52,63
58
13
CACTGTGCTGACAGCTGTT
19 52,63
58
14
ATGCTCCCTGTGGGCACA
18 61,11
58
12
TGGGTGCTGGGCGGGC
16 81,25
58
11
TGACGAAGACTCAGATTGTAT
21 38,09
58
14
CGACCGTCCCACGGCTC
17 76,47
60
13
GCGTCTGCAAGTCGCTCG
18 66,66
60
14
CCGCCTCGTTATTGCTTAC
19 52,63
58
13
TTTCGCGCCAAAAGCTATCT
20
45
58
13
GGGGGCGGTCGTCGTTG
17 76,47
60
13
CCCACTTGAGGAGCACCT
18 61,11
58
12
GTAACGGGTTCTCGTCTATG
20
50
60
13
CACCACGACGCTTGAGTTT
19 52,63
58
14
GCGAAAGTACCGACCGCG
18 66,66
60
12
TGGATCTGGAGCAATTGGC
19 52,63
58
15
GGCCGAGCGGTCGCGG
16
87,5
60
12
AGCACCAGCTGGCGGGA
17 70,58
58
13
TTGCCGCGGCAGCAGTTG
18 66,66
60
12
fragment
3
1
3
1
5
1
0
1
2
1
1
0
0
0
0
1
1
0
2
1
1
0
1
1
0
2
1
1
0
0
2
2
2
1
2
1
4002
4002
4279
4279
3889
3889
3920
3920
2950
2950
2442
2442
1796
1796
4794
4794
4441
4441
4229
4229
4152
4152
3911
3911
3549
3549
2776
2776
4697
4697
4008
4008
3615
3615
3030
3030
position alligment whit

772 cyclotella Cryptca
4777
382
4661
772
4661
372 O.Tauri
4292
774
3724
853
3295
1391
3187
24
4821 O.Tauri2
243 O.Tauri2
4684
356 O.Tauri2
4585
402 O.Tauri2
4554
558 O.Tauri2
4469
647 O.Tauri2
4196
1170 O.Tauri2
3945
27 O.Tauri3
4724
314
4322
536
4151
930
3960
strat/ stop
ATG(16)
stop*
TGA(9)
ATG(14)
ATG(16)
ATG(14)
N. start
N.stop
ATG(14)
N.stop
start/ stop*
TGA(11);TAA(7)
N. start
N.stop
TGA(10);TAA(1)
N.stop
N. start
ATG(10)
TAA(1)
TGA(2)
ATG(0)
N.stop
TGA(1)
N. start
ATG(17);TAA(11)
N. start
N.stop
TGA(6)
N.stop
TGA(13)
ATG(8)
N. start
N.stop
N. start
N.stop
Table 4: example of output
Unfortunately the alignments did not give reliable primers because the
alignments were not
specific enough to design a primer from them,
consequently protein sequences were tried in order to find a quality alignment

and retrieve from it a cDNA sequence. However this strategies did not give good
results. The last step was to scan the EST database using BLAST which gave
60
some sequences, one of which was from Scenedesmus Obliquos .

see table 2 and the alignment below :
61
62
63
From that alignments it was possible to design some perfect matches and
degenerate primer sequences as shown below :
SceneForward 5'ACCTGCCTGGACATCATCCTNAACATCAC 3' Tm=64 GC=51,7 no dimer
Scene Reverse 5'TACCGGAGCGGGACCGGGTCGA 3'
Tm=68 GC=~72 no dimer
64
reindhardtii Forward 5' ATCGGCCACCAGAAGGGC 3' Tm= 62 GC=66.6 no dimer

reindhardtii Reverse 5' CGTGGCGCATGAAGCGCA 3' Tm= 60 GC=66.6 no dimer
4.5 Experimental procedure

Organism Scenedesmus Protuberans was obtained from (Ege Biotecnology), to
grow the algae two culture mediums based upon Provasoli and Guillard (f/2)
recipes were prepared, that were used with a range of strains ( Chlamidomonas
65
moewusii; Cyclotella nana
and Detonula confervacea )
the recipe and
component are in the table below.

Provasoli and guillard Recipe
Quantity
Compound
500 mL
Filtered Seawater
75 mg
NaNO3
8.83 x 10-4 M
5 mg
NaH PO4 H2O
3.63 x 10-5 M
1 mL
Tracemetal solution
---
0.6 mL
Vitamin solution
----
Trace metal solution
Molar
concetration in
final Medium
(50 mL)
1575 mg
FeCl3 . 6 H2O
1 x 10-5 M
2180 mg
Na EDTA . 2 H2O
1 x 10-5 M
0.49 mg
Cu SO4 . 5 H2O
4 x 10-8 M
0.315 mg
Na Mo O4. 2 H2O
3 x 10-8 M
1.1 mg
Zn SO4. 7 H2O
8 x 10-8 M
0.5 mg
Co Cl2 . 6 H2O
5 x 10-8 M
5 mg
Mn Cl2 . 4 H2O
9 x 10-9 M
Vitamin solution
2.5 l
Biotin
2 x 10-9 M
This recipe was autoclavate and stored in refrigerator, NA2 SiO3 . 9 H2O were
omitted as the author suggested to avoid silica precipitation form, in the same
way the second medium was prepared and glycerol added.
Unfortunately the culture from Ege Biotecnology was contaminated resulting in
66
the algae growing rate being too slow. To eliminate this unknown contamination
the antibiotic as Spectinoycin was added, however this was not successful. An
extraction of DNA by (Qiagen KIT ) was attempted from the culture but as was
predicted beforehand the algae DNA was not obtained as shown in the photo of
electrophoresys analysis below.
67
To carry out the experiment a new sample was ordered from (EBILTEM Ege
University ) this time the algae was already grown so that it was possible to
directly extracted a DNA by a standard phenol- chloroform extraction obtaining
a pure and large amount of DNA for each sample (ng/ul 1 : 163,28 ; 2 : 158,49 ;
3 : 214,47 ) , as shown in the photo of electrophoresys gel.
68
Using this sample two different PCR analysis were obtained which identified
this sequence.
69
The first one, PCR analysis, was conducted with the following thermal cycle,
initial denaturation 94 C for 2 min, denaturation 94 C for 20 sec , Annealing
61,5 C 10 sec, extention 72 C 15 sec final extention 72C 5 min. The cycle was
repeat 30 times. According to primer temperature and the length of the expected
product
The second one, was carried out with a gradient PCR and a different primer
reverse (Rcre), in this way it was possible to optimised the melting temperature
and avoid unspecific product. The reaction was conducted with 8 samples at
different temperatures, calculated from the average annealing temperature 58.5
C as show in the picture :
70
The next analysis was used to check if a digestion by restriction enzyme gave
the expect fragment by gel purification of the last 3 upper bands from the last
gradient PCR done. A SphI was used for this in accordance with the current
71
assets of the laboratory and the restriction site on the map, having just one on
the sequence. Unfortunately no negative control was possible due to the lack of
enzyme
5 Result and future perspective

Biodiesel produced from sea algae would create an alternative fuel source
without the necessity to displace any land currently used for the production of
food, it would also require the creation of many new jobs in the alga-culture
industry. For theses reasons algae was chosen as the biological system for this
project. Due to the fact that research has not yet fully defined algae the projects
first consideration was to create a database for this information. The available
data was used in an attempt to identify the Transcription Factor Binding Site
(TFBSs). These short sequences have a fragment length usually 8-15 bp in the
cis-regulatory region and can regulate gene expression by the interaction of the
Transcription Factor (TFs). These are usually 100-300 bp upstream region of the
transcriptional start site of the gene. In higher eukaryotes they can be found
upstream, downstream or in the introns of the genes that they regulate.
Furthermore, they can be close or far away from the regulated genes. TFBSs are
a crucial component for gene regulation that affects the Transcription Processes
72
and the final phenotypes of an organism. Typically, when certain TFs bind to the
TFBS in the promoter region of the corresponding sequence, the transcriptional
process is signalled and initiated. On the other hand, when other competing
molecules interacted with the binding site the transcription factor fails to bind
and the transcriptional process, in the worst case scenario, was inhibited The
molecule and hence the TFBS can cause modulation in the transcriptional
process which in turn produces more or less mRNA. Therefore it was necessary
to identify the TFBSs sequence to be able to decipher the mechanism of gene
regulation. Based upon this a hypothesis was made to prove that algae with a
high fatty acid content would have stronger promoters for ACCase genes.
Therefore it was necessary to be able to indentify a sequence which conserved
the common factors for those strains which produce a significant quantity of
fatty acids that might be lacking in other strains. Unfortunately due to the
limitations of the genomic data and a lack of quantified data in respect of the
quantity of fatty acids in algae currently available it was not possible to carry out
this study. Instead the upper part of the alpha fragment homologous predicted
gene was used to search for the generic conservative sequence was shared. To
explore this field a bio-informatics approach was used; although a biological
experiment method such as DNA foot printing and gel electrophoresis would
have been more reliable and accurate to identify TFBSs, this had the
disadvantage of being very expensive and time consuming. Consequently two
73
computational methods were proposed to identify, from scratch, TFBS.

TFBSs are regions which can be considered to be significantly conserved and
hence likely to posses a regulatory function. The simplest strategy is to single
out the most conserved part of the alignments according to the identity
percentage: whilst a non-coding region highly conserved can be reasonably
considered to possess a functional role. However, one problem is in knowing
how much of the conserved region should be considered significant due to the
rate of substitution inside the species. Moreover, sequence alignment methods
only provide a small insight into the problem, because, although similar
fragments may be aligned together and shared the TFBS sequences are not
necessarily proximal.
Various methodologies were used but an exhaustive search method was
prohibited due to the exponential growth of the computation with respect to the
increasing problem size. In this study novel algorithms were used, (as mentioned
in the computational paragraphs) in an attempt to find out significant motifs
from the selected dataset of (ACCase) genes. Three shared motifs were found
and these occurred several times with a high motif score or fitness evaluation, as
shown in Table 4
However, it should be noted that from the results obtained the shared sequences
were found between the fragments in 8 and 10 widths and that no shared
74
fragments were found in subsequent tests. Furthermore, there were some

significant motifs not shared by both of them and these showed a pattern of the
conservative region between -719 and -793 bp upstream of the encoded region of
Clamidomonas Reinardhii whilst in the other strains this localisation was not
found to be present.
Sequences
position
Bioprospector + Bg
TGTTTT(N)C
-793 C.r. Motif score: 1.559
Fitness : 3.344442
AA(N)CCTGCA(N) -731 C.r Motif score: 1.446
Fitness : 2.900421
AATC(N)TGC(N)C
Fitness : 3.004174
-969 C.r Motif score: 1.547
GALF_P
Table 4: motif shared position referred at Clamidomonas Reinardtii
At this time it is not possible to make any firm conclusions regarding the TBFS
factor but it would not be unreasonable to consider this a sound proposal for
further investigative research.
On the other hand, in the experimental phase using the information collect
allowed for the design primers in order to identify the gene from Scenedesmus
Protuberans, a green algae already discussed in a previous study, which has the
ability to store a large amount of fatty acids in its cells. The degenerated primers
sequence was determined, using Clustalw by means of alignment of the
sequences of Scenedescmus Obliqus and Chlamidomonas Reinhatii in the Alpha
part. These primers gave the possibility to search for the gene fragment in the
75
strain in question but still the results from the analysis remained ambiguous
because the unspecific PCR products were found not even with restriction
analysis of the sequence and therefore it was not possible to recognise the
fragment that was searched for. It is only possible to determine with some
certainty the gene fragments identity by sequencing of the fragments found. At
the present moment these results are not yet available.
Bibliography
1: Agarwal A.K. Biodiesel development and characterization for use as biofuel J.Eng 2001
2: Durrett, T., Benning, C. and Ohlrogge, J, Plant triacylglycerols as feedstocks for the
production of biofuels., 2008
3: National Biodiesel bord, National Biodiesel Board, 2008
4: Exposy, exposy news 2007, 2007
5: Martinot, Eric, Renewables 2007 Global status Report, 2008
6: National Biodiesel Board, Statistic. the EU biodiesel industry, 2008
7: Biopower london, Biodiesel to drive up the price of the coosking oil, 2006
8: Herer, Jack, the Emperor Wears no Clothes, 1985
9: Klass, Donald ., Biomass for renewable Energy Fuels, and Chemicals.,
10: Kitani, Osamu, Energy and Biomass Engineering, 1999
11: Enviromental news, Biofuel some numbers,
12: Purdue, Purdue report,
13: American fuel, Biodiesel Yield even Higher Energy Balance,
14: United stated Department of Energy, Biodiesel just the basic, 2007
15: Australia Brodcasting Corporation, Biofuel Demend markes fried food expensive in
indonesia,
16: Lester Brown, How Food and Fuel Compete for land,
17: Falkowski, P.G. and Raven, J.A, Aquatic Photosynthesis, 1997
18: Van den Hoek, C., Mann, D.G. and Jahns, H.M., Algae: An Introduction to Phycology,
1995
19: Guschina and Harwood,, Lipids and lipid metabolism in eukaryotic algae, 2006
20: Thompson, G.A., Lipids and membrane function in green algae., 1996
21: Wada and Murata,, Membrane lipids in cyanobacteria. In Lipids in Photosynthesis:
Structure,, 1998
22: Ben-Amotz, A., Shaish, A. and Avron, M. , Mode of action of the massively accumulated
-carotene of Dunaliella bardawil in protecting the alga against damage by excess irradiation.,
23: Bigogno, C., Khozin-Goldberg, I., Boussiba, S., Vonshak, A. and Cohen, Z, Lipid and
fatty acid composition of the green oleaginous alga Parietochloris incisa, the richest plant
source of arachidonic aci, 1971
24: metzger and Largeau, Botryococcus braunii: a rich source for hydrocarbons and related
ether lipids, 2005
25: Benemann, J.R., Pursoff, P. and Oswald, W.J, Microalgae as a Source of Liquid Fuels,
1982
26: Borowitzka, M, Fats, oils and hydrocarbons. In Microalgal Biotechnology , 1988
27: Hu, Q., Zhang, C.W. and Sommerfeld, M, Biodiesel from Algae: Lessons Learned Over
the Past 60 Years and Future Perspectives, 2006
28: Sheehan, J., Dunahay, T., Benemann, J. and Roessler, P.G., A Look Back at the US
Department of Energy's Aquatic Species Program Biodiesel from Algae, Close Out Report ,
1998
29: Ohlrogge and Browse, , 1995
30: Cobelas, M.A. and Lechado, J.Z., ,
31: Basova, M.M, Fatty acid composition of lipids in microalgae, 2005
32: Parker, P.L., van Baalen, C. and Maurer, L., , 1967
33: Bigogno, C., Khozin-Goldberg, I., Boussiba, S., Vonshak, A. and Cohen, Z, A.R, , 1971
34: De Swaaf, M.E., de Rijk, T.C., Eggink, G. and Sijtsma, L., Optimisation of
docosahexaenoic acid production in batch cultivation by Crypthecodinium cohnii.,

35: Knothe, G., , 2005
36: Baud, S., Wuillme, S., Dubreucq, B., de Almeida, A., Vuagnat, C., Lepiniec, L., Miquel,
M. and Rochat, C, Function of plastidial pyruvate kinases in seeds of Arabidopsis thaliana,
2007
37: Andre, C., Froehlich, J.E., Moll, M.R. and Benning, C., A heteromeric plastidic pyruvate
kinase complex involved in seed oil biosynthesis in Arabidopsis, 2007
38: Goodridge AG, Fatty acid synthesis in eucaryotes, 1985
39: von Meyenburg K, Jorgensen B, Deurs BV, Physiological and morphological effects of
overproduction of membrane-bound ATP synthase in Escherchia col, 1984
40: Dudley J, Lambert RJ, de la Roche IA., Genetic analysis of crosses among corn strains
divergently selected for percent oil and protein, 1977
41: Eastwell KC, Stumpf PK, Regulation of plant acetyl-CoA carboxylase by adenylate
nucleotides, 1983
42: Nikolau B, Choi J-K, Guan X, Ke J, McKean AL, et al, , 1996
43: Jaworski JG, Post-Beittenmiller D, Ohlrogge JB. 1993, Acetyl-acyl carrier protein is not a
major intermediate in fatty acid biosynthesis in spinach, 1993
44: Soll J, Roughan G., Acyl-acyl carrier protein pool sizes during steady-state fatty acid
synthesis by isolated spinach chloroplasts, 189-192
45: Post-Beittenmiller D, Jaworski JG, Ohlrogge JB., In vivo pools of free and acylated acyl
carrier proteins in spinach: evidence for sites of regulation of fatty acid biosynthesis, 1991
46: Post-Beittenmiller D, Roughan G, Ohlrogge JB., Regulation of plant fatty acid
biosynthesis: analysis of acyl-CoA and acyl-acyl carrier protein substrate pools in spinach and
pea chloroplasts., 1992
47: 73. Post-Beittenmiller D, Jaworski JG, Ohlrogge JB. , Probing regulation of lipid
biosynthesis in oilseeds by the analysis of the in vivo acyl-ACP pools during seed
development. In Seed Oils for the Future,, 1993
49: Browse J, Roughan PG, Slack CR. , Light control of fatty acid synthesis and diurnal
fluctuations of fatty acid composition in leaves, 1981
50: Page RA, Okada S, Harwood JL., Acetyl-CoA carboxylase exerts strong flux control over
lipid synthesis in plants, 1994
51: Kascer H, Porteous JW., Control of metabolism: what do we have to measure?, 1987
52: Rasmussen JT, Rosendal J, Knudsen J., Interaction of acyl-CoA binding protein (ACBP)
on processes for which acyl-CoA is a substrate, product or inhibitor, 1993
53: Knudsen J, Faergeman NJ, Skott H, Hummel R, Borsting C, et al., Yeast acyl-CoAbinding protein: acyl-CoAbinding affinity and effect on intracellular acyl-CoA pool size,
1994
54: Hills MJ, Dann R, Lydiate D, Sharpe A, Molecular cloning of a cDNA from Brassica
napus L. for a homologue of acyl-CoA-binding protein, 1994
55: Roesler KR, Savage LJ, Shintani DK, Shorrosh BS, Ohlrogge JB., Co-purification, coimmunoprecipitation, and coordinate expression of acetyl-coenzyme A carboxylase activity,
biotin carboxylase, and biotin carboxyl carrier protein of higher plants, 1996
57: Hu, Q., Environmental effects on cell composition. In Handbook of Microalgal Culture
(Richmond, A., ed., 2004
58: Basova, M.M, Fatty acid composition of lipids in microalgae., 2005
59: Roessler, P.G., Changes in the activities of various lipid and carbohydrate biosynthetic
enzymes in the diatom Cyclotella cryptica in response to silicon deficiency, 1988
60: Khozin-Goldberg, I. and Cohen, Z, he effect of phosphate starvation on the lipid and fatty
acid composition of the fresh water eustigmatophyte Monodus subterraneus., 2006
61: Reitan, K.I., Rainuzzo, J.R. and Olsen, Y., , 1194
62: Sato, N., Hagio, M., Wada, H. and Tsuzuki, M. , Environmental effects on acidic lipids of
thylakoid membranes. In Recent Advances in the Biochemistry of Plant Lipids, 2000
63: Saha, S.K., Uma, L. and Subramanian, G, Nitrogen stress induced changes in the marine
cyanobacterium Oscillatoria willei BDU 130511, 2003
64: Renaud, S.M., Thinh, L.V., Lambrinidis, G. and Parry, D.L., Effect of temperature on
growth, chemical composition and fatty acid composition of tropical Australian microalgae
grown in batch cultures., 2002
65: Somerville, C., Direct tests of the role of membrane lipid composition in low-temperatureinduced photoinhibition and chilling sensitivity in plants and cyanbacteria., 1995
66: Aaronson, S, Effect of incubation temperature on the macromolecular and lipid content of
the phytoflagellate Ochromonas danica,
67: Boussiba, S., Vonshak, A., Cohen, Z., Avissar, Y. and Richmond, A., Lipid and biomass
production by the halotolerant microalga Nanochloropsis salina. Biomass, , 1987
68: Patterson, G., Effect of temperature on fatty acid composition of Chlorella sorokiniana.
Lipids, ,
69: Falkowski, P.G. and Owens,, Lightshade adaptation: two strategies in marine
phytoplankton., 1980
70: Khotimchenko, S.V. and Yakovleva, I.M. , Lipid composition of the red alga Tichocarpus
crinitus exposed to different levels of photon irradiance., 2005
71: Fabregas, J., Maseda, A., Dominquez, A. and Otero, A. , ) The cell composition of
Nannochloropsis sp. changes under different irradiances in semicontinuous culture, 2004
72: Sukenik, A., Yamaguchi, Y. and Livne, A, .Alterations in lipid molecular species of the
marine eustigmatophyte Nannochloropsis sp. J., 1993
73: Tonon, T., Larson, T.R. and Graham, I.A. , Long chain polyunsaturated fatty acid
production and partitioning to triacylglycerols in four microalgae., 2002
74: Alonso, D.L., Belarbi, E.H., Fernandez-Sevilla, J.M., Rodriguez-Ruiz, J. and Grima, E.M.,
Acyl lipid composition variation related to culture age and nitrogen concentration in
continuous culture of the microalga Phaeodactylum tricornutum. , 2000
75: Liang, Y., Beardall, J. and Heraud, P., Changes in growth, chlorophyll fluorescence and
fatty acid composition with culture age in batch cultures of Phaeodactylum tricornutum and
Chaetoceros muelleri (Bacillariophyceae). Bot. Mar. , 2006
76: Zhekisheva, M., Boussiba, S., Khozin-Goldberg, I., Zarka, A. and Cohen, Z. ,
Accumulation of oleic acid in Haematococcus pluvialis (Chlorophyceae) under nitrogen
starvation or high light is correlated with that of astaxanthin esters. , 2002
77: Merchant, S.S., Prochnik, S.E., Vallon, O. et al. , The Chlamydomonas genome reveals
the evolution of key animal and plant functions., 2007
78: Mus, F., Dubini, A., Seibert, M., Posewitz, M.C. and Grossman, A.R., Anaerobic
adaptation in Chlamydomonas reinhardtii: anoxic gene expression, hydrogenase induction and
metabolic pathways, 2007
79: Blling, C. and Fiezhn, O. , Metabolite profiling of Chlamydomonas reinhardtii under
nutrient deprivation. , 2005
80: Ghirardi, M.L., Posewitz, M.C., Maness, P.C., Dubini, A., Yu, J. and Seibert, M.,
Hydrogenases and hydrogen photoproduction in oxygenic photosynthetic organisms. , 2007
81: Stauber, E.J. and Hippler, M., Chlamydomonas reinhardtii proteomics. , 2004
82:Roessler, P.G. and Ohlrogge, J.B. Cloning and characterization of the gene that encodes
acetyl-coenzymeA carboxylase in the alga Cyclotella cryptica. J. Biol. Chem.1993
83:Livne, A. and Sukenik, A. (1990) Acetyl coenzyme A carboxylase from the marine
Prymnesiophyte Isochrysis galbana. Plant Cell Physiol. 31, 851858.
84: Giulio,P. Federico Z. and Graziano P. algorithms for finding conserved regulatory motif
and region in homologous sequences. BMC Bioinformatics 2007
85: Bioprospector http://ai.stanford.edu/~xsliu/BioProspector/
86:Galf_P http://appsrv.cse.cuhk.edu.hk/~tmchan/GALFP/
Acknowledgements
First of all, I would like to thank at my supervisor prof. dr. Ugur Sezerman
accepted me as one of his students and for his precious comments and remarks
for all the long talks and urgent e-mails and document too.
Thanks also to the other phD students I have met at Sabancy University: Alper
kucukural and Gunseli Akcapinar
Thanks to all the erasmus students who have invited me to parties and lunches
and thanks to Paolo that shared with me this experience. Especially thanks to
Ceyda and Eda for the fun we had together for the free english lessons and for
having made me feel like at home. Thanks also to the internet for all the free
calls to home.
Finally, many thanks to my family and Anna that keep supporting me all the
time.

Biofuel Optim

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Biofuel Optim

Transféré par

Droits d'auteur :

Formats disponibles

ALMA MATER STUDIORUM

Optimising Biofuel production

ANNO ACCADEMICO 2007/2008

per I miei genitori

Photosynthetic organisms, including plants, algae, and some photosynthetic

relatively unknown with reference to algae. Synthesis and sequestration of TAG

tools to seek for the related conserved regulatory motif,

Produced by the trans-esterification of triaglycerides with methanol, and

oil and soybean oil(7) .

vegetable constantly challenge bio-fuels reliability on industrial scale. Some

Coconut: 353 dm3 (11)

Peanut: 138 dm3 (11)

Sunflower: 126 dm3 (11)

Algae represent an extremely diverse, yet highly specialized group of organism

brown algae (Phaeophyceae)

Hydrocarbons are another type of neutral lipid

suitable for the conventional agriculture

feedstock for fuel

Fatty acid composition

Euglenophyceae, C20:5, C22:5 and C22:6 in Chrysophyceae, C18:33, 18:4 and

Crypthecodinium cohnii (34), the very-long-chain fatty acids arachidonic acid

Biosynthesis of Fatty acids and triaciyglycerols

chloroplast, photosynthesis provides an endogenous source of acetyl CoA, and

protein (ACP; Figure 2)

2.4 Regulation of fatty acid Synthesis

stage of development, rate of growth and

surrounding factor as stress or nutrient deficiency[48], and therefore rates of

a pathway. One approaches depends on examination of the in vitro properties of

Knowledge in other species

evidence supported this suggestion: Acetate or pyruvate were incorporated into

advantage of the susceptibility of maize and barley plastid Acetyl-CoA

2.6 Feedback regulation

Most biochemical pathways are controlled in part by a feedback mechanism

are also slowed as their substrates become depleted by mass-action. Because

2.7 What Controls Promoter Activity of FAS Genes?

transcription factors that may bind to these elements in different organism. In

genetic approaches may allow identification of

3 State of the art

3.1 Comparison of lipid metabolism in algae and higher

3.2 Factor affecting tryacilglycerolipids accumulation

accumulation of lipids, particularly TAG, in response to nitrogen deficiency has

(Eustigmatophyceae) (60) P.tricornutum and Chaetocerossp. (Bacillariophyceae),

aeruginosa, Oscillatoria rubescens and Spirulina platensis, and reported that

increases with increasing temperature. In contrast, no significant change in the

3.2.3 Light intensity

3.2.4 Growth phase and Physiological status

docosahexaenoic acid (22:63) and eicosapentaenoic acid (20:53) are

3.2.5 Physiological roles of triacylglycerol

3.3 Algae genomic and proposed model system in biofuel

Some eukaryotes genome have been sequenced. These eukaryotes include C.

lipid metabolism when Chlamydomonas is exposed to nutrient stress has yet

analyzed by gas chromatography coupled to time-of-flight mass spectrometry,

produce other energy-rich products such as butanol.

3.4 Acetil CoA carboxylase protein and genetic

Kischimoto et. al.1994

Tornabene et al. 1998

Metzger et al. 2005

Kyle DJ, et al. 1991

tree by neighbour-joining, align the multiple sequences sequentially guided by

position specific weight matrices, provided, for

4.3.1.1Scoring segment with background Markov

alignment position is sampled with probability proportional to Ax. Here Qx and

4.3.1.2Using motif score distribution to measure goodness of