Vous êtes sur la page 1sur 4

Mammalian Genome 10, 638641 (1999).

Incorporating Mouse Genome

Springer-Verlag New York Inc. 1999

Complete nucleotide sequence, genomic organization, and promoter


analysis of the murine survival motor neuron gene (Smn)
Christine J. DiDonato, Thierry Brun, Louise R. Simard
Centre de Recherche, Hopital Ste-Justine, 3175 Cote Ste-Catherine, Montreal, Quebec, Canada H3T 1C5
Received: 11 November 1998 / Accepted: 14 January 1999

In humans, loss or mutation of the Survival Motor Neuron (SMN)


gene is responsible for proximal spinal muscular atrophy (SMA),
the second most common autosomal recessive disease of childhood after cystic fibrosis. This lethal neuropathy affects 1/10,000
live-born children. It is characterized by degeneration of the -motor neurons in the spinal cord, which causes proximal, symmetrical
limb and trunk muscle weakness that progresses to paralysis. Since
SMA is clinically heterogeneous, patients are classified on the
basis of age at onset and disease severity (Munsat and Davies
1992). Type I SMA children are the most severely affected. They
have onset of symptoms prior to 6 months, are never able to sit,
and rarely live beyond 2 years of age. Type II and III SMA are
milder forms and show onset of symptoms between 6 months and
17 years.
Because of a 500-kb inverted duplication on Chr 5q13, the
SMN gene is present in two copies designated centromeric SMN
(cenSMN) and telomeric SMN (telSMN; Lefebvre et al. 1995).
The genes span 30 kb, are highly homologous, and their ubiquitous transcripts differ by only four nucleotides, which are silent
polymorphisms (Lefebvre et al. 1995; Burglen et al. 1996; Brahe
et al. 1996; Hahnen and Wirth 1996; Chen et al. 1998). Individuals
lacking cenSMN are normal, whereas SMA patients have no detectable telSMN (>95%) or small intragenic mutations (Lefebvre
et al. 1995; Bussaglia et al. 1995).
Analysis of mRNA indicated that the human SMN genes are
alternatively spliced and that telSMN produces three times the
amount of full-length transcript (90%) compared with cenSMN
(30%; Lefebvre et al. 1995; Gennarelli et al. 1995). SMN protein
is found in both the nucleus and cytoplasm and is present at high
levels in brain, kidney, and liver in normal tissues (Liu and Dreyfuss 1996; Coovert et al. 1997; Lefebvre et al. 1997; Francis et al.
1998). There is only one copy of the SMN gene in rodents. The
absence of alternative splicing in the mouse and rat suggests that
only the product for the full-length SMN transcript is required for
normal development (DiDonato et al. 1997; Viollet et al. 1997;
Battaglia et al. 1997). Recently, SMN protein was shown to be
essential for spliceosomal snRNP biogenesis (Liu et al. 1997;
Fischer et al. 1997) and, consistent with this housekeeping function, complete loss of Smn leads to embryonic death prior to uterine implantation (Schrank et al. 1997). Despite these advances, it
is not clear why defects in telSMN specifically affect motor neurons. One might speculate that motor neurons have a high requirement for full-length SMN protein, which is not met when the
telSMN locus is disrupted. Alternatively, SMN may possess an
additional function specific to motor neurons, and loss of this
function causes SMA. To address these questions in vivo, it will be
necessary to produce viable mice that harbor hypomorphic Smn
alleles and/or Smn alleles that can be disrupted conditionally. To
facilitate this process, we present the entire nucleotide sequence,
genomic organization, a panel of unique probes that span the Smn

Correspondence to: L.R. Simard

gene, as well as our gene targeting experience at the Smn locus.


We have also analyzed the 5 region of Smn and show that it
contains a functional promoter.
The Smn locus and flanking regions were completely sequenced by a directed subcloning approach that was supplemented
by PCR products amplified from 129/SvJ genomic DNA. Previously characterized BAC clones, 411M1 and 227N6 (DiDonato et
al. 1997), were used as source DNA to create BamHI, EcoRI, PstI,
and Sa1I subclone libraries in pBSPT KS+ (Stratagene, Inc., La
Jolla, Calif.). Additionally, a partial Sau3A library cloned into the
BamHI site of pBSPT KS+ with an average insert size of 1.2 kb
was constructed from a 13-kb EcoRI fragment that contained
Smn exons 2a8. Oligonucleotide hybridization was used to identify subclones containing the Smn locus and flanking sequence
(Fig. 1). The sequence from these subclones was obtained by dyeprimer cycle sequencing with double-stranded DNA as template
on a Li-Cor automated sequencer. Manual sequencing, with a
cycle sequencing kit (Gibco BRL, Burlington, ON), was also performed on all exon/intron boundaries and for areas that showed
multiple base ambiguities (>10/100 bp). Nucleotide sequence was
edited manually and assembled with the fragment assembly program in the GCG package. More than 42% of the Smn gene was
sequenced at least twice, and 77% of this was confirmed on both
strands. When large numbers of base ambiguities (>10/100 bp)
occurred, they tended to be either at the beginning or end of
sequencing runs. In these instances, at least three sequence runs
were compared.
The assembled genomic sequence of Smn (14,527 bp) was
analyzed with a variety of computer programs. This allowed us to
determine the genomic organization of the locus and to annotate
the sequence with putative transcription factor binding sites, as
well as simple and complex repetitive elements. We have deposited this sequence into GenBank under accession number U92641.
The Smn locus comprises nine exons and spans a distance of 13
kb as determined by comparison with the cDNA. The introns in
Smn vary in size from 208 bp to 3216 bp. Table 1 compares the
murine and human SMN loci and indicates exon size, intervening
sequence distances, and exon/intron borders. The murine locus is
approximately half the size of its human counterpart, mostly owing
to the large reduction in size of intron 1 as well as intron 2a.
Otherwise, the two genes are similar. The sequences flanking all of
the exon/intron boundaries conform to the consensus AG/GT
splice donor and acceptor sites (Breathnach et al. 1978). Furthermore, since all exons are in phase 0, exon skipping could occur
without the translational frame being altered.
Analysis of the entire Smn gene with RepeatMasker (AFA
Smit and P. Green, unpublished results) revealed that 33% of this
sequence contains simple and complex repetitive elements. Three
simple sequence repeats are located in introns 2a, 2b, and 5 respectively, while complex repetitive elements cluster in introns 4,
6, and 7 (Fig. 1). BLAST searches against the non-redundant database using each intron identified high sequence similarity to the
A-kinase anchoring protein 220 (AKAP) gene (Lester et al. 1996)

C.J. DiDonato et al.: Murine survival motor neuron gene

639

Fig. 1. Schematic diagram of the Smn locus.


EcoRI, BamHI, PstI, and XhoI restriction sites are
shown. All repetitive elements identified by
RepeatMasker are indicted except three low
complexity repeats (5UTR region, intron 6, and
intron 7) and two LTR elements located in introns
4 and 7 respectively. Full details can be found
under GenBank accession No. U92641. Unique
probes that span the Smn gene are shown under
the exons (probes AI). Probe A corresponds to
exons 2a4, probe B to exons 46, probe C to
nucleotides 12011401, probe D to a 250-bp
EcoRI fragment (19202166), probe E to
nucleotides 30343600, probe F to nucleotides
42514974, probe G to nucleotides 86719000,
probe H to nucleotides 12,071122251, and probe
I to nucleotides 13,77413,920. The area of
homology contained within two gene-targeting
vectors is indicated above the exons.
Table 1. Comparison of mouse/human exon-intron boundaries in Smn.

Exon
1
2a
2b
3
4
5
6
7
8

M
H
M
H
M
H
M
H
M
H
M
H
M
H
M
H
M
H

3 Splice Junction

Exon Size
(bp)

Nucleotide Nos.

Amino Acids

Amino Acid
No.

5 Splice Junction

Intron Size
(bp)

tttctccctcttcag/AGT
tcttaccctttccag/AGC
ttactgttttcatag/CAT
ttcctattttcgtag/CAT
tcttytcttttgtag/TGG
tgatttcttttgtag/TGG
cttttctttttaaag/AAT
tttttctttttaaag/AAT
aaatattctttatag/CCA
aaatattccttatag/CCA
tttttccatttgcag/ATA
tttttctgtctccag/ATA
atgctctctttacag/GGT
ttattttccttacag/GGT
ttcttwcaaattcag/gtt
atttctcatttgcag/gaa

72
81
72
72
120
120
201
201
148
153
96
96
111
111
53
56
338
570

1,3331,404
121,152121,265
3,0243,095
107,423107,494
3,3073,426
104,826104,945
4,2444,444
103,776103,976
4,6534,799
103,464103,616
7,5207,615
101,584101,679
8,7998,909
100,161100,271
12,12612,177
94,34494,397
13,81414,154
93,32893,899

124
127
2548
2851
4988
5291
89115
92158
156204
1592209
205236
210241
237273
242278
274288
279294

24
27
24
24
40
40
67
67
49
51
32
32
37
37
15*
16*

CAG/gtaaggtcgcggctc
CAG/gtgaggtcgcagcca
AAG/gtatgaaatggttaa
AAG/gtatgaaatgcttgn
CAG/gtgatagattgattt
CAG/gttattttaaaatgt
GAG/gtaggccccgaaaga
GAG/gtaaggatacaaaaa
AAG/gtaaatgttctgttg
AAG/gtaaaccttctatga
CCA/gtaagtacacaagag
CCA/gtaagtaaaaaagag
ATG/gtaaggactccctgg
ATG/gtaagtaatcactca
gaa/gtaagtctgtcattt
gga/gtaagtctgccagca

1,619
13,841
211
2,667
817
1,168
208
511
2,720
2,031
1,183
1,517
3,216
5,926
1,636
1,068

Note: Nucleotide numbering for the human sequence corresponds to Chen et al. (1998). The asterisks indicate a translational stop codon.

in introns 4 and 6. However, it is always the same cDNA sequence


and most likely represents an AKAP pseudogene. Using this information, coupled with standard techniques for identifying single
copy sequences, we developed a panel of unique probes spanning
the entire gene that can be used in Southern blot hybridization
(Fig. 1).
A total of 1.3 kb of sequence 5 to the Smn translational start
site was obtained. Within this sequence there are 2 SINE elements,
1 LINE element, and a low complexity repeat (Fig. 2). After masking of repetitive elements, the sequence was analyzed with TESS
(transcription element search software; Schug and Overton 1997).
The non-repetitive sequence lacks consensus TATA and CAAT
boxes, typical of housekeeping genes. However, there are 2 Sp1
binding sites, which may play a role in transcriptional activation
and regulation (Pugh and Tijan 1991). To determine whether this
region contained a functional promoter, we ligated the full-length
1.3-kb fragment and a series of 5 deletion fragments upstream of
the promoterless chloramphenicol acetyl transferase (CAT) reporter gene in pBLCAT6 (Boshart et al. 1992). Transient transfections into HEPG2 cells, a human hepatoblastoma cell line, indicated that there was no statistically significant difference in pro-

moter activity between the deletion fragments, 1020/CAT6 and


455/CAT6, as compared with the full-length construct, 1332/
CAT6 (p > 0.02 and p > 0.1, respectively). These constructs were
approximately two times more active than the positive control
vector, pBLCAT5 (Boshart et al. 1992), which contains the CAT
reporter gene driven by the HSV thymidine kinase (TK) promoter.
We also placed the 1332 to 455 region, which contains the
LINE and two SINE elements, upstream of the TK promoter in
pBLCAT5 (Fig. 2). As can be seen, this fragment had no effect on
TK promoter activity.
The power of exon prediction software [Grail/Exons, Grail/
Gap2, F-gene, FEX, HEXON, and Genescan (Burge and Karlin
1997; Solovyev et al. 1994a, 1994b, 1995)] was evaluated with our
Smn sequence. All programs were run under default parameters
chosen by NIX (nucleotide identification of unknown sequences;
UK HGMP Resource Center) for mouse genomic sequence. The
graphic output of each program was viewed simultaneously, which
facilitates annotation of the sequence. In general, all of the programs identified 80% of the exons; however, taken together, the
entire Smn gene was identified. Overall, our results suggest that
multiple exon prediction software should be used when screening

640

C.J. DiDonato et al.: Murine survival motor neuron gene

Fig. 2. Deletion analysis of Smn promoter activity in


HepG2 cells. The human hepatoblastoma cell line
HepG2 was maintained in minimal essential medium, supplemented with 10% (vol/vol) fetal bovine
serum (FBS). The cells were incubated in a humidified atmosphere of 5% CO2 at 37C. The medium
was replaced twice a week, and cells were passaged
1 to 10 every week. For transfection, the cells were
counted, plated at a density of 2 105 cells/well on
multi-well tissue culture plates and grown for 1824
h. The cells were transfected with 3.0 g Lipofectintreated DNA (Life Technologies, Inc.) as described
by the manufacturer, with two different DNA preparations per construct, and each transfection was performed in triplicate. To normalize the efficiencies of
transfection, 0.5 g pCMV--gal DNA was cotransfected with 2.5 g of the test plasmids. Seven hours after transfection, the cells were washed and fed with fresh media. The cells were lysed and
harvested 48 h later. Aliquots were used for total protein quantitation, -galactosidase assays, and CAT assays. Protein content was determined with the
Bio-Rad reagent according to manufacturers directions. The -galactosidase assay was performed by using the -Galactosidase Enzyme Assay System
(Promega). CAT assays were performed with the CAT enzyme assay system with reporter lysis buffer (Promega), and enzyme activity was monitored by
liquid scintillation counting. The values for CAT activities of test plasmids were normalized by protein content and -galactosidase activity. The CAT
values are the average of three independent transfection experiments. pBLCAT5, which has the HSV-thymidine kinase promoter driving CAT expression,
and the promoter-less pBLCAT6 serve as positive and negative controls for promoter activity, respectively. The error bars reflect deviations between
experiments and DNA preparations. Symbols: Line
; SINE ; low complexity repeat .

for the presence of coding sequence within genomic DNA, as no


single program was capable of identifying all Smn exons.
BLAST (Altschul et al. 1990) analysis against a number of
different databases identified the murine, rat, and human cDNAs
for SMN (Lefebvre et al. 1995; DiDonato et al. 1997; Viollet et al.
1997; Battaglia et al. 1997), as well as three unpublished cDNAs
from bovine (AF016590, AF026810), canine (U50746), and zebrafish (AF083557). There were a total of 67 EST hits (26 murine,
32 human, 6 rat, and 3 zebrafish). Alignment of the open-reading
frames (ORFs) from these species, as well as ORFs from C. elegans and S. pombe (previously identified orthologs based on sequence similarity) demonstrates the power of cross-species comparisons to elucidate important structural and functional domains.
As shown in Fig. 3, the SIP-1, Sm, and SMN oligomerization
domains that were previously identified (Liu et al. 1997; Lorson et
al. 1998) are highly conserved throughout evolution. The polyproline stretches in exons 4, 5, and 6 are also conserved; however,
their functional significance has yet to be determined. Interestingly, a missense mutation (P245L), which replaces the second
proline of the poly-proline stretch in exon 6 with a leucine (a
non-conservative amino acid change), has been identified in an
SMA type III patient (Rochette et al. 1997). Since missense mutations are extremely rare in SMA, it underscores the functional
significance of this region.
Finally, it should also be noted that the frequency of homologous recombination (HR) in gene targeting experiments with ES
cells differs between the 5 and 3 end of the Smn locus. The
reported frequency of HR events near the 5 end of the locus (exon
2a) has been reported to be 1/25 Neo resistant (NeoR) colonies
(Schrank et al. 1997). We have experienced a different HR frequency of 1/219 NeoR colonies at the 3 end of the Smn locus
specifically exon 6 (unpublished results). This discrepancy could
be due to a number of technical factors inherent in doing this type
of work, including the ES cell line used, transfection conditions,
and the length of the targeting construct. An additional possibility
is the number and distribution of repetitive elements contained
within the gene targeting vectors. One can imagine that, as the
number of repeats increases the chance of random integration also
increases. For example, the region of homology in Schrank and
coworkers targeting vector extended from distal intron 1 to the
proximal/middle region of intron 4 and was relatively free of repetitive elements (Fig. 1). In contrast, our targeting vector spanned
exon 5 to the middle of intron 6 and lacked any substantial stretch

Fig. 3. Amino acid sequence homology of SMNrelated proteins. SMN protein sequences from human
(U18423), mouse (U77714, U63294), canine
(U50746), rat (U75369), zebrafish (AF083557), C. elegans-related protein (Z81048), S. pombe-related protein (Z54354), and a partial bovine SMN (AF016590, AF026810) were
aligned with CLUSTAL. The exon boundaries are shown above the sequence, the SIP-1 (aa13-44), Sm (aa240-267), and SMN oligomerization
(aa249-278) domains are indicated by bold, double, and triple underlines,
respectively. Dashes represent breaks in the actual amino acid sequence of
the proteins to maximize sequence alignment. The murine Smn protein is
97%, 85%, 83%, and 52% identical to rat, canine, human, and zebrafish
orthologs, respectively.

C.J. DiDonato et al.: Murine survival motor neuron gene

of unique sequence, owing to the density and distribution of repetitive elements. In the future, as more of the mouse genome is
sequenced, it would be interesting to analyze how the density and
distribution of repetitive elements within gene targeting constructs
affect HR frequency. For the present, our findings are valuable for
those studying the in vivo consequences of SMA missense mutations, since most of these are located in exon 6. Furthermore, given
the repetitive nature of this region, the sequence information provided by these studies will facilitate PCR-based screening strategies to identify HR events.
In conclusion, the work presented illustrates how the nucleotide sequence of a gene can be used in a variety of ways from
determining gene structure to creating and identifying subtle mutations in vivo. Moreover, this sequence can now be used for
interspecies sequence comparisons to identify conserved domains
within SMN that regulate gene expression. If such domains do
exist, they may provide a target for the development of novel
therapies for the treatment of SMA.
Acknowledgments. Special thanks to Ken Morgan and Mary Fujiwara for
helpful discussions, and Sebastien Meilleur for technical assistance. This
work was supported by grants from the Medical Research Council (MRC)
of Canada and Families of SMA. C.J. DiDonato is an MRC postdoctoral
fellow, and L.R. Simard is a Fonds de la Recherche en Sante du Quebec
Scholar.

References
Altschul S, Warren G, Miller W, Myers E, Lipman D (1990) Basic local
alignment search tool. J Mol Biol, 215, 403410
Battaglia G, Princivalle A, Forti F, Lizier C, Zeviani M (1997) Expression
of the SMN gene, the spinal muscular atrophy determining gene, in the
mammalian central nervous system. Hum Mol Genet 6, 19611971
Boshart M, Kluppel M, Schmidt A, Schutz G, Luckow B (1992) Reporter
constructs with low background activity utilizing the cat gene. Gene 110,
129130
Brahe C, Clermont O, Zappata S, Tiziano F, Melki J et al. (1996) Frameshift mutation in the survival motor neuron gene in a severe case of SMA
type I. Hum Mol Genet 5, 19711976
Breathnach R, Benoist C, OHare K, Gannon F, Chambon P (1978) Ovalbumin gene: evidence for a leader sequence in mRNA and DNA sequences at the exon-intron boundaries. Proc Natl Acad Sci USA 75,
49534957
Burge C, Karlin S (1997) Prediction of complete gene structures in human
genomic DNA. J Mol Biol 268, 7894
Burglen L, Lefebvre S, Clermont O, Burlet P, Viollet L et al. (1996)
Structure and organization of the human survival motor neurone (SMN)
gene. Genomics 32, 479482
Bussaglia E, Clermont O, Tizzano E, Lefebvre S, Burglen L et al. (1995)
A frame-shift deletion in the survival motor neuron gene in Spanish
spinal muscular atrophy patients. Nat Genet 5, 335337
Chen Q, Baird SD, Mahadevan M, Besner-Johnston A, Farahani R et al.
(1998) Sequence of a 131-kb region of 5q13.1 containing the spinal
muscular atrophy candidate genes SMN and NAIP. Genomics 48, 121
127
Coovert DD, Le TT, McAndrew PE, Strasswimmer J, Crawford TO et al.
(1997) The survival motor neuron protein in spinal muscular atrophy.
Hum Mol Genet 6, 12051214
DiDonato CJ, Chen XN, Noya D, Korenberg JR, Nadeau JH et al. (1997)
Cloning, characterization, and copy number of the murine survival motor neuron gene: homolog of the spinal muscular atrophy-determining
gene. Genome Res 7, 339352

641
Fischer U, Liu Q, Dreyfuss G (1997) The SMN-SIP1 complex has an
essential role in spliceosomal snRNP biogenesis. Cell 90, 10231029
Francis JW, Sandrock AW, Bhide PG, Vonsattel JP, Brown RH Jr (1998)
Heterogeneity of subcellular localization and electrophoretic mobility of
survival motor neuron (SMN) protein in mammalian neural cells and
tissues. Proc Natl Acad Sci USA 95, 64926497
Gennarelli M, Lucarelli M, Capon F, Pizzuti A, Merlini L et al. (1995)
Survival motor neuron gene transcript analysis in muscles from spinal
muscular atrophy patients. Biochem Biophys Res Commun 213, 342
348
Hahnen ET, Wirth B (1996) Frequent DNA variant in exon 2a of the
survival motor neuron gene (SMN): a further possibility for distinguishing the two copies of the gene. Hum Genet 98, 122123
Lefebvre S, Burglen L, Reboullet S, Clermont O, Burlet P et al. (1995)
Identification and characterization of a spinal muscular atrophydetermining gene. Cell 80, 155165
Lefebvre S, Burlet P, Liu Q, Bertrandy S, Clermont O et al. (1997) Correlation between severity and SMN protein level in spinal muscular
atrophy. Nat Genet 16, 265269
Lester LB, Coghlan VM, Nauert B, Scott JD (1996) Cloning and characterization of a novel A-kinase anchoring protein. AKAP 220, association
with testicular peroxisomes. J Biol Chem 271, 94609465
Liu Q, Dreyfuss G (1996) A novel nuclear structure containing the survival
of motor neurons protein. EMBO J 15, 35553565
Liu Q, Fischer U, Wang F, Dreyfuss G (1997) The spinal muscular atrophy
disease gene product, SMN, and its associated protein SIP1 are in a
complex of spliceosomal snRNP proteins. Cell 90, 10131021
Lorson CL, Strasswimmer J, Yao JM, Baleja JD, Hahnen E et al. (1998)
SMN oligomerization defect correlates with spinal muscular atrophy
severity. Nat Genet 19, 6366
Munsat TM, Davies KE (1992) Meeting report: International SMA consortium Meeting. Neuromusc Disorders 2, 423428
Pugh BF, Tijan R (1991) Transcription from a TATA-less promoter requires a multisubunit TFIID complex. Genes Dev 5, 19351945
Rochette CF, Surh LC, Ray PN, McAndrew PE, Prior TW et al. (1991)
Molecular diagnosis of non-deletion SMA patients using quantitative
PCR of SMN exon 7. Neurogenetics 1, 101107
Schrank B, Gotz R, Gunnersen JM, Ure JM, Toyka KV et al. (1997)
Inactivation of the survival motor neuron gene, a candidate gene for
human spinal muscular atrophy, leads to massive cell death in early
mouse embryos. Proc Natl Acad Sci USA 94, 99209925
Schug J, Overton G (1997) TESS: Transcription Element Search Software
on the WWW Technical Report CBIL-TR-1997-1001-v0.0, of the
Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania
Solovyev VV, Salamov AA, Lawrence CB (1994a) The prediction of human exons by oligonucleotide composition and discriminant analysis of
spliceable open reading frames. In The Second International Conference
on Intelligent Systems for Molecular Biology, Altman R, Brutlag D,
Karp R, Latrop R, Searls D (eds.) (Menlo Park, Calif.: AAAI Press) p
354362
Solovyev VV, Salamov AA, Lawrence CB (1994b) Predicting internal
exons by oligonucleotide composition and discriminant analysis of
spliceable open reading frames, Nucleic Acids Res 22, 51565163
Solovyev VV, Salamov AA, Lawrence CB (1995) Identification of human
gene structure using linear discriminant functions and dynamic programming. In Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology. Rawling C, Clark D, Altman R,
Hunter L, Lengauer T, Wodak S (eds.) (Cambridge, England: AAAI
Press) p 367375
Viollet L, Bertrandy S, Bueno Brunialti AL, Lefebvre S, Burlet P et al.
(1997) cDNA isolation, expression, and chromosomal localization of the
mouse survival motor neuron gene (Smn). Genomics 40, 185188

Vous aimerez peut-être aussi