CFE

CFE / Loader / Boot Prompt
There is a small subset of commands that can be access only from the boot
prompt of the system.
These are useful in some circumstances, but the possibilities are quite limited.
In the below examples I am using a system that shows a CFE prompt, but
this may be !oader or something else depending on your system.
CFE> help
"ersion #rint CFE "ersion.
update$flash %pdates the boot flash with the firmware image on the
#C&Card.
netboot 'oots the supplied %(! off the networ).
boot$diags 'oots the diagnostic image off of the #C&Card
boot$bac)up 'oots the bac)up image of *ata +,T-# off of the #C&
Card.
boot$primary 'oots the primary image of *ata +,T-# off of the #C&
Card.
boot$ontap 'oots the correct image of *ata +,T-#
bye (eset the system
flash %pdate a flash memory de"ice
autoboot -utomatic system bootstrap.
go .tart a pre"iously loaded program
boot !oad an executable file into memory and execute it
load !oad an executable file into memory without executing it
set date .et current date
set time .et current time
ping #ing a remote I# host
arp *isplay or modify the -(# Table
ifconfig Configure the Ethernet interface
show date current time according to (TC
show time current time according to (TC
show
de"ices
*isplay information about the installed de"ices
unseten" *elete an en"ironment "ariable
set&defaults (eset all system en"ironmental "ariables to default
"alues.
printen" *isplay the en"ironment "ariables
seten" .et an en"ironment "ariable.
help +btain help for CFE commands
NETAPP Configuration backup
The command is "ery simple and straight forward. /ou start by dumping out
the configuration from the filer. This automatically goes into 0etc0configs. From
here you can then clone the config if needed, or compare 1diff2 the config.
fler01> confg
Usage:
confg clone <fler> <remote_user>
confg dif [-o <output_fle>] <confg_fle1> [ <confg_fle2> ]
confg dump [-f] [-v] <confg_fle>
confg restore [-v] <confg_fle>
Failed disk replacement in NetApp
*is) failures are "ery common in storage en"ironment and as a storage administrator
we come across this situation "ery often, how often that depends how much dis)s
your storage systems is ha"ing3 more dis)s you manage more often you come
across this situation.
This post I ha"e written considering (-I*&*# with FC&-! dis)s because it4s always
better than (-I*5 and .C.I loops we don4t use. *ue to its design (-I*&*# gi"es
protection from double dis) failure in a single raid group. To say that it means you will
not loose data e"en if 6 dis)s are failed in a single (7 at same time or one after
another.
-s li)e any other storage system +ntap also uses a dis) from spare dis)s pool to
rebuild the data from sur"i"ing dis) as soon as it encounters a failed dis) situation
and sends an autosupport message to ,et-pp for parts replacement. +nce
autosupport is recei"ed by ,et-pp they initiate (8- process and part gets deli"ered
to the address listed for that failed system in ,et-pp records. +nce the dis) arri"es
you change the dis) by yourself or as) a ,et-pp engineer to come at onsite and
change it, whate"er way as soon as you replace the dis) your system finds the newly
wor)ing dis) and adds it in spare pool.
,ow wasn4t that pretty simple and straightforward9 +h yes3 because we are using
software based dis) ownership and dis) auto assignment is turned on. 8uch li)e
your baby had some cold so he called&up 7# himself and got it cured rather than
as)ing you to ta)e care of him, but what about if there are some more complication.
,ow, will co"er what all other things can come in way and any other complications.
Scenario 1:
ha!e replaced m" dri!e and light sho#s $reen or Amber but %s"sconfig &r' still
sho#s the dri!e as broken(
.ometimes we face this problem because system was not able to either label the
dis)s properly or replaced dis) itself is not good. The first thing we try is to label the
dis) correctly if that doesn4t wor) try replacing with another dis) or )nown good dis)
but what if that too doesn4t wor), :ust contact ,et-pp and follow their guidelines.
To label the dis) from ;'(+<E,; to ;.#-(E; first you ha"e to note down the bro)en
dis) id, which you can get from aggr status &r;, now go to ad"ance mode with pri"
set ad"anced and run dis) unfail at this stage your filer will throw some =&5 errors
on console or syslog or snmp traps, depends on how you ha"e configured but this
was the final step and now dis)s should be good which you can confirm with dis)
show for detailed status or sysconfig &r command. 7i"e it a few seconds to
recogni>e the changed status of dis) if status change doesn4t shows at first.
Scenario 2:
T#o disks ha!e failed from same raid group and don)t ha!e an" spare disk in
m" s"stem*
,ow in this case you are really in big trouble because always you need to ha"e at
least one spare dis) a"ailable in your system whereas ,et-pp recommends ?@6A
ratio i.e. ha"e one spare on each 6A dis)s. In the situation of dual dis) failure you
ha"e "ery high chances of loosing your data if another dis) goes while you are
rebuilding the data on spare dis) or while you are waiting for new dis)s to arri"e.
.o always ha"e minimum 6 dis)s a"ailable in your system one dis) is also fine and
system will not complain about spare dis) but if you lea"e system with only one spare
dis) then maintenance centre will not wor) and system will not scan any dis) for
potential failure.
,ow going to your abo"e situation that you ha"e dual dis) failure with no spares
a"ailable, so best bet is :ust ring ,et-pp to replace failed dis) -.-# or if you thin)
you are loosing your patient select same type of dis) from another healthy system,
do a dis) fail, remo"e dis) and replace it with failed dis) on other system.
-fter adding the dis) to another filer if it shows #artial0failed "olume, ma)e sure the
"olume reported as partial0failed belongs to newly inserted dis) by using "ol status
&" and "ol status &r; commands, if so :ust destroy the "olume with "ol destroy
command and then >ero out the dis) with dis) >ero spares.
This exercise will not ta)e more than ?B min1except dis) >eroing which depends on
your dis) type and capacity2 and you will ha"e single dis) failure in 6 systems which
can sur"i"e with another dis) failure, but what if that doesn4t happens and you )eep
running your system with dual dis) failure. /our system will shut down by itself after
65 hours3 yes it will shut down itself without any failo"er to ta)e, your attention. There
is a registry setting to control how long your system should run after dis) failure but I
thin) 65hrs is a good time and you shouldn4t increase or decrease it until and unless
you thin) you don4t care of the data sitting there and anyone accessing it.
Scenario 3:
+" dri!e failed but there is no disk #ith amber lights
- number of times these things happen because dis) electricals are failed and no
more system can recogni>e it as part of it. .o in this situation first you ha"e to )now
the dis) name. There are couple of methods to )now which dis) has failed.
a2 sysconfig &r loo) for bro)en dis) list
b2 From autosupport message chec) for failed dis) I*
c2 ;fcadmin de"ice$map; loo)s for a dis) with xxx or '/# message
d2 In 0etc0messages loo) for failed or bypassed dis) warning and there it gi"es
dis) I*
,ow once you ha"e identified failed dis) I* run dis) fail and chec) if you see
amber light if not use blin)$on in ad"anced mode to turn on the dis) !E* or if that
that fails turn on the ad:usting dis)4s light so you can identify the dis) correctly using
same blin)$on command. -lternati"ely you can use led$on command also instead of
blin)$on to turn on the dis) !E*s ad:acent to the defecti"e dis) rather than its red
!E*.
If you use auto assign function then system will assign the dis) to spare pool
automatically otherwise use dis) assign command to assign the dis) to system.
Scenario 4:
,isk LE, remains orange after replacing failed disk
This error is because you were in "ery hurry and ha"en4t gi"en enough time for
system to recogni>e the changes. Chen the failed dis) is remo"ed from slot, the dis)
!E* will remain lit until the Enclosure .er"ices notices and corrects it generally it
ta)es around =D seconds after remo"ing failed one.
,ow as you ha"e already done it so better use led$off command from ad"anced
mode or if that doesn4t wor)s because system belie"es that the !E* is off when it is
actually on, so simply turn the !E* on and then bac) off again using led$on then
led$off commands.
Scenario 5:
,isk reconstruction failed
There could be a number of issues to fail the (-I* reconstruction fail on new dis)
including enclosure access error, file system dis) not responding0missing, spare dis)
not responding0missing or something else, howe"er most common reason for this
failure is outdated firmware on newly inserted dis).
Chec) if newly inserted dis) is ha"ing same firmware as other dis)s if not first update
the firmware on newly inserted dis) and it then reconstruction should finish
successfully.
Scenario 6:
,isk reconstruction stuck at -. or failed to start
This might be an error or due to limitation in +,T-# i.e. no more than 6
reconstructions should be running at same time. Error which you might find a time is
because (-I* was in degraded state and system went through unclean shutdown
hence parity will be mar)ed inconsistent and need to be recomputed after boot.
Eowe"er as parity recomputation requires all data dis)s to be present in the (-I*
group and we already ha"e a failed dis) in (7 so aggregate will be mar)ed as
C-F!$inconsistent. /ou can confirm this condition with aggr status &r; command.
If this is the case then you ha"e to run wafliron, gi"ing command aggr wafliron start
while you are in ad"ance mode. 8a)e sure you contact ,et-pp before starting
walfiron as it will un&mount all the "olumes hosted in the aggregate until first phase of
tests are not completed. -s the time walfiron ta)es to complete first phase depends
on lots of "ariables li)e si>e of "olume0aggregate0(7, number of files0snapshot0!uns
and lots of other things therefore you can4t predict how much time it will ta)e to
complete, it might be ? hr or might be 5&B hrs. .o if you are running wafliron contact
,et-pp at fist hand.
NetApp command line shortcuts
Fust a few commands which I use frequently while on console.
CT(!GC H It deletes the word before cursor
CT(!G( H (ewrites the entire line you ha"e entered
CT(!G% H *eletes the whole line
CT(!G- H 7o to start of the line
CT(!GE H 7o to end of the line
CT(!G< H *elete all the following texts
- few more commands are there but I feel arrow )eys wor) better then you press
these sequences li)e
CT(!GF H (ight arrow
CT(!G' H !eft arrow
CT(!G# H %p arrow
CT(!G, H *own arrow
CT(!GI H Tab )ey

CFE

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

CFE

Transféré par

Droits d'auteur :

Formats disponibles

CFE / Loader / Boot Prompt

Vous aimerez peut-être aussi