Académique Documents
Professionnel Documents
Culture Documents
Architecture
NetApp Confidential
MODULE 4: ARCHITECTURE
4-1
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Module Objectives
After this module, you should be able to:
Show the end-to-end path of a file write
request through a cluster
Answer questions about replicated database
(RDB) concepts
Identify the differences between a vol0 root
volume and a data virtual storage server
(Vserver) root volume
NetApp Confidential
MODULE OBJECTIVES
4-2
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Lesson 1
NetApp Confidential
LESSON 1
4-3
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Components
Three major software components on every
node:
The network module
The data module
The SCSI module
NetApp Confidential
COMPONENTS
The modules refer to separate software state machines that are accessed only by well defined APIs. Every
node contains a network module, a SCSI module, and a data module. Any network or SCSI module in the
cluster can talk to any data module in the cluster.
The network module and the SCSI module translate client requests into Spin Network Protocol (SpinNP)
requests and vice versa. The data module, which contains the WAFL (Write Anywhere File Layout) file
system, manages SpinNP requests. The cluster session manager (CSM) is the SpinNP layer between the
network, SCSI, and data modules. The SpinNP protocol is another form of RPC interface. It is used as the
primary intranode traffic mechanism for file operations among network, SCSI, and data modules.
The members of each replicated database (RDB) unit on every node in the cluster are in constant
communication with each other to remain synchronized. The RDB communication is like the heartbeat of
each node. If the heartbeat cannot be detected by the other members of the unit, the unit corrects itself in a
manner that is discussed later in this course. The four RDB units on each node are the blocks configuration
and Operations Manager (BCOM), the volume location database (VLDB), VifMgr, and management.
4-4
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Management
M-Host
Cluster Traffic
CSM
Data
module
RDB Units:
Mgwd
VLDB
VifMgr
BCOM
Data Vserver
Root Volume
Vol0
Root
Vol1
Vol2
NetApp Confidential
4-5
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
NetApp Confidential
4-6
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
FC
SCSI
SpinNP
TCP/IP
NetApp Confidential
4-7
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
NetApp Confidential
4-8
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
The CSM
Provides a communication mechanism
between any network or SCSI module and
any data module
Provides a reliable transport for SpinNP traffic
Is used regardless of whether the network or
SCSI module and the data module are on the
same node or on different nodes
NetApp Confidential
THE CSM
4-9
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Node2
Network and
SCSI
modules
Network and
SCSI
modules
CSM
CSM
Data
module
Data
module
Vol0
Root
Vol1
Vol2
Root
Vol 1
NetApp Confidential
Vol0
Vol3
Vol4
10
4-10
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Requests
Responses
Node2
Network and
SCSI
modules
Network and
SCSI
modules
CSM
CSM
Data
module
Data
module
Vol0
Root Root
Vol1
Vol 1
Vol2
NetApp Confidential
Vol0
Vol3
Vol4
11
4-11
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Network
Module
SAN
Module
Network
Module
SAN
Module
Network
Module
SAN
Module
Cluster Interconnect
Network
Module
SAN
Module
WAFL
RAID
Storage
N
V
R
A
M
WAFL
RAID
Storage
N
V
R
A
M
WAFL
RAID
Storage
NetApp Confidential
4-12
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
12
CSM
Data module
Network
Protocols
WAFL
RAID
Storage
Clients
To HA partner
Physical
Memory
NVRAM
Management
NetApp Confidential
4-13
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
13
14
4-14
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Data Vservers
1 of 2
NetApp Confidential
15
DATA VSERVERS: 1 OF 2
Think of a cluster as a group of hardware elements (nodes, disk shelves, and more). A data Vserver is a
logical piece of that cluster, but a Vserver is not a subset or partitioning of the nodes. A Vserver is more
flexible and dynamic. Every Vserver can use all the hardware in the cluster, and all at the same time.
Example: A storage provider has one cluster and two customers: ABC Company and XYZ Company. A
Vserver can be created for each company. The attributes that are related to specific Vservers (volumes, LIFs,
mirror relationships, and others) can be managed separately, while the same hardware resources can be used
for both. One company can have its own NFS server, while the other can have its own NFS, CIFS, and iSCSI
servers.
4-15
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Data Vservers
2 of 2
NetApp Confidential
16
DATA VSERVERS: 2 OF 2
A one-to-many relationship exists between a Vserver and its volumes. The same is true for a Vserver and its
data LIFs. Data Vservers can have many volumes and many data LIFs, but those volumes and LIFs are
associated only with this one data Vserver.
4-16
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
G H
A
B
C
R
G
17
4-17
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
PetCo
RonCo
QuekCo
Namespace
Namespace
Namespace
Namespace
Vserver
Root
Vserver
Root
Vserver
Root
Volume
Volume
Vserver
Root
Volume
Volume
Volume
Volume
NetApp Confidential
18
4-18
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Namespaces
A namespace is the file system of a data
Vserver.
A namespace consists of many volumes.
A namespace is independent of the
namespaces of other data Vservers.
The root of the namespace is the cluster
data Vserver root volume.
A client mount or mapping can be to the data
Vserver root volume or to a point further into
the tree.
NetApp Confidential
19
NAMESPACES
A namespace is a file system. A namespace is the external, client-facing representation of a Vserver. A
namespace consists of volumes that are joined together through junctions. Each Vserver has one namespace,
and the volumes in one Vserver cannot be seen by clients that are accessing the namespace of another
Vserver.
4-19
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
NetApp Confidential
20
4-20
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Lesson 2
NetApp Confidential
LESSON 2
4-21
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
21
The RDB
The RDB is the key to maintaining highperformance consistency in a distributed
environment.
The RDB maintains data that supports the cluster,
not the user data in the namespace.
Operations are transactional (atomic): entire
transactions are either committed or rolled back.
Four RDB units exist: the volume location
database (VLDB), management, VifMgr, and
blocks configuration and operations manager
(BCOM).
NetApp Confidential
22
THE RDB
The RDB units do not contain user data. The RDB units contain data that helps to manage the cluster. These
databases are replicated; that is, each node has its own copy of the database, and that database is always
synchronized with the databases on the other nodes in the cluster. RDB database reads are performed locally
on each node, but an RDB write is performed to one master RDB database, and then those changes are
replicated to the other databases throughout the cluster. When reads of an RDB database are performed, those
reads can be fulfilled locally without the need to send requests over the cluster interconnects.
The RDB is transactional in that the RDB guarantees that when data is written to a database, either it all gets
written successfully or it all gets rolled back. No partial or inconsistent database writes are committed.
Four RDB units (the VLDB, management, VifMgr, and BCOM) exist in every cluster, which means that four
RDB unit databases exist on every node in the cluster.
4-22
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Management Gateway
Is also known as the M-host
Enables management of the cluster from any
node
Provides the CLI
Runs as mgwd (the management gateway
daemon) on every node
Stores its data in the management RDB unit
NetApp Confidential
23
MANAGEMENT GATEWAY
The management RDB unit contains information that is needed by the management gateway daemon (mgwd)
process on each node. The kind of management data that is stored in the RDB is written infrequently and read
frequently. The management process on a given node can query the other nodes at run time to retrieve a great
deal of information, but some information is stored locally on each node, in the management RDB database.
4-23
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
NetApp Confidential
24
4-24
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
VIF Manager
Runs as vifmgr
Stores and monitors LIF configuration
Stores and administers LIF failover policies
NetApp Confidential
25
VIF MANAGER
The VifMgr is responsible for creating and monitoring NFS, CIFS, and iSCSI LIFs. It also handles automatic
NAS LIF failover and manual migration of NAS LIFs to other network ports and nodes.
4-25
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
NetApp Confidential
26
4-26
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
NetApp Confidential
27
4-27
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
28
4-28
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
RDB Databases
node1
node2
node4
node3
NetApp Confidential
29
RDB DATABASES
This slide shows a four-node cluster. The four databases that are shown for each node are the four RDB units
(management, VLDB, VifMgr, and BCOM). Each unit consists of four distributed databases. Each node has
one local database for each RDB unit.
The databases that are shown on this slide with dark borders are the masters. Note that the master of any
particular RDB unit is independent of the master of the other RDB units.
The node that is shown on this slide with a dark border has epsilon (the tie-breaking ability).
On each node, all the RDB databases are stored in the vol0 volume.
4-29
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Quorum
1 of 2
A quorum is a simple majority of connected, healthy, and
eligible nodes.
Two RDB quorum concepts exist: a cluster-wide quorum
and an individual RDB unit that is in or out of quorum.
RDB units never go out of quorum as a whole; only local
units (processes) do.
When an RDB unit goes out of quorum, reads from the
RDB unit can still occur, but changes to the RDB unit
cannot.
Example: If the VLDB goes out of quorum, during the brief
time that the database is out, no volumes can be created,
deleted, or moved; however, access to the volumes from
clients is not affected.
NetApp Confidential
30
QUORUM: 1 OF 2
A master can be elected only when a majority of local RDB units are connected and healthy for a particular
RDB unit on an eligible node. A master is elected when each local unit agrees on the first reachable healthy
node in the RDB site list. A healthy node is one that is connected, can communicate with the other nodes,
has CPU cycles, and has reasonable I/O.
The master of a given unit can change. For example, when the node that is the master for the management
unit is booted, a new management master must be elected by the remaining members of the management unit.
A local unit goes out of quorum when cluster communication is interrupted for a few seconds, for example,
because of a booting or a cluster interconnect hiccup that lasts for a few seconds. Because the RDB units
always work to monitor and maintain a good state, the local unit comes back in quorum automatically. When
a local unit goes out of quorum and then comes back into quorum, the RDB unit is synchronized again. Note
that the VLDB process on a node might go out of quorum although the VifMgr process on that same node has
no problem.
When a unit goes out of quorum, reads from that unit can be performed, but writes to that unit cannot. That
restriction is enforced so that no changes to that unit happen during the time that a master is not agreed upon.
In addition to the example above, if the VifMgr goes out of quorum, access to LIFs is not affected, but no LIF
failover can occur.
4-30
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Quorum
2 of 2
NetApp Confidential
31
QUORUM: 2 OF 2
Marking a node as ineligible (by using the cluster modify command) means that the node no longer
affects RDB quorum or voting. If you mark the epsilon node as ineligible, epsilon is automatically given to
another node.
4-31
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
NetApp Confidential
32
4-32
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
4+
2+
NetApp Confidential
4-33
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
33
Two-Node Clusters
Two-node clusters are a special case:
No majority exists in the event of a cluster
interconnect partition or during a failover
situation.
The RDB manages this case under the
covers but must be told that this cluster
contains only two nodes.
cluster1::> cluster ha modify configured true
See TR3450 for more information
NetApp Confidential
34
TWO-NODE CLUSTERS
From Ron Kownacki, author of the RDB:
Basically, quorum majority doesnt work well when down to two nodes and theres a failure, so RDB is
essentially locking the fact that quorum is no longer being used and enabling a single replica to be artificially
writable during that outage.
The reason we require a quorum (a majority) is so that all committed data is durable: if you successfully
write to a majority, you know that any future majority will contain at least one instance that has seen the
change, so the update is durable. If we didnt always require a majority, we could silently lose committed
data. So in two nodes, the node with epsilon is a majority and the other is a minorityso you would only
have one-directional failover (need the majority). So epsilon gives you a way to get majorities where you
normally wouldnt have them, but it only gives unidirectional failover because its static.
In two-node (high-availability mode), we try to get bidirectional failover. To do this, we remove the
configuration epsilon and make both nodes equaland form majorities artificially in the failover cases. So
quorum is two nodes available out of the total of two nodes in the cluster (no epsilon involved), but if theres
a failover, you artificially designate the survivor as the majority (and lock that fact). However, that means you
cant fail over the other way until both nodes are available, they sync up, and drop the lockotherwise you
would be discarding data.
4-34
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
Management
M-Host
Cluster Traffic
CSM
Data
module
Data Vserver
Root Volume
Vol0
RDB Units:
Mgwd
VLDB
VifMgr
BCOM
Root
Vol1
Vol2
NetApp Confidential
4-35
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
35
Module Summary
Now that you have completed this module, you
should be able to:
Show the end-to-end path of a file write
request through a cluster
Answer questions about replicated database
(RDB) concepts
Identify the differences between a vol0 root
volume and a data virtual storage server
(Vserver) root volume
NetApp Confidential
MODULE SUMMARY
4-36
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
36
Exercise
Module 4: Architecture
Time Estimate: 15 Minutes
NetApp Confidential
EXERCISE
Please refer to your exercise guide.
4-37
2013 NetApp, Inc. This material is intended only for training. Reproduction is not authorized.
37