Vous êtes sur la page 1sur 51

Welcome to Data Domain System Administration.

Copyright © 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are trademarks
of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners. Published in the
USA.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO
THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.

Use, copying, and distribution of any DELL EMC software described in this publication requires an applicable software license. The trademarks, logos, and service marks
(collectively "Trademarks") appearing in this publication are the property of DELL EMC Corporation and other parties. Nothing contained in this publication should be construed
as granting any license or right to use any Trademark without the prior written permission of the party that owns the Trademark.

AccessAnywhere Access Logix, AdvantEdge, AlphaStor, AppSync ApplicationXtender, ArchiveXtender, Atmos, Authentica, Authentic Problems, Automated Resource Manager,
AutoStart, AutoSwap, AVALONidm, Avamar, Aveksa, Bus-Tech, Captiva, Catalog Solution, C-Clip, Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, EMC
CertTracker. CIO Connect, ClaimPack, ClaimsEditor, Claralert ,CLARiiON, ClientPak, CloudArray, Codebook Correlation Technology, Common Information Model, Compuset,
Compute Anywhere, Configuration Intelligence, Configuresoft, Connectrix, Constellation Computing, CoprHD, EMC ControlCenter, CopyCross, CopyPoint, CX, DataBridge ,
Data Protection Suite. Data Protection Advisor, DBClassify, DD Boost, Dantz, DatabaseXtender, Data Domain, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, DLS
ECO, Document Sciences, Documentum, DR Anywhere, DSSD, ECS, elnput, E-Lab, Elastic Cloud Storage, EmailXaminer, EmailXtender , EMC Centera, EMC ControlCenter,
EMC LifeLine, EMCTV, Enginuity, EPFM. eRoom, Event Explorer, FAST, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony, Global File Virtualization, Graphic
Visualization, Greenplum, HighRoad, HomeBase, Illuminator , InfoArchive, InfoMover, Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, Isilon, ISIS,Kazeon, EMC
LifeLine, Mainframe Appliance for Storage, Mainframe Data Library, Max Retriever, MCx, MediaStor , Metro, MetroPoint, MirrorView, Mozy, Multi-Band
Deduplication,Navisphere, Netstorage, NetWitness, NetWorker, EMC OnCourse, OnRack, OpenScale, Petrocloud, PixTools, Powerlink, PowerPath, PowerSnap, ProSphere,
ProtectEverywhere, ProtectPoint, EMC Proven, EMC Proven Professional, QuickScan, RAPIDPath, EMC RecoverPoint, Rainfinity, RepliCare, RepliStor, ResourcePak,
Retrospect, RSA, the RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, ScaleIO Smarts, Silver Trail, EMC Snap, SnapImage, SnapSure, SnapView, SourceOne,
SRDF, EMC Storage Administrator, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX, TimeFinder, TwinStrata, UltraFlex,
UltraPoint, UltraScale, Unisphere, Universal Data Consistency, Vblock, VCE. Velocity, Viewlets, ViPR, Virtual Matrix, Virtual Matrix Architecture, Virtual Provisioning, Virtualize
Everything, Compromise Nothing, Virtuent, VMAX, VMAXe, VNX, VNXe, Voyence, VPLEX, VSAM-Assist, VSAM I/O PLUS, VSET, VSPEX, Watch4net, WebXtender, xPression,
xPresso, Xtrem, XtremCache, XtremSF, XtremSW, XtremIO, YottaYotta, Zero-Friction Enterprise Storage.

Revision Date: March 2017

Course Number: MR-1XP-DDSADMIN

Copyright © 2017 Dell Inc. Data Domain System Administration 1


This course covers the knowledge and skills needed for configuring and maintaining Data Domain
systems.

Copyright © 2017 Dell Inc. Data Domain System Administration 2


This module focuses on some of the key features of the Data Domain Operating System (DDOS )
including deduplication, SISL and DIA, protocols used by DD OS, and how to access a Data Domain
system for administrative tasks.

Copyright © 2017 Dell Inc. Data Domain System Administration 3


This lesson covers a hardware overview of Data Domain, including current hardware models. An overview
of DD Virtual Edition is also presented here.

Copyright © 2017 Dell Inc. Data Domain System Administration 4


Dell EMC Data Domain storage systems are traditionally used for disk backup, archiving, and disaster
recovery. An Dell EMC Data Domain system can also be used for online storage providing the user with
additional features and benefits.

A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.

Data Domain systems consist of three components: a controller, disk drives, and enclosures to hold the
disk drives.

Data Domain systems use Serial Advanced Technology Attachment (SATA) disk drives and Serial
Attached SCSI (SAS) drives.

Copyright © 2017 Dell Inc. Data Domain System Administration 5


Here is the current Data Domain family. The Data Domain family includes systems for small to large
enterprise organizations.

By reducing storage requirements by 10 to 30x and archive storage requirements by up to 5x, Data
Domain systems can help significantly minimize the storage footprint for small enterprise/ROBO (Remote
Office/Branch Office) environments and scaling all the way up to large enterprise environments.

The models currently shipping with DD OS 6.0 are:


• DD2200
• DD6300
• DD6800
• DD9300
• DD9800

Also available are the ES30 and DS60 expansion shelves that can be added to most Data Domain
systems for additional storage capacity.

Copyright © 2017 Dell Inc. Data Domain System Administration 6


This is the basic topology of a typical Data Domain implementation.

The Data Domain system (controller and any additional expansion shelves) is connected to storage
applications by means of VTL via Fibre Channel, or CIFS or NFS via Ethernet.

In the exploded view diagram, the Data Domain controller sits at the center of the topology implemented
through additional connectivity and system configuration, including:
• Expansion shelves for additional storage, depending on the model and site requirements
• Media server Virtual Tape Library storage via Fibre Channel
• LAN environments for connectivity for Ethernet based data storage, for basic data interactions,
and for Ethernet-based system management

Copyright © 2017 Dell Inc. Data Domain System Administration 7


Storage configuration features allow you to add and remove storage expansion enclosures from the
active, retention, and cloud tiers. Storage in an expansion enclosure (which is sometimes called an
expansion shelf) is not available for use until it is added to a tier.

For both active and retention tiers, DD OS 5.2 and later releases support ES30 shelves. DD OS 5.7 and
later support, DS60 shelves.

Copyright © 2017 Dell Inc. Data Domain System Administration 8


The FS15 SSD shelf is a solid state expansion shelf used exclusively for the metadata cache in the active
or extended retention tiers of a Data Domain system.

It uses the same form factor as the earlier ES30 expansion shelves and offers different quantities of 800
GB SAS solid state drives depending on the capacity of the active tier.

With a DD9800, the FS15 can be configured as required with either 8 or 15 disks.

When configured for high availability the DD6800 requires 2 or 5 disks and DD9300 models require 5 or 8
disks..

The FS15 SSD shelf is always counted in the number of ES30 shelf maximums but since it is only used
for metadata, it does not affect capacity.

The SSD shelf for metadata is not supported for ER and Cloud Tier use cases.

Copyright © 2017 Dell Inc. Data Domain System Administration 9


What is Data Domain Virtual Edition or DD VE?

Just like a traditional Data Domain appliance, DD VE is a data protection appliance, with one primary
difference. It has no Data Domain hardware tied to it. DD VE is an all software only virtual deduplication
appliance that provides data protection in an enterprise environment. It is intended to be used as a cost
effective solution in customer remote and branch offices.

DD VE 3.0 is supported on Microsoft Hyper-V and VMWare ESXi versions 5.1, 5.5 and 6.0.

Copyright © 2017 Dell Inc. Data Domain System Administration 10


This lesson covers a software overview of Data Domain, including deduplication fundamentals, SISL, DIA,
the Data Domain file system and supported protocols.

Copyright © 2017 Dell Inc. Data Domain System Administration 11


The latest Data Domain Operating System (DD OS ) has several features and benefits, including:
• Support for backup, file archiving, and email archiving applications
• Simultaneous use of VTL, CIFS, NFS, NDMP, and Dell EMC Data Domain Boost protocols

• Data Domain Secure Multi-tenancy (SMT) is the simultaneous hosting, by an internal IT department or
an external provider, of an IT infrastructure for more than one consumer or workload (business unit,
department, or Tenant).

• SMT provides the ability to securely isolate many users and workloads in a shared infrastructure, so
that the activities of one Tenant are not apparent or visible to the other Tenants.

• Conformance with IT governance and regulatory compliance standards for archived data

Copyright © 2017 Dell Inc. Data Domain System Administration 12


There are many powerful features and capabilities to the Data Domain system. They are all concerned
with backing up data and taking up as little storage space as possible. They are also concerned with the
speed of the backup process and maintaining the reliability and integrity of the data that is backed up and
stored.

Copyright © 2017 Dell Inc. Data Domain System Administration 13


Deduplication is similar to data compression, but it looks for redundancy of large sequences of bytes.
Sequences of bytes identical to those previously encountered and stored are replaced with references to
the previously encountered data.

This is all hidden from users and applications. When the data is read, the original data is provided to the
application or user.

Deduplication performance is dependent on the amount of data, bandwidth, disk speed, CPU, and
memory or the hosts and devices performing the deduplication.

When processing data, deduplication recognizes data that is identical to previously stored data. When it
encounters such data, deduplication creates a reference to the previously stored data, thus avoiding
storing duplicate data.

Copyright © 2017 Dell Inc. Data Domain System Administration 14


Deduplication typically uses hashing algorithms.

Hashing algorithms yield a unique value based on the content of the data being hashed. This value is
called the hash or fingerprint, and is much smaller in size than the original data.

Different data contents yield different hashes; each hash can be checked against previously stored
hashes.

Copyright © 2017 Dell Inc. Data Domain System Administration 15


There are three Deduplication methods:

• File-Based is one method.

• Fixed-Length and Variable-Length are the other two methods and are Segment-Based.

Copyright © 2017 Dell Inc. Data Domain System Administration 16


In file-based deduplication, only the original instance of a file is stored. Future identical copies of the file
use a small reference to point to the original file content. File-based deduplication is sometimes called
single-instance storage (SIS).

File-based deduplication enables storage savings. It can be combined with compression (a way to
transmit the same amount of data in fewer bits) for additional storage savings. It is popular in desktop
backups. It can be more effective for data restores. It doesn’t need to re-assemble files. It can be included
in backup software, so an organization doesn’t have to depend on a vendor disk.

File-based deduplication results are often not as great as with other types of deduplication (such as block-
and segment-based deduplication). The most important disadvantage is there is no deduplication with
previously backed up files if the file is modified.

File-based deduplication stores an original version of a file and creates a digital signature for it (such as
SHA1, a standard for digital signatures). Future exact copy iterations of the file are pointed to the digital
signature rather than being stored.

Copyright © 2017 Dell Inc. Data Domain System Administration 17


Fixed-length segment deduplication (also called Fixed block-based deduplication) reduces data storage
requirements by comparing incoming data segments (also called fixed data blocks or data chunks) with
previously stored data segments. It divides data into a single, fixed length (for example, 4 KB, 8 KB, 12
KB, or larger).

Fixed-length segment deduplication reads data and divides it into fixed-size segments. These segments
are compared to other segments already processed and stored. If the segment is identical to a previous
segment, a pointer is used to point to that previous segment.

For data that is identical (does not change), fixed-length segment deduplication reduces storage
requirements.

When data is altered the segments shift, causing more segments to be stored. For example, when you
add a slide to a Microsoft PowerPoint deck, all subsequent blocks in the file are rewritten and are likely to
be considered as different from those in the original file, so the deduplication effect is less significant.
Smaller blocks get better deduplication than large ones, but it takes more resources to deduplicate.

In backup applications, the backup stream consists of many files. The backup streams are rarely entirely
identical even when they are successive backups of the same file system. A single addition, deletion, or
change of any file changes the number of bytes in the new backup stream. Even if no file has changed,
adding a new file to the backup stream shifts the rest of the backup stream. Fixed-sized segment
deduplication backs up large numbers of segments because of the new boundaries between the
segments.

Copyright © 2017 Dell Inc. Data Domain System Administration 18


Variable-length segment deduplication evaluates data by examining its contents to look for the boundary
from one segment to the next. Variable-length segments are any number of bytes within a range
determined by the particular algorithm implemented.

Unlike fixed-length segment deduplication, variable-length segment deduplication uses the content of the
stream to divide the backup or data stream into segments based on the contents of the data stream.

When you apply variable-length segmentation to a data sequence, deduplication uses variable data
segments when it looks at the data sequence. In this example, byte A is added to the beginning of the
data. Only one new segment needs to be stored, since the data defining boundaries between the
remaining data were not altered.

Eventually variable-length segment deduplication will find the segments that have not changed, and
backup fewer segments than fixed-size segment deduplication. Even for storing individual files, variable
length segments have an advantage. Many files are very similar to, but not identical to, other versions of
the same file. Variable length segments will isolate the changes, find more identical segments, and store
fewer segments than fixed-length deduplication.

Copyright © 2017 Dell Inc. Data Domain System Administration 19


With Data Domain inline deduplication, incoming data is examined as soon as it arrives to determine if a
segment is new or unique or a duplicate of a segment previously stored. Inline deduplication occurs in
RAM before the data is written to disk. Around 99% of data segments are analyzed in RAM without disk
access.

The process is shown in this slide, as follows:


• Inbound segments are analyzed in RAM.
• The stream is divided into segments, and each segment is given a unique ID.
• If a segment is redundant, a reference to the stored segment is created.
• If a segment is unique, it is compressed and stored.

Inline deduplication requires less disk space than post-process deduplication. With post-process
deduplication, files are written to disk first, then they are scanned and compressed.

There is less administration for an inline deduplication process, as the administrator does not need to
define and monitor the staging space.

Inline deduplication analyzes the data in RAM, and reduces disk seek times to determine if the new data
must be stored. Writes from RAM to disk are done in full-stripe batches to use the disk more efficiently,
reducing disk access.

Copyright © 2017 Dell Inc. Data Domain System Administration 20


When the deduplication occurs where data is created, it is often referred to as source-based deduplication,
whereas when it occurs where the data is stored, it is commonly called target-based deduplication.

Source-based deduplication
• Occurs where data is created.
• Uses a host-resident agent, or API, that reduces data at the server source and sends just changed
data over the network.
• Reduces the data stream prior to transmission, thereby reducing bandwidth usage.
• DD Boost is designed to offload part of the Data Domain deduplication process to a backup server
or application client, thus using source-based deduplication.

Target-based deduplication
• Occurs where the data is stored.
• Is controlled by a storage system, rather than a host.
• Provides an excellent fit for a virtual tape library (VTL) without substantial disruption to existing
backup software infrastructure and processes.
• Works best for high change-rate environments.

Copyright © 2017 Dell Inc. Data Domain System Administration 21


Dell EMC Data Domain SISL™ Scaling Architecture is also called:
• Stream-Informed Segment Layout (SISL) scaling architecture
• SISL scaling architecture
• SISL architecture
• SISL technology

SISL architecture helps to speed up Data Domain systems.

SISL is used to implement Dell EMC Data Domain inline deduplication. SISL uses fingerprints and RAM to
identify segments already on disk.

SISL architecture provides fast and efficient deduplication by avoiding excessive disk reads to check if a
segment is on disk:
• 99% of duplicate data segments are identified inline in RAM before they are stored to disk.
• Scales with Data Domain systems using newer and faster CPUs and RAM.
• Increases new-data processing throughput-rate.

Copyright © 2017 Dell Inc. Data Domain System Administration 22


SISL does the following:
• Segment
The data is split into variable-length segments.
• Fingerprint
Each segment is given a fingerprint, or hash, for identification. It compares against other hashes in
the Summary Vector Array. It does not compare all hashes.
• Filter
The summary vector and segment locality techniques identify 99% of the duplicate segments in
RAM, inline, before storing to disk. If a segment is a duplicate, it is referenced and discarded. If a
segment is new, the data moves on to step 4.
• Compress
New segments are grouped and compressed using common algorithms: lz, gz, gzfast, or off/no
compression (lz by default).
• Write
Writes data (segments, fingerprints, metadata and logs) to containers stored on disk.

Copyright © 2017 Dell Inc. Data Domain System Administration 23


Dell EMC Data Domain Global Compression™ is the Dell EMC Data Domain trademarked name for
deduplication. It identifies previously stored segments and cannot be turned off.

Local compression compresses segments before writing them to disk. It uses common, industry-standard
algorithms (for example, lz, gz, and gzfast). The default compression algorithm used by Data Domain
systems is lz.

Local compression is similar to zipping a file to reduce the file size. Zip is a file format used for data
compression and archiving. A zip file contains one or more files that have been compressed, to reduce file
size, or stored as is. The zip file format permits a number of compression algorithms. Local compression
can be turned off.

Copyright © 2017 Dell Inc. Data Domain System Administration 24


Dell EMC Data Domain Data Invulnerability Architecture (DIA), is an important Dell EMC Data Domain
technology that provides safe and reliable storage. It provides this through end-to-end verification, fault
avoidance and containment as well as fault detection and healing. This technology ensures reliable file
system recovery.

Copyright © 2017 Dell Inc. Data Domain System Administration 25


The end-to-end verification check verifies all file system data and metadata. The end-to-end verification
flow:
• Writes request from backup software.
• Analyzes data for redundancy.
• Stores new data segments.
• Stores fingerprints.
• Verifies, after backup I/O, that the Data Domain OS (DD OS) can read the data from disk and
through the Data Domain file system.
• Verifies that the checksum that is read back matches the checksum written to disk.

If the checksum read back does not match the checksum written to disk, the system will attempt to
reconstruct the data. If the data can not be successfully reconstructed, the backup will fail and an alert will
be issued.

Since every component of a storage system can introduce errors, an end-to-end test is the simplest way to
ensure data integrity. End-to-end verification means reading data after it is written and comparing it to
what was sent do disk, proving that it is reachable through the file system to disk, and proving that data is
not corrupted.

Copyright © 2017 Dell Inc. Data Domain System Administration 26


When the DD OS receives a write request from backup software, it computes a huge checksum over the
constituent data. After analyzing the data for redundancy, it stores the new data segments and all of the
checksums. After the I/O has selected a backup and all data is synced to disk, the DD OS verifies that it
can read the entire file from the disk platter and through the Data Domain file system, and that the
checksums of the data read back match the checksums of the written data.

This ensures that the data on the disks is readable and correct and that the file system metadata
structures used to find the data are also readable and correct. This confirms that the data is correct and
recoverable from every level of the system. If there are problems anywhere, for example if a bit flips on a
disk drive, it is caught. Mostly, a problem is corrected through self-healing. If a problem can’t be corrected,
it is reported immediately, and a backup is repeated while the data is still valid on the primary store.

Copyright © 2017 Dell Inc. Data Domain System Administration 27


Data Domain systems are equipped with a specialized log-structured file system that has important
benefits.

1. New data never overwrites existing data. (The system never puts existing data at risk.)

Traditional file systems often overwrite blocks when data changes, and then use the old block address.
The Data Domain file system writes only to new blocks. This isolates any incorrect overwrite (a software
bug problem) to only the newest backup data. Older versions remain safe.
As shown in this slide, the container log never overwrites or updates existing data. New data is written to
new containers. Old containers and references remain in place and safe even when software bugs or
hardware faults occur when new backups are stored.

There are fewer complex data structures.

2. In a traditional file system, there are many data structures (for example, free block bit maps and
reference counts) that support fast block updates. In a backup application, the workload is primarily
sequential writes of new data. Because a Data Domain system is simpler, it requires fewer data structures
to support it. New writes never overwrite old data. This design simplicity greatly reduces the chances of
software errors that could lead to data corruption.

Copyright © 2017 Dell Inc. Data Domain System Administration 28


The system includes non-volatile RAM (NVRAM) for fast, safe restarts.

The system includes a non-volatile RAM (NVRAM) write buffer into which it puts all data not yet safely on
disk. The file system leverages the security of this write buffer to implement a fast, safe restart capability.

The file system includes many internal logic and data structure integrity checks. If a problem is found by
one of these checks, the file system restarts. The checks and restarts provide early detection and recovery
from the kinds of bugs that can corrupt data. As it restarts, the Data Domain file system verifies the
integrity of the data in the NVRAM buffer before applying it to the file system and thus ensures that no data
is lost due to a power outage.

For example, in a power outage, the old data could be lost and a recovery attempt could fail. For this
reason, Data Domain systems never update just one block in a stripe. Following the no-overwrite policy,
all new writes go to new RAID stripes, and those new RAID stripes are written in their entirety. The
verification-after-write ensures that the new stripe is consistent (there are no partial stripe writes). New
writes never put existing backups at risk.

Copyright © 2017 Dell Inc. Data Domain System Administration 29


Continuous fault detection and healing provide an extra level of protection within the Data Domain
operating system. The DD OS detects faults and recovers from them continuously. Continuous fault
detection and healing ensures successful data restore operations.

Here is the flow for continuous fault detection and healing:


• The Data Domain system periodically rechecks the integrity of the RAID stripes and container logs.
• The Data Domain system uses RAID system redundancy to heal faults. RAID 6 is the foundation for
Data Domain systems continuous fault detection and healing. Its dual-parity architecture offers
advantages over conventional architectures, including RAID 1 (mirroring), RAID 3, RAID 4 or RAID
5 single-parity approaches.

RAID 6:
– Protects against two disk failures.
– Protects against disk read errors during reconstruction.
– Protects against the operator pulling the wrong disk.
– Guarantees RAID stripe consistency even during power failure without reliance on NVRAM or
an uninterruptable power supply (UPS).
– Verifies data integrity and stripe coherency after writes.

By comparison, after a single disk fails in other RAID architectures, any further simultaneous
disk errors cause data loss. A system whose focus is data protection must include the extra
level of protection that RAID 6 provides.

Copyright © 2017 Dell Inc. Data Domain System Administration 30


During every read, data integrity is re-verified.
Any errors are healed as they are encountered.
To ensure that all data returned to the user during a restore is correct, the Data Domain file system stores
all of its on-disk data structures in formatted data blocks. These are self-identifying and covered by a
strong checksum. On every read from disk, the system first verifies that the block read from disk is the
block expected. It then uses the checksum to verify the integrity of the data. If any issue is found, it asks
RAID 6 to use its extra level of redundancy to correct the data error. Because the RAID stripes are never
partially updated, their consistency is ensured and thus so is the ability to heal an error when it is
discovered.

Continuous error detection works well for data being read, but it does not address issues with data that
may be unread for weeks or months before being needed for a recovery. For this reason, Data Domain
systems actively re-verify the integrity of all data every week in an ongoing background process. This
scrub process finds and repairs defects on the disk before they can become a problem.

Copyright © 2017 Dell Inc. Data Domain System Administration 31


The Dell EMC Data Domain Data Invulnerability Architecture (DIA) file system recovery is a feature that
reconstructs lost or corrupted file system metadata. It includes file system check tools.

If a Data Domain system does have a problem, DIA file system recovery ensures that the system is
brought back online quickly.

This slide shows DIA file system recovery:


• Data is written in a self-describing format.
• The file system can be recreated by scanning the logs and rebuilding it from metadata stored with
the data.

In a traditional file system, consistency is not checked. Data Domain systems check through initial
verification after each backup to ensure consistency for all new writes. The usable size of a traditional file
system is often limited by the time it takes to recover the file system in the event of some sort of
corruption.

Imagine running fsck on a traditional file system with more than 80 TB of data. The reason the checking
process can take so long is the file system needs to sort out the locations of the free blocks so new writes
do not accidentally overwrite existing data. Typically, this entails checking all references to rebuild free
block maps and reference counts. The more data in the system, the longer this takes.

In contrast, since the Data Domain file system never overwrites existing data and doesn’t have block
maps and reference counts to rebuild, it has to verify only the location of the head of the log (usually the
start of the last completed write) to safely bring the system back online and restore critical data.

Copyright © 2017 Dell Inc. Data Domain System Administration 32


Two main components of the Data Domain file system are the administrative files, the ddvar and the file
storage, MTree.

Copyright © 2017 Dell Inc. Data Domain System Administration 33


Data Domain system administrative files are stored in /ddvar. This directory stores system core and log
files, generated support upload bundles, compressed core files, and .rpm (Red Hat package manager)
upgrade package files.

The ddvar file structure keeps administrative files separate from storage files.

You cannot rename or delete /ddvar, nor can you access all of its sub-directories.

Copyright © 2017 Dell Inc. Data Domain System Administration 34


An MTree is a logical partition of the Data Domain file system. They act as a destination directory for
deduplicated data. MTree operations can be performed on a specific MTree as opposed to the entire file
system.

The MTree file structure:


• Uses compression.
• Implements data integrity.
• Reclaims storage space with file-system cleaning. You will learn more about file-system cleaning
later in this course.

MTrees provide more granular space management and reporting. This allows for finer management of
replication, snapshots, and retention locking. These operations can be performed on a specific MTree
rather than on the entire file system. For example, you can configure directory export levels to separate
and organize backup files.

You can add subdirectories to MTree directories. You cannot add anything to the /data directory. /col1
can not be changed - however MTrees can be added under that. The backup MTree
(/data/col1/backup) cannot be deleted or renamed. If MTrees are added, they can be renamed and
deleted. You can replicate directories under /backup.

Copyright © 2017 Dell Inc. Data Domain System Administration 35


Here is a reference table of MTree Limits for specific Data Domain systems, DD OS versions, supported
configurable MTrees and supported concurrently active MTrees.

Copyright © 2017 Dell Inc. Data Domain System Administration 36


All Data Domain systems can be configured as storage destinations for leading backup and archiving
applications using NFS, CIFS, Boost, or VTL protocols:

• Network File System (NFS) clients can have access to the system directories or MTrees on the Data
Domain system.

• Common Internet File System (CIFS) clients also have access to the system directories on the Data
Domain system.

• Dell EMC Data Domain Virtual Tape Library (VTL) is a disk-based backup system that emulates the
use of physical tapes. It enables backup applications to connect to and manage DD system storage
using functionality almost identical to a physical tape library. VTL (Virtual Tape Library) is a licensed
feature, and you must use NDMP (Network Data Management Protocol) over IP (Internet Protocol) or
VTL directly over FC (Fibre Channel).

• Data Domain Boost (DD Boost) software provides advanced integration with backup and enterprise
applications for increased performance and ease of use. DD Boost distributes parts of the deduplication
process to the backup server or application clients, enabling client-side deduplication for faster, more
efficient backup and recovery. DD Boost software is an optional product that requires a separate
license to operate on the Data Domain system.

Copyright © 2017 Dell Inc. Data Domain System Administration 37


This lesson covers connecting Data Domain through different data paths.

Copyright © 2017 Dell Inc. Data Domain System Administration 38


Data paths specifies how a Data Domain system fits into a typical backup environment.

Data Domain data paths, which include NFS, CIFS, DD Boost, NDMP, and VTL over Ethernet or Fibre
Channel.

Copyright © 2017 Dell Inc. Data Domain System Administration 39


Data Domain systems connect to backup servers as storage capacity to hold large collections of backup
data. This slide shows how a Data Domain system integrates non-intrusively into an existing storage
environment. Often a Data Domain system is connected directly to a backup server. The backup data flow
from the clients is simply redirected to the Data Domain device instead of to a tape library.

Data Domain systems integrate non-intrusively into typical backup environments and reduce the amount
of storage needed to back up large amounts of data by performing deduplication and compression on data
before writing it to disk. The data footprint is reduced, making it possible for tapes to be partially or
completely replaced.

Depending on an organization’s policies, a tape library can be either removed or retained.

An organization can replicate and vault duplicate copies of data when two Data Domain systems have the
Data Domain Replicator software option enabled.

Copyright © 2017 Dell Inc. Data Domain System Administration 40


A data path is the path that data travels from the backup (or archive) servers to a Data Domain system.
Data Domain systems use Ethernet and Fibre Channel.

An Ethernet data path supports the NFS, CIFS, NDMP, and DD Boost protocols that a Data Domain
system uses to move data.

In the data path over Ethernet, backup and archive servers send data from clients to Data Domain
systems on the network via the TCP(UDP)/IP.

You can also use a direct connection between a dedicated port on the backup or archive server and a
dedicated port on the Data Domain system. The connection between the backup (or archive) server and
the Data Domain system can be Ethernet or Fibre Channel, or both if needed. This slide shows the
Ethernet connection.

Copyright © 2017 Dell Inc. Data Domain System Administration 41


If the Data Domain virtual tape library (VTL) option is licensed, and a Fibre Channel Host Bus Adapter
(HBA) is installed on the Data Domain system, the system can be connected to a Fibre Channel system
attached network (SAN). The backup or archive server sees the Data Domain system as one or multiple
VTLs with up to 512 virtual linear tape-open LTO-1, LTO-2, LTO-3, LTO-4, or LTO-5 tape drives and
20,000 virtual slots across up to 100,000 virtual cartridges.

VTL requires a fibre channel data path. DD Boost uses either a fibre channel or Ethernet data path.

Copyright © 2017 Dell Inc. Data Domain System Administration 42


This lesson covers the Command Line Interface (CLI), Data Domain System Manager and Data Domain
Management Center.

Copyright © 2017 Dell Inc. Data Domain System Administration 43


There are 3 ways to interface with Data Domain administration. You can use the Command Line (CLI),
the System Manager GUI, or the Data Domain Management Center.

Copyright © 2017 Dell Inc. Data Domain System Administration 44


The Dell EMC Data Domain command line interface (CLI) enables you to manage Data Domain systems.

To initially access the Data Domain system, the default administrator’s username and password will be
used. The default administrator name is sysadmin. The initial password for the sysadmin user is the
system serial number.

After the initial configuration, use the SSH or Telnet (if enabled) utilities to access the system remotely and
open the CLI.

The DD OS Command Reference Guide provides information for using the commands to accomplish
specific administration tasks. Each command also has an online help page that gives the complete
command syntax. Help pages are available at the CLI using the help command. Any Data Domain system
command that accepts a list (such as a list of IP addresses) accepts entries separated by commas, by
spaces, or both.

Copyright © 2017 Dell Inc. Data Domain System Administration 45


Prior to DD OS 5.7 you could manage multiple DD Systems from within System Manager. Now, System
Manager only allows another system to be managed for Replication.

DD System Manager provides a single, consolidated management interface that allows for configuration
and monitoring of many system features and system settings. Note the Management options. As we
progress through the course we will use some of the Management options.

Also notice the information contained in the Footer: DDSM – OS – Model – User – Role.

Multiple DD Systems are now managed with Data Domain Management Center.

You can access the System Manager from many browsers:


• Microsoft Internet Explorer™
• Google Chrome™
• Mozilla Firefox™

Copyright © 2017 Dell Inc. Data Domain System Administration 46


Starting with DD OS 5.7, System Manager no longer allows management of multiple DD systems – except
for replication. Data Domain Management Center supports management of multiple DD systems. A
maximum of 100 DD systems can be added to a DD Management Center. It also allows multiple
simultaneous users.

It can be accessed on Microsoft Windows:


Microsoft Internet Explorer 9, 10, or 11; Mozilla Firefox 30 and higher; Google Chrome

On Apple OS X:
Mozilla Firefox 30 and higher; Google Chrome

Copyright © 2017 Dell Inc. Data Domain System Administration 47


The Data Domain Management Center provides capacity and replication resource management, health
and status monitoring, template-based reporting of aggregated data, customizable grouping and filtering of
managed systems via activity monitoring dashboards that support multiple user roles.

The Data Domain Management Center can monitor all Data Domain platforms. The Data Domain
Management Center can monitor systems running DD OS version 5.1 and later.

The Data Domain Management Center includes an embedded version of the System Manager that can be
launched, providing convenient access to a managed Data Domain system for further investigation of an
issue or to perform configuration.

Copyright © 2017 Dell Inc. Data Domain System Administration 48


This lab covers the steps necessary to access a Data Domain system.

Copyright © 2017 Dell Inc. Data Domain System Administration 49


This module focused on some of the key features of the Data Domain Operating System (DD OS).

Deduplication improves data storage because it is performed inline. It looks for redundancy of large
sequences of bytes. Sequences of bytes identical to those previously encountered and stored are
replaced with references to the previously encountered data.
SISL gives Data Domain deduplication speed. 99% of duplicate data segments are identified inline in
RAM before they are stored to disk. This scales with Data Domain systems using newer and faster
CPUs and RAM.

DIA provides safe and reliable storage because of:


• End-to-end verification
• Fault avoidance and containment
• Continuous fault detection and healing
• File system recovery

There are 3 ways to interface with Data Domain administration. You can use the Command Line (CLI),
the System Manager GUI, or the Data Domain Management Center.

Copyright © 2017 Dell Inc. Data Domain System Administration 50


Copyright © 2017 Dell Inc. Data Domain System Administration 51

Vous aimerez peut-être aussi