Vous êtes sur la page 1sur 36

5

Advanced Storage
June 2018
v1.0

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.


Objectives

• Explain local NVMe SSD devices


• Block Storage Performance
• File Storage Service replication options
• Object Storage S3 combability API examples

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5-2
OCI Storage Options
Local NVMe Lowest
Local NVMe SSD storage
Latency
• Persistent, high-performance, local to a compute instance
• Ideal for Big Data, OLTP, and high-performance workloads

Block Volumes Block Volume storage


• Persistent, durable, and high-performance storage
• Ideal apps that require SAN like features and performance

File Service
File storage
• Durable, scalable, enterprise-grade network file system
• Ideal for Enterprise applications that need shared files (NAS)
Object

Object Storage
Highest • Internet-scale, high-performance, highly-durable storage
Durability • Ideal for storing unlimited amount of unstructured data

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5-3
Local NVMe SSD Devices
• Some instance shapes in OCI include locally attached NVMe devices
• Local NVMe SSD can be used for workloads that have high storage performance requirements
• The acronym NVMe stands for Non-Volatile Memory Express
• NVMe is a high-performance, NUMA (Non Uniform Memory Access) optimized, and highly scalable
storage protocol, that connects the host to the memory subsystem
• Designed for high performance and non-volatile storage media, NVMe is the only protocol that stands
out in highly demanding and compute intensive enterprise, cloud and edge data ecosystems.

Instance type NVMe SSD Devices


BM.DenseIO2.52 16 drives = 51.2 TB raw
VM.DenseIO2.8 2 drive = 6.4 TB raw
VM.DenseIO2.16 4 drives = 12.8 TB raw
VM.DenseIO2.24 8 drives = 25.6 TB raw

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5-4
NVMe SSD Persisted - Reboot/Pause

101011010101010
101011010101010
100101010101010 Data saved on 100101010101010 Data deleted on
010101010010101 instance 010101010010101 instance reboot or
00010011110101 00010011110101
reboot or pause, not usable
pause for primary data
Local NVMe Local NVMe
SSD SSD
Instance Instance
(VM/BM) (VM/BM)

“With Oracle Cloud Infrastructure, companies can leverage NVMe for persistent storage to host databases and
applications. However, other cloud providers typically do not offer such a capability. In cases where NVMe
storage was an option with other vendors, it was not persistent. This meant that the multi-terabyte database
that researchers loaded to this storage was lost when the server stopped.
Accenture

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5-5
SLA for NVMe Performance
Shape Minimum • OCI provides a service-level agreement
Supported IOPS (SLA) for NVMe performance
VM.DenseIO1.4 200k
VM.DenseIO1.8 250k • Measured against 4k block sizes with 100%
VM.DenseIO1.16 400k random write workload on Dense IO
BM.DenseIO1.36 2.5MM shapes where the drive is in a steady-state
of operation
VM.DenseIO2.8 250k
VM.DenseIO2.16 400k • Run test on Oracle Linux shapes with 3rd
VM.DenseIO2.24 800k party Benchmark Suites,
BM.DenseIO2.52 3.0MM https://github.com/cloudharmony/block-stor
age

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5-6
NVMe Performance

Using a Bare Metal shape with 52 CPUs- BM.DenseIO2.52 the following command was executed following
the cloud harmony test suite and we are getting ~500K IOPS (50/50) on a Read and Write Mix test for a
single NVMe device.
# run.sh --target=/dev/nvme1n1 --test=iops --nopurge --noprecondition --fio_direct=1 --fio_size=10g
--skip_blocksize 512b --skip_blocksize 1m --skip_blocksize 8k --skip_blocksize 16k --skip_blocksize 32k
--skip_blocksize 64k --skip_blocksize 128k

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5-7
NVMe Performance

Using a Bare Metal shape with 52 CPUs- BM.DenseIO2.52 the following command was executed following
the cloud harmony test suite and we are getting ~ 3.0MM IOPS (50/50) on a Read and Write Mix test for all
NVMe devices combined
# run.sh `ls /dev/nvme[0-9]n1 | sed -e 's/\//\--target=\//'` --test=iops --nopurge --noprecondition
--fio_direct=1 --fio_size=10g --skip_blocksize 512b --skip_blocksize 1m --skip_blocksize 8k --skip_blocksize
16k --skip_blocksize 32k --skip_blocksize 64k --skip_blocksize 128k

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5-8
Block Volume

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5-9
Block Volume Service
• Block Volume Service let you store data on block volumes independently and beyond the
lifespan of compute instances
• Block volumes operates at the raw storage device level and manages data as a set of
numbered, fixed-size blocks using a protocol such as iSCSI
• You can create, attach, connect, and move volumes, as needed, to meet your storage and
application requirements
• Typical Scenarios
– Persistent and Durable Storage
– Expand an Instance's Storage
– Instance Scaling

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 10


Block Volume Service – Key Features
Feature Description

Get 60 IOPS per GB, up to a maximum of 25,000 IOPS per volume, backed by
Consistent High Performance
Oracle's first in the industry performance SLA.

Block and boot volumes can be backed up seamlessly to OCI Object Storage,
Integrated Data Protection
enabling frequent recovery points.

Dynamically detach and reattach up to 32 block storage volumes to any bare metal
Easily Scale Up or Down (BM) or virtual machine (VM) instance in your Virtual Cloud Network. That's up to 1
petabyte of remote block storage per instance.

Create one or more point-in-time direct disk-to-disk copies of an existing volume


Block Storage Cloning within seconds, for scenarios such as storage scale out, disaster recovery, dev/test
environments duplication and production troubleshooting.

Manageable and versatile boot volumes for compute instances, with all the
Boot Volumes advantages of block volumes including the backup and clone capabilities. Custom
size large boot volumes in 1 GB increments.

Group multiple block and boot volumes, and perform crash-consistent point-in-time
Volume Groups
coordinated backups and clones across all the volumes in the group.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 11


Volume Groups
• A volume group represents a set of block
storage volumes that can be treated as a
single entity for backup and clone purposes.
• A volume group is associated with a single
Availability Domain (AD), and volumes within
the group are also within the same AD.
• You can add up to 32 volumes to a volume
group.
• This simplifies the process to create backups
and clones of running enterprise applications
that span multiple storage volumes across
multiple instances. You can then restore an
entire group of volumes from a volume group
backup.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 12


Volume Groups for Coordinate Clones
With volume groups, the Block Volume service enables you to create point-in-time backups and
clones of running enterprise applications that span multiple storage volumes across one or more
compute instances.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 13


Volume Groups for Coordinate Backups
Coordinated multi-volume backups provide an end-to-end solution for creating, managing, and
restoring backups for applications by leveraging and extending the existing single-volume backup
and restore features that are already available for block storage volumes.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 14


Volume Groups via CLI and SDK
Volume groups and coordinated backups and clones are generally available now via CLI and SDK, with Console support
coming soon.
Following are a few sample commands for creating and managing volume groups and coordinated backups and clones:
• Create a Volume Group
# oci bv volume-group create --compartment-id ocid1.compartment.oc1..aaaaaaaa22azyzvp3et7tfvp2qdwgz7mwbyo6m5h4nk3nf6i64js3byiqwxa --availability-domain
AkfI:PHX-AD-1 --source-details '{"type": "volumeIds", "volumeIds":["ocid1.volume.oc1.phx.abyhqljtdn2nqquw7qr3rjwlnwa3y25sv6p6lvxaz7lfwkse6xmzmcuc46sq",
"ocid1.volume.oc1.phx.abyhqljtnyrkcac4vg247g4v2ukplmhuluhtxxbqlnyvfmkah2rjuyieu4kq"]}’

• List a Volume Group


# oci bv volume-group list --compartment-id ocid1.compartment.oc1..aaaaaaaa22azyzvp3et7tfvp2qdwgz7mwbyo6m5h4nk3nf6i64js3byiqwxa

• Create a backup
# oci bv volume-group-backup create --volume-group-id ocid1.volumegroup.oc1.phx.abyhqljthc33hrlrqwnlbmoxmep74ykli7mwdr2ukjvuxlaordndl2knpxma

• Create a Clone
# oci bv volume-group create --compartment-id ocid1.compartment.oc1..aaaaaaaa22azyzvp3et7tfvp2qdwgz7mwbyo6m5h4nk3nf6i64js3byiqwxa --availability-domain
AkfI:PHX-AD-1 --source-details '{"type": "volumeGroupId", "volumeGroupId":
"ocid1.volumegroup.oc1.phx.abyhqljthc33hrlrqwnlbmoxmep74ykli7mwdr2ukjvuxlaordndl2knpxma"}'

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 15


Multi-Attach Block Volume Service (LA)
• Having a shared file system is a very common request to allow applications to be able to
access the same data or to allow multiple users to get access to same information at the same
time for example. On-premises this is a very easy task to achieve using NAS or SAN devices
• Multi-attach block volume allows you to attach the same block volume with two or more
compute instances
• Typical Scenarios
– High Availability applications
– Cluster Applications using OCFS2 and GFS2
– Corosync/Pacemaker clusters

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 16


Multi-Attach Block Volume Service (LA)
As of today, the process is done through a preview version of OCI CLI which needs to be
requested from Oracle. Once you get access to that new OCI CLI version and your tenancy has
been enabled to use such feature you will be able to run the OCI command line to attach a block
device to multiple compute instances where you plan to use your cluster file system. Here is an
example:

# oci compute volume-attachment attach --instance-id ocid1.instance-id --type iscsi --


volume-id ocid1.volume.id --is-shareable true

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 17


Block Volume Performance
There are 3 aspects to the block volume performance:
• IOPS
• Throughput
• Latency
• We use FIO as the industry-standard performance benchmark tool, especially for ISCSI devices
using the network stack
• FIO provides different levers and options to tailor performance measurement for each of these 3
performance aspects
• This is the same benchmark tool used in the Gartner Cloud Harmony test suite which is also
utilized to validate performance of other cloud providers

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 18


Block Volume Performance
Our block volume service performance measurement methodology and characteristics are covered
in detail here:
https://blogs.oracle.com/cloud-infrastructure/block-volume-performance-analysis
https://docs.us-phoenix-1.oraclecloud.com/Content/Block/Concepts/blockvolumeperformance.htm
http://www.storagereview.com/oracle_cloud_infrastructure_compute_bare_metal_instances_review

WARNING:
Before running any tests, protect your data by making a backup of your data and operating system environment to prevent
any data loss.
Do not run FIO tests directly against a device that is already in use, such as /dev/sdX. If it is in use as a formatted disk and
there is data on it, running FIO with a write workload (readwrite, randrw, write, trimwrite) will overwrite the data on the disk,
and cause data corruption.
Run FIO only on unformatted raw devices that are not in use.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 19


Block Volume Performance
Here is a sample FIO commands we use for throughput measurement on a single volume:
Read-only:
# sudo fio --direct=1 --ioengine=libaio --size=10g --bs=4k --runtime=60 --numjobs=8 --iodepth=64 --time_based
--rw=randread --group_reporting --filename=/dev/sdb --name=iops-test

Write-only:
# sudo fio --direct=1 --ioengine=libaio --size=10g --bs=4k --runtime=60 --numjobs=8 --iodepth=64 --time_based
--rw=randwrite --group_reporting --filename=/dev/sdb --name=iops-test

Read/write Mix:
# sudo fio --direct=1 --ioengine=libaio --size=10g --bs=4k --runtime=60 --numjobs=8 --iodepth=64 --time_based
--rw=randrw --group_reporting --filename=/dev/sdb --name=iops-test

Note: In read/write case, you need to add the read result and write result for duplex traffic.
Also, please note that all volumes attached to an instance share the same network bandwidth with the
instance. If there is heavy network traffic or other volumes are under I/O pressure, the apparent performance
of single volume may look degraded

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 20


Block Volume Performance – FIO Parameters
Parameters Description
--filename=/dev/sdx raw device where you want to run the tests
If value is true, use non-buffered I/O. This means the test will avoid using the host's page cache for writes and write directly to disk
--direct=1

--rw:randread This means that a read only test will be executed.

--bs=4k The block size in bytes used for I/O units


Defines how the job issues I/O to the file. Linux native asynchronous I/O
--ioengine=lbaio

--iodepth=64 Number of I/O units to keep in flight against the file

--runtime=60 Tell fio to terminate processing after the specified period of time. This value is interpreted in seconds

--numjobs=8 Create the specified number of clones of this job. Each clone of job is spawned as an independent thread or process.
If set, fio will run for the duration of the runtime specified even if the file(s) are completely read or written. It will simply loop over
--time_based the same workload as many times as the runtime allows
It may sometimes be interesting to display statistics for groups of jobs as a whole instead of for each individual job. This is
--group_reporting especially true if numjobs is used

--name ASCII name of the job. This may be used to override the name printed by fio for this job

--size=10g The total size of file I/O for each thread of this job. Fio will run until this many bytes has been transferred

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 21


Block Volume Performance
Read-only test on a 1TB Block Volume connected using iSCSI – BM.DenseIO.52
# sudo fio --direct=1 --ioengine=libaio --size=10g --bs=4k --runtime=60 --numjobs=8 --iodepth=64
--time_based --rw=randread --group_reporting --filename=/dev/sdb --name=iops-test

iops-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64


...
fio-2.1.11
Starting 8 processes
Jobs: 8 (f=8): [r(8)] [100.0% done] [98115KB/0KB/0KB /s] [24.6K/0/0 iops] [eta 00m:00s]
iops-test: (groupid=0, jobs=8): err= 0: pid=4152: Wed Jun 27 16:51:03 2018
read : io=5951.6MB, bw=101537KB/s, iops=25384, runt= 60021msec

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 22


Block Volume Performance

Write-only test on a 1TB Block Volume connected using iSCSI – BM.DenseIO.52


# sudo fio --direct=1 --ioengine=libaio --size=10g --bs=4k --runtime=60 --numjobs=8 --iodepth=64
--time_based --rw=randwrite --group_reporting --filename=/dev/sdb --name=iops-test

iops-test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64


...
fio-2.1.11
Starting 8 processes
Jobs: 4 (f=0): [_(1),E(1),w(2),_(1),E(1),w(2)] [33.9% done] [0KB/97.10MB/0KB /s] [0/25.8K/0 iops]
[eta 01m:59s]
iops-test: (groupid=0, jobs=8): err= 0: pid=4170: Wed Jun 27 16:54:13 2018
write: io=5962.4MB, bw=101722KB/s, iops=25430, runt= 60021msec

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 23


Block Volume Performance

Read/Write test on a 1TB Block Volume connected using iSCSI – BM.DenseIO.52


# sudo fio --direct=1 --ioengine=libaio --size=10g --bs=4k --runtime=60 --numjobs=8 --iodepth=64
--time_based --rw=randrw --group_reporting --filename=/dev/sdb --name=iops-test

iops-test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64


...
fio-2.1.11
iops-test: (groupid=0, jobs=8): err= 0: pid=4117: Wed Jun 27 16:45:01 2018
read : io=2975.6MB, bw=50769KB/s, iops=12692, runt= 60016msec
write: io=2977.7MB, bw=50804KB/s, iops=12701, runt= 60016msec

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 24


NVMe Performance x Block Volume Performance
Using a Bare Metal shape with 52 CPUs- BM.DenseIO2.52 the following command was executed following
the cloud harmony test suite against one single NVMe device and a single Block Volume

#run.sh --target=/dev/<device> --test=iops --nopurge --noprecondition --fio_direct=1


--fio_size=10g --skip_blocksize 512b --skip_blocksize 1m --skip_blocksize 8k --skip_blocksize 16k
--skip_blocksize 32k --skip_blocksize 64k --skip_blocksize 128k

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 25


File Storage Service

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 26


File Storage Service Data Replication
There are two options for FSS data replication as of today
• AD Data Protection: Asynchronously copy your file system or snapshot data to another AD,
using rsync
• Regional Data Protection: Asynchronously copy your file system or snapshot data to
another region, using rsync

• In order to help with this replication, we have Terraform modules responsible to replicate the
data across two Oracle Cloud Infrastructure File Storage Service (FSS) shared File
Systems.
• The module is responsible for launching hosts and copying the data directly from the source
FSS File System (or Snapshot folder) to a destination FSS File System using cron job in
conjunction with rsync,
https://orahub.oraclecorp.com/pts-cloud-dev/terraform-modules/tree/master/terraform-oci-fss

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 27


File Storage Service Data Replication
Local Data Sync

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 28


File Storage Service Data Replication
Regional Data Sync

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 29


Object Storage Service

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 30


Object Storage - Amazon S3 Compatibility API Keys
Object Storage provides an API to enable interoperability with Amazon S3. To use this Amazon
S3 Compatibility API, you need to generate the signing key required to authenticate with Amazon
S3. This special signing key is an Access Key/Secret Key pair. Oracle provides the Access Key
that is associated with your Console user login. You or your administrator generates the Secret
Key to pair with the Access Key.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 31


Rclone and Amazon S3 Compatibility API Keys
Rclone is an open source command line utility that synchronizes files and directories between a
local file system and a variety of cloud backends. Here is an example with Object Storage in
OCI:
1. Install rclone in your local machine (https://rclone.org/install/)
2. Run rclone config command to cerate a file under $HOME/.config/rclone/rclone.conf
3. Here is an example of the rclone.conf file:
[OCI]
type = s3
env_auth = false
access_key_id = ocid1.credential.aaa… <amazon_s3_api_key_ocid>
secret_access_key = <amazon_s3_api_key>
region = us-phoenix-1
endpoint = <tenant-name>.compat.objectstorage.us-phoenix-1.oraclecloud.com
location_constraint = us-phoenix-1
acl = private
server_side_encryption =
storage_class =

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 32


Rclone and Amazon S3 Compatibility API Keys
1. Verify whether rclone has recognized the configuration:
# rclone listremotes
OCI:

2. List your Object Storage buckets:


# rclone lsd OCI:
-1 2018-06-18 06:10:10 -1 OCI-Training
-1 2018-02-20 16:56:03 -1 Sanjay

3. Create a new bucket:


# rclone mkdir OCI:test_bucket

4. No specific configuration is required for using a local file system with rclone. You can simply
choose a local directory as your source using:

# export SOURCE=/Users/flperei/Data

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 33


Rclone and Amazon S3 Compatibility API Keys
1. After your configuration is complete, you can start the transfer. Depending on the amount of
data and the connection speed, this transfer can take a very long time (days or weeks):
# rclone --verbose --cache-workers 64 --transfers 64 --retries 32 copy $SOURCE OCI:test_bucket
2018/06/22 14:54:48 INFO : S3 bucket test_bucket: Waiting for checks to finish
2018/06/22 14:54:48 INFO : S3 bucket test_bucket: Waiting for transfers to finish
2018/06/22 14:54:50 INFO : oracleproxy.sh: Copied (new)
2018/06/22 14:54:50 INFO : kubeconfig: Copied (new)
2018/06/22 14:54:50 INFO :
Transferred: 4.064 kBytes (1.331 kBytes/s)
Errors: 0
Checks: 0
Transferred: 2
Elapsed time: 3s

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 34


Rclone and Amazon S3 Compatibility API Keys

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 35


Summary

• Describe and validate storage performance; both NVMe and Block Volumes
• Use volume groups to manage snapshot and cloning activities for logical volumes spanning
multiple block volumes
• Understand the multi-attach block volume feature for connecting the same block volume to
multiple hosts in the same Availability Domain
• Implement data replication to increase durability of File Storage Service file systems
• Utilize the S3 Compatibility API to enable interoperability with Amazon S3

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5 - 36

Vous aimerez peut-être aussi