Académique Documents
Professionnel Documents
Culture Documents
Module 4 Foundation
Module 6 Networking
Module 9 DR
Module 11 Performance
License
The provision of this software to you does not grant any licenses or other rights under any Microsoft
patents with respect to anything other than the file server implementation portion of the binaries for
this software, including no licenses or any other rights in any hardware or any devices or software
that are used to communicate with or in connection with this software.
Module 1
Introduction to Troubleshooting
Nut anix Tro ub lesho o t ing 5.x
Introduction to Troubleshooting
Course Agenda
1. Int ro 7. A FS
2. Tools & Ut ilit ies 8. A BS
3. Services & Log s 9. DR
4. Found at ion 10 . AOS Up g rad e
5. Hard w are 11. Perform ance
6. Net w orking
Course Agenda
• This is the Introduction module.
O bject ives
Objectives
After completing this module, you will be able to:
• Explain the five steps in the life cycle of a customer support case.
Int roduct ions
• Nam e?
• Com p any?
• W here?
• IT Exp erience?
Scenario-Based Approach
• This course uses a Scenario-based Approach that lets you practice these steps as
you learn troubleshooting techniques.
o Each scenario begins with a realistic customer call reporting a problem that you
need to troubleshoot.
• You will work through each step of the life cycle of the customer support call to
diagnose the customer's problem.
o As the course progresses, you will need to use the knowledge and skills that you
learned in earlier scenarios to troubleshoot the customer's problem.
The Life Step 1: Isolate
Cycle of a and Identify
Problems
Case
Step 2:
Research
Problems in
Documentation
Step 2:
Research
Problems in
Documentation
Step 3: Gather
and Analyze Steps 2 and 3 may be
Logs reversed, depending
on the problem.
8
Step 2:
Level 2 Research
Provide
Support Tasks Problems in
In-depth
Documentation
Analysis
Re-create
the Problem
11
1: Isolate and Ident ify Problems
12
13
14
15
16
17
• The NSS 5.x written exam will consist of a pool of multiple choice
questions. The exam is closed-book and offered in a proctored
environment. The candidate is not required to attend the NSS
Troubleshooting Course before attempting the Written exam.
• BUT it is highly recommended!
• The candidate is required to pass the NSS 5.x Written exam prior
to being offered the NSS 5.x Lab exam.
• The candidate will need to achieve a score of 70% on the Written
exam in order to pass.
19
Written Exam
Lab Exam - Format
21
22
23
Recertification
Node D
24
Lab Equipment
Thank You!
25
Thank You!
Module 2
Tools and Utilities
1. Intro 7. A FS
2. Tools & Utilities 8. A BS
3. Services & Logs 9. DR
4. Foundation 10 . Up g rad e
5. Hardware 11. Perform ance
6. Networking
Course Agenda
• This is the Tools and Utilities module.
O bject ives
Objectives
• Provide a comprehensive view of the primary tools available to troubleshoot issues
on a Nutanix cluster and related applications.
• Run and interpret the output of Nutanix Cluster Check (NCC).
• Determine which tool is best to use when troubleshooting a Nutanix issue.
• Collect a comprehensive log bundle from the cluster to provide to Nutanix Support for
analysis.
Nut anix Troubleshoot ing Toolkit
NCC Overview
NCC Modules and Plugins
NCC is a framework of scripts to run various diagnostics commands against the cluster.
These scripts are called Plugins and are grouped by Modules. The scripts can help
diagnose cluster health. The scripts run standard commands against the cluster or
nodes depending on what type of information is being retrieved.
A Module is a logical grouping of Plugins. Some modules have sub modules. Within a
module, administrators have a choice to run all plugins run_all=true or just a specific
plugin out of that module.
The scripts (plugins) run tests against the cluster to report on the overall health.
NCC Version
[nutanix@NTNX-16SM6B490273-A-CVM:10.1.60.113:~$ncc —version
3.0.3.2-16e3fe75
NCC Version
• Ensure that each version in the cluster is running the same version of NCC.
Furthermore, Prism Central and each cluster managed by Prism Central are all
required to run the same NCC version.
NCC Version (co nt ’d )
From Prism, the NCC version can be captured from the either the
gear icon in the top right corner - > Upgrade Software - > NCC, or
from clicking on the user icon in the top right - > About Nutanix
11
12
15
16
17
A single installer file can be run from the CLI of one CVM,
and NCC will be installed across all CV Ms on the cluster.
After dow nloading the file from the Support Portal, copy it to
the CV M using SCP/ SFTP. Ensure that the directory it is copied
to exists on all CV Ms.
Ensure that the MD5SUM value of the Installer file matches that
which is posted on the Support Portal
18
Make t he inst allat ion file execut able and run it to inst all
NCC.
19
From CLI:
• To execute all NCC health checks on a system from the CV M CLI,
the follow ing command is issued on the CV M. O n a 4 - node cluster,
this takes approximately 5 minutes:
ncc health_checks run_all
• The resulting output of the checks is approximately 70 KB in size;
however, will depend on the return status of checks and those that
don’t PASS. The file w ill be saved to the follow ing directory:
/home/nutanix/data/logs/ncc-output-latest.log
From PRISM:
• To execute all NCC health checks on a system from Prism ( AOS
5.0 + & NCC 3.0 +) , navigate to the Health page - > Actions - > Run
20 Checks. Select All checks and click Run.
21
22
23
25
Running Specific Healt h Checks - PRISM
26
NCC File Ut ilit ies
27
NCC Log Collector
• AOS implements many logs and configuration information files that are
useful for troubleshooting issues and finding out details about a particular
node or cluster.
o A log bundle can be collected from the NCC Log Collector utility to
view these details.
• Note: The NCC Log Collector does not collect the NCC
health checks – these should be run separately.
28
• To use t he NCC Log Collect or, t he follow ing com m and is run from a CVM:
ncc log_collector [plugin_name]
• The output of the NCC Log Collector bundle will be stored in:
o /home/nutanix/data/log_collector
o Output of the last 2 Colletions will be saved
• Available plugin options are:
o run_all – Runs all p lug ins, collect s everyt hing from t he last 4 hours by d efault . This is
t he recom m end ed p lug in to b e run to b eg in t roub leshoot ing clust er -related issues.
o cvm_config – Cont roller VM config urat ion
o sysstats – Cont roller VM system st at ist ics
o cvm_logs – Cont roller VM log s
o cvm_kernel_logs – Cont roller VM kernel log s
o alerts – Cluster alert s
o hypervisor_config – Hyp ervisor config urat ion
29 o hypervisor_logs – Hyp ervisor log s
The log collector will also collect the alerts and cluster configuration details including:
- cluster_config
- cvm_config
- cvm_kernel_logs
- cvm_logs
- hypervisor_config
- hypervisor_logs
- sysstats
- df
- disk_usage
- fio_stats
- fio_status
- interrupts
- iostat
- ipmi_event
- ipmi_sensor
- lsof
- meminfo
- metadata_disk_usage
- mpstat
- ntpq
- ping_gateway
- ping_hosts
- sar
- top
In most instances, it is ideal to capture ncc log_collector run_all, unless the culprit
component has been identified. The output of the NCC Log Collector bundle will be
stored in the /home/nutanix/data/log_collector directory. Only the output of the last 2
collections will be saved.
NCC Log Collector Flags – Time Period
• The NCC Log Collector will only collect logs if there is more than 10% of available
free space. If you would like to elect to collect a log bundle and forego this
prerequisite, the --force flag can be used:
o ncc log_collector --force=1 run_all.
• The default value of the flag is 0. This should only be changed under the direction
and supervision of Nutanix Support.
NCC Log Collector Flags
• If sensitive cluster information, such as IP Addresses need to be anonymized,
the --anonymize_output=true flag can be specified as part of the log collector
command
ncc log_collector --last_no_of_hours=[1-23]
--anonymize_output=true [plugin_name]
• To sp ecify a p art icular CV M or list of CV Ms to collect log s from , use
t he --cvm_list com m and .
ncc log_collector --cvm_list=“10.4.45.54,10.4.45.55” [plugin_name]
31
--cvm_list=“10.4.45.54,10.4.45.55” [plugin_name]
Tools & Ut ilit ies
32
Prism
Prism
34
Prism
35
Prism Cent ral
Prism Central
Prism Cent ral
Prism Central
Prism Cent ral
38
Prism Central
REST A PI
REST API
Nut anix REST API
42
43
REST API Explorer (co nt ’d )
44
Nut anix Com m and Line Int erface ( nCLI)
47
nCLI – CVM
• Executing the ncli command w ill drop the user into the nCLI
shell, making it easier to use tab completion if the specific
command to be run is unknow n.
48
nCLI Help
49
nCLI Examples
Category : Starter
Sub Category :
Cluster Expiry Date : No Expiry
Use non-compliant feat... : true
50
nCLI Examples
Tools & Ut ilit ies
51
aCLI
• Acropolis provides a command line interface for managing hosts, networks,
snapshots, and VMs. The Acropolis CLI can be accessed from an SSH session into
any CVM in a Nutanix cluster using the acli command at the shell prompt.
Alternatively, complete aCLI commands can be run from outside of the aCLI prompt,
or the shell can be accessed from the 2030 Service Diagnostics Page of the
Acropolis service. aCLI command history can be viewed in the /home/nutanix/.acli
history file on a CVM. Note that this is a hidden file and only available on hosts that
are running the AHV hypervisor. It is persistent across reboots of the CVM.
aCLI Examples
<acropolis> vm.list
Output:
VM name VM UUID
Server01 40a5cde6-2b20-4fa9-a8b4-1dd4cfc72226
<acropolis> vm.clone Server02 clone_from_vm=Server01
aCLI Examples
• aCLI commands can also be run, with the output specified to be in JSON format.
This is again useful in scripting the configuration and management of an AHV
environment.
o nutanix@cvm:~$ acli vm.list
o VM name VM UUID
o Server01 40a5cde6-2b20-4fa9-a8b4-1dd4cfc72226
o nutanix@cvm:~$ acli -o json vm.list
o {"status": 0, "data": [{"name": "Server01", "uuid": "40a5cde6-2b20-4fa9-a8b4-
1dd4cfc72226"}], "error": null}
• The complete aCLI command line reference manual is available on the Nutanix
Support Portal.
Tools & Ut ilit ies
54
55
more/ less and pipe
• If you need t o brow se log files w it hout having t o edit t hem , t he more and
less commands are very useful.
• The more com m and w ill print out a file one page at a t im e. The spacebar
can be used to go to the next page.
• The less command is similar t o more but it allow s you t o navigat e up/
dow n t hrough t he file, and allow s for st ring searches.
• The less +F is similar to tail –f. You can control C out and will still be in less to search the
file.
• The pipe ( |) charact er isused t o connect m ult ip le program s t oget her. The
st andard out put of t he first program is used as standard input for the
second.
56
head/ t ail
57
head/tail
grep
grep
grep Examples
59
grep Examples
grep Examples (co nt ’d )
60
62
IPTables
Tools & Ut ilit ies
64
65
Hardware Tools
Sup erMicro SUM Tool
SUM Tool
• This tool is not to be distributed to any customers or 3rd parties.
The SUM tool can be downloaded internally with the following command:
wget http://uranus.corp.nutanix.com/hardware/vendors/supermicro/tools/
sum_1.5.0_Linux_x 86_64_20150513.tar.gz
The following Evernote link contains further details and examples around using
the SUM tool: https://www.evernote.com/shard/s28/sh/
b9e3e5cd-82cd-4d16-823f-2717d31d394d/efdb0a118ce0afead0df29314e97b6f3
IPMIt ool
IPMItool
IPMITool
70
IPMITool
IPMITool
71
IPMITool
> IPMITool (cont'd)
• Running the ipmitool from the AHV Host
• When using ipmi tool on a AHV host, you do not need to
specify the IP Address, Username or Password for Access
Allows IPMI user Administration andresetting Passwords
ipmitool user list-to list allusers by 10
ipmitool user set password user_id PasswOfd-to reset an IPMI user password
ipmitoolme reset [ warm lcold)-to reset the Manaaement Controller
• Requires Login Credentials to the AHV Host
II
SMCIPMITool
SMCIPMITool
SMCIPMITool
SMCIPMITool
• SuperMicro reference:
o https://www.supermicro.com/solutions/SMS_IPMI.cfm
SMCIPMITool Examples
74
SMCIPMITool Examples
• This tool is not to be distributed to any customers or third parties.
SMCIPMITool Examples
75
SMCIPMITool Examples
• This tool is not to be distributed to any customers or third parties.
SMCIPMITool Examples (co nt ’d )
76
SuperMicro IPMIView
SuperMicro IPMIView
• Config urat ion d et ails such as IP and log in cred ent ials can b e
saved t o a file for rep eat ed use
78
SuperMicro IPMIView
SuperMicro IPMIView – Login/ Connected
79
80
81
83
• The collect_perf utility is included w ith the CV M from AOS 4 .1.1. W hen a
performance issue is being observed in the Nutanix environment, performance
data can be collected using collect_perf if the issue cannot be resolved by live
troubleshooting or if the issue is not immediately reproducible.
• Avoid running the collect_perf utility for an extended amount of time ( >3
hours) as it w ill take a LONG time to dow nload the large bundle from the
Nutanix cluster and require additional time for backend processing of the data.
85
• You should ALW AYS collect logs for any performance-related issue
collect_perf
• Performance issues are often complex in nature. Information gathering is a crucial
first step in troubleshooting a performance issue in order to gain further insight
around the specifics of the environment and to identify when/where the problem
exists. Some of the important factors to note include, but are not limited to:
• What are the performance expectations?
• What part of the environment is the issue isolated to (UVMs, vDisks, hosts,
CVMs)?
• Is the issue constant or intermittent?
• Is the issue reproducible?
• When did the issue start and what is the business impact?
• After gathering environment-related information, the next step would be to run the
collect_perf utility in order to collect performance-related data. The utility should be
run during a timeframe when the performance issue is being observed so the
appropriate dataset can be analyzed to root cause the issue. It is ideal if the
performance issue is reproducible so the collect_perf utility can be executed in
conjunction with a recreate. If the issue is not reproducible, the collect_perf utility
should be run during a timeframe that overlaps with the reported performance
problem. While the tool will run and collect performance data on a stable cluster,
there will not be much to be done by way of identification of an issue.
collect_perf
86
collect_perf
• The collect_perf utility is generally safe to run on an active Nutanix cluster in
production. There are some additional flags that can be specified in conjunction with
the collect_perf command. You can control the amount of data collected using the –
space_limit parameter. This is set in bytes. There is also a 20% space limit – the
data collector will choose the lower of those two limits. Note that the command flag
must be specified before the start option:
nutanix@NTNX-A-CVM:~$ collect_perf --space_limit=4294967296 start
collect_perf
collect_perf
collect_perf
collect_perf
• If there is a particular time when a performance issue impacts the Nutanix
environment, perhaps after hours or some other inconvenient time, it is possible to
schedule the collect_perf utility to run through a crontab entry. You will need to
ensure that you are:
• Comfortable with editing the crontab using crontab –e
• Create a start AND a stop entry
• Don’t delete any of the existing default crontab entries.
• The following example would configure the collect_perf script to run between 1:50 am
and 3:30 am on October 22, 2017:
50 01 22 10 * /usr/local/nutanix/bin/collect_perf start
30 03 22 10 * /usr/local/nutanix/bin/collect_perf stop
• Try to avoid running the collect_perf utility for an extended amount of time (greater
than 3 hours), as it will be timely to download a large bundle from the Nutanix cluster
and require additional time for backend processing of the data.
collect_perf & Illuminat i
89
Illuminati Server
Illuminat i Server
91
Illuminati Server
Illuminat i / / W eat her Report
92
Clicking on Full W eather Report w ill provide more detail on the checks
and further insight to the cluster. All checks have independent
watermarks that w ill flag the component for further investigation if
exceeded.
93
94
• W atermarks:
• The w at erm arks for t his check are b ased on how oft en m et ad at a lookup s
result ing in m ore t han 20 % of t he t ot al lat ency
• Warn if lookup s exceed ed t he w at erm ark m ore t han 50 % and less t han 80 %
of t he sam p les
• A lert if lookup s exceed ed t he w at erm ark for m ore t han 80 % of t he sam p les
97
98
99
Each page w ill show related statistics, tasks, and operation data
for the respective service it is for.
‘for i in `svmips`; do ssh $i "sudo iptables -t filter -A WORLDLIST -p tcp -m tcp --dport
[port #] -j ACCEPT"; done’.
• This change can be made persistent using the iptables-save command; however, this
is not recommended to leave as a permanent configuration:
‘for i in `svmips`; do ssh $i "sudo iptables -t filter –A WORLDLIST -p tcp -m tcp --dport
2009 -j ACCEPT && sudo iptables-save"; done’.
• The –D flag can be used to delete these specific rules once access to the ports are
no longer needed.
• The IPTables service can be stopped completely with the following command:
• Once troubleshooting has completed the service should be re-enabled using sudo
service iptables start.
• The links command can be used to launch a service diagnostics page without altering
the current IPTables configuration. links http://localhost:[port #] or links http://0:[port
#] will launch a text mode browser view for the specified port.
Service Diagnost ics Pages – W eb Browser
10 1
10 2
10 4
Accessible from: http://portal.nutanix.com
Soft ware Document at ion
• Ad m inist rat ion and Set up g uid es cont ain d et ails around t he
op erat ional t heory b ehind sp ecific feat ures, as w ell as t he correct
config urat ion p roced ure.
• Reference A rchit ect ures/ Best Pract ices Guid es/ Tech Not es
p rovid es insig ht t o exp ect ed b ench m ark values and
recom m end ed config urat ion set t ing s for various feat ures and
soft w are int eg rat ions.
10 5
Software Documentation
Know ledge Base
10 6
Knowledge Base
• The Nutanix Knowledge Base is a collection of articles published and maintained by
Nutanix employees. There are a wide variety of articles that cover everything from
troubleshooting to initial configuration to product- and feature-related details. Specific
KBs can be referenced with the shorthand notation of:
http://portal.nutanix.com/kb/<KB #>. If you believe a particular KB article requires
improvements or further clarify, there is a section titled “Submit feedback on this
article” which will notify the appropriate Nutanix engineers to make the changes that
are specified.
Nut anix NEXT Communit y
10 7
Remote Support
Remote Support
• SSH t unnels are est ab lished from t he ext ernal clust er t o b ackend
servers int ernal t o Nut anix
• This feat ure is d isab led by d efault , b ut can b e enab led from Prism
or t he com m and line only by the customer.
• Can b e enab led for a t im e w ind ow b et w een 1 m inut e and 24 hours
119
Remote Support
• Remote Support is a feature that allows Nutanix engineers to access a Nutanix
cluster remotely. Customers have the option to enable the Remote Support feature
from their end for a pre-determined time in order to allow Nutanix engineers to
provide remote troubleshooting or proactive support monitoring. Once Remote
Support is enabled at the customer site, one of the CVMs establishes an SSH tunnel
connection to an internal Nutanix server, allowing engineers to access the system.
Remote Support - Enable/ Configurat ion
120
• The status of the Remote Tunnel can be verified from Prism or from the command
line.
• The Remote Tunnel can also be enabled via ncli with the following command:
• In order to ensure a successful Remote Support connection, the CVMs must be able
to resolve DNS. Name servers can be configured on the CVMs from within Prism, or
with the ncli cluster add-to-name-servers servers=[comma separated list of name
servers]. If a firewall device exists in the customer’s infrastructure, port 8443 will need
to be opened.
Pulse/ Pulse HD
Pulse/Pulse HD
Pulse
Pulse provides diagnostic systems data to Nutanix support for the delivery of
pro-active Nutanix solutions.
• The Nut anix cluster autom at ically collect s t his inform at ion w it h no effect on system perform ance.
• It is t uned t o collect im p ort ant syst em -level d at a and st at ist ics in ord er t o aut om at ically d et ect issues and
help m ake t roub leshoot ing easier.
• A llow s Nut anix t o proact ively reach out to custom ers for version specific ad visories and alert s.
• Cases from syst em s w it h Pulse/ Pulse HD enab led are resolved 30 % fast er.
Pulse collects machine data only and does NOT collect any private customer
122 information.
Pulse
• When Pulse is enabled, Pulse sends a message once every hour to a Nutanix
support server by default. Pulse also collects the most important data like system-
level statistics and configuration information more frequently to automatically detect
issues and help make troubleshooting easier. With this information, Nutanix support
can apply advanced analytics to optimize your implementation and to address
potential problems. Pulse sends messages through ports 80/8443/443, or if this is
not allowed, through your mail server. When logging in to Prism/Prism Central for the
first time after installation or an upgrade, the system checks whether Pulse is
enabled. If it is not, a message appears recommending that you enable Pulse. To
enable Pulse, click Continue in the message and follow the prompts; to continue
without enabling Pulse, check the Disable Pulse (not recommended) box and then
click Continue.
Pulse HD
Pulse HD
• Nutanix Pulse HD provides diagnostic system data to Nutanix support teams to
deliver pro-active, context-aware support for Nutanix solutions. The Nutanix cluster
automatically and unobtrusively collects this information with no effect on system
performance. Pulse HD shares only basic system-level information necessary for
monitoring the health and status of a Nutanix cluster. This allows Nutanix support to
understand the customer environment better and is an effective troubleshooting tool
that drives down the time to resolution. Several different tools are available internally
for Nutanix Support to more easily parse and utilize the data collected via Pulse HD.
The following information is collected from the cluster when Pulse HD is enabled.
• Cluster Info
o - Cluster name
o - Uptime
o - NOS version
o - Cluster ID
o - Block serial number
o - HW model
o - Cluster IOPS
o - Cluster latency
o - Cluster memory
• Node / Hardware
o - Model number
o - Serial number
o - CPU - number of cores
o - Memory (size)
o - Hypervisor type
o - Hypervisor version
o - Disk model
o - Disk status
o - Node temperature
o - Network Interface Model
o - SATADOM firmware
o - PSU status
o - Node location
• Storage pool list
o - Name
o - Capacity (logical used capacity and total capacity)
o - IOPS and latency
• Container Info
o - Container Name
o - Capacity (logical used and total)
o - IOPS and latency
o - Replication factor
o - Compression ratio
o - Deduplication ratio
o - Inline or post-process compression
o - Inline deduplication
o - Post-process deduplication
o - Space available
o - Space used
o - Erasure coding and savings
• Controller VM
o - Details of logs, attributes, and configurations of services on each Controller VM
o - Controller VM memory
o - vCPU usage
o - Uptime
o - Network statistics
o - IP addresses
• VM
o - Name
o - VM state
o - vCPU
o - Memory
o - Disk – space available
o - Disk – space used
o - Number of vDisks
o - Name of the container that contains the VM
o - VM operating system
o - IOPS
o - Latency
o - VM protected?
o - Management VM?
o - IO Pattern - Read/ Read Write/Random/sequential
• Disk Status
o - Perf stats and usage
• Hypervisor
o - Hypervisor software and version
o - Uptime
o - Installed VMs
o - Memory usage
o - Attached datastore
• Datastore Information
o - Usage
o - Capacity
o - Name
• Protection Domain (DR)
o - Name
o - Count and names of VMs in each protection domain
• Currently Set Gflags
• BIOS / BMC info
o - Firmware versions
o - Revision / release date
• Disk List
o - Serial Numbers
o - Product part numbers
o - manufacturer
o - Firmware versions
o - Slot Location
o - Disk type
• Domain Fault Tolerance States
• Default Gateway
• SSH key List
• SMTP Configuration
• NTP Configuration
• Alerts
• Click Stream Data
• File Server Data
Pulse HD – Enable from Prism
124
125
126
127
Troubleshooting Pulse HD
• Look at KB 1585.
Tools & Ut ilit ies
• NCC/ Log Collector
• Cluster Interfaces/ CV M Commands ( nCLI)
• AHV Commands ( aCLI)
• Linux File/ Log Analysis Tools
• Hardware Tools
• Performance Tools
• Service Diagnostics Pages
• SRE Tools
128
SRE Tools
SRE Tools
SRE Tools
SRE Tools Inst all
159
16 1
Module 2
Tools and Utilities
16 3
Labs
• Module 2 Tools and Utilities.
Thank You!
16 4
Thank You!
Module 3
Services and Logs
Nut anix Troub leshoot ing 5.x
1. Int ro 7. A FS
2. Tools & Ut ilit ies 8. A BS
3. Services & Log s 9. DR
4. Found at ion 10 . AOS Up g rad e
5. Hard w are 11. Perform ance
6. Net w orking
Course Agenda
• This is the Services and Logs module.
O bject ives
Objectives
After completing this module, you will be able to:
• Describe the major components and what they do.
• Troubleshoot the components.
• Find and interpret logs of interest.
Com p onent s
Components.
Component Relat ionships
Component Relationships.
• All of the AOS components (services/processes) have dependencies on others.
Some are required, some are optional, depending on features being made available.
• Internal link:
o https://confluence.eng.nutanix.com:8443/display/STK/AOS+Services
• Finding leaders
o https://confluence.eng.nutanix.com:8443/download/attachments/13631678/How%2
0to%20determine%20the%20leaders%20in%20a%20cluster.docx?version=2&mod
ificationDate=1498665487454&api=v2
Component Relat ionships
Internal link:
https://confluence.eng.nutanix.com:8443/display/STK/AOS+Services
Finding Leaders:
https://confluence.eng.nutanix.com:8443/download/attachments/13631678
/How%20to%20determine%20the%20leaders%20in%20a%20cluster.docx
?version=2&modificationDate=1498665487454&api=v2
Component Relationships.
• All of the AOS components (services/processes) have dependencies on others.
Some are required, some are optional, depending on features being made available.
• Internal link:
o https://confluence.eng.nutanix.com:8443/display/STK/AOS+Services
• Finding leaders
o https://confluence.eng.nutanix.com:8443/download/attachments/13631678/How%2
0to%20determine%20the%20leaders%20in%20a%20cluster.docx?version=2&mod
ificationDate=1498665487454&api=v2
Cluster Component s
Genesis Uhura
Prism Erg on
Zeus Chronos
Med usa Cereb ro
Pit hos
Insig ht s
St arg ate
Lazan
Curat or
Acrop olis Minerva
Ot hers
7
Cluster Components
• There are 30+ components in 5.x.
Some Useful Cluster Commands
$ cluster status
• Ret urns st at us of AOS services
from all nod es in t he clust er.
$ cluster stop
• St op s m ost of t he clust er
com p onent s on all nod es in t he
clust er. Makes st orag e
unavailab le t o t he UVMs. W ill
not execut e if UVMs are running .
$ cluster start
• Sig nals Genesis on all nod es t o
st art any clust er p rocesses not
8 running .
Genesis
Genesis
Genesis
• Genesis is responsible for the following among other things.
• Assigning the 192.168.5.2 address to the eth1 interface of the Controller VM.
• Ensuring that the node is up to date (release version maintained by Zookeeper).
• Preparing and mounting the physical disks.
• Starting the other services.
Some Useful Genesis Commands
$ genesis status
• Ret urns st at us and PIDs for all clust er p rocesses on t he nod e
$ genesis stop <service>
• St op s t he sp ecified service
$ cluster start
• Sig nals Genesis on all nod es t o st art any clust er p rocesses not
running
$ genesis restart
• Rest art s t he local g enesis p rocess
$ cluster restart_genesis
• Rest art s g enesis on all nod es in t he clust er
11
If you execute the command cluster status and see the above output, this
means that genesis isn’t running on the local node.
• This m ay b e b ecause t he nod e or CVM w as recent ly reboot ed or t he genesis
service isn’t fully operat ional.
• If a reboot hasn’t been perform ed, check t he genesis.out log t o see w hy
t he service isn’t st art ed. A lso check for genesis FATAL files t o see if t he
12
service is crashing .
• All Hypervisors: The Cont roller VM can reach 192.168.5.1 from 192.168.5.254 .
• ESXi/ AHV : The Cont roller VM has p assw ord -less ssh access to 192.168.5.1 (t he internal
interface). If you encounter p rob lem s, run t he fix_host_ssh scrip t .
• Hyper-V : The Nut anix Host Ag ent Service is running on t he hyp ervisor, and aut horized
cert s file has t he need ed cert ificates. Run t he winsh com m and to verify t hat Nut anix
Host Ag ent Service is running , and t hen run a Pow ershell com m and to verify t hat t he
cert ificates are correct .
• The com m and to start is issued by t he Nut anix user
NOTE: Genesis m ust b e st arted using t he Nut anix user context . If it is st arted w it h any
ot her context (such as root) t he cluster w ill b e unst ab le.
13
Genesis Requirements
Genesis Logs
~/data/logs/genesis.out
Common Issues
• Unab le t o com m unicat e w it h hyp ervisor
• Clust er IPs chang ed w it hout follow ing p rop er p roced ure
• Zookeep er crashing or unresp onsive
14
Genesis Logs
• /home/nutanix/data/logs is the location of most of the log files.
• Genesis.out is a symlink to the most recent genesis log file and is the one most often
needed. However, there are times when previous versions of the genesis logs can be
useful.
• Some of the details in the genesis logs are:
o Services start and stop.
o Connection problems between nodes.
o Some interactions with the hypervisor.
Genesis.out Content s
https://portal.nutanix.com/#/page/docs/details?t
argetId=Advanced-Admin-AOS-v50:tro-genesis-log-
entries-c.html
15
Genesis.out Contents
• Have students go to
https://portal.nutanix.com/#/page/docs/details?targetId=Advanced-Admin-AOS-
v50:tro-genesis-log-entries-c.html
• …and read the file. It’s about 5 minutes of reading. Text of the page is on the next
couple slides which are hidden. This is the 5.0 version of the file, as of this writing,
there is also a 5.1 version of the file which appears to be very similar if not identical.
These pages are pulled from the Acropolis Advanced Administration Guide.
• As of 5.0 there are 34 services.
Genesis.out content s ( cont .)
W hen checking the status of the cluster services, if any of the services are down, or the
Controller V M is reporting Down with no process listing, review the log
at /home/nutanix/data/logs/genesis.out to determine why the service did not start,
or why Genesis is not properly running.
Check the contents of genesis.out if a Controller V M reports multiple services as
DOW N, or if the entire Controller V M status is DOW N.
Under normal conditions, the genesis.out file logs the following messages
periodically:
• Unp ub lishing service Nut anix Cont roller
• Pub lishing service Nut anix Cont roller
• Zookeep er is running as [ lead er|follow er]
Prior to these occasional messages, you should see Starting [n]th service. This is an
indicator that all services were successfully started.
16
Possible Errors
• 2017-03-23 19:28:00 WARNING command.py:264 Timeout executing scp -q -o
CheckHostIp=no -o ConnectTimeout=15 -o StrictHostKeyChecking=no -o
TCPKeepAlive=yes -o UserKnownHostsFile=/dev/null -o
PreferredAuthentications=keyboard-interactive,password -o
BindAddress=192.168.5.254 'root@[192.168.5.1]:/etc/resolv.conf'
/tmp/resolv.conf.esx: 30 secs elapsed
• 2017-03-23 19:28:00 WARNING node_manager.py:2038 Could not load the local ESX
configuration
Any of the above messages mean that Genesis was unable to log on to the host using the
17
configured password.
Genesis.out content s ( cont .)
Determine the cause of a Genesis failure based on the information available in the log files.
1. Examine the contents of the genesis.out file and locate the stack trace ( indicated by the CRITICAL message type) .
2. Analyze the ERRO R messages immediately preceding the stack trace.
...
2017-03-23 19:30:00 INFO node_manager.py:4170 No cached Zeus configuration found.
2017-03-23 19:30:00 INFO hyperv.py:142 Using RemoteShell ...
2017-03-23 19:30:00 INFO hyperv.py:282 Updating NutanixUtils path
2017-03-23 19:30:00 ERROR hyperv.py:290 Failed to update the NutanixUtils path: [Errno 104] Connection reset by peer
2017-03-23 19:30:00 CRITICAL node_manager.py:3559 File "/home/nutanix/cluster/bin/genesis", line 207, in <module>
main(args)
File "/home/nutanix/cluster/bin/genesis", line 149, in main
Genesis().run()
File "/home/nutanix/jita/main/28102/builds/build-danube-4.1.3-stable-release/python-tree/bdist.linux-
x86_64/egg/util/misc/decorators.py", line 40, in wrapper
File "/home/nutanix/jita/main/28102/builds/build-danube-4.1.3-stable-release/python-tree/bdist.linux-
x86_64/egg/cluster/genesis/server.py", line 132, in run
File "/home/nutanix/jita/main/28102/builds/build-danube-4.1.3-stable-release/python-tree/bdist.linux-
x86_64/egg/cluster/genesis/node_manager.py", line 502, in initialize
File "/home/nutanix/jita/main/28102/builds/build-danube-4.1.3-stable-release/python-tree/bdist.linux-
x86_64/egg/cluster/genesis/node_manager.py", line 3559, in discover
...
In the example above, the certificates in AuthorizedCerts.txt w ere not updated, w hich means that you failed to connect
18
St arg at e
Stargate
St argate
Stargate
• https://confluence.eng.nutanix.com:8443/display/STK/The+IO+Path%3A+A+Stargate
+Story
o This is another source for stargate architecture.
• At the end of this section (Stargate) there are 2 hidden slides that show another
diagram of the read and write orchestration within Stargate.
St argate Component s
SSD
OpLog
Store
Unified
Cache
Memory
SMB
Admission
Controller
Adapter
iSCSI
Adapter
HDD
21
Stargate Components.
Adapters.
• iSCSI / NFS / SMB.
Admission Controller.
• Separate queues for user IO and background tasks.
• Responsible for “hosting” individual vdisks.
vDisk Controller.
• One vDisk controller per vDisk.
• Maintains metadata and data caches.
• Looks up and updates metadata.
• Directs I/O requests based on direction (read/write), size, and so on.
• Ops to fix extent groups, migrate extents and extent groups, drain oplog, copy block
map, and so forth.
Oplog.
• Maintains in-memory index of oplog store data.
• Draining optimizes metadata lookup, updates & extent store writes.
• Write log on SSD for faster response to writes.
• Persistent write cache of dirty data that has not yet been drained to Extent Store.
• Pipelines write requests to remote Oplog Store.
Extent Store.
• Handles extent group read and write requests from vDisk Controller.
• Maintains each extent group as a file on disk.
• Manages all disks in the Controller VM.
• Pipelines write requests to remote Extent Store.
O plog
22
Oplog
Purpose of the Oplog
• Low write latency.
• Absorb burst random write.
o Write hot data may get overwritten soon, thus prevent cost to write to extent store.
• Coalesce contiguous IO reducing ops to extent store.
23
Unified/Content Cache.
• Upon a read request of data not in the Content Cache (or based upon a particular
fingerprint), the data will be placed into the single-touch pool of the Content Cache
which completely sits in memory (Extent Cache).
o LRU (Least Recently Used) is used until it is evicted from the cache.
• Any subsequent read request will “move” (no data is actually moved, just cache
metadata) the data into the memory portion of the multi-touch pool, which consists of
both memory and SSD. From here there are two LRU cycles:
o One for the in-memory piece upon which eviction will move the data to the SSD
section of the multi-touch pool where a new LRU counter is assigned.
o Any read request for data in the multi-touch pool will cause the data to go to the
peak of the multi-touch pool where it will be given a new LRU counter.
Random W rite Sequence
1. Protocol W RITE
5 4a
SSD
OpLog
request
Cassandra
Unified
Store
2. Incoming t raffic cont rol
Memory Cache 4
Memory 3. Check in- memory
1 Buffers
Adapter
3 and replicas
iSCSI
Adapter
5. Receive Oplog Store
replica ACK
FUSE
Adapter HDD
24
6 . Protocol W rite ACK
Questions:
• What are the criteria to write data from the Memory Buffers into the OpLog Store? Is
it outstanding writes to a contiguous region (1.5MiB, according to documentation)
within a specific time or is it from the size of the protocol payload?
O plog Drain Sequence
3 Unified
Cache Memory
2 and generate async W riteO p
Memory Buffers
6 3. Query Cassandra for extent groups
5 4. Cassandra extent group metadata
NFS
7
Extent
response
1
Adapter vDisk Store
Controller
SMB 5. Check each extent group for sufficient
Admission
Controller
Adapter
replicas
iSCSI
Adapter 6. Commit async writeOp to ExtentGroup
FUSE
replicas
Adapter HDD 7. Replicate extents to ExtentGroup
replicas and release O plog space
25
26
Questions:
• What are the criteria to write data from the Memory Buffers into the OpLog Store? Is
it outstanding writes to a contiguous region (1.5MiB, according to documentation)
within a specific time or is it from the size of the protocol payload?
Cached Read Sequence
SSD 1. Protocol READ
Cassandra
4 OpLog
Store
request
Unified 2. Incoming traffic
control
Cache
Memory
3. Check in-memory
Memory
Buffers
1
Oplog index
5 NFS
5 4 vDisk
Controller
Extent
Store 4. Check if requested
Adapter
2 data is in the Unified
SMB
3 Cache
Admission
Controller
Adapter
FUSE
Adapter
HDD
27
FUSE
Adapter
HDD
28
Memory
5 4. Check is requested data is in
Memory
Buffers the Unified Cache
1 7 5. Query Cassandra for Extent
NFS 4 vDisk Extent Groups
9 Adapter 9 Controller Store
6. Cassandra Extent Group
2
metadata response
SMB
3 7. Retrieve data from Extent
Admission
Adapter
Controller
Store
iSCSI
Adapter 8. Store retrieved data in
Unified Cache
FUSE
Adapter 9. Protocol READ ACK
HDD
29
~/data/logs/stargate.out
~/data/logs/stargate.{INFO,WARNING,ERROR,FATAL}
~/service iptables stop
Stargate Diagnostics
• https://portal.nutanix.com/#/page/docs/details?targetId=Advanced-Admin-AOS-
v51:tro-stargate-log-entries-c.html
31
2009 page.
20 0 9 page ( cont ’d)
32
33
vDisk Data.
• Name.
• Usage / Deduped (calculated during Curator scans – may be stale).
• Perf stats for both oplog and overall vDisk.
• AVG Latency – latency between Oplog and Extent Store - does not measure what’s
experienced by the guest VM.
20 0 9 page – Extent Store
34
35
36
Anatomy of a Write
37
Curat or
Curator.
Curator
Background Process.
• Runs on all t he nodes of t he clust er.
• Mast er/ slave archit ect ure.
39
• Expose int erest ing st at s.
o Snap shot usag e, vDisk usag e, sp ace saving s.
Curator.
Curator – Map/ Reduce
40
Curator – Map/Reduce.
• Mapping and reducing are distinct tasks which are carried out on individual nodes.
o The Curator master will dictate which curator slaves will perform which tasks.
• Mapping is the master determining from the slaves what each node has and the
reducing tasks are manipulating that data.
o Data can be deleted, or more copies made of it, depending on the results of the
mapping tasks.
Curator Scans
Curator Scans
Curator Scans
42
Curator Scans.
Curator ILM
Curator ILM.
• ILM – Information Lifecycle Manager.
o Used to down migrate data from hot tier to cold tier.
o Up migrations are done by Stargate.
Curator – O t her Tasks It Does
45
46
~/data/logs/curator.out
~/data/logs/curator.{INFO,WARNING,ERROR,FATAL}
~/service iptables stop
Curator Diagnostics.
• o Talk about “salt stack blocking the port”
Med usa / Cassand ra
Medusa / Cassandra
Medusa / Cassandra
49
Medusa / Cassandra.
Cassandra Ring
50
Cassandra Ring.
• Basic ring, 4 nodes in one cluster.
Cassandra Ring Updat ing
51
Monitoring Cassandra.
Medusa/ Cassandra Diagnost ics
~/data/logs/cassandra_monitor.{INFO,WARNING,ERROR,FATAL}
~/data/logs/cassandra.out
~/data/logs/cassandra/system.log
~/data/logs/dynamic_ring_changer.out
53
Medusa/Cassandra Diagnostics.
• https://portal.nutanix.com/#/page/docs/details?targetId=Advanced-Admin-AOS-
v51:tro-cassandra-log-entries-c.html
Node States:
• Normal – All good.
• Forwarding – Data is being moved to other nodes – usually due to node removal.
• Detached – Node has been detached due to unresponsive Cassandra process.
54
Extent
• Key Role: Logically contiguous data.
• Description: An extent is a 1MB piece of logically contiguous data which consists of
n number of contiguous blocks (varies depending on guest OS block size). Extents
are written/read/modified on a sub-extent basis (aka slice) for granularity and
efficiency. An extent’s slice may be trimmed when moving into the cache depending
on the amount of data being read/cached.
Extent Group
• Key Role: Physically contiguous stored data
• Description: An extent group is a 1MB or 4MB piece of physically contiguous stored
data. This data is stored as a file on the storage device owned by the CVM. Extents
are dynamically distributed among extent groups to provide data striping across
nodes/disks to improve performance.
• NOTE: As of 4.0, extent groups can now be either 1MB or 4MB depending on
dedupe.
• There are several commands in the curator_cli that can be used to see vDisk info
such as vdisk chain info or vdisk usage.
Cereb ro
Cerebro.
Cerebro
Replication Management.
Master/ Slave Architecture.
• Master.
o Task d eleg at ion t o Cereb ro Slaves.
o Coord inat ing w it h rem ot e Cereb ro Mast er w hen rem ot e
rep licat ion is occurring .
o Det erm ines w hich d at a need s t o b e rep licat ed .
– Deleg at es rep licat ion t asks t o t he Cereb ro Slaves.
• Slaves.
o Tell local St arg at e w hich d at a t o rep licat e and t o w here.
56
Cerebro.
Cerebro Diagnost ics
~/data/logs/cerebro.out
~/data/logs/cerebro.{INFO,WARNING,ERROR,FATAL}
~/data/logs/cerebro_cli.{INFO,WARNING,ERROR,FATAL}
57
Cerebro Diagnostics.
• The 2020 page on the Cerebro master shows protection domains (with number of
snapshots, Tx and Rx KB/s, and so forth), remote bandwidth, ping remote latency,
ergon tasks, slave data and Medusa latency.
A crop olis
Acropolis.
Acropolis
59
Acropolis.
Acropolis – W hat It Does
60
CVM CVM
( Acrop olis Mast er)
Task Executors
Scheduler/ HA
Net Controller
61
Acropolis Components.
Acropolis is a distributed service.
• All of the above components run in a single Python thread (with coroutines).
• Acropolis master is responsible for all task execution and resource management.
• We will distribute master responsibilities if we run into scalability limits.
• Each Acropolis instance collects and publishes stats for the local hypervisor.
• Each Acropolis instance hosts a VNC proxy service for VM console access.
Nutanixbible has a good section about the dynamic scheduler, what it does and
how it does it. It also has a good section on placement decisions.
Acropolis Interface
Acropolis Interface.
Acropolis Diagnost ics
~/data/logs/acropolis.out
~/data/logs/curator.{INFO,WARNING,ERROR,FATAL}
~/data/logs/health_server.log (scheduler log)
Acropolis Diagnostics.
• Health_server.log – Look for scheduler.py issues.
Acropolis 20 30 pages on Acropolis
Master
:20 30 – Host / Task / Net work Informat ion
:20 30 / sched – Host status / V Ms on each host
:20 30 / t asks – Acropolis t asks
:20 30 / vms – V M Name / UUID / V NC / St at us
:20 30 / shell – Interactive acli shell
64
Labs.
Labs
67
Labs.
Module 3 Services and Logs.
Thank You!
68
Thank You!
Module 4 Foundation
Nut anix Tro ub lesho o t ing 5.x
Module 4 Foundation
Course Agenda
1. Int ro 7. A FS
2. Tools & Ut ilit ies 8. A BS
3. Services & Log s 9. DR
4. Found at ion 10 . Up g rad e
5. Hard w are 11. Perform ance
6. Net w orking
Course Agenda
• This is the Foundation module.
O bject ives
Objectives
After completing this module, you will be able to:
• Explain what Foundation is and how it is used
• Define the difference between Standalone and CVM-based Foundation
• Describe how to Troubleshoot Foundation
• Move Nodes to other Blocks
• Find and Analyze Logs of Interest
This module is designed to provide an overview of Foundation with a focus on
how it can be used in Troubleshooting.
• It also provides information about the different versions of Foundation, the standalone
virtual machine and CVM-based.
o We will discuss when to use both Foundation types and the dependencies for each
type.
• We will cover general troubleshooting techniques for Foundation including
networking, virtual networking, and IP configuration.
• Finally, we will review the Foundation logs and cover how they can be used for
overall troubleshooting.
Foundation
Foundation
W hat is Foundat ion?
What is Foundation?
• Foundation is a Nutanix tool to image (Bare Metal) or re-image nodes in the field
with the proper AOS version and Hypervisor.
o When a customer purchases a block of nodes, from the factory the nodes are
imaged with the latest version of the Acropolis Operating System (AOS) and the
latest Acropolis Hypervisor (AHV) version.
• If customers want to use a different hypervisor other than the one that the cluster
ships with (ESXi/HyperV), the nodes need to be imaged and configured using
Foundation.
o Nutanix supports various hypervisors including:
o AHV
o ESXi
o HyperV, and
o Xenserver.
Foundat ion in t he Field
Standalone Foundation
St andalone Foundat ion
Standalone Foundation
• Standalone Foundation is available for download on the Support Portal.
o The latest version is 3.7.2.
• Download the Standalone Foundation from the Portal Foundation_VM-
3.7.2.tar file
o The standalone Foundation VM is for Nutanix Internal use only, not available for
customers.
• Once downloaded from the Support Portal the tar file needs to be unpackaged.
o After unpackaging the tar file, the virtual machine can be deployed using the ovf
file or creating a new VM and pointing to the vmdk virtual disk.
• Oracle Box or VM Workstation virtualization applications (required) can be installed
on the computer/laptop to run the Foundation virtual machine.
o The virtual machine is Linux-based and needs network access.
• The VM virtual networking in most instances will be bridged from the VM to the
computer/laptop physical NIC (Network Interface Card).
o This will give the Foundation virtual machine the ability to communicate with the
Nutanix nodes on the local network required for imaging.
o By default, the eth0 interface on the Foundation virtual machine is set to DHCP.
Through the console you can log in to the virtual machine with these credentials:
• Username: nutanix
• Password: nutanix/4u
There is also a script on the desktop of the Foundation virtual machine to configure the
IP address settings.
• If you make any changes and save this will cause the network service to be restarted
on the Foundation virtual machine.
9
Nutanix Cluster
10
Switch Nutanix Cluster
11
Laptop
14
Standalone Foundation
• Once the Foundation virtual machine is configured with the correct IP address you
can then access the Foundation Application at the following URL:
o http://ip_address_Foundation_vm:8000/gui
• The Foundation web User Interface is wizard-based and will walk the customer
through several screens/tabs to perform the node imaging.
o The various tabs will provide block discovery automatically for bare metal (No
hypervisor or CVM running on node) using the MAC addresses from the IPMI
interface, CVM and Hypervisor IP address configurations, subnet mask, DNS,
NTP, and default gateway settings.
• Foundation also provides a tab during the wizard-based deployment to upload the
AOS and hypervisor binaries.
o Scp or winscp can also be used to copy the binaries up to the Foundation Virtual
Machine.
• Copy nutanix_installer_package-version#.tar.gz
o …to the /home/nutanix/foundation/nos folder.
o It is not necessary to download an AHV ISO because the AOS bundle includes an
AHV installation bundle.
• However, you have the option to download an AHV upgrade installation
bundle if you want to install a non-default version of AHV.
• If you intend to install ESXi or Hyper-V, you must provide a supported ESXi or Hyper-
V ISO (Windows 2012R2 Server) image.
o Therefore, download the hypervisor ISO image into the appropriate folder for that
hypervisor.
• ESXi ISO image: /home/nutanix/foundation/isos/hypervisor/esx
• Hyper-V ISO image: /home/nutanix/foundation/isos/hypervisor/hyperv
• For third party hypervisors, reference the download page on the Nutanix Support
Portal for approved releases and the appropriate JSON file.
• Third-party hypervisor software can be obtained from that specific vendor.
o Example ESXi 6.5: Download from the VMware site.
o Example Hyper-V: Download from the Microsoft site.
Standalone Foundation
• Once the Foundation virtual machine is up and running the network interface eth0
has to be configured with the correct IP address Subnet Mask Default Gateway and
DNS Servers.
o By default, eth0 is configured for DHCP, recommended to use a static IP Address.
• After configuring all the IP settings test connectivity with PING to make sure you can
reach the gateway name servers and most of all the IPMI IP addresses configured on
the Nutanix nodes.
o Once the IP connectivity is established the nodes can be successfully imaged from
the Foundation virtual machine.
St andalone Foundat ion
Bare Met al
• No hypervisor or CV M exists on the node
• No Automatic Discovery via IPv6 Link- Local
Address FE80 …
Requires IPMI MAC address
How to O bt ain IPMI MAC Address –
• &RQILJXUH,3$GGUHVV
• .H\ERDUGDQG0RQLWRU
• :HE%URZVHUWR,30,,3$GGUHVV
• &90RU+\SHUYLVRU
• 3K\VLFDO'HYLFH
16
Standalone Foundation
• Standalone bare metal imaging is performed using the IPMI Interface.
o The Foundation virtual machine needs access to each nodes IPMI interface to
successfully image.
o Since there is not an existing CVM on the node, automatic discovery via IPv6 link-
local address will not be available.
• The only way to discover the nodes is to obtain the MAC address to the IPMI
interface and manually add the block and nodes by MAC address.
o Foundation can also configure the IPMI IP address, but in most instances you will
console into each node and configure the IPMI with a static IP address ahead of
time or day of installation.
• Then test IP connectivity from the Foundation virtual machine using ping.
o If the Foundation virtual machine does not successfully ping the node’s IPMI
address, go back and review IP settings on the nodes (IPMI interface) and the eth0
interface on the Foundation virtual machine.
• To make things easier, you can cable the nodes IPMI or Shared LOM port and the
Laptop/Computer to a single flat switch and then assign a network ID and IP
addresses to the Laptop/Computer and Nutanix nodes.
o This way there are no VLAN configurations or complications.
o This setup provides basic network connectivity for imaging the nodes.
• Once the nodes are successfully imaged and IP addresses configured for each
node’s CVM and Hypervisor, they can then be moved into the production
network and subnet.
o At this point, proper VLANs must be assigned if applicable to the CVM and
Hypervisor external interfaces.
o To access the console of the node point a web browser to the IP address set for
the IPMI interface, or…
o You can also go into the datacenter and hook up a keyboard and Monitor
physically to the nodes.
St andalone Foundat ion
Standalone Foundation
• Once on the console of the IPMI interface, under Configuration there is Network.
o The network configuration tab is where you can:
• Get the IPMI mac address and
• Obtain or modify existing IP address, gateway, and DNS settings.
• For the LAN interface settings there are three choices:
o Dedicate – Do not use!
o Share
o Failover
• The LAN interface setting should always be set to Failover (Default).
o IPMI port speeds vary depending on platform.
• Some platforms have 100Mb, others the IPMI will be 1Gb.
o All the ports are RJ-45 interfaces.
• The IPMI can failover or share access with one of the LOM (LAN on Motherboard)
ports.
o This allows the use of one port for both IPMI and Data.
• The shared failover LOM Ethernet port can be either 1Gb or 10Gb depending on the
platform.
St andalone Foundat ion
IPv4/IPVv6 Settings
VLAN Settings
18
LAN Interface
Standalone Foundation
• Notice in the IPMI web interface on the network settings page is where to configure
the network.
o The MAC address of the IPMI interface required by standalone Foundation for
bare metal imaging and cluster setup.
o The VLAN setting for the IPMI by default is set to 0.
• VLAN 0 means that the interface will be on the whatever the default VLAN that
is set on the network switch port.
• Let’s deal with access mode switch ports first.
o For example, let’s say we have a Cisco Nexus 5K switch.
• When the switch is initialized and the ports are brought online (no shutdown)
they will be configured by default in access mode and the default VLAN will be
VLAN 1.
o If the IPMI port is configured for VLAN 0 then it will use the VLAN that is assigned
to the switch port in access mode.
• When the switch port is in access mode then all changes to the VLAN would be
configured on the switchport and do not change the VLAN ID from 0 in the IPMI
settings.
• Now let’s look at Trunk ports.
o Switch ports can also be configured in trunk mode.
• Why do we want to set a switch port to trunk mode (switchport mode trunk)?
- Changing a switchport to trunk mode allows you to do something on the host
network interfaces called VLAN tagging.
• VLAN tagging allows for creating multiple logical networks from a single
interface on a physical network adapter or a virtual network adapter.
• We can also apply a VLAN tag to the IPMI network interface.
- Now this is where the discussion can get a little confusing.
• If the IPMI port is cabled to a switchport that is configured in trunk mode, it
works slightly differently than if the switchport is in access mode.
• If the IPMI port is cabled to a switchport that is in trunk mode and the IPMI port is
configured for VLAN ID 0 then the switch will determine what VLANs the port will be
on with a setting called (switchport trunk native VLANs 1).
o The default is VLAN ID 1 on Cisco switches.
o Think of VLAN 0 as non-tagged as opposed to tagged.
• If the IPMI port is cabled to a switchport that is in trunk mode and the IPMI port is
configured for any other VLAN ID then 0 then that will be the VLAN ID for the IPMI
port.
o Also, on the LAN interface setting, the default is Failover and there is no need to
change this setting.
St andalone Foundat ion
Configure IPMI to use Static IP ipmitool lan set 1 ipsrc static
Standalone Foundation
St andalone Foundat ion
20
Standalone Foundation
• This is the backplane of the NX-1065-G4/G5 platform.
o The IPMI port is 1Gb with an RJ-45 connector.
• The LOM Shared IPMI is 1Gb with an RJ-45connector.
St andalone Foundat ion
NX-3175-G4
Standalone Foundation
• This is the backplane of the NX-3175-G4 platform.
o The IPMI port is 1Gb with an RJ 45 connector.
• The LOM Shared IPMI is 10Gb with an SFP+ connector.
CVM- Based Foundation
CVM-Based Foundation
CVM- based Foundat ion
23
CVM-based Foundation
• All of the Binary files and the Foundation Service is now available on the CVMs.
o For the most part, the Foundation that runs directly from the CVMs is the same as
the standalone virtual machine with a few differences.
CVM versus St andalone Foundat ion
Similarities Differences
The Access URL is the same Must have a Hypervisor installed on the node and a
http://CVM_IP or VM_IP:8000. healthy CVM virtual machine up and running on the
node.
The paths for the images and ISOs for AOS The node must not be part of an existing cluster.
and Hypervisors are the same.
The log path locations are the same. Requires IPv6 for node discovery.
Most of the pages in the wizards between Serially images first node.
the two are the same.
Have to create a cluster. Standalone can image only
and cluster creation is not required.
24
CVM-based Foundation
• The Foundation service must be running.
o The node cannot be part of an existing cluster.
• If the node is part of an existing cluster then all the other services will be running and
the Foundation service will not.
CVM Foundat ion W eb Interface
10.30.15.47:8000/gui/index.html
26
/home/nutanix/foundation/nos/
Location to Upload AOS nutanix_installer_package-
Software thru GUI or SCP release-danube-4.7.5.2-
stable.tar.gz
/home/nutanix/foundation/isos/
hypervisor/kvm/host-bundle-
AHV Hypervisor Tarball el6.nutanix.20160217.2.tar.gz
host-bundle-el6.nutanix.
27
20160217.tar.gz
28
29
30
31
Foundation CVM-based
• When you click Next on the Setup Node page will perform a pre-check and test
connectivity between all the IP addresses and the gateway.
o If there is a connectivity issue then the wizard will highlight the IP addresses in
question in red.
• The error is not very obvious from the web interface.
o At this point, there could be a number of issues causing the network connectivity
problem.
• To provide better troubleshooting we must access the Foundation Server logs
either:
o On the CVM if CVM-based or
o On the Standalone Virtual Machine.
• Both use the same log path locations and naming conventions.
• Foundation logs are stored in the following path:
o /home/nutanix/data/logs/foundation.out
o This log is for the overall Foundation Service.
o It also tells you where the individual logs for each node are being stored and their
names.
o You may have to examine the foundation.out log on multiple nodes to find the
one that indicated where the node log files will be stored.
cat foundation.out
tar: Cowardly refusing to create an empty archive
Try `tar --help' or `tar --usage' for more information.
'tar cf /home/nutanix/data/logs/foundation/archive/orphanlog-archive-20170511-
144648.tar -C /home/nutanix/data/logs/foundation/. ' failed; error ignored.
2017-05-11 14:46:48,304 console: Console logger is configured
2017-05-11 14:46:48,304 console: Configuring <class
'foundation.simple_logger.SmartFileLoggerHandler'> for None
2017-05-11 14:46:48,305 console: Configuring <class
'foundation.simple_logger.FoundationLogHandler'> for foundation
2017-05-11 14:46:48,305 console: Configuring <class
'foundation.simple_logger.SessionLogHandler'> for foundation.session
2017-05-11 14:46:48,428 console: Foundation 3.7 started
Foundat ion CVM- based - (cont)
The Foundation
interface does not
provide much insight as
to the actual problem
32
Foundation CVM-based
• By looking in to the individual node log we discover the issue. The subnet mask was
incorrect at 255.255.255.128 which will not allow the CVM to reach the gateway.
o Subnet Mask 255.255.255.128.
o The Gateway is 10.30.0.1.
o The CVM IP address is 10.30.15.47.
• So let’s examine:
o 255.255.255.128 subnet mask means that my network IDs are the following:
• In the last octet we are borrowing one bit which means every 128 decimals is a
new subnet.
• This will allow us to divide the last octet in to two subnets starting from 0 to 255.
o 0-127 is a subnet
o 128-155 is a subnet
• When assigning IP addresses any IP address from 0 thru 127 are on the same
subnet and 128 thru 255 is a separate network.
• Examine the IP address on the CVM 10.30.15.47/255.255.255.128.
o So the CVM is running on subnet 10.30.15.0-128.
• The Gateway is 10.30.0.1/255.255.240.0.
o This is the correct subnet mask for this network which means in the third octet
every 16 decimals is a different subnet.
• So the Gateway is running on subnet 10.30.(0-15).xxx
o These are not on the same subnet and therefore cannot communicate.
Imaging Steps
• Get FRU Device Descrip t ion via IPMIt ool
• Check t o see w hat t yp e of Syst em
o Lenovo
o Soft w are Only
o Dell
• Prep are NOS Packag e and Nod e Sp ecific Phoenix
• Mount Phoenix ISO and Boot int o Phoenix
34
Before G4 Platforms
• Mod ify t he field "nod e_p osit ion":
o "A " in /etc/nutanix/factory_config.json
o restart genesis
Since G4 Platforms
• Just m od ifying factory_config.json file d oes not up d at e
t he nod e locat ion in Prism
37
Case A
Copy an exist ing hardware_config.json
• You can copy t he file from a correct ly config ured nod e
inst alled at t he sam e p osit ion in a d ifferent b lock
Easiest , not alw ays ap p licab le
38
Solution
• In the example below, we have a node that's being displayed in slot D while
physically in slot A of the same chassis.
• We want to make sure this node shows up correctly in slot A of the hardware diagram
in Prism.
Case A (easiest, not always applicable):
• Copy an existing hardware_config.json file from a correctly configured node installed
at the same position in a different block.
Requirements
• You have access to a hardware_config.json file of the exact same node model as the
one you are going to modify (eg. you can use the file from an A node in a different
chassis).
Procedure
• Move original hardware_config.json file
o <Wrong_positioned_node_IP>:~$ sudo mv /etc/nutanix/hardware_config.json
o /etc/nutanix/hardware_config.json.ORIG
• Copy /etc/nutanix/hardware_config.json from a node that is already in the desired
location to the node you want to correct(these two nodes should be in the same
position of another block)
<Wrong_positioned_node_IP>:~$ sudo sh -c 'rsync -av
o
nutanix@Well_positioned_node_IP:/etc/nutanix/hardware_config.json
/etc/nutanix/hardware_config.json'
• Restart genesis
o <Wrong_positioned_node_IP>:~$ genesis restart
Case B
Manually Mod ify hardware_config.json File
hardware_config.json
• JSON file st ruct ure is used by Prism GUI
• Provid e config urat ion inform at ion ab out t he syst em
Make a b ackup copy of t his file b efore you m ake
m od ificat ions
39
Labs
Thank You!
46
Thank You!
Module 5
Hardware Troubleshooting
Nut anix Troub leshoot ing 5.x
Objectives
After completing this module, you will be able to:
• Describe and Troubleshoot the Physical Hardware Components of a Nutanix Cluster
• Perform Failed Component Diagnosis
• Explain the Role of Disk Manager (Hades)
• Repair Single SSD Drive Failures
• Apply Firmware Updates using Lifecycle Manager LCM
• Hardware Troubleshooting Logs of Interest
• Describe and Perform a Node Removal in a Nutanix Cluster
o How to Perform a Node Removal
o Node Removal Process
o Troubleshooting and Monitoring Node Removal
• Describe and Perform a Disk Removal on a Node
o How to Perform a Disk Removal in a Nutanix Cluster
o Disk Removal Process
o Troubleshooting and Monitoring Disk Removal
Com p onent s
Components
Describe & Troubleshoot Nutanix Cluster Hardware Component s
3 4 6 0 G4
Nutanix # of Nodes Drive Form
NX-ABCD-GE
Factor
A = Product Family Series – (1 = entry – 1 SSD + 2 HDD; 3 = balanced – 2 SSDs + 2/4 HDDs;
6 = capacity – 1-2 SSDs + 4/5 HDDs; 7 = GPU node; 8 = Ent Apps; 9 = all Flash)
B = # of nodes being sold in the chassis ( some will always be 1, others will be 1,2,3 or 4)
C = Chassis form factor – nodes/rack units (5 = 1N-2U; 6 = 4N-2U; 7 = 1N-1U)
D = 0 = 2.5” disk form factor; 5 = 3.5” form factor
5
GE = Generation (4 = Haswell; 5 = Broadwell)
The NX line of hardware consists of Blocks (chassis) and Nodes (x86 Severs). There are
different Models to choose from including entry level platforms such as the 1000 series or the
All Flash 9000 series platform. Nutanix also provides a concept called storage heavy nodes in
the 6000 series. See the Nutanix Support Portal for details and specifications on all Nutanix NX
Series Platforms.
The chassis that the NX hardware ships in is called a Block. All of the Nutanix NX platform
chassis are standard 2U in width. Chassis come in several variations for node configuration.
Some chassis have four slots to contain up to a max of 4 nodes in the Chassis. Examples would
be the NX 1000 and NX 3000 series. Storage heavy nodes will have two slots for 2 nodes.
Nutanix even has single node backup only targets which only contain a single slot in the
chassis and one node.
The compute resources in the chassis are the nodes. Each node is a separate x86 hardware
server that runs the hypervisor, CVM, and UVM's. Nodes can be referred to as sleds or blades.
Each sled slides into a slot in the chassis and connects to the midplane for power and disk
connectivity. All chassis have disk shelves built in for storage. All nodes run Intel Processors
and come in different socket, core, Network, and memory configurations. The 8000 series
nodes support expansion GPU cards for GPU intensive workloads.
On the top of all the chassis, you will see the Nutanix Branding label that confirms this is
Nutanix hardware. On the side is where the chassis serial no. is located. This is the serial
number in the factory_config.json file for node postition and naming. The serial number will
also be in the name of the CVM in the SSH shell.
Failed Component Diagnosis
SATA SSDDĞƚĂĚĂƚĂ Drive
SATA HDD or SATA SSD Data Drive
Node
Chassis or Node Fan
Memory
Power Supply
Chassis
• An Alert was Received on the Nutanix UI that a SATA SSD was Marked Offline
• There are Errors in the Cassandra Log File indicating an Issue Reading
Failure Metadata
Indicators
W hat to do Next
If an Alert was Received that the Disk was Marked O ffline run the Follow ing
Command:
ncli>disk list id=xx – W here xx is the Disk ID in the Alert to Verify
Location for Replacement
If there are Errors Indicating Metadata Problems, Replace the Disk
8
• SSD drives failures can be one of the most challenging issues to fix in a Nutanix
Cluster. This is especially true for Nutanix single SSD node systems. The SATA SSD
drive is where the Controller Virtual Machine lives. The Controller Virtual Machine
configuration files and data are stored on the SSD drives.
• Nutanix nodes ship with either a single or dual SSD drive configuration.
• For dual configurations, the CVM configuration files partition is mirrored for fault
tolerance. If only one SSD drive is lost, the Controller Virtual Machine should not
experience downtime. Note: The Controller VM will reboot if there is a SSD drive
failure. Replace the failed SSD drive and it should automatically be brought back into
the mirror and re-synced.
• For single SSD boot drive configurations, there is no mirrored drive so Controller
Virtual Machine failure will occur. Controller Virtual Machine failure will cause a loss in
the local Stargate access. HA will provide failover for the hypervisor by pointing to a
remote Controller virtual machine for Stargate Services. The Controller virtual machine
has to be rebuilt.
• To replace the single SSD drive requires manual intervention. We will review the
detailed steps next.
Repair Single SSD Drive Failure Systems
• Systems that have a single SSD drive cannot rely on the mirroring software to fix the
failed drive. For these systems you must follow the following procedures to repair
single SSD drive failures.
• Gather information about the CVM on the failed SSD drive node.
• Next remove the failed SSD drive and replace with replacement drive.
• Log on to any other CVM in the cluster and run the following command to start the
SSD drive repair.
$ single_ssd_repair –s cvm_ip
• The repair process will partition format and rebuild the CVM on the node where the
SSD drive failed. ssd_repair_status can be used to track the status of the repair
process.
nutanix@cvm$ ssd_repair_status
Sample output:
10
11
Failure Indicators
Stargate Process Marks the Disk O ffline if I/ 0 Stops for more than 20
Seconds
• A n A lert is Generat ed
Next Steps
Run the Command
12
• ncli> disk ls id=xx - t o g et t he d isk locat ion t o rep lace
ncli> disk ls id=xx (where xx is the ID shown in the alert) and verify the following:
Storage Tier: SSD-SATA
Location: [2-6]
Online: false
SATA HDD or SATA SSD Data Drive – SATA Controller Failure
Verify by Running the smartctl Command on the Disk Checking for any
Errors
• nutanix@cvm$ sudo smartctl -a /dev/sdc
Sample Output:
• SMART overall-health self-assessment test result: PASSED
• SMART Error Log Version: 1
• No Errors Logged
14
No Green Light on Power Button Node does not pow er on, reseat t he
w hen trying to Power Up Node. If t hat does not resolve t he
issue, replace t he node.
O ne of the on-board NIC Ports is
not W orking A ny ot her failure indicat ors, replace
t he node.
A diagnosed Memory Failure Actual
Memory Slot Failure Troubleshoot NIC issues. If t he NIC is
st ill not w orking replace t he nod e.
Multiple HDDs go O ffline, No Drive
Errors Reporting
15
Node Failure
Failed Component Diagnosis
SATA SSD Metadata Drive
SATA HDD or SATA SSD Data Drive
Node
Chassis or Node Fan
Memory
Power Supply
Chassis
16
18
Memory Failure
Failed Component Diagnosis
20
21
22
23
Chassis Failure
SATA SSD Boot Drive Failure – Rescue
Shell
Log in to Rescue Shell to Diagnose Issues w it h t he
Boot Device
Steps to Enable t he Rescue Shell
• Creat e t he ISO Im ag e svmrescue.iso
nutanix@cvm$ cd ~/data/installer/*version* -
AOS Version
nutanix@cvm$ ./make_iso.sh svmrescue Rescue 10
• Launch t he Recovery Shell
o Shut d ow n t he Cont roller VM if St ill Running
24
26
• Note: Drive failures are not as severe in a Nutanix Cluster due to the Replication
Factor. With RF2 there is always another copy on another node in the cluster. With
RF3 there are always 2 copies on 2 other nodes in the cluster.
• Best practice is to watch for drive failures and replace as soon as possible.
• Each node has a CVM running on it that provides access to the disks for that node.
Each node’s local disks are placed into one large Nutanix Storage Pool, or what is
known as the Acropolis Distributed Storage Fabric (ADSF).
• The Service on the node responsible for all reads and writes to the direct attached
disks is Stargate. Stargate monitors the I/O times. If drives are having delayed I/O
times, then Stargate will mark the drive offline.
• Although slow I/O times can be a very good indicator of a drive going bad, it’s
sometimes not enough information to prove the drive is failing or failed. Smartctl self-
tests can be run on the offline drive to provide more extensive testing.
• After the drive is marked offline by the Stargate service, the Hades Disk Manager
service automatically removes the drive from the data path and runs smartctl self-
tests against the drive. The smartctl self-tests provide a more thorough testing of the
drive.
• If the drive passes the smartctl self-tests, the drive is marked back online and
returned to the data path.
• If the drive fails the smartctl self-tests or Stargate marks the disk offline 3 times within
an hour (regardless of smartctl results) the drive is removed from the cluster.
/home/nutanix/data/logs/hades.out
• This logfile will provide information to assist when troubleshooting drive failure issues.
Any drives marked offline by stargate will be tested automatically using smartctl by
the Hades service.
• Hades also performs other disk operations besides disk monitoring such as disk
adding, formatting, and removal. The Hades logfile will be helpful to assist with any
issues for these operations as well.
• There is also a smartctl log that can be examined for more detailed information about
the smartctl drive self-tests being performed and the diagnosis on the drive that is
failed or failing.
• To view drives marked offline search the stargate logfile for “marked offline”
entries. Review the time stamp of occurrence then review the smartctl log for the
corresponding drive self-tests for the prediction good or bad and detailed information
on the drive.
• There will also be entries in the Hades logfile specifying these actions running by
Hades to test the offline drive marked by Stargate.
Explain t he Role of Disk Manager ( Hades)
Prism – Hardware – Disk
Page
27
• Run the following command to check the status or restart the Hades service:
Short Test
The goal of the short test is the rapid identification of a defective hard drive.
Therefore, a maximum run time for the short test is 2 min. The test checks the disk
by dividing it into three different segments. The following areas are tested:
~ Electrical Properties: The controller tests its own electronics, and since this is
specific to each manufacturer, it cannot be explained exactly what is being tested. It
is conceivable, for example, to test the internal RAM, the read/write circuits or the
head electronics.
~ Mechanical Properties: The exact sequence of the servos and the positioning
mechanism to be tested is also specific to each manufacturer.
~ Read/Verify: It will read a certain area of the disk and verify certain data, the size
and position of the region that is read is also specific to each manufacturer.
Long Test
The long test was designed as the final test in production and is the same as the
short test with two differences. The first: there is no time restriction and in the Read/
Verify segment the entire disk is checked and not just a section. The Long test can,
for example, be used to confirm the results of the short tests.
Explain t he Role of Disk Manager ( Hades)
• If Stargate marks a drive offline 3 times within 1 hour, the drive is marked as failed
and removed from the cluster.
• If not, Hades will run self-tests on the drive to verify it is failed or failing. If the drive
self-tests fail Hades will mark the drive failed and remove from the cluster. If the drive
passes the drive self-tests it is brought back online.
• If the drive has failed, a replacement drive needs to be sent to the customer. The
physical drive replacement can be performed by Authorized Support Partners or by
the customer. The Support Portal has step-by-step documentation for drive
replacement procedures.
• Remove the date/time entries for when Hades thought the disk was bad.
• Restart Genesis on that local CVM.
• Stop and restart Stargate for this local CVM.
Explain t he Role of Disk Manager ( Hades)
• If the replacement drive does not come online automatically, use Prism to manually
prepare the new drive. In Prism on the On the Hardware – Diagram page select the
new drive and then click Repartition and Format.
• After re-inserting the existing disk, or any disk that has old data on it, the newly-
inserted disk will need to be manually repartitioned and formatted in Prism. The disk
cannot be used until these tasks are performed.
Explain t he Role of Disk Manager ( Hades)
Prism – Hardware – Diagram
Page
31
• This can be helpful to examine disks being marked offline by Stargate and the
outcome of the drive self-tests performed by Hades. The inventory will show each
time a drive has been marked offline and the smartctl tests performed on the drive.
Sample Output:
slot_list {
location: 1
disk_present: true
disk {
serial: "BTHC512104MM800NGN"
model: "INTEL SSDSC2BX800G4"
is_mounted: true
current_firmware_version: "0140"
target_firmware_version: "0140"
is_boot_disk: true
vendor_info: "Not Available"
}
}
slot_list {
location: 2
disk_present: true
disk {
serial: "Z1X6QR49"
model: "ST2000NM0033-9ZM175"
is_mounted: true
current_firmware_version: "SN04"
target_firmware_version: "SN04"
vendor_info: "Not Available"
}
Lifecycle Manag er (LCM)
* Can be released and on a faster schedule than AO S and upgraded out -of-
band.
• Note: These do not get updat ed w it h AOS. Only t he Int erface.
LCM Architecture
LCM Inventory & Upgrade W orkflow
Inventory:
• User clicks on t he Inventory tab inside t he LCM page.
• LCM URL is set t o t he Nut anix port al by default . If t his needs t o be changed,
cust om er can change t his by clicking on Advanced Settings.
• User clicks on Options->Perform Inventory t o p erform an invent ory.
• Progress can be t racked t hrough t he LCM page or Task view.
Update:
• If an updat e exist s for an ent it y, t hey show up on t he Available update page.
• User can choose t o updat e all ent it ies by clicking on t he Update All but t on.
• User can choose t o updat e individual ent it ies by clicking on Define.
36
Options
Available Updates Inventory
Last Updated: Unknown See all Last Updated: Unknown See all Last Updated: Unknown See all
Last Updated: Unknown See all Last Updated: 13 days ago See all
Last Updated: Unknown See all
38
No dependencies found
Update Required
Update Selected
HOST NTNX-BLOCK-1C
No dependencies found
39
More on Updates
• LCM is d esig ned t o aut om at ically account for d ep end encies
and up g rad e p at hs for firm w are.
• Fut ure up d at es w ill b e t ag g ed “ Crit ical” , “ Recom m end ed ” , or
“ Op t ional” .
• LCM w ill g roup -t og et her and p riorit ize up g rad es b ased on
low est “ im p act ” :
1. CVM ( Example: LSI SA S 30 0 8 HBA , LCM Fram ew ork)
2. Host ( Example: NIC d river up d at e)
3. Phoenix ( Example: BIOS, Sat ad om )
Information displayed in Prism comes from Insights DB.
1
Logs
For Pre-check, Inventory, and other LCM framework-related issues you should look at the
genesis.out file on the LCM leader.
• Use “grep -i lcm” to see relevant only t he relevant ent ries.
For LCM module issues look at:
• ~/data/logs/lcm_ops.out
For module download issues look at:
• ~/data/logs/lcm_wget.out
To see status of LCM tasks outside of Prism:
Ergon page
Links http://<cvm-ip>:2090/tasks
Possible states: Queued, Running, Succeeded, Failed, Aborted
3 NOTE: LCM logs are not gathered by NCC log_collector until NCC 3.1.
LCM Troubleshooting
LCM Troubleshoot ing
nutanix@NTNX-A-CVM:xx.yy.zz.81:~$ zkcat
/appliance/logical/pyleaders/lcm_leader/n_0000000004
xx.yy.zz.82
4 NOTE: LCM Leader w ill move to ot her CV Ms during some t ypes of upgrades.
LCM Troubleshooting
LCM Troubleshoot ing
Additional Info
• Mod ules d ow nload ed t o ~/software_downloads/lcm/ and t hen
cop ied t o all CVMs in t he clust er.
• W hen LCM t asks fail t hey w ill not aut om at ically rest art . Place t he
clust er int o a “clean” st at e by p erform ing t he follow ing t o resum e:
1. zkrm /appliance/logical/lcm/operation
Always consult w ith a Senior SRE before making Zookeeper
edits.
2. cluster restart_genesis
3. In Prism: Perform LCM Invent ory
4 . In Prism: Perform LCM up d at e as req uired
5
LCM Troubleshooting
LCM Know n Issues and Caveat s
Node Removal
Describe and Perform a Node Removal in a Nut anix
Cluster
• Nodes can be removed from the cluster dynamically also. Why would you want to
remove a node from a cluster?
o Maintenance, replacement, to move it to another cluster, and other reasons.
• When a node is marked for removal from the cluster, all user virtual machines and
data must be migrated to other nodes in the cluster. Until the data is migrated
throughout the existing nodes in the cluster the node cannot be removed.
• User virtual machines will be migrated to other existing nodes in the cluster using a
feature in AHV called Live Migration. In AHV, Live Migration is enabled by default.
• Live Migration moves a virtual machine from one node (hypervisor) to another. The
Live Migration of the User Virtual Machine is performed online requiring no downtime.
All virtual machines except for the CVM must be migrated off the node first. Time
consumption will depend on how many VMs need to be migrated. More VMs means
the operation will take more time.
• After all the virtual machines are migrated off the node, then the data on the disks
need to be replicated to other nodes in the cluster for access and redundancy
requirements. This data consists of Cassandra, Zookeeper, Oplog and Extent Store
data. Time consumption will depend on the amount of data stored on the disks.
• Once all the data is replicated from the disks the node is removed from the cluster.
How to Perform a Node Removal
Make sure there are at least 4 Nodes in the cluster ( Minimum 3 Nodes Required)
nCLI Method
In nCLI run:
• “host list” t o ob t ain IP Ad d ress and ID of Nod e t o Prep are for Rem oval from t he Clust er
In nCLI run
• host remove-start id=“ID” of nod e from t he “host list” out p ut
• host list and host get-remove-status w ill show
MARKED_FOR_REMOVAL_BUT_NOT_DETACHABLE w hich m eans d at a m ig rat ion st art ed and verifies
Prep aring Nod e for Rem oval.
The Node cannot be physically removed from Block until Data Migration Completes.
Hypervisor Caveats
• Acropolis Hypervisor - A ut om at ically Perform s Live Mig rat ions of UVMs d uring Nod e
Rem oval
• ESXi Hypervisor – Manual Mig rat ion of UVMs
• HyperV Hypervisor– Manual Mig rat ion of UVMs
10
Node
11
12
13
Shows a Progress Bar Percentage Complete for the Node and Disks
15
20
21
Disk Removal
Describe and Perform a Disk Removal on a
Node
23
Part II
Each Nut anix Node in t he Cluster has it s ow n Associated Disks to Store Dat a.
Nodes can cont ain Hard Disk Drives ( HDD) Solid St ate Drives ( SSD) or bot h.
• Hyb rid Nod es w ill have b ot h, for exam p le: 2 SSDs and 4 HDDs
• A ll Flash nod es w ill have just SSD Drives.
Nodes can sust ain disk failures and remain operat ional.
For Cont ainers w it h RF2 ( 2 cop ies of all d at a) you can lose 1 disk per node and Node w ill remain
operat ional.
• Req uires 3 nod es m inim um
For Cont ainers w it h RF3 ( 3 copies of all dat a) you can lose 2 disks per node and Node w ill remain
operat ional.
• Req uires 5 nod es m inim um
Before a disk can be removed from a node t he dat a has to be migrated.
• If d isk has failed t hen of course t here w ill b e no d at a m ig rat ion.
• St ill have t o p erform a d isk rem oval for failed d rives b efore you can p hysically rem ove.
Disks t hat fail need to be removed and replaced immediately !
24
25
nCLI Method
In nCLI run
• disk list –
To ob t ain t he d isk “ ID”
• disk remove-start id=
“disk ID” from Disk list out p ut
• disk get-remove-status –
To Verify Dat a Mig rat ion
0005464c-247c-31ea-0000-
00000000b7a0::415025
26
27
Data_Migration_Status 0x111
31
32
Labs
• Module 5 Hardware Troubleshooting
Thank You!
33
Thank You!
Module 6 Networking
Course Agenda
1. Int ro 7. A FS
2. Tools & Ut ilit ies 8. A BS
3. Services & Log s 9. DR
4. Found at ion 10 . AOS Up g rad e
5. Hard w are 11. Perform ance
6. Net w orking
Course Agenda
• This is the Networking module.
O bject ives
Objectives
After completing this module, you will be able to:
• Explain basic networking concepts.
• Gain an understanding and practice of IP Subnetting.
• Describe the functionality of Open vSwitch (OVS) in a Nutanix environment.
• Troubleshoot OVS issues on an AHV host.
• Become familiar with WireShark.
Net w orking Basics
Networking Basics
Net working O verview
Networking Overview
OSI Model
OSI Model
vLANs
vLANs
vLAN Tagging
vLAN Tagging
Subnet t ing
Subnetting
Subnetting Example
Interface Speeds
Interface Speeds
et htool
Ethtool
et htool
13
Ethtool
Net working Configurat ion Files
14
OVS Architecture
W hat is O pen vSw itch ( OVS) ?
16
• Sort of…
o DVS and N1K are d ist rib ut ed virt ual sw it ches – a cent ralized
w ay t o m onit or t he net w ork st at e of VMs sp read across
m any host s. Op en vSw it ch is not d ist rib ut ed – a sep arat e
inst ance runs on each p hysical host .
• Op en vSw it ch includ es t ools such as ovs-ofctl and
ovs-vsctl t hat d evelop ers can scrip t and ext end t o
p rovid e d ist rib ut ed sw it ch cap ab ilit ies.
• Op en vSw it ch p rovid es t he virt ual net w orking on t he
A HV nod es in a Nut anix clust er.
17
18
19
OpenvSwitch Architecture
OVS Basic Config urat ion
21
Bridge Commands
Port Commands
22
Port Commands
Port Commands (co nt ’d )
23
*N ot e a port resides on t he bridge; an int erface is a net w ork device at t ached t o a port .
24
Interface Commands
OVS on Nut anix
26
OVS on Nutanix
• Each hypervisor hosts an OVS instance, and all OVS instances combine to form a
single switch.
OVS on Nutanix
Related Log Files
• /var/log/messages
• /var/log/openvswitch/ovs-vswitchd.log
28
29
30
manage_ovs Script
nutanix@cvm:~$ manage_ovs --helpshort
The update_uplinks action requires the --interfaces flag, which indicates the
desired set of uplinks for the OVS bridge. The script will remove any existing
uplinks from the bridge, and replace them with the specified set of uplinks on
a single bonded port.
flags:
/usr/local/nutanix/cluster/bin/manage_ovs:
--bond_name: Bond name to use
(default: 'bond0')
--bridge_name: Openvswitch on which to operate
(default: '')
--[no]dry_run: Just print what would be done instead of doing it
(default: 'false')
--[no]enable_vlan_splinters: Enable vLAN splintering on uplink interfaces
(default: 'true')
--[no]force: Reconfigure the bridge even if the set of uplinks has not changed
(default: 'false')
-?,--[no]help: show this help
--[no]helpshort: show usage only for this module
--[no]helpxml: like --help, but generates XML output
--host: Host on which to operate
(default: '192.168.5.1')
--interfaces: Comma-delimited list of interfaces to configure as bridge uplinks, or a
keyword: all, 10g, 1g
--mtu: Maximum transmission unit
(an integer)
--num_arps: Number of gratuitous ARPs to send on the bridge interface after updating
uplinks
(default: '3')
(an integer)
--[no]prevent_network_loop: Enables network loop prevention when bridge chain is
enabled.
(default: 'false')
--[no]require_link: Require that at least one uplink has link status
(default: 'true')
Packet A nalysis Tools
• W ireshark
• tcpdump
32
33
34
35
36
37
38
tcpdump
tcpdump – St art ing a capt ure
39
Labs
Labs
41
Labs
Thank You!
42
Thank You!
Module 7
Acropolis File Services
Troubleshooting
Nut anix Troub leshoot ing 5.x
1. Intro 7. A FS
2. Tools & NCC 8. A BS
3. Services & Logs 9. DR
4. Foundation 10 . Up g rad e
5. Hardware 11. Perform ance
6. Networking
Course Agenda
• This is the AFS module.
O bject ives
Objectives
• In this Module you will learn about Acropolis File Services. We will discuss
Acropolis File Services Features and requirements. We will discuss how to setup
and implement Acropolis File Services. We will also be covering general
troubleshooting guidelines and techniques for Acropolis Files Services
including log file review.
• We will have a quick discussion on Active Directory and its role and requirements
for Acropolis File Services. Acropolis Files Services depends on Microsoft Active
Directory for user authentication to provide secure file access. Discuss the role
DNS and NTP play in Active Directory and AFS file services.
• We will cover how to implement Acropolis File Services in the Prism interface.
We will talk about where to obtain the AFS software from the Nutanix Support
Portal at the following URL:
http://portal.nutanix.com under Downloads - Acropolis File Services (AFS).
o
• Please refer to the documentation and release notes for new features in the release,
installation instructions, and resolved/known issues.
• Upload the AFS software to the Cluster in Prism and launch the AFS Setup
Wizard on the File Server page. Go through the Setup Wizard and create an
Instance of AFS on the cluster.
• During the AFS Setup we will be examining the log files on the CVMs to review the
Setup process and the detailed steps. Learn how to use these CVM log files when
encountering AFS Setup issues. The AFS log files contain errors and warnings as
to why Setup is failing. We will learn how to look for errors in the AFS logfiles to
assist in troubleshooting the setup of AFS.
• After Setup, these same logfiles on the CVMs, and another set of log files on the
AFS UVMs, can be used in troubleshooting AFS issues. These logfiles can assist
to identify problems and resolve File Services issues. The CVM and FSVM
logfiles will provide in-depth troubleshooting information to help identify and solve
issues with Acropolis File Services.
A crop olis File Services (A FS)
• The External network is for client access to the shares hosted on the FSVMs.
o The external network is also used by the FSVMs for Active Directory
authentication, NTP, and Name Server access.
• The Internal network provides the FSVMs with iSCSI Initiator access to the External
Data Services IP which leverages the high availability features of Acropolis Block
Services (ABS).
• AFS virtual machines use volume groups and disks for file server storage. A new
storage container is built named Nutanix_afsservername_ctr to store the disks in the
volume groups.
FSVM IP Address requirements:
• The External network requires one IP address per FSVM (N).
o The Internal network requires one IP address per FSVM plus one extra IP
address for internal communications (N+1).
• If there are 3 FSVMs N=3, 3 external IP addresses are required, 4 internal IP
addresses = N+1 are required (7 total). The IP addresses do not have to be
contiguous. The internal and external network can be the same or different vLANs. If
the two networks are on different vLANs make sure that the proper vLAN tags
are configured.
Acropolis File Services Features & Requirement s
File Shares
• Hom e Shares
o Default Perm issions - DA =Full access DU=Read only
• General Purp ose Shares
o Default Perm issions – DA =Full access DU=Full access
• File Shares are folders that can be accessed over the network. In AFS there are
two types of shares:
o Home Shares and
o General Purpose Shares.
Home Shares
• Home shares are a repository for a user’s personal files. By default a home share
is created for each file server during setup. The share is distributed at the top level
directories within the home folder share.
• For example, in the home share when creating directories for each user user1, user2,
user3 and so on they are automatically distributed across the file server virtual
machines hosting the file server. The User1 directory would be hosted on FSVM1,
User2 on FSVM2, and user3 FSVM3... DFS client referrals built into Windows
machines will connect the user to the hosting file server virtual machine.
Home Share Default Permissions:
• Domain administrator: Full access
• Domain User: Read only
• Creator Owner: Full access (inherited only)
• For example, when creating a share on the file server named share1, by default it will
be a general purpose share and will be placed on only one FSVM. When the next
share is created share2, this share will be placed on another FSVM. All the folders
and files created within the share are stored on only one FSVM.
• Access to the shares are controlled by Share level Access Control Lists (ACLs) and
folder and file level Access Control Lists (ACLs). The Share level permissions are
basically roles with several permissions included. The Share level permissions
provide secure access to users and groups over the network. If the user does
not have share permissions, they will not be able to access the share. Share
permissions are Full Control, Read, or Change.
• In AFS, permissions can also be set at the folder and file level. This is referred to as
the local NTFS permissions. There is a more advanced set of Access Control List
(ACL) permissions at the folder or file level. It is recommended to use the folder/file
NTFS permissions to control access. To set or modify the local NTFS permissions,
use the Security tab for the properties of the folder/file or the cacls.exe Windows
command line tool.
• Both Share level and folder/file level permissions have to be satisfied for access. If
the permissions defined are different between the two, permissions at the Share level
and permissions at the File/folder level, then the most restrictive of the two apply.
• Let’s say that at the Share level you have full control, but at the File level you have
read-only.
• What is the most restrictive of the two:
o Share level permissions or File/folder level?
• In this case, the most restrictive of full control Share level and Read file level would
be read.
• Access-Based Enumeration (ABE) is a feature introduced in Windows Server 2003.
Prior to Windows Server 2003, when creating a share of an existing folder, anyone
who had permissions to the Share, even if it was read-only at the top level, could
actually view all the files in the shared folder regardless of the local user’s NTFS
permissions. This would allow users to see files in the Share without having local
NTFS permissions.
• Access-based enumeration, when enabled, honors the local NTFS permissions from
the Shares perspective. If the user has access to the share, with ABE enabled, they
must also have local NTFS permissions to the files and folders in the share. The user
needs minimum Read permissions (local NTFS permissions) to see any folders/files
in the share. In AFS, it supports enabling/disabling ABE at the share level.
Acropolis File Services Features &
Requirement s
• In the AOS 5.1 release, the schedules can now be configured for hourly, nightly,
weekly, and monthly with different retentions for each. Only one occurrence of each
schedule type is allowed per file server. The snapshot scheduling is for the whole file
server all shares. You cannot have different snapshot schedules at the Share level.
• The default schedule starting in AOS 5.1 is:
o 24 hourly
o 7 Daily
o 4 Weekly
o 3 monthly
• Acropolis File Services snapshots are scheduled from the AFS Leader using crontab.
To troubleshoot we can look at the crontab file on the FSVMs to check the schedule
that is stored in the file /var/spool/cron/root. Do not attempt to modify the files in the
cron directory. Use the crontab command to view.
• The following command afs get_leader will get the leader FSVM. SSH to the AFS
Leader FSVM and review the crontab schedules.
o Note: You can view the crontab schedules from any FSVM.
• The following are crontab commands to review the snapshot schedules on the
FSVM.
o sudo crontab -u root –l (u=user, l=list) – The Schedules are listed for the root user.
Sample Output:
• 0 0 1 * * /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=monthly --prefix=afs-auto-
snap --keep=3 --interval=1 //
• 0 * * * * /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=hourly --prefix=afs-auto-
snap --keep=24 --interval=1 //
• 0 0 * * * /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=daily --prefix=afs-auto-
snap --keep=7 --interval=1 //
• 0 0 * * Sun /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=weekly --prefix=afs-
auto-snap --keep=4 --interval=1 //
• crontab schedules will exist on all FSVMs for the home folder default share because
that share is hosted on all FSVMs. So you will see crontab schedules on each FSVM
of the file server cluster. All other shares that are of type General Purpose will be
hosted on only one FSVM. To see the schedules for those shares, you must find the
owner FSVM of the particular share and then list the crontab schedules on that
FSVM.
• Once we know the owner of the share we can look in messages for these entries to
assist in troubleshooting snapshot scheduling and retention.
• zfs list –t snapshot - This command will list the current snapshots that exist for a
given file server instance.
Acropolis File Services Features & Requirement s
Supported Configurations
• Act ive Direct ory Dom ain Funct ion Level
• W ind ow s Server Ed it ions Sup p ort ed
W indows Client Support
SMB Versions
• 2.0
• 2.1
System Limits
SMB 2.0
• The SMB 2 protocol has a number of performance improvements over the former
SMB 1 implementation, including the following:
SMB 2.1
• SMB 2.1 brings important performance enhancements to the protocol in Windows
Server 2008 R2 and Windows 7. These enhancements include the following:
• System Limits:
Active Directory
• Securit y Provid er
• Req uired by A FS
• Join Dom ain – Must b e Dom ain Ad m in
Domains
• FQDN – Learn.nut anix.local
• Net b ios - Learn
Domain Controllers
• LDA P Server
• Aut hent icat ion Servers - KDC
DNS Dependencies to Verify and Troubleshoot
• SRV Record s
• nslookup
• _ldap._tcp.learn.nutanix.local
Verify Correct DNS Server in /etc/resolv.conf
11
• The Active Directory Services run on Microsoft Windows Servers. On the Microsoft
Windows servers, there is a role named AD DS Active Directory Services. This role
can be installed using Server Manager in the Roles and Features. After the role is
installed, the server can be promoted to be a Domain Controller for an Active
Directory Domain.
• Microsoft Active Directory uses the concept of Domains in its Directory Services
implementation. A Microsoft Active Directory Domain is a collection of the following
objects:
o Users
o Groups
o Computers
o Shares
• The Domain is also a security boundary for these objects. Within the domain, there
is one set of administrators who have domain-wide permissions to administer any of
the objects within the respective domain. There is a Global Group named Domain
Admins. Any user account that is placed in the specified Domain Admins group has
full rights to all objects in the Domain. Rights include creating, deleting, or
modifying domain objects such as users, groups, and computer accounts. Domain
Admins can also join computers to the domain. You must be a Domain Admin to join
the AFS file server to the domain.
• Active Directory Domain Names use DNS names as the naming convention. The
domain name is established when the first Windows server is promoted to be the first
domain controller in a new domain. The domain name is set by the administrator.
Learn.nutanix.local is an example of a domain name. There will also be a place to set
the backwards-compatible Netbios Domain Name for older Windows systems. By
default, it will make the Netbios name the left-most domain name separated by a
period (.) in the domain name, but you can change to anything if desired. If the
domain name is learn.nutanix.com, then by default the Netbios domain name will be
learn.
• Domain admins also have the permission to join computers to the domain which
creates a computer account object in the domain. Once a computer is joined to the
Domain, the computer can use the Active Directory Domain and Domain Controllers
for user authentication.
• Domain Controllers are the Windows Servers hosting the Active Directory
Domain database and provide the authentication services to the computers and
users. The Windows workstations servers and AFS use DNS services to locate
Domain Controllers for joining and authenticating with the Domain Controllers.
• Active Directory Domains use DNS names so that the Domain can be hosted in a
DNS forward lookup zone. DNS forward lookup zones provide name-to-IP address
mapping lookups. Reverse lookup zones provide IP address to name mapping
lookups.
• The Domain Controllers provider several services to the Domain. These services
include LDAP for directory queries, Kerberos Security protocol for authentication, and
kpasswd for user password changes. In the DNS forward lookup zone, to publish
services in the domain, the domain controllers via the netlogon service will perform
Dynamic DNS updates to the forward lookup zone file for their “A” host record and
“SRV” service records.
• The SRV records tell what services the Domain Controller provides. These SRV
records are used by the Windows workstations servers and AFS to find the Domain
Controllers for authentication services. The SRV records are also used by the
computers and AFS when joining the domain.
• nslookup can be used to test and troubleshoot SRV record lookup. From any FSVM
you can run the nslookup command to verify that the SRV records exist in the DNS
forward lookup zone. Ping the “A” record to verify name resolution and connectivity.
• ssh to the FSVM using Putty and run the following commands to test DNS:
• nutanix@NTNX-16SM13150152-B-CVM:10.30.15.48:~$ nslookup
• > set type=srv
• > _ldap._tcp.learn.nutanix.local
• Server: 10.30.15.91
• Address: 10.30.15.91#53
• Service = 0 100 389 – 0 is the priority, 100 is the weight, and 389 is the LDAP default
port. dc01.learn.nutanix.local is the host providing the service.
• If DNS replies with answers, then the DNS settings are correct and the FSVM can
find the Domain Controller services in the forward lookup zone.
• If DNS does not reply, verify that the FSVM is configured and using the correct DNS
server. The FSVM stores the DNS settings in the /etc/resolv.conf file.
How t o Im p lem ent A FS
File Server
• If the cluster is running AHV as the hypervisor, you will also find the +Network
Config button that you can use to set up the vLAN tags (AHV calls them a
Network).
How to Implement Acropolis File Services
• At the top of the dialogue box is the License Warning: The current license does not
allow the use of this feature.
• In Nutanix, even if the license has not been installed you can still set up the feature.
o You will just receive a warning.
o To be in compliance with your Nutanix software license agreement Acropolis File
Services requires a Pro License.
• The New File Server Pre-Check dialogue box has a link to the AFS requirements
Learn more.
• After the AFS binaries have been uploaded to the cluster and an external data
services IP address has been configured you can click Continue to set up the file
server. You must click on the EULA blue link before clicking Continue!
How to Implement Acropolis File Services
“Basics” Page
• File Server Nam e
• File Server Size ( Minim um 1TiB)
• Number of File Server VMS (Minimum 3)
• Number of vCPUS Per File Server VM (Default 4)
• Mem ory p er File Server VM ( Default 12GiB)
15
• The file server name which has to be unique for the cluster.
• The File server name is the computername/hostname that will be used to access
the file server shares.
To connect to an actual share on the file server, a Universal Naming
Convention (UNC) path has to be specified.
o UNC Format:
• \\servername\sharename
• …where servername will be the File Server Name specified on the
Basics tab.
• The sharename is created at the time the share is created.
• The sharename has to be unique as well.
• Aliases can also be defined in DNS for the file server name.
• The file server size is how much space the file server can use for storage for the
files.
o The file server size is minimum 1 TiB (tebibyte) or greater.
• This setting can be increased later if more storage space is required.
• The number of file server VMs to set up initially to support the clients:
o A minimum of 3 is required to create a clustered instance of the file server.
• There should be no more than one FSVM per node per instance.
o Acropolis File Services scale-out architecture has the ability to add more FSVMs
for scalability as the load demands increase or if new nodes are added to the
existing cluster.
• The Acropolis Files Services Guide on the Support Portal under Software
Documentation has overall guidelines for memory and vCPU based on the
number of users.
How to Implement Acropolis File Services
Basics Page
16
• Note: If the DNS or NTP settings are misconfigured, the domain join task will fail
during initial setup.
How to Implement Acropolis File Services
18
“Summary” Page
• Prot ect ion Dom ain (Default NTNX-fileservername)
19
• The Join AD Domain tab is where you input the domain name, username and
password of a user account with domain admin rights.
o Use the SamAccountName.
• To review again:
o What service does the file server use to find the Domain Controllers to join the
domain and perform user authentication for file share access?
• The Summary page summarizes the settings and also by default builds a Protection
Domain.
o The Protection Domain is named NTNX-fileservername.
• Protection Domains are a feature in AOS to setup replication of VMs to a remote
cluster in a remote site for disaster recovery.
How to Implement Acropolis File Services
20
Summary Page
22
23
Troubleshooting Scenario
• In the setup of the file server, let’s make some changes.
o Enter an incorrect address for the DNS servers.
o Now try to perform a domain join and see what happens.
• On the File Server tab, click the file server to highlight it and then over to the right
click on Update Link. This will bring up the Update File Server dialogue box. Click
the radial button to the left of Network Configuration and then click Continue.
• The Update File Server Network dialogue box is displayed with the Client Network
page to change any external IP addresses, Netmask, Default Gateway or modify
existing DNS/NTP settings. On this page change the DNS Servers to incorrect IP
addresses.
• Click Next to proceed to the Storage Network page. Do not change any settings on
this page. Click Save to save your changes.
• Now we will try to join the domain with correct domainname and credentials.
Troubleshoot ing Scenario (cont .)
24
• The DNS server settings point to a non-existent or errant DNS server. This fails the
Setup. The File server virtual machines hosting the shares need the correct DNS
settings to contact the Domain Controllers. The file server virtual machines use DNS
to query for the SRV records of the Active Directory domain controllers for
domain join operations and user authentication for share access.
• Tasks in Prism are very detailed and a good starting point to look for errors and
troubleshoot AFS deployment issues.
Troubleshoot ing Scenario (cont .)
25
• The NTP server settings point to a non-existent or errant NTP server. This Fails the
Setup. The File server virtual machines hosting the shares need to be time synched
with the Domain Controllers. The Kerberos authentication protocol requires the
FSVM’s clocks to be within five minutes of the Active Directory domain controllers
for successful authentication.
• Tasks in Prism are very detailed and a good starting point to look for errors and
troubleshoot AFS deployment issues.
Troubleshoot ing
Troubleshooting
• During the initial setup of the file server, several tasks are created to carry out the
deployment of an instance of Acropolis File Services on a Nutanix Cluster.
o FSVMs must be created on multiple nodes of the cluster.
• A minerva (still uses the old name for AFS) Leader CVM is elected among all CVMs
in the cluster.
o The Leader is responsible for scheduling and running these tasks to set up and
deploy the file server instance.
• The fileserver logfile named minerva_cvm.log on the minerva leader will contain the
information for each task being performed during setup.
o This logfile will contain information and errors to assist in troubleshooting AFS
setup issues.
Furthermore, the minerva logfiles on both the CVMs and NVMs should be
consulted for help with troubleshooting Acropolis File Services.
Run the following command on any CVM to find the minerva leader:
• SSH to one of the CVMs
o $minerva get_leader
• The file server log file on the leader CVM is where to look for setup issues.
o Location: /home/nutanix/data/logs
o Filename: minerva_cvm.log
There are a lot of prechecks built in during setup of the file server. Reviewing the
minerva_cvm.log file on the minerva leader will provide details for all the setup tasks
and pre checks. One pre check is to verify the IP addresses internal and external using
ping to see if already in use.
The minerva_cvm.log file will also provide insight into the Active Directory domain join
task failures. Incorrect NTP settings will fail the authentication to join the domain.
Incorrect DNS settings to find and use the Active directory Services. Typing in the
wrong domain name username or password credentials. If any of these settings are
misconfigured the domain join task will fail and the log should indicate which type of
failure.
Previously we looked in Prism for Setup issues. In the below example, here are the
correct settings that we can change to incorrect values to cause setup issues. We will
then examine different domain join errors in the minerva_cvm.log on the minerva leader.
Really an extension of the minerva_cvm.log on the CVMs, there is also a logfile named
minerva_nvm.log on each individual NVM with specific tasks from the minerva leader.
We will cover the NVM logfile in more detail later.
Troubleshoot ing
Troubleshooting
• Examining the minerva_cvm.log file on the leader can be very helpful in overall
troubleshooting, but you should review the logfile on all CVM’s for any other errors.
There will be a lot of duplication from the leader logfile to the other CVM logfiles in
the cluster but is does not hurt to check.
• After setup, the minerva_cvm.log file on the minerva leader will be the first place to
look for errors. Any of the following will create file server tasks:
o Updates to the File Server Configuration
o Join and Leave Domain Operations
o Share Creation
o Quota Policies
• The minerva_cvm.log on the minerva leader will provide verbose logging to help
troubleshoot file server tasks. The file server tasks will actually be implemented on
the FSVM virtual machines that make up the clustered file server. On the FSVMs,
there is a local logfile named minerva_nvm.log to examine for file server task errors
specific to that FSVM.
Troubleshooting
• Now that we have just covered the important logfiles to assist in troubleshooting on
the CVMs, there is another set of logfiles on the FSVMs. The logfiles on the local
FSVMs will help troubleshoot issues specific to that FSVM. The logfiles are in the
same location as on the CVMs.
• When troubleshooting the local FSVMs there are two logfiles that will provide the
most help.
o The minerva_nvm.log and minerva_ha.log logfiles will assist in most
troubleshooting cases when it comes to file server tasks, high availability, iscsi
connectivity to volume groups, and vDisks.
minerva_nvm.log
• File server tasks are created on the minerva leader then implemented on the FSVM
virtual machines. For file server tasks being run on local FSVMs, review the
minerva_nvm.log to assist in troubleshooting. Look for errors to troubleshoot issues.
Any updates to the configuration including network settings changes, share
creation/deletion, domain join and unjoin operations, and Quota Policies are logged.
minerva_ha.log
• The minerva_ha.log logfile will help to troubleshoot the overall functions on the
FSVMs.
• The FSVMs are configured in a cluster for HA (High Availability). This provides fault
tolerance to the shares in case of an FSVM virtual machine failure. The file server
virtual machines use HA.py to redirect storage access from the failed FSVM hosting
the share’s vDisks to an existing healthy FSVM virtual machine. iSCI is the protocol
being used to access the storage (vDisks) on the node of the failed FSVM.
• Ha.py events will be logged on all FSVMs. They also will indicate whether HA is
enabled on the FSVM.
• FSVMs use iSCI block storage access to vDisks in ADSF. The minerva_ha.log will
log any of these events as well. The FSVM uses the Linux iSCSI software initiator for
ADSF iSCSI access. This log file will contain all the iscsiadm commands to
troubleshoot the iSCSI software initiator in the FSVM.
Troubleshooting
• During initial deployment of Acropolis File Services, one of the requirements is to join
the File Server to the Active Directory Domain for authentication and access to the
File Server shares. The domain join process needs to be established before any
access will be provided to shares on the File Server. During Setup, if the domain join
process fails, the FSVMs will be up and running but no share access will exist.
• Here are some of the things to check for when troubleshooting domain join issues
with Acropolis Files Services:
• For the domain name make sure you are using the Fully Qualified Domain Name
and that it is the correct domain name for the Active Directory Domain. Do not use
the Netbios domain name.
• SSH into the FSVMs and verify DNS settings and verify Service Location record
lookup.
• cat /etc/resolv.conf to confirm DNS server settings.
• Use nslookup to check A records and SRV records.
• Check the DNS Server and verify correct “A” and “SRV” entries in the forward
lookup zone for the AD domain.
• Check the NTP settings on the File Server. If possible use the Domain Controllers for
your NTP servers.
• Must be a member of the Domain Admins group. Can Verify with MMC Active
Directory Users and Computers.
• AOS 5.1 can be configured to allow a non-Domain Admin to join the File Server to
the Domain. There is a gflag that can be set on the CVMs to allow for this in AOS
5.1. The File Server can be deployed without joining the Domain. Through Prism or
from NCLI, the Domain Join task can be done later and the user does not require
Domain Admin rights. There are a few prerequisites.
• gflag has to be set on CVMs to allow for non-Domain Admin permissions to join the
domain.
• The computer account for the File Server Name has to be manually created in the
Active Directory Domain prior to the domain join task.
• The User who will perform the Domain Join task needs permission on the
computer account in Active Directory to reset password.
30
Troubleshooting
• In the Join Domain dialogue box click Show Advanced Settings.
• Organizational Units are containers within the Active Directory domain to group
security principals for delegating administration tasks. They are also used to apply
Group Policy Settings. When the file server joins the Active Directory domain a
computer account will be created in the domain. By default the account will be stored
in the computers system container. You can specify what organization unit to use to
store the computer account.
• Hint: You can also use Active Directory Users and Computers to move the
computer account later.
• The Overwrite the existing AFS machine account (if present) check box will allow
the Join Domain even if the computer account exists in Active Directory. Instead of
checking this option, you can manually delete the computer account in Active
Directory.
Troubleshoot ing
Troubleshooting
• Acropolis File Services relies on DNS for proper hostname resolution. Not only is
DNS required for Active Directory but it is very important that the AFS file servers
register IP Addresses in DNS for client hostname resolution.
• On any FSVM the following command can be run to get the correct Host to IP
address mappings that are required in DNS.
o afs get_dns_mappings
• The Hostname will point to each external IP address configured for each FSVM in the
file server cluster. In the example examine the following:
Troubleshooting
• Here is a list of possible failures (issues) that an administrator or end user may face
while accessing file services from SMB clients.
Authentication Failures:
• Authentication is the first and foremost thing that happens before accessing any
share on the file server. By default, file services are available to all domain users.
Trusted domain users also can access the services based on the trust configuration.
• If the primary domain (the domain to which AFS is joined) user authentication fails,
there could be multiple reasons for that. We first might want to:
o Check the domain join status.
o Check the share access using machine account credentials works fine.
o Check the clock skew with the communicating domain controller (DC).
o Check smbd and winbindd status.
• You can simply run scli health-check on any of the FSVMs to validate the above
things. Here is the sample output of this command:
o nutanix@NTNX-10-30-15-247-A-FSVM:~$ scli health-check
o Cluster Health Status:
o Domain Status: PASSED
o No home shares found!
o Share Access: PASSED
Logon Failures:
• There could be multiple reasons for logon failures. So here are a few things which
could be the reason.
Authorization issues:
•
• Access Denied errors: This is basically a post-authentication (or session setup)
issue where accessing share root is failing with access denied, or specific files or
folders are preventing the logged-in user accessing them.
•
• Insufficient Share level permissions: You might want to check if the share level
permissions are not open for everyone, then check if the specific user or the
group he is part of has permissions (at least read access).
• Share level permissions can be viewed on the properties windows of share. There is
no other tool to view them from AFS cluster side. Nutanix will probably develop
something in future.
• Insufficient NTACLs on share root: User or group must have access (at least read)
on share to successfully mount the share.
• NTACLs on share level permissions can be viewed on windows client from mount
point properties.
• NTACLs for Home share root are stored only in insightsDB. Currently there is no
command tool (or CLI) to dump the ACLs for home share root. Otherwise, we can
use smbcacls to dump the ACLs for any other file system objects.
•
Syntax and examples:
• #sudo smbcacls //FSVM-external-IP/share “full path to share root or folder/file path” -
Pr
o You can get the path from “sudo zfs list”.
• Lack of DFS support on client: We need to make sure the client is DFS-capable.
By default all Windows clients (from Win 7 onwards) are DFS-capable. But this
capability can be disabled through the Windows registry setting. Make sure that this
is not turned off.
• TODO: Actual registry path.
o HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Mup
• DNS resolver issues: If clients connect to the server using the hostname (like \\AFS-
server.domain.com\share1), then the referral node information also will be in host
FQDN format. So the client should be able to resolve the referred host FQDN from
the DNS server it connects to. If not, share access will fail. If the client doesn’t have
access to the DNS server, then it can access the file server share using the IP
address (like \\10.1.2.3\share1). The end user needs to make sure they can ping the
AFS hostname if they are trying to connect using hostname.
•
• Possible DNS Scavenging: Currently we refresh DNS IPs every 5 days. If the
scavenging is enabled on the DNS server is less than this interval, then the DNS
server might have deleted the DNS mappings for AFS. In that case, we will need to
re-register the DNS IPs for the cluster.
•
• To get the listing, we can run the command afs get_dns_mappings on any of the
FSVM nodes in cluster.
• Rename: If the top level folder happened to be a remote node hosting it, then the
client will not be able to rename it.
• Recursive Delete: Currently all TLDs are exported as reparse points. So when the
client tries to delete a TLD, it expects the folder to be empty. If not, delete fails with
a directory not empty error. To delete the non-empty TLD, the user will need to
delete the data at one level down and then delete the actual TLD.
• ACL propagation from share root: This is again for the same reason of DFS
limitations. When permissions are changed on home share root with some existing
TLDs, permissions made at shareroot will not be propagated to all TLDs. In fact they
will be propagated to TLDs which are local to the node that the client connected to.
•
• We are developing an MMC plugin to perform these all these operations without any
issues. This is planned to come in a post-Asterix.1 release.
•
• Non-domain admin user restriction: Due to these limitations, we have designed so
that non-domain admin users will not have the ability to perform Rename TLD,
Delete TLD, and Change permissions on home share root by default.
•
• If any non-domain administrator is managing the home share, they need access to
perform all the above operations. Then we can use the following commands to
enable access:
•
• #net sam addmem BUILTIN\\Administrators <member>
o Example: sudo net sam addmem BUILTIN\\Administrators DOMAIN_NB\\user1
•
• #sudo net sam listmem BUILTIN\\Administrators
o Verify that newly-added user or group is shown in the listing.
Miscellaneous issues:
• Reaching the connection limit: Currently we have a limit on the number of max
SMB connections that can be supported. This limit changes based on the FSVM
memory configuration. We start with a minimum of 250 connections (12G) per node
and keep increasing the limit to 500 (16G), 1000 (24G), 1500 (32G), 200 0(40G) and
4000 (>60G).
• We log a message in smbd.log when we reach the max limit:
o Check for message Maximum number of processes exceeded.
• Unable to write due to space limitations: If total used space used or quota is
exceeded then we prevent further write operations.
• Folder visibility: If any user complains that they are unable to view some of the files
or folders while other users are able to, then it could be the case that ABE is
enabled on the share and the user or that the group that he is part of does not have
READ access or has an explicitly-denied READ on those file system objects.
•
• In those cases, we need to check ACLs on those parent objects and see if there
are any denied permissions.
• You can run the scli share-info command to see the ABE status on all the shares.
• Here is an example that shows the ABE status.
o Share Name: dept1
o Share Type: DEFAULT
o ABE Enabled: DISABLED
o NVM Host: NTNX-mnrvatst-164525-1
•
• No SMB1 client support by default: Currently we disabled the support for SMB1
clients. If any clients are using the SMB1 protocol they will be denied the access to
share. SMB1 can be enabled using scli.
•
• The scli command to enable SMB1 is:
o #scli smbcli global “min protocol” NT1
•
• MAC and Linux client limitations: In Asterix release, MAC or linux clients are
officially not qualified but users can mount the AFS shares from MAC or linux clients.
But these clients have some limitations while accessing Home shares.
•
• MAC Client limitations:
• From MAC clients the remote Top Level Directories (TLDs) cannot be created on
Home shares. This client doesn’t understand the folder DFS referrals during the
creation of these folders. But they can get into any of the TLDs. And they will be able
to browse the TLDs.
• Sometimes, MAC clients will all not be displaying all TLDs on home share. This is
due to some TLDs sharing the same file ID. MAC will keep only one TLD (among
those entries sharing the same file ID) and will filter the rest. This is being addressed
in the Asterix.1 release.
•
• Linux client limitations:
• Similar to MAC, Linux smb2 clients have issues in following the folder-level DFS
referrals. Creating, Delete, or browsing remote Top Level Directories is not
feasible, whereas there is no issue in accessing general purpose shares. Here is the
command to perform smb2 mount from linux:
•
o #sudo mount -t cifs //<AFS-host FQDN>/<share-name> <local mount path> -o
domain=<domain netbios>,user=<AD user name>,pass=<password>,vers=“2.1”
•
• Example: sudo mount -t cifs //osl-fileserver.corp.nutanix.com/minerva
/mnt/afs_minerva -o domain=CORP,user=<AD user
name>,pass=<password>,vers=“2.1”
Troubleshoot ing
Troubleshooting
Logs of Interest
• CVM Log s
• Locat ion: /home/nutanix/data/logs/
o minerva_cvm.log
• FSVM Log s
• Locat ion: /home/nutanix/data/logs
o minerva_ha.log
o minerva_nvm.log
• ncc log_collector fileserver_logs
34
Logs of Interest
Labs
Module 7
Acropolis File Services Troubleshooting
35
Labs
Thank You!
36
Thank You!
Module 8
Acropolis Block Services
Troubleshooting
Nut anix Troub leshoot ing 5.x
1. Intro 7. A FS
2. Tools & Utilities 8. A BS
3. Services & Logs 9. DR
4. Foundation 10 . AOS Up g rad e
5. Hardware 11. Perform ance
6. Networking
Course Agenda
• This is the ABS module.
O bject ives
Objectives
After completing this module, you will be able to:
• Describe the difference between SAN and NAS Protocols
• Explain the Acropolis Block Services Feature
• Define SAN Technology Terms
• Explain the iSCSI Protocol and how it works
• Name the Supported Client Operating Systems
• Describe how ABS Load Balancing and Path Resiliency Work
• Explain How to Implement ABS on Windows
• Explain How to Implement ABS on Linux
• List Logs of Interest
• Discuss Key Points
• Perform Labs
SA N and NA S
• SAN provides block storage over a network to hosts. To provide block storage over
the network to hosts requires a transport or protocol to carry the data and a physical
network. When it comes to SAN access there are three protocols that can be used for
block access. We will have a quick look at the different SAN protocols.
• FCP or Fiber Channel Protocol was invented in the early 1990’s as a protocol to run
over high speed networks to carry SCSI-3 protocol packets from initiators (hosts) to
targets (Nutanix Clusters). Fiber Channel Protocol is a truck or vehicle to carry the
SCSI-3 protocol over the network to the storage. Nutanix clusters do not support
FCP.
• Fiber Channel Protocol uses FCP HBA’s host bus adapters and Fiber Channel
Switches to provide the physical network for the FCP Protocol. The FCP Protocol is
also lossless. Buffer to buffer credits and pause frames guarantee no packet loss.
• Fiber Channel Over Ethernet FCOE protocol encapsulates the FCP protocol into an
Ethernet frame and using Ethernet equipment as the network. FCOE allows FCP to
run over Ethernet networks. Quality of Service and pause frames have to be
implemented on the Ethernet network to provide a reliable lossless network. Cisco
Nexus 5k and 9k are two Ethernet switch examples that support FCOE. Nutanix
clusters do not support FCOE.
• iSCSI SAN protocol uses the TCP transport protocol of TCP/IP to carry the SCSI-3
packets over the Ethernet network. Hosts can use iSCSI hardware HBA’s or standard
network interface cards with an iSCSI software Initiator installed on the host.
• iSCSI uses TCP/IP as the transport. TCP/IP does not guarantee packet delivery on
the receiving end. If the receiving host receives more packets than it can handle it will
start dropping the packets. When the sender does not receive the acknowledgement
from the receiver then it will resend the packets and could potentially overrun the
network. This is the main difference between iSCSI and FCP. Nutanix Clusters
support the iSCSI protocol for block access.
• What is NAS or Network Attached Storage? NAS is file sharing protocol that allows
access to files over the network. There are two NAS protocols CIFS Common
Internet file System developed by Microsoft and Network File System developed
originally by Sun Micro Systems (Not Supported on Nutanix Clusters).
• Both protocols, CIFS and NFS use TCP/IP as the transport over the network. Today
Nutanix supports CIFS or SMB server message blocks. Acropolis File Services is the
feature in Nutanix to provider file share access for Windows clients.
Explain t he Acropolis Block Services
Feat ure
Acropolis Block Services
iSCSI SAN
• LUNs
External Data Services IP – Virtual Target
Volume Groups
• vDisks/ Disks/ LUNs
• LUN – Log ical Unit Num b er
Use Cases Supported by ABS
• iSCSI for Microsoft Exchang e Server
• Shared St orag e for W ind ow s Server Failover Clust ering W SFC
• Bare Met al Hard w are
6
• All initiators will use the External Data Services IP address for target discovery and
initial connectivity to the cluster for block services. ABS exposes a single IP address
to the cluster as the virtual target for iSCSI connectivity. This allows for seamless
node additions without disruption to client connectivity and no need for client
reconfigurations. ABS also provides automatic load balancing of
vDisks/disks/LUNS across CVMs in the cluster.
• The External Data Services IP acts as an iSCSI redirector and will dynamically
map the Initiator to one of the CVM’s external IP address for vDisk/disk/LUN (Logical
Unit Number) access on that node. The Initiator only needs to connect to the single
External Data Services IP address and redirection occurs behind the scenes with
ABS.
• ABS will also provide intelligent failover in the event the CVM on the node where
the vDisk/disk/LUN is currently being accessed goes down. ABS will redirect the
Initiator connection to surviving CVMs for vDisk/disk/LUN access without any
disruption. Redirection to surviving nodes can be immediate or minimal delay of up to
10 seconds.
• Acropolis Block Services uses the iSCSI SAN protocol to provide block access
over the network to Nutanix Clusters. iSCSI leverages and uses the TCP/IP
transport protocol over the network to carry the iSCSI traffic.
• Using ABS does not require the use of multipath software on the Initiator but is
compatible with existing clients using MPIO.
• Acropolis Block Services exposes ADSF storage using Volume Groups and
disks. You can think of a Volume Group as a grouping of vDisks/disks/LUNS
mapped to a particular host or hosts (if using Microsoft Windows Server Failover
clustering) called LUN mapping/masking.
• The Volume Group also serves as an Access Control List. After the Volume Group
is created, the initiator has to be added to the Volume Group to gain access to any
disks. The initiator’s IQN (iqn.1991-05.com.microsoft:server01 iSCSI Qualified
Name) or IP address must be added to the Volume Group to allow host access. This
is called LUN masking to control which hosts see which disks on the Nutanix
Cluster.
Explain t he Acropolis Block Services
Feat ure
• The Initiator is the host that is going to connect to the storage using a block
protocol. The only block protocol that is supported today on Nutanix Clusters is
iSCSI. The host can be a virtual machine or bare metal.
• Hardware iSCSI initiator HBAs (Host Bus Adapter) will have the iSCSI protocol built
into the Adapter. The hardware Initiator HBA will also have its own processor to
offload the iSCSI and TCP/IP cycles from the Host. Several Vendors manufacture
hardware iSCSI HBAs including Qlogic and Emulex. A hardware HBA at best can
offer benefits if the host processor is too busy. The hardware iSCSI HBA will
offload the iSCSI and TCP/IP processing from the host processor to the HBA.
• Software iSCSI Initiator uses a standard network interface card and adds the iSCSI
protocol intelligence thru software. All of the modern operating systems today have
iSCSI software Initiators built-in.
• The Target is the Nutanix Cluster. In ABS, the Target is actually a virtual IP
Address to the External Data Services IP. The External Data Services IP is used
for discovery and the initial connection point. ABS then performs iSCSI redirection
mapping to a particular CVM.
• In iSCSI SAN, before Initiators can gain access to disks in Volume Groups a point to
point connection needs to be established between the Initiator and the Target, or in
the case of ABS a virtual target. To discover the target, in the iSCSI software the
Initiator will be configured to discover ABS by the External Data Services IP. The
discovery happens over port 3260 then redirected to the external IP address of an
online CVM to host the vDisk on port 3205. These ports have to be open to access
Nutanix Clusters for ABS.
• Once the Initiator discovers the target by the IP address, the Initiator has to create a
session to the target and then a connection over TCP. In the case of ABS, the
session and connection will be to the single External Data Services IP.
• When troubleshooting initial target discovery and connectivity there are several things
to verify:
o Is the External Data Services IP address configured on the cluster?
o What is the IP address of the External Data Services IP and is that the address we
are using for discovery?
o Are the ports open between the initiator and target?
o Is the Target IP address reachable from the initiator?
Define the iSCSI Protocol and SAN Terminology
• Multipath software had to be installed and configured on each host to provide the
load balancing and path resiliency to the Nutanix Cluster.
• Cluster changes such as node additions and removals required manual iSCSI
reconfigurations on the hosts.
• In AOS 4.7, the Acropolis Block Services feature was introduced to the Nutanix
Cluster. With ABS, there is a single External Data Services IP Address that
represents a virtual target for the entire cluster. Initiators only have to discover the
single External Data Services IP and ABS uses iSCSI redirection on the back end to
map Initiators to the CVMs. The iSCSI redirection is transparent to the host.
• ABS exposes a single virtual target IP address for clients. Node additions and
removals will not require a host reconfiguration. If new nodes are added to the
cluster, hosts still configure iSCSI to use the External Data Services IP address.
iSCSI redirection will see the new nodes and load balance new disks to the CVMs on
the new nodes.
• By default, any node in the cluster can provide Target services for block access.
Supported Client O perat ing Systems
Supported
Clients
11
3. Log on to CVM 1
VM A
CVM 1
VG 1
Disk 1
(VG 1)
iSCSI Initiator
Data Services IP
1. Logon
Request
Disk 2
(VG 1) 2. Logon CVM 2
Redirection
(CVM 1)
Disk 3
(VG 1)
CVM 3
12 Active (Preferred)
VG1 VTA
Virtual Targets
Disk 1
CVM 1
(VG 1 VTA)
iSCSI Initiator
Data Services IP
VG 1 VTB
Disk 2
(VG 1 VTB) CVM 2
Disk 3
VG 1 VTC
(VG 1 VTC)
CVM 3
Active (Preferred)
13
Data Services IP
2. Log on
Disk 2 Request
VG 1 VTB
(VG 1 VTB)
3. Log on CVM 2
Redirection
Disk 3 (CVM3)
(VG 1 VTC) 4. Log on
(CVM 3)
VG1 VTA
VG 1 VTC
Active (Preferred) CVM 3
Preferred (Down)
14
Cluster
Redirected to
Common CVM
15
ABS on Windows
How to Implement ABS on W indows
Volume Group
Prism
Storage Page
Volume Group
18
20
22
24
Virtual Target
26
o iqn.2010-06.com.nutanix:windows-90e7bb28-755c-4e02-833b-b786d634fe6d-tgt0
• The highlighted portion was the iSCSI Target Name Prefix setting from the Volume
Group.
• Highlight the IQN of the Target in the Discovered Targets windows. If the Target is
not showing up then check to see if a Volume Group with disks in it was created for
the Initiator and mapped properly. If not, do so before proceeding. Once the Volume
Group is created and has disks then click Refresh to rediscover the target.
• Highlight the Target and click Connect to log in and make the point-to-point iSCSI
connection to the virtual target.
• The Add this connection to the list of Favorite Targets checkbox should never be
unchecked. Checking this box makes the iSCSI bindings persistent. Persistent
bindings will automatically reconnect in the case of computer restarts.
• The Advanced button allows for selection of a specific IP address on the Initiator to
use for the iSCSI connection. This is useful when the initiator is configured with more
than one IP address, in the Advanced Settings dialogue box there is the ability to
manually select which IP address on the Initiator to use for the iSCSI connection.
• When the settings are complete click OK. The status now changes from Inactive to
Connected.
• The Properties button will show the session and connection information. Target
portal group information is also in the properties.
• At this point, now the Initiator is SAN-attached to the Nutanix Cluster and has access
to the disks in the Volume Group.
ABS on W indows – Prepare t he DISK
Disk Manager in
Leaf Pane Computer Management MMC
Tree
Pane New Disk / LUN
27
ABS on Linux
How to Implement ABS on Linux
• Sample output:
o # lsblk --scsi
o NAME HCTL TYPE VENDOR MODEL REV TRAN
o sda 1:0:0:0 disk NUTANIX VDISK 0
o sdb 4:0:0:0 disk NUTANIX VDISK 0 iscsi
o sr0 0:0:0:0 rom QEMU QEMU DVD-ROM 2.3. ata
How to Implement ABS on Linux
• mkfs is the Linux utility to make a filesystem on the new partitioned disk.
o Linux uses the ext4 filesystem.
• Mkfs.ext4 /dev/devicename_partition_number
• For example if the disk is sdb and the partition is number 1 then the path to the
device is the following:
• /dev/sdb1
How to Implement ABS on Linux
• Use the mount command to mount the disk into the filesystem.
• Type the following command to mount the disk to the new folder:
o #mount /dev/sdb1 /mnt/iscsi
• To make this persistent across reboots then we have to place the mount in the
/etc/fstab file.
o #nano /etc/fstab
o mount /dev/sdb1 /mnt/iSCSI ext4 defaults,_netdev 0 0
• _netdev flags that it is a network device
Troubleshooting
Troubleshoot ing - Storage
Troubleshooting - Storage
• When troubleshooting Acropolis Block Services start with network connectivity.
o The Initiator relies on the network for access to the disk on the target.
• Any type of network issue can cause access problems, performance issues,
or disconnections to the disk.
• ACLI is the Acropolis Command Line which is accessible via ssh to any CVM in the
cluster.
• On the cluster, there are several ACLI commands to verify Initiator connectivity and
provide details about the iSCSI client.
o If the Initiator does not see the disks or experiences disconnects, verify that the
client is currently connected to the cluster. Run the following command to
retrieve a list of iSCSI clients connected to the cluster:
• If the command output does not list the client of interest, then check the iSCSI
software Initiator on the client and verify connectivity.
o Ask, “Did anything change—IP addresses, VLAN tags, network reconfiguration,
and so forth?”
• Check the Data Services IP address.
o The virtual Data Services IP address will live on one CVM in the Cluster and will
automatically fail over if the CVM hosting the Data Services IP address goes
offline.
o The IP address is assigned to the virtual network interface eth0:2. Allssh and grep
for eth0:2 to locate the CVM hosting the Data Services IP address.
o ping the Data Services IP address to test network connectivity.
Troubleshoot ing - Client
Troubleshooting - Client
• Check and verify the network settings on the Client.
o Any network problems or misconfigurations will cause disk access problems.
• The iSCSI Software Initiator software installed on the Windows and Linux clients
can be helpful in diagnosing iSCSI network connectivity issues.
o Use the iSCSI utilities to confirm Discovery and Session connectivity.
• Try to ping the Target’s IP address form the client to verify TCP/IP connectivity.
o Verify the IP addresses of both the Target and the Initiator.
• Are the right IP addresses being used?
• Are the network ports configured for the proper VLAN?
• Have any switch configuration changes been implemented?
- Review all IP settings such as IP address, Subnet Mask, Gateway, and so
forth.
Verify Connectivity on
the Windows Initiator
Click to Highlight
38
• iscsiadm –m discovery
• The node and session options will return Target information and TCP connectivity
info.
• The iscsiadm –m node command shows the IP address Port and IQN for the ABS
Virtual Target.
• The iscsiadm –m session shows the actual session via tcp.
• The /var/lib/iscsi filesystem has several directories and files to show iSCSI
connectivity.
o The ifaces subfolder shows the iSCSI interface being used Default.
o The nodes subfolder shows target IQN and IP information.
o The send_targets subfolder has the all the iSCSI detailed settings.
Troubleshoot ing – LUN Mapping
• Initiators can be mapped to the Volume Group by either their IQN or IP Address.
Mapping the IQN or IP address is like a whitelist or Access Control List (ACL) of
which client has access to the Volume Group.
• If Initiators are being mapped using the IQN then first obtain the IQN of the Initiator.
On Windows hosts, use the iscsicli command line tool or the GUI tool on the
Configuration tab. For Linux hosts read the contents of the
/etc/iscsi/initiatorname.iscsi file.
• If the client is mapped using an IP address then use ipconfig for windows and ifconfig
for Linux to obtain IP addresses of the client.
Troubleshoot ing – LUN Mapping
Get Volum e Group VDISK Det ails
<acropolis> vg.get Windows $ vdisk_config_printer -nfs_file_name 99bff767-1970-4e37-b84f-
Windows { afa585650f21
annotation: "" vdisk_id: 124055
attachment_list { vdisk_name: "NFS:1:0:302"
client_uuid: "5ba3f873-7c07-4348-971f-746ea45be9fd" vdisk_size: 10737418240
external_initiator_network_id: "10.30.15.93" iscsi_target_name: "iqn.2010-06.com.nutanix:windows-90e7bb28-755c-
target_params { 4e02-833b-b786d634fe6d"
num_virtual_targets: 32 iscsi_lun: 0
} container_id: 8
}
creation_time_usecs: 1496950570611638
disk_list {
vdisk_creator_loc: 6
container_id: 8
vdisk_creator_loc: 279
container_uuid: "ef583ded-60c5-441d-ab60-a1a9271bf0b6"
vdisk_creator_loc: 95130973
flash_mode: False
nfs_file_name: "99bff767-1970-4e37-b84f-afa585650f21"
index: 0
vmdisk_size: 10737418240
iscsi_multipath_protocol: kMpio
vmdisk_uuid: "99bff767-1970-4e37-b84f-afa585650f21"
scsi_name_identifier: "naa.6506b8d08d6857207fbb11caa356ec4e"
} vdisk_uuid: "5c0538cd-1857-4083-bae4-103fecc57520"
flash_mode: False chain_id: "8a14a147-0fda-406b-8f83-60e015512ca0"
iscsi_target_name: "windows-90e7bb28-755c-4e02-833b-b786d634fe6d" last_modification_time_usecs: 1496950570623304
logical_timestamp: 2
name: "Windows"
shared: True
uuid: "90e7bb28-755c-4e02-833b-b786d634fe6d"
41 version: "kSecond"
}
• From the list you can then get more details for a specific volume group with vg.get
vgname:
o $acli vg.get Windows
o Windows {
o annotation: ""
o attachment_list {
o client_uuid: "5ba3f873-7c07-4348-971f-746ea45be9fd"
o external_initiator_network_id: "10.30.15.93"
o target_params {
o num_virtual_targets: 32
o }
o }
o disk_list {
o container_id: 8
o container_uuid: "ef583ded-60c5-441d-ab60-a1a9271bf0b6"
o flash_mode: False
o index: 0
o vmdisk_size: 10737418240
o vmdisk_uuid: "99bff767-1970-4e37-b84f-afa585650f21"
o }
o flash_mode: False
o iscsi_target_name: "windows-90e7bb28-755c-4e02-833b-b786d634fe6d"
o logical_timestamp: 2
o name: "Windows"
o shared: True
o uuid: "90e7bb28-755c-4e02-833b-b786d634fe6d"
o version: "kSecond"
o }
• In the output, look for attachment_list. The attachment_list will show the
external_initiator_network_id as either the IQN or IP address of the client. Verify that
the external_initiator_network_id matches the client requiring access to the disks in
the Volume Group.
• From the acli vm.get vgname command we can get the vmdisk_uuid. Using the
vmdisk_uuid and the vdisk_config_printer command you can look up the iSCSI
target name, LUN ID, vDisk size, and other details of the vDisk.
42
Troubleshooting – pithos_cli
• The main service on AOS to manage disks is pithos. There is a command line tool
on the CVM named pithos_cli to lookup the virtual targets for all Initiators or for a
specific Initiator. This can be helpful in troubleshooting ABS issues to pinpoint where
the vDisk resides, and on which CVM. From there the local Stargate logfiles can be
examined to assist in diagnosing the issues.
• To view virtual targets for all initiators type the following command:
o Pithos_cli –lookup iscsi_client_params
43
Troubleshooting – pithos_cli
• To view virtual targets for a specific Initiator type the following command:
44
• ABS by default will configure the client with 32 virtual targets distributed throughout
nodes in the cluster. We can see this with the previous pithos_cli –lookup
iscsi_client_params.
• The AOS 5.1 release now supports mixed clusters. Mixed clusters can have hybrid
and all flash nodes in the same cluster. A hybrid node has a mix of SSD and HDD
drives. All flash nodes are equipped with all SSD drives.
• Customers may want to use the all flash nodes for ABS. Configuring a preferred SVM
will instruct the client disks to be placed on that node, bypassing load balancing of
the disks to different nodes.
• The pithos_cli command is where you can set the preferred SVM for disk
placement for that client. Type the following command to edit the
iscsi_client_params with any text editor on the CVM.
• $pithos_cli --lookup iscsi_client_params --
iscsi_initiator_network_identifier=10.30.15.95 --edit --editor=vi or nano
• At the bottom of the file before the last bracket (}), insert a new line and add the
following
o preferred_svm_id: id
45
• Modifying the virtual targets from 32 to 6 now shows only 6 virtual targets for this
Initiator.
Logs of Interest – View iSCSI Adapter
Messages
• The External Data Services IP address allows for a single IP address for
discovering and connecting to the virtual targets. The Data Services IP address is
hosted on one CVM in the cluster at a time. The Data Services IP is hosted on
interface eth0:2.
• The allssh ifconfig | grep eth0:2 command will show which CVM is currently hosting
the IP address. If the CVM hosting the Data Services IP address goes offline, another
CVM will be elected to host the IP address.
• The actual iSCSI session and connection will be hosted by one of the CVMs after
login and iSCSI redirection to the CVM hosting the vDisk. By default, the vdisks for a
Volume Group will be load balanced across the CVMs in the cluster. After the host is
connected to the vDisk thru the CVM, if the CVM goes offline, then ABS will redirect
the host to another CVM for data access.
• All messages pertaining to iSCSI will be available in the Stargate logfiles. The main
logfile to review for troubleshooting issues will be the stargate.INFO log on each
CVM.
• To view all iSCSI adapter messages, use allssh and zgrep to search thru each
CVM’s Stargate logs. Below are a few samples:
Sample Output
• I0619 13:50:52.103484 23730 iscsi_server.cc:2010] Processing 1 Pithos vdisk
configs for vzone
• I0619 13:50:52.103548 23730 iscsi_server.cc:2022] Processing vdisk config update
for 179371 timestamp 1497905452100683 config vdisk_id: 179371 vdisk_name:
"NFS:4:0:260" vdisk_size: 10000000000 iscsi_target_name: "iqn.2010-
06.com.nutanix:linuxvg-a4df083e-babe-47de-9857-5596e1a0effb" iscsi_lun: 1
container_id: 152253 creation_time_usecs: 1497905452088823 vdisk_creator_loc: 6
vdisk_creator_loc: 279 vdisk_creator_loc: 109079661 nfs_file_name: "bf86cde1-
ad54-4b53-9d48-6edce4670cc1" iscsi_multipath_protocol: kMpio chain_id:
"\210\332\305\275\231%I\311\256o\327CS\225m\037" vdisk_uuid:
"c\363\023v\010\211K\026\202tlo\312U\355\372" scsi_name_identifier:
"naa.6506b8d799ce64bc385db2c66e90af50"
• I0619 13:50:52.103561 23730 iscsi_server.cc:2701] Adding state for vdisk 179371
from Pithos
• I0619 13:50:52.103637 23730 iscsi_logical_unit.cc:67] Creating state for
10000000000 byte VDisk disk as LUN 1, Target iqn.2010-06.com.nutanix:linuxvg-
a4df083e-babe-47de-9857-5596e1a0effb, VDisk 179371, NFS:4:0:260
• I0619 13:50:52.103642 23730 iscsi_target.cc:59] Added LUN 1 for VDisk
'NFS:4:0:260' to iqn.2010-06.com.nutanix:linuxvg-a4df083e-babe-47de-9857-
5596e1a0effb
47
• Use zgrep and searching for Creating state will provide details on the LUN creation.
Looking at the log shows the vDisk creation, LUN ID (Unique for each LUN in the
Volume Group), size, and the IQN of the target.
• The following command will search the Stargate logfiles for iSCSI LUN creation
events:
o allssh "zgrep 'iscsi_logical_unit.*Creating state'
/home/nutanix/data/logs/stargate*log*INFO*“
• On the Initiator you can examine each LUN for the –tgtx value per device.
Logs of Interest – iSCSI Redirect ion Messages
• During the Login to the Data Services IP address is when iSCSI redirection will
occur. iSCSI redirection will map the LUN to a CVM in the cluster. The iSCSI
redirection uses the external IP address of the CVM and a non-standard iSCSI
port 3205 for the connection.
• The iSCSI redirection will also occur whenever the CVM hosting iSCSI
connections goes offline. When the CVM goes offline this will cause a logout
event on the Initiator. The Initiator will automatically attempt to log in using the Data
Services IP address. iSCSI redirection will perform a logout and a login to an
existing CVM in the cluster.
• All of the iSCSI redirection happens on the CVM hosting the data services IP address
eth0:2.
• In the example above you can see this iSCSI redirection occurring. Performing a:
o $allssh "zgrep 'iscsi_login.*redirect’ /home/nutanix/data/logs/stargate*log*INFO*“
• …will reveal which CVM is hosting the Data Services IP address also. The
stargate*log*INFO* logfile on that CVM has the iSCSI redirection messages.
Logs of Interest – Virtual Target Session
Messages
Find virtual target session
$allssh "zgrep 'iscsi_server.*virtual target’
/ home/ nutanix/ data/ logs/ stargate*log*INFO*"
Sample Output:
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0615
05:57:17.110442 23727 iscsi_server.cc:1238] Adding virtual target iqn.2010-06.com.nutanix:windows-
90e7bb28-755c-4e02-833b-b786d634fe6d-tgt0 to session 0x4000013700000001 from base target iqn.2010-
06.com.nutanix:windows-90e7bb28-755c-4e02-833b-b786d634fe6d with 32 virtual targets
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0615
05:57:17.125102 23730 iscsi_server.cc:1238] Adding virtual target iqn.2010-06.com.nutanix:windows-
90e7bb28-755c-4e02-833b-b786d634fe6d-tgt0 to session 0x4000013700000001 from base target iqn.2010-
06.com.nutanix:windows-90e7bb28-755c-4e02-833b-b786d634fe6d with 32 virtual targets
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0615
05:57:17.139722 23728 iscsi_server.cc:1238] Adding virtual target iqn.2010-06.com.nutanix:windows-
90e7bb28-755c-4e02-833b-b786d634fe6d-tgt0 to session 0x4000013700000001 from base target iqn.2010-
06.com.nutanix:windows-90e7bb28-755c-4e02-833b-b786d634fe6d with 32 virtual targets
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0615
05:57:17.154194 23730 iscsi_server.cc:1238] Adding virtual target iqn.2010-06.com.nutanix:windows-
90e7bb28-755c-4e02-833b-b786d634fe6d-tgt0 to session 0x4000013700000001 from base target iqn.2010-
06.com.nutanix:windows-90e7bb28-755c-4e02-833b-b786d634fe6d with 32 virtual targets
49
• Review these logfile messages for errors occurring during the session establishment.
Logs of Interest – Session Login & Logout Messages
Preferred SV M
IP address of initiator Creating the Session
50
• Session information can also be viewed from the Initiator. On Linux use the iscsiadm
utility, in Windows the iSCSI management GUI, Powershell, or iscsicli.
Logs of Interest – LUN Event Messages
Logical Unit ( LUN) add,disable,enable, replace events
$allssh "zgrep 'scsi.*Processing vdisk config update’
/home/nutanix/data/logs/stargate*log*INFO*“
Sample Output:
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0619 13:50:52.103637 23730
iscsi_logical_unit.cc:67] Creating state for 10000000000 byte VDisk disk as LUN 1, Target iqn.2010-06.com.nutanix:linuxvg-a4df083e-babe-47de-
9857-5596e1a0effb, VDisk 179371, NFS:4:0:260
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0619 13:50:52.103642 23730
iscsi_target.cc:59] Added LUN 1 for VDisk 'NFS:4:0:260' to iqn.2010-06.com.nutanix:linuxvg-a4df083e-babe-47de-9857-5596e1a0effb
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0619 13:59:25.283494 23730
iscsi_logical_unit.cc:67] Creating state for 5000000000 byte VDisk disk as LUN 2, Target iqn.2010 -06.com.nutanix:linuxvg-a4df083e-babe-47de-
9857-5596e1a0effb, VDisk 179588, NFS:4:0:261
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0619 13:59:25.283499 23730
iscsi_target.cc:59] Added LUN 2 for VDisk 'NFS:4:0:261' to iqn.2010-06.com.nutanix:linuxvg-a4df083e-babe-47de-9857-5596e1a0effb
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0619 15:19:30.812345 23727
iscsi_logical_unit.cc:898] Disabling LUN 2, Target iqn.2010-06.com.nutanix:linuxvg-a4df083e-babe-47de-9857-5596e1a0effb, VDisk 179588,
NFS:4:0:261
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0619 15:19:30.812364 23727
iscsi_target.cc:74] Removed LUN 2 from iqn.2010-06.com.nutanix:linuxvg-a4df083e-babe-47de-9857-5596e1a0effb
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0619 15:44:35.431586 23729
iscsi_logical_unit.cc:67] Creating state for 5000000000 byte VDisk disk as LUN 2, Target iqn.2010 -06.com.nutanix:linuxvg-a4df083e-babe-47de-
9857-5596e1a0effb, VDisk 179677, NFS:4:0:262
/home/nutanix/data/logs/stargate.NTNX-16SM13150152-C-CVM.nutanix.log.INFO.20170613-162520.23713:I0619 15:44:35.431592 23729
iscsi_target.cc:59] Added LUN 2 for VDisk 'NFS:4:0:262' to iqn.2010-06.com.nutanix:linuxvg-a4df083e-babe-47de-9857-5596e1a0effb
51
Module 8
Acropolis Block Services Troubleshooting
52
Labs
Thank You!
53
Thank You!
Module 9
DR
Nutanix Troubleshooting 5.x
Module 9 DR
Course Agenda
1. Intro 7. AFS
2. Tools & Utilities 8. ABS
3. Services & Logs 9. DR
4. Foundation 10. AOS Upgrade
5. Hardware 11. Performance
6. Networking
Course Agenda
• This is the DR module.
Data Protection
Data Protection
Dat a Protect ion O verview
Protection Domains
Dat a Protect ion Concept s
Metro Availability
Async DR
10
Async DR
Async DR (co nt ’d )
11
Async DR (cont’d)
• Clicking the Async DR tab displays information about Protection Domains configured
for asynchronous data replication in the cluster. This type of Protection Domain
consists of a defined group of virtual machines to be backed up (snapshots) locally
on a cluster and optionally replicated to one or more remote sites.
o The table at the top of the screen displays information about all the configured
Protection Domains, and the details column (lower left) displays additional
information when a Protection Domain is selected in the table. The following table
describes the fields in the Protection Domain table and detail column.
o When a Protection Domain is
selected, Summary: protection_domain_name appears below the table, and
action links relevant to that Protection Domain appear on the right of this line. The
actions vary depending on the state of the Protection Domain and can include one
or more of the following:
• Click the Take Snapshot link to create a snapshot (point-in-time backup) of this
Protection Domain.
• Click the Migrate link to migrate this protection domain.
• Click the Update link to update the settings for this protection domain.
• Click the Delete link to delete this protection domain configuration.
o Eight tabs appear that display information about the selected Protection Domain:
• Replications, VMs, Schedules, Local Snapshots, Remote
Snapshots, Metrics, Alerts, Events.
12
13
• Select a consistency group for the checks VMs. Click the Use VM Name button to
create a new consistency group for each checked VM with the same name as that
VM.
• Click the Use an existing CG button and select a consistency group from the
drop-down list to add the checked VMs to that consistency group, or click the Create
a new CG button and enter a name in the field to create a new consistency group
with that name for all the checked VMs.
• Click the Protect Selected Entities button to move the selected VMs into the
Protected VMs column.
• Click Next at the bottom of the screen when all desired consistency groups have
been created.
Async DR Configurat ion – Schedule
14
• Enter the number to save locally in the Local line keep the last xx snapshots field.
The default is 1.
o A separate line appears for each configured remote site. To replicate to a remote
site, check the remote site box and then enter the number of snapshots to save on
that site in the appropriate field. Only previously configured remote sites appear in
this list.
• The number of snapshots that are saved is equal to the value that you have entered
in the keep the last ## snapshots field + 1. For example, if you have entered the
value keep the last ## snapshots field as 20, a total of 21 snapshots are saved.
When the next (22nd) snapshot is taken, the oldest snapshot is deleted and replaced
by the new snapshot.
• When all of the field entries are, click Create Schedule. ***JEREMY—WHEN ALL
OF THE FIELD ENTRIES ARE…WHAT?
Async Protect ion Domain Failover -
Planned
• A Protection Domain can be migrated to a remote site as part
of planned system maintenance.
• This action is performed using the Migrate option from the
Async DR tab of the Data Protection pane.
15
16
17
Troubleshooting
Cerebro Service Master
20
Cerebro Diagnost ics Page
snapshot_tree_printer
snapshot _t ree_printer
snapshot_tree_printer
Connect ion Errors
• Ensure that the primary and remote sites are reachable via
ping.
• Verify that ports 2020 and 2009 are open between the source
and destination. The following loop can be used to verify port
connectivity when run from the primary site:
• for i in `svmips`; do (echo "From the CVM $i:";
ssh $i 'for i in `source /etc/profile;ncli rs ls
|grep Remote|cut -d : -f2|sed 's/,//g'`; do echo
Checking stargate and cerebro port connection to
$i ...; nc -v -w 2 $i -z 2009; nc -v -w 2 $i -z
2020; done');done
7
Connection Errors
cerebro.INFO
cerebro.INFO
• I0719 09:14:24.648830 8133 fetch_remote_cluster_id_op.cc:672] Fetched
cluster_id=47785 cluster_incarnation_id=1499975040912401
cluster_uuid=00055438-277c-4c11-0000-00000000baa9 for remote=remote01 with
dedup support 1 with vstore support 1 cluster operation mode 0
Labs
Labs
Module 9 DR
Lab Configuration Protection Domains
10
Labs
• Module 9 DR
• Lab Configuration Protection Domains
Thank You!
11
Thank You!
Module 10
AOS Upgrade Troubleshooting
Nut anix Troub leshoot ing 5.x
1. Intro 7. A FS
2. Tools & Utilities 8. A BS
3. Services & Logs 9. DR
4. Foundation 10 . AOS Up g rad e
5. Hardware 11. Perform ance
6. Networking
Course Agenda
• This is the AOS Upgrade module.
O bject ives
Objectives
After completing this module, you will be able to:
• Review AOS Upgrade Prerequisites.
• Review all AOS Release Notes and Security Advisories.
• Understand Acropolis Recommended Upgrade Paths.
• Run NCC Before Performing AOS Upgrades.
• Prepare to Upgrade AOS.
• Perform an AOS Upgrade using One-Click in Prism.
• Troubleshoot AOS Upgrade Issues.
• Identify AOS Upgrade Logs of Interest.
• List Add Node to Cluster Considerations.
Before Up g rad ing AOS
• In the Prism Element interface there is a One Click to Upgrade the AOS version on
the Nutanix Cluster. Acropolis performs a rolling upgrade with no downtime for the
virtual machines.
• Click the Gear icon in Prism Element then click the Upgrade Software to get to the
Upgrade Software dialogue box. The Upgrade Software dialogue box has several
tabs to perform different one clicks for various software components in the Nutanix
Cluster.
• The upgrades for software components include Acropolis, File Server, Hypervisor,
Firmware, Foundation, and Container.
• The Acropolis tab is where to perform the AOS rolling Upgrades. On the Acropolis
tab in the Upgrade Software dialogue box you can Enable Automatic Download.
• When the Enable Automatic Download is checked, the Cluster periodically checks
for new versions and automatically downloads the software package when available
to the cluster.
• The Cluster will connect to the Nutanix download server over the internet to check for
new versions. Internet connectivity and properly configured DNS settings are
required for the Enable Automatic Downloads to work successfully.
• For customers that cannot provide internet access to the Nutanix Clusters for
example a dark site, on the Nutanix Support Portal under the Downloads link
customers can manually download the binaries and corresponding .json file to their
local machine first. The Upgrade software dialogue box has a link to upload the
binaries manually. Click the link, browse the hard drive for the software and .json file
and choose Upload now.
• Due to the increased number of services introduced in 5.x more memory is needed
by the common core and stargate processes. During the upgrade to release 5.1, a
feature was introduced to increase the memory allocated to CVM’s on nodes to
address the higher memory requirements of Acropolis.
• When upgrading to AOS 5.1 there will be an increase to the CVM memory for all
nodes in the cluster with 64GB of memory or greater. For any nodes in the cluster
that have less than 64GB, no memory increase will occur on the CVM.
• Nodes identified as candidates for the CVM memory increase, nodes having 64
GB of RAM and less than 32GB allocated to the CVM will be increased by 4GB.
Upgrade Prism Cent ral First
• You can now upgrade AOS on some or all of the clusters registered to and managed
by Prism Central.
o The upgrade procedure, known as 1-click Centralized Upgrade, enables you to
upgrade each managed cluster to a specific version compatible with Prism
Central.
• Acropolis Upgrade support starts in Prism Central 5.1.
Review All AOS Release Notes & Securit y Advisories
Sup p ort Port al - Soft w are D ocum ent s Pag e –AOS Version – Release N ot es
10
W =Major ( 5)
• Major and Minor Releases are Typ ically m ad e Availab le every 3 to 6 Mont hs
X=Minor ( 1)
Y=Maintenance ( 0 )
• Maintenance Releases for t he latest Major or Minor release are Typ ically m ad e Availab le every
6 to 12 Weeks
Z =Patch ( 3)
11 • Patch Releases are m ad e Availab le on an as-need ed Basis
Actions
• Make sure that the latest version of NCC is installed on the cluster. The Upgrade
Software dialogue box has a one click for upgrading NCC software. You can
manually upload the new NCC installer file to the cluster or if Enable Automatic
Download is checked the cluster will check for new versions periodically.
• NCC can also be upgraded to the latest version manually. The NCC installer file will
be either a single installer file (ncc_installer_filename.sh) or an installer file in a
compressed tar file.
• The NCC single file or tar file needs to be copied to a directory on one of the
controller VMs. The Directory were the file is copied must exist on all CVMs in the
cluster.
• Using SCP or WINSCP, copy the file to any cvm in the cluster to the
/home/nutanix directory. Use the following command to check the MD5 value of the
file after copying to cluster:
nutanix@cvm$ md5sum ./ncc_installer_filename.sh
• If this does not match the MD5 value published on the Support Portal then the file
is corrupted. Try downloading the file again form the Support Portal.
• For a single NCC installer file perform the following steps to upgrade the NCC
software for the entire cluster:
• On the CVM where the single NCC installer file was copied run the following
commands:
nutanix@cvm$ ./ncc_installer_filename.sh
• If the Installer file is packaged in a tar file then run the following commands:
• This command copies the tar file to each CVM in the cluster and performs the
upgrade.
• To troubleshoot the upgrade there are log files depending on the version in the
following two locations:
• /home/nutanix/data/logs/
o OR
• /home/nutanix/data/serviceability/ncc
Run NCC Before Performing AOS
Upgrades
Command Line Output Snippet:
nutanix@NTNX-16SM13150152-A-CVM:10.30.15.47: node with service vm id 6
~$ ncc health_checks run_all service vm external ip: 10.30.15.49
ncc_version: 3.0.2-a83acc39 hypervisor address list: [u'10.30.15.46']
cluster id: 47802 hypervisor version: 3.10.0-229.26.2.el6.
cluster name: Eric nutanix.20160217.2.x86_64 Acropolis
node with service vm id 4 ipmi address list: [u'10.30.15.43'] Version AOS
service vm external ip: 10.30.15.47 Hypervisor software version: euphrates-5.1.0.1-stable
Version
hypervisor address list: [u'10.30.15.44'] software changeset ID:
hypervisor version: 3.10.0-229.26.2.el6.nutanix.20160217.2.x86_64 419aa3a83df5548924198f85398deb20e8b615fe
ipmi address list: [u'10.30.15.41'] node serial: ZM163S033719
software version: euphrates-5.1.0.1-stable rackable unit: NX-1065S
software changeset ID: 419aa3a83df5548924198f85398deb20e8b615fe Running : health_checks run_all
node serial: ZM163S033945 Add Node to Cluster Considerations
rackable unit: NX-1065S [================================== ] 98%
13
$cluster enable_auto_install
• If the cluster has any Protection Domains set up for disaster recovery, replication is
not allowed during the AOS upgrade. Use the following commands to verify
Protection Domains on the cluster and any outstanding replication updates
processing:
• This command will list all the Protection Domains for the cluster. The Protection
Domain name will be used in the following command to check the replication status:
• Be sure to check the replication status for all Protection Domains. If any replication
operations are in progress, output similar to the following is displayed:
ID : 12983253
Protection Domain : pd02
Replication Operation : Receiving
Start Time : 09/20/2013 17:12:56 PDT
Remote Site : rs01
Snapshot Id : rs01:54836727
Aborted : false
Paused : false
Bytes Completed : 63.15 GB (67,811,689,418 bytes)
Complete Percent : 84.88122
• List the Protection Domain schedules and delete during upgrade and then re-
create after the upgrade is complete.
• The CVMs are upgraded in parallel. Then serially each CVM has to be rebooted and
memory increased if upgrading to 5.1. During the reboot alerts will be fired on the
cluster indicating a down CVM. You can disable the alerts during the upgrade then
re-enable after completion
Upgrading AOS
Perform an AOS Upgrade Using One -Click in Prism
Element
16
• Once the software binaries are uploaded to the cluster the Upgrade Software
dialogue box shows the Upgrade button. When you click Upgrade it gives you two
options:
o Pre-upgrade or
o Upgrade Now.
• The Pre-upgrade runs thru the pre-checks but does not actually perform the
upgrade. The pre-upgrade can reveal any issues in the cluster that will prevent the
upgrade from running successfully.
• The Upgrade Now option will perform the pre-checks also, if there are no issues
then the upgrade will proceed.
• If the upgrade pre-checks fail, you can review the following logfile:
/home/nutanix/data/logs/preupgrade.out
• …to assist in troubleshooting. This file is placed randomly on one of the CVMs in the
cluster. To search for which CVM has the log file to review run the following
command:
• The command will return the CVM where the files were written. SSH to that CVM and
review the contents for errors that are causing the upgrade pre-checks to fail.
Perform an AOS Upgrade Using One -Click in Prism
Element
17
$cluster disable_auto_install
• Once the Genesis service on each node restarts a hidden marker file is created in
the /home/nutanix directory named .node_disable_auto_upgrade.
• You can verify the file exists with the following command:
• Proceed to the CVM that you want to upgrade. Remove the hidden marker file on
the node and restart Genesis for the upgrade to proceed on that CVM. Once
completed go to the next CVM and repeat the process until all the nodes are
upgraded.
• After you delete the file you must restart Genesis on that node to start the upgrade.
Type the following command to restart Genesis for that node only:
SNMP Considerations
Enable Alerts
• The MIB can be downloaded from the Prism web console under the Gear icon —>
SNMP link.
• You will also have to re-create all the Protection Domain asynchronous
schedules.
Preupgrade.out Logfile
Install.out Logfile
20
• The example in the slide shows the steps performed to upgrade AOS. A task is
created for the following:
• Pre-upgrade steps succeeded or failed. If the task has failed you can hover the
mouse over the task to get a tool tip for the actual error. For more details review
the preupgrade.out logfile.
• Upgrading Acropolis task. Click the Details blue link to see the tasks for each
CVM. The CVM details will show the steps processed to upgrade the CVM.
Review the install.out logfile for more advanced troubleshooting.
Troubleshoot AOS Upgrade Issues
Sample O ut put :
nutanix@NTNX-16SM13150152-B-CVM:10.30.15.48:~/data/logs$ upgrade_status
2017-06-28 14:49:44 INFO zookeeper_session.py:76 Using host_port_list:
zk1:9876,zk2:9876,zk3:9876
2017-06-28 14:49:44 INFO upgrade_status:38 Target release version: el6-release-euphrates-
5.1.0.1-stable-419aa3a83df5548924198f85398deb20e8b615fe
2017-06-28 14:49:44 INFO upgrade_status:43 Cluster upgrade method is set to: automatic rolling
upgrade
2017-06-28 14:49:44 INFO upgrade_status:96 SVM 10.30.15.47 still needs to be upgraded.
Installed release version: el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9
2017-06-28 14:49:44 INFO upgrade_status:96 SVM 10.30.15.48 still needs to be upgraded.
Installed release version: el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9, node is currently upgrading
2017-06-28 14:49:44 INFO upgrade_status:96 SVM 10.30.15.49 still needs to be upgraded.
Installed release version: el6-release-danube-4.7.5.2-stable-
21 7c83bbaf29e9b603f3a0825bee65f568d79603b9
Log on to the hypervisor host w ith SSH or the IPMI remote console.
Confirm that package installation has completed and that genesis is not running.
nutanix@cvm$ ps afx | grep genesis
nutanix@cvm$ ps afx | grep rc.nutanix
nutanix@cvm$ ps afx | grep rc.local
CV M Logfiles:
All genesis details pre-, during, and post -upgrades
nutanix@cvm$ /home/nutanix/data/logs/genesis.out
Details about svm_upgrade which is the parallel part of upgrade that runs on each CV M
simultaneously where the new AOS bits are applied to the alt-boot partition.
nutanix@cvm$ /home/nutanix/data/logs/install.out
Details about the serial part of upgrade that is run on one node at a time. Reboot and
memory increase happens here.
nutanix@cvm$ /home/nutanix/data/logs/finish.out
• Before the AOS upgrade begins there are several upgrade pre-checks that run. The
upgrade pre-checks make sure the upgrade version is compatible with the
current version running on the cluster etc.
• If the pre-checks fail, then use the preupgrade.out logfile to investigate issues with
the pre-checks and resolve. If the pre-checks pass, then the upgrade will begin.
• The AOS tar file uploaded to the cluster is copied to each CVM. In parallel the
upgrade runs on all the CVMs in the cluster. The two logfiles that can be used to
troubleshoot the upgrade are genesis.out and install.out. install.out will have the
information needed to troubleshoot the actual upgrade on the CVM.
• genesis.out on the master will contain details about the task and upgrade being
scheduled on the cluster. Look in genesis.out if the upgrades are not starting on the
CVMs.
• All CVMs are upgraded in parallel. Once the CVMs are upgraded they need to be
rebooted and memory increased if the upgrade is from any previous AOS release
to any 5.1 releases.
• The reboot and memory increase is logged to the finish.out logfile on the CVM. If any
issues occur during this phase review the finish.out logfile to assist in solving the
issue.
Ident ify AOS Upgrade Logs of Interest
/home/nutanix/data/logs/boot.err
Detailed Information for Firstboot
/home/nutanix/data/logs/svm_upgrade.tar.out
26
• The boot.out and boot.err logfiles have detailed information about the firstboot
process.
• The svm_upgrade.tar.out logfile has information about the Installer file but probably
will not use this logfile that often.
A d d Nod e t o Clust er Consid erat ions
28
CV M V LAN Configured
• Fact ory Prep ared Nod es Discovery and Set up w it hout vLA N
Config urat ion in A HV 4 .5 or Lat er
CV M Reboots
29
• Purchase new equipment nodes and blocks. Cable the new equipment into the racks.
Use Prism Element and launch the Expand Cluster wizard and dialogue box from
the Gear icon. Expand Cluster wizard and dialogue box can also be launched from
the Hardware page of Prism Element.
• The Nutanix nodes are prepared and imaged at the factory. The default Hypervisor is
AHV on the factory-prepared nodes.
• If the existing cluster has configured a vLAN tag for the CVMs public interface eth0,
factory-prepared nodes should be able to be discovered and set up. It is best to work
with the network team to configure the new nodes CVM with the same vLAN tag.
• The command to configure a vLAN tag for the eth0 ethernet interface is the following:
• CPU features are automatically set in AHV clusters. The cluster has to be running
AOS version 4.5 or later for the CPU automatic configuration.
• When adding nodes from a different processor class to the existing cluster, ensure
that there are no running VMs on the node.
• CVMs will be rebooted and upgraded serially. The User VMs do not have to be
migrated during AOS upgrades.
Add Node - Older Generat ion Hardware
Requirement s
G5 Broadwell G5 Broadwell
UVM UVM
UVM
UVM
Power Cycle UVM
UVM Power Cycle
VM VM
G4 Haswell
Power Cycle
G4 Haswell
Power Cycle VM
VM UVM
UVM
G5 Broadwell G5 Broadwell
AOS Version
Same - Yes
Hypervisor Re-image
Different No/Else Node
31
After the upgrade is complete, you can add the node to the cluster without re-imaging it.
Alternately, if the AOS version on the node is higher than the cluster, you must either
upgrade the cluster to that version (see UpgradingAOS) or re-image the node.
AOS version is same but hypervisor different You are provided with the option
to re-image the node before adding it. (Re-imaging is appropriate in many such cases,
but in some cases it may not be necessary such as for a minor version difference.)
Note: You can add multiple nodes to an existing cluster at the same time.
Labs
Module 10
AOS Upgrade
32
Thank You!
33
Thank You!
Module 11
Performance
Nut anix Tro ub lesho o t ing 5.x
Module 11 Performance
Click to Agenda
Course edit Master t it le st yle
1. Int ro 7. A FS
2. Tools & NCC 8. A BS
3. Services & Log s 9. DR
4. Found at ion 10 . Up g rad e
5. Hard w are 11. Perform ance
6. Net w orking
Course Agenda
• This is the Performance module.
Click
O bject
toives
edit Master t it le st yle
Objectives
After completing this module, you will be able to:
• Define performance related terminology and basic statistics
• Describe how to properly frame a performance issue
• Describe the layered approach to performance troubleshooting
• Use Prism UI to analyze performance statistics
Performance
W hat do we talk about when we talk about
Click to edit Master t it le st yle
Performance?
How do you choose a line when checking out at the super market?
Pause and take input from the class, then proceed with the animated bullets.
Typically you’ll hear things related to decisions aimed at ensuring the fastest path out of
the store.
If you encounter a comedian student that feels they are being clever by adding “the
hottest cashier,”
point out that his choice speaks to his definition of Quality-Of-Service where success is
defined by
enjoyment more than expedience.
Performance Concepts
and Basic Statistics
Standalone Foundation
Click to edit Master
Performance Terminology
t it le st yle
Performance Terminology
Most customers generally use the term latency to express dissatisfaction with
performance. We only get cases when latency is deemed to be too high.
Most customers generally talk about throughput when it’s deemed to be too low.
Click to edit Master
Performance Terminology
t it le st yle
Performance Terminology
The “thought exercise” here points the way towards an appreciation of what averages
convey. For this example one might assume that this disk could not have been
saturated over the sample interval if only considering the average utilization over the
interval.
Click to edit Master
Performance Terminology
t it le st yle
10
Performance Terminology
Typically, very large opsize (> 64KB) might indicate a more sequential workload,
whereas smaller ops are typically random in nature. This is completely application-
dependent.
11
Basic Statistics
The majority of data we consider when doing performance analysis are averages
calculated over sample intervals.
• The Entity and Metric charts in Prism are derived from samples taken every 30
seconds at its most granular level.
• It’s important to understand what an average or statistical mean is meant to convey.
The average is meant to convey a sense of the general tendency of a specific variable
over a sample.
Click to
Basic St at
edit
ist ics
Master t it le st yle
Basic Statistics
Talking points:
• When we consider the performance charts in Prism, we are seeing means calculated
over a sample. For “live” data, each sample interval covers 30 seconds. So when we
look at a chart that tells us something about read latency, we are relying upon the
calculated mean to get a sense of performance.
Variance conveys to us how meaningful the mean really is. Prism charts do not convey
variance, but it’s important to understand that an average or mean of a sample can be
skewed greatly by the presence of outliers.
ClickWto
Fun it hedit
Numbers
Master t it le st yle
• Calculat e t he mean
(0.71 + 0.63 + 0.88 + 0.67 + 0.66 + 0.79 + 0.82 + 1.08 + 1.1 + 0.78 + 0.79 +
0.83) / 12 == 0.811666
Consider the following set of read latency values collected over an interval (all in
ms):
{ 0.71, 0.63, 0.88, 0.67, 0.66, 0.79, 0.82, 1.23, 1.4, 0.78, 0.79, 0.83 }
• Calculate the mean
(0.71 + 0.63 + 0.88 + 0.67 + 0.66 + 0.79 + 0.82 + 1.23 + 1.4 + 0.78 + 0.79 + 0.83) / 12
== 0.8491666
o Let’s call it 0.85ms
• How expressive is 0.85ms in describing the general tendency of this sample?
ClickWto
Fun it hedit
Numbers
Master t it le st yle
14
Consider the following set of read latency values collected over an interval (all in
ms):
{ 0.71, 0.63, 0.88, 0.67, 0.66, 0.79, 0.82, 1.23, 1.4, 0.78, 0.79, 0.83 }
• Calculate the mean
(0.71 + 0.63 + 0.88 + 0.67 + 0.66 + 0.79 + 0.82 + 1.23 + 1.4 + 0.78 + 0.79 + 0.83) / 12
== 0.8491666
o Let’s call it 0.85ms
• How expressive is 0.85ms in describing the general tendency of this sample?
ClickWto
Fun it hedit
Numbers
Master t it le st yle
• Calculat e t he mean.
(0.71 + 0.63 + 0.88 + 0.67 + 0.66 + 0.79 + 0.82 + 1.08 + 94.5 + 0.78 + 0.79 + 0.83) /
12 == 8.595000 We’re using the same data as before
with one exception. We’ve added a
§ Let ’s call it 8.60 m s.
significant outlier to the set.
• How exp ressive is 8.60 m s in d escrib ing t he g eneral
t end ency of t his sam p le?
15
Consider the following set of read latency values collected over an interval (all in
ms):
{ 0.71, 0.63, 0.88, 0.67, 0.66, 0.79, 0.82, 1.08, 94.5, 0.78, 0.79, 0.83 }
• Calculate the mean
(0.71 + 0.63 + 0.88 + 0.67 + 0.66 + 0.79 + 0.82 + 1.08 + 94.5 + 0.78 + 0.79 + 0.83) / 12
== 8.595000
o Let’s call it 8.60ms
• How expressive is 8.60ms in describing the general tendency of this sample?
ClickWto
Fun it hedit
Numbers
Master t it le st yle
The presence of an outlier skews our mean to a point where it’s not a
reliable characterization of the general tendency of read latency in
this sample. This sample has a very high degree of variance due to
the one outlier.
16
Consider the following set of read latency values collected over an interval (all in
ms):
{ 0.71, 0.63, 0.88, 0.67, 0.66, 0.79, 0.82, 1.08, 94.5, 0.78, 0.79, 0.83 }
• Calculate the mean
(0.71 + 0.63 + 0.88 + 0.67 + 0.66 + 0.79 + 0.82 + 1.08 + 94.5 + 0.78 + 0.79 + 0.83) / 12
== 8.595000
o Let’s call it 8.60ms
• How expressive is 8.60ms in describing the general tendency of this sample?
Click to and
Underst edit W
Master
hat The
t it le
Datstayle
Conveys
The basic takeaway from the previous few slides is that an average may or may
not reveal general tendency of a variable
• Suppose your customer is concerned because their UVM application is logging some
write operations exceeding 100ms every hour.
o Can you rule out Nutanix storage performance after reviewing write latency
averages collected every 30 seconds, when those averages show values that
range between 1ms and 10ms over time?
The answer here is no, quite obviously. We will cover this a bit more when we talk
about case framing techniques.
Histograms - Finding t he Needle in t he
Click to edit Master t it le st yle
Hayst ack
Mention that we do have some histogram information in Prism (IO Metrics section) and
also on the 2009/latency page.
Click to
Read Latency
edit Master
Histogram
t it le st
- yle
Slide 14
19
Consider the following set of read latency values collected over an interval (all in
ms):
{ 0.71, 0.63, 0.88, 0.67, 0.66, 0.79, 0.82, 1.08, 94.5, 0.78, 0.79, 0.83 }
• Calculate the mean
(0.71 + 0.63 + 0.88 + 0.67 + 0.66 + 0.79 + 0.82 + 1.08 + 94.5 + 0.78 + 0.79 + 0.83) / 12
== 8.595000
o Let’s call it 8.60ms
• How expressive is 8.60ms in describing the general tendency of this sample?
The histogram is better suited for issues where we are trying to validate the
presence of very short-lived spikes in behavior. In this case, the one read operation
that was ~94.5ms would not be evident if all we had was a graph showing average read
latency of 8.6ms for the sample.
Click
Lit t le’s
toLaw
edit Master t it le st yle
20
Little’s Law
N =X *R
X=N / R
Concurrency of a system (N) is equal to throughput(X)
multiplied by the latency(R)
• How can you increase throughput ( X) of a syst em?
§ Increase concurrency ( N) or red uce latency ( R) .
Throughput (X) of a system is equal to Concurrency
• We increase concurrency by increasing p arallelism .
(N) divided by Latency (R)
21
Little’s Law
N=X*R
X=N/R
Increasing Concurrency
• Open up more checkout lanes at the grocery store.
• Increasing the Number of threads used for processing operations.
• Increasing the Size of each operation, which increases the overall amount of work
done with each op.
Most of the time, it’s a workload modification needed. IOW, increasing concurrency is
usually done within the I/O generator.
Decreasing Latency
• Locate the bottleneck and determine changes needed to reduce/eliminate.
Click to ion
Correlat edit Master t it le st yle
22
Correlation
• Customers will sometimes have concerns about a single metric, typically Latency.
They might see some increase in average Latency and express concern.
o When something like this occurs, it is our job to start investigating all of the
available workload characteristics in order to understand that change in Latency.
o In many cases, an explanation of “why” is what’s needed to satisfy the
customer.
Performance Case Framing
CVM-Based Foundation
Click to edit
Framing t he Issue
Master t it le st yle
24
Essentially our goal is to translate the customer’s problem statement into a clear
understanding of how that aligns to interaction of the Nutanix platform.
Click to edit
Framing t he Issue
Master t it le st yle
25
During your initial engagement with the customer it’s imperative that you ensure
that the system is generally healthy before diving too deeply in the performance
analysis.
• Did you run NCC?
• Broken HW or misconfigurations?
• Have Best Practices been implemented?
Running NCC and getting the full output of all health tests should be done with any
performance engagement.
NCC Caveat: Use common sense. Did NCC identify something that might impact
performance?
Best Practices are a requirement for ensuring optimal performance for many
workloads, such as Microsoft SQL.
Click to edit
Framing t he Issue
Master t it le st yle
26
Customer service professionals will frequently restate their understanding of the key
points of an issue
as well as agreed-upon success criteria and ensure that the customer agrees with how
they are describing it.
One often overlooked detail that is crucial is the Success Criteria. Without having this
established up front, we cannot be sure when the case is resolved.
Click to edit
Framing t he Issue
Master t it le st yle
You should ensure that you can answer some key questions:
• W hat is b eing m easured , how is it b eing m easured , and where
is it b eing m easured from?
• W hen is t he issue exp erienced ? Sp ecific t im est am p s?
Tim ezone?
• W hen d id t his st art occurring ? Has anyt hing chang ed ?
• How is it occurring ? Cont inuously or int erm it t ent ly?
Rep rod ucib le?
• W hich syst em s are im p act ed ? W hat is t he b usiness im p act ?
• W hat is t he exp ect at ion (Success Crit eria) ?
27
You should ensure that you can answer some key questions:
• What is being measured, how is it being measured, and where is it being measured
from?
• When is the issue experienced?
• When did this start occurring? Has anything changed?
• How is it occurring? Continuously or intermittently? Is it reproducible?
• Which systems are impacted? What is the business impact?
• What is the expectation (Success Criteria)?
While we probably won’t ask a customer “Why do think latency should be lower?” or
“Why do you expect higher throughput?”, the “Why” is often implied in the answers to
the other questions. If the problem is something new, as in “It used to work fine” …
that’s your “Why”.
On the other hand, if this is something that has never performed as expected, we may
have to evaluate things differently.
• Was the solution sized appropriately for this workload?
• Are they using best practices for their workload?
• Is anything broken?
Click to edit
Framing t he Issue
Master t it le st yle
28
This will not only get us thinking about likely bottlenecks but also focus our attention on
what we should be investigating.
Click to edit
Framing t he Issue
Master t it le st yle
29
30
31
That’s not to say that we might not attempt to do something to lower latency for a
Throughput-sensitive workload.
• However, we need to evaluate what would be required to do so.
o It might simply not make sense if this would require more/different hardware or
changes in how the system is configured or utilized.
• As an example, full backups will almost certainly read data that might’ve been down-
migrated to the Cold Tier.
o Would it be worth it to go with an all-flash system just to mitigate HDD read costs
in such a scenario, particularly if all other workloads are doing quite well?
A Layered Approach
to Performance Analysis
A Layered Approach
Another way to say this is that we take a top-down approach when doing analysis of the
system.
• This also helps to convey the importance and relevance of asking the framing
question related to “what are you measuring and where are you measuring it from?”
Knowing where something is measured helps us start to visualize all the potential
places in the system where we might be seeing costs.
A Layered
Click to Approach
edit Master t - le
it Thinking
st yle “Top
Dow n”
34
A Layered Approach
Here we provide a very basic and high-level, top-down, breakdown within 3 key
components:
• UVM
• Hypervisor
• CVM
This is not meant to be a complete list of all possible “layers,” but it does show an
approach to thinking about storage I/O as it travels from UVM to our CVM and back.
There are many Stargate components other than those listed here.
Click
A Layered
to edit
Approach
Master t it- le
St argate
st yle Layers
35
A Layered Approach
At a very high-level, we can characterize round-trip time spent through the basic
Stargate layers.
The example seems to convey that most of the time was spent within the vDisk
controller layer.
• From here, we would then investigate that layer further, considering all of associated
costs within that layer to determine why we are spending time there.
o Answering why within the vDisk layer will lead us towards possible resolutions.
Click
A Layered
to edit
Approach
Master t it- le
20st
0 yle
9 / latency
36
A Layered Approach
In order to enable the 2009/latency stats run the following on any CVM:
for i in `svmips`; do curl http://$i:2009/h/gflags?latency_tracker_sampling_frequency=1;
curl http://$i:2009/h/gflags?activity_tracer_default_sampling_frequency=1; done
Run the same with ‘0’ to set it back to default.
37
Mention that these are small portions of the overall 2009-latency page.
We are only focused on some of the fields here to illustrate the basic approach to
bottleneck detection.
38
39
40
It should be very clear that we are spending most of the time within the vDisk Controller
level.
Click
A Layered
to edit
Approach
Master t it le st yle
A Layered Approach
42
A Layered Approach
It’s important to realize that observing where we spend the most time is only a small
part of being able to assist in resolving the issue.
• However, we do expect all SREs to be able to make such observations when framing
the case for internal collaboration.
Click
A Layered
to edit
Approach
Master t it le st yle
A Layered Approach
Tell the class that there is only 1 disk assigned to this UVM.
While this is a very simple example, they should be able to conclude that the Stargate
Layer is a much lower latency average than what is measured within the UVM.
• Therefore, this does not seem like a Nutanix storage I/O issue.
It seems that the issue is somewhere above our CVM, so the investigation would
probably focus on:
- The hypervisor stats,
- The UVM configuration,
- …and so forth.
Click
A Layered
to edit
Approach
Master t it le st yle
44
A Layered Approach
The bottom line here is that a top-down layered approach is a useful method of
bottleneck identification.
• When one layer accounts for the majority of the overall cost, we’ve located the most
significant bottleneck.
• We can start at very high-level layers initially and then “zoom in” as needed.
o Each layer can be broken down into its corresponding layers as needed.
What we mean here is that layers can start off very generalized and then we can focus
in further as needed.
Key Points About
Nutanix Performance
46
Nutanix Performance
Understanding some basic aspects about the Nutanix platform will help you
develop a sense of intuition when it comes to performance issues.
• What does the I/O path look like for reads and writes?
• What are the key components and services that play a role in Nutanix performance?
Promote nutanixbible.com.
• There’s a lot of very good information there when it comes to basic architecture and
I/O path.
What we mean by intuition is the combination of knowledge and experience that will
allow you to quickly develop theories as to the likely reasons for a given performance
concern.
• It’s this intuition that helps to guide our exploration of the available data.
Click
Nut anix
to edit
Performance
Master t it le st yle
47
Nutanix Performance
From a Nutanix I/O path perspective, we will primarily consider the following
services
• Stargate - Handles storage I/O within the CVM.
• Cassandra - Distributed cluster-wide metadata store.
• Medusa - Stargate’s interface to Cassandra metadata.
Obviously there are a lot more services that run on the CVM, but we will only focus on
the most basic interactions that occur when servicing UVM I/O requests.
Click
Nut anix
to edit
Performance
Master t it le st yle
SSD
The OpLog is implemented as two
The Unified Cache spans Memory
Cassandra OpLog
separate components:
and SSD and provides the read
Store
- OpLog Index (Memory) is the
cache for user data and metadata.
Unified
Cassandra data is contained within Cache metadata component of the OpLog.
The metadata cached here is
Memory
Finally, we the Extent Store which
the SSD tier across all the CVMs in - OpLog Store (SSD) is the data store
accessed from Cassandra by the
OpLog
Let’s start adding the basic
Index
spans SSD and HDD tiers and provides
the cluster. for inbound random writes.
Medusa interface.
components that play a role in the
Next layer is the vDisk Controller. A
Extent
The Protocol Adapter is the top
the long-term persistent storage for Store
Next we have Admission Control
storage I/O path
separate instance is created for
user data. layer of storage I/O from the
which performs rate limiting to the
Both of these components are
each vDisk in Memory with some
Protocol perspective of the CVM. This is
inbound workload to ensure that
Adapter
AdmCtl
completely implemented in the
features leveraging the SSD tier …
where we receive inbound I/O
large bursts do not overwhelm the
CVM’s Memory allocation
requests from the Hypervisor.
underlying sub-systems.
vDisk
Controller HDD
48
Nutanix Performance
Protocol Adapter - This provides the SMB (Hyper-V), NFS (ESXi, AHV), or iSCSI
(Volume Groups) interface to our CVM.
Admission Control - Performs inbound throttling of IOPS with per vDisk, node overall,
and even underlying disk awareness. The overall goal is to smooth out aggressive
bursts of operations. It also serves to prioritize UVM I/O over backend/system I/O on a
node.
Unified Cache - Manages content via LRU and dual tiers: Single Touch and Multi
Touch. First entry into Single Touch, subsequent accesses promotes into Multi Touch.
Multi Touch data is first migrated down to SSD before being evicted. Single Touch
items that do not get “touched” are evicted from Memory.
Oplog Index - Memory resident metadata needed to index and access data in the
OpLog
Oplog Store - The SSD store for inbound random writes. These are eventually
coalesced and flushed down to the extent store.
Medusa/Cassandra - Medusa provides the interface for fetching file system metadata
from the Cassandra DB. Cassandra data is distributed across all nodes in the cluster.
Extent Store - Manages the underlying data repository spanning both SSD and HDD
(hybrid systems).
Click
Nut anix
to edit
Performance
Master t it-leExtent
st yle Store Read
Nutanix Performance
1 - Basically it converts the protocol-specific read to what’s needed to access the data
via Stargate: vDisk, offset, size.
2 - This is where the number of outstanding operations plays a role in possible throttling
(queueing).
3 - Consult the OpLog Index to see if the data is there.
4 - Given that this is a “cold” read, we know that we won’t find our metadata here.
5 - A Medusa lookup is issued to get the necessary metadata needed to access the
data.
6 - Since this is the first time this metadata is read, it gets added into the ”First Touch”
pool of the Unified Cache (Memory).
7 - With the metadata information the read data can be retrieved from the extent store.
8 - The read data is also added into the “First Touch” pool in the Unified Cache.
9 - The data is sent up to the Protocol Adapter.
10 - The Protocol Adapter sends the appropriate response.
Nutanix Performance - Unified Cache Hit
Click
Read to edit Master t it le st yle
50
1 - Basically it converts the protocol-specific read to what’s needed to access the data
via Stargate: vDisk, offset, size.
2 - This is where the number of outstanding operations plays a role in possible throttling
(queueing).
3 - Consult the OpLog Index to see if the data is there.
4 - We find all the metadata and user data in the Unified Cache -- This is how data gets
“hot”. Multiple accesses promotes from single-touch to multi-touch.
5 - The data is sent up to the Protocol Adapter.
6 - The Protocol Adapter sends the appropriate response.
51
Considered rare, based on how most workloads behave. Would require some kind of
write-than-read access pattern.
Another possibility is that the UVM is struggling to keep data cached, so it has to read
something from storage that was just written.
1 - Basically it converts the protocol-specific read to what’s needed to access the data
via Stargate: vDisk, offset, size.
2 - This is where the number of outstanding operations plays a role in possible throttling
(queueing).
3 - This time we find a hit for the data in the OpLog Index.
4 - We read the data from the OpLog Store and send it to the Protocol Adapter.
5 - The Protocol Adapter sends the appropriate response.
Click
Nut anix
to edit
Performance
Master t it-leTwo
st yle
W rite Pat hs
This design is quite sensible given that nearly all highly concurrent sequential writer
workloads, like backups or other bulk data load operations, are throughput-sensitive and
not latency-sensitive. As long as this work completes within a reasonable amount of
time, performance is acceptable. Whereas, random, low concurrent, write workloads are
typically far more latency-sensitive. The OpLog allows us to complete these random
writes very quickly and then optimize/coalesce them when we flush them to the Extent
Store.
In rare cases where a customer has a workload that isn’t completing fast enough AND
the inbound writes are being handled as sequential AND the application cannot be
optimized any further, there is some tuning which can be done to allow more of this
work into the OpLog. This should only be done when working with senior-level support
resources and/or engineering engagements.
Click
Nut anix
to edit
Performance
Master t it-leWst
rite
yleto O pLog
Nutanix Performance
Write to OpLog.
1 - Converts the protocol write to what’s needed for Stargate: vDisk, offset, payload.
2 - This is where the number of outstanding operations plays a role in possible throttling
(queueing).
3 - Queue the write operation.
4 - A thread processes the queued writes and sends them to the participating oplog
stores.
• Ideally one copy will be saved locally, but if the local oplog is full, it will commit two
remote replicas.
5 - When we receive the acknowledgement that the local and remote writes have
completed, we send the completion to the Protocol Adapter.
6 - The Protocol Adapter sends the write acknowledgement.
For RF3, the remote writes are chained and not parallel. The node where the write
originates sends a remote RF write request that requires 2 replica commitments. The
node that receives this request sends out the next replica write in sequence. This is
done to avoid saturating the local network with parallel outbound writes.
Click
Nut anix
to edit
Performance
Master t it-leBypassing
st yle O pLog
7
5 Any metadata lookups are
6
stored in the Unified Cache
4
The data is written to the
The metadata cache is consulted to
9 8 Extent Store and remote
prepare for data placement. Any misses
The Protocol Adapter
The Protocol Adapter send the replica writes are issued to
The I/O is categorized as
here require Medusa lookups.
When both local and remote
ACK up to the I/O stack of the comply with RF factor
receives a write operation
sequential, so OpLog is
Admission Control applies
1 2
replica writes complete, the
3
hypervisor and converts into a bypassed
throttling where necessary
completion is sent to the
Stargate write operation
Protocol Adapter
54
Nutanix Performance
1 - Converts the protocol write to what’s needed for Stargate: vDisk, offset, payload.
2 - This is where the number of outstanding operations plays a role in possible throttling
(queueing).
3 - Write is categorized as sequential so OpLog is bypassed.
4 - The Unified Cache is consulted for the necessary metadata for data placement and
misses results in Medusa ops.
5 - Any cache-miss data is added to the Unified Cache single touch pool.
6 - Data is persisted to the Extent store and replica write(s) is issued.
7 - Updated metadata is persisted to the Unified Cache.
8 - Once all local and remote operations complete, we let the protocol adapter know.
9 - The Protocol Adapter formulates the correct protocol ACK for the write.
Click
Nut anix
to edit
Performance
Master t it-leILM
st yle
Nutanix Performance
Nutanix Performance
At a very high level, here are some key aspects to Nutanix performance that you
should be aware of:
• Unified Cache provides efficient reads for user data and metadata.
• OpLog is used to absorb random writes at low latency.
• Sequential vDisk writes that are both sequential and have more than 1.5MB
outstanding data are written directly to Extent Store.
• Medusa Lookups add latency when there is a metadata miss.
• Cold Tier reads/writes see higher latency due mostly to HDD costs.
Obviously this isn’t all you need to know about, but these do cover many common
performance issues.
There is one special case where sequential writes don’t bypass OpLog and that’s when
they are overwriting data already in OpLog. This is done to ensure we won’t have stale
data in OpLog.
Using Prism Performance Charts
58
The Prism UI consumes data from the Arithmos service and provides some very
useful charts that aide performance analysis.
• Arithmos - Service which runs per CVM that collects performance metrics for
various entities.
• Entities - Categorical object types, such as Cluster, Host, VM, Virtual Disk, and
others.
• Metrics – Entity-specific performance statistics provided by Arithmos.
In some cases, a single Prism chart might be a weighted average across multiple
entities in the cluster. We’ll see an example of this on the next slide.
Click to
Using Prism
edit Performance
Master t it le stChart
yle s
Entity: Cluster
Metric: Storage Controller IOPS
Entity: Cluster
Metric: Hypervisor CPU Usage (%)
Metric: Hypervisor Memory Usage (%)
59
The default home page has many useful charts that are derived from Arithmos that are
meant to convey cluster-level statistics.
Click to
Using Prism
edit Performance
Master t it le stChart
yle s
60
You can explore the various Entity types available via Arithmos when you create a
custom Entity Chart.
There are numerous Entities across the cluster. We are going to focus on the VM type
next.
Click to
Using Prism
edit Performance
Master t it le stChart
yle s
61
While Entity Charts allow you to choose a specific Entity and then see an applicable list
of Metrics, a custom Metric Chart does the opposite.
The custom Metric Chart allows you to start with a Metric and then it will provide a list
of applicable Entities.
The resulting charts are the same.
Click to
Using Prism
edit Performance
Master t it le stChart
yle s
62
In addition to the Home page and the Analysis page, the VM page provides very
nice VM-specific performance charts.
• The VM view provides more detail when customers are focused on specific
applications.
• The VM landing page provides some top talker stats which might be a good place to
start when investigating unexpected changes in cluster performance stats.
Let’s walk through an example where the customer is concerned about a very
noticeable increase in cluster-wide latency.
Click to
Using Prism
edit Performance
Master t it le stChart
yle s
63
We start on the Home page in the Prism UI and notice the increase in latency there.
We go from the Home page to the VM page.
On the VM page, we see the “Top talker” in latency is the UVM WindowsVM-2.
PAUSE AND ASK:
Q: Which VM is doing the most work from an IOPS perspective
A: WindowsVM
Q: Do we see this VM in the top latencies?
A: No
We click on WindowsVM-2 row in the Latency box to quickly get to the VM Table view.
Click to
Using Prism
edit Performance
Master t it le stChart
yle s
64
Pause after “What’s the approximate OpSize for each write” and take input from the
class.
65
When a VM is selected in the table view, we can scroll down the page to get some nice
performance charts specific to that VM.
Let’s take a look at things from the specific UVM “Disk” perspective .
Click to
Using Prism
edit Performance
Master t it le stChart
yle s
66
The Virtual Disk entries are from the perspective of the UVM, but if we consult the UVM
configuration we could map that to the underlying vDisk.
From the “Additional Stats” section we can see the relative sequential vs. random
aspect of the inbound workload.
We also see the current approximation for the write working set size, but this is merely
based on what has been observed so far. IOW, working set size calculations are based
on what’s been seen so far.
Click to
Using Prism
edit Performance
Master t it le stChart
yle s
67
68
Clicking on I/O Metrics provides a view of read vs. write latency over time.
Clicking on the latency curve changes the view below. The view below provides some
useful histograms to consider.
Click to
Using Prism
edit Performance
Master t it le stChart
yle s
69
70
72
Bottom line here is that the Prism UI does provide very useful data that aides in
Performance Analysis.
• Remember that the VM I/O Metrics tab provides a histogram view of opsize and
latency for reads and writes.
o Very useful when evaluating average latency. Helps answer the question of “How
meaningful is my mean?”
• You should be able to build a very good high-level understanding of performance
characteristics from Prism UI.
You might have to remind them about histograms and how they can convey samples with
a great deal of variance.
Stress how spending time in the Prism charts for a problem that has already happened
can aide in very comprehensive framing. Take screenshots of the graphs to share with
your colleagues.
Click to edit Master t it le st yle
74
Data Collection
When you are opening a customer case for a performance issue, please include:
• Any screenshots captured during the problem.
• Any Guest OS logs, or application logs, that convey the issue.
o Typically have timestamps when things are logged.
o Make sure you provide the relative time zone.
• Pretty much anything else you think will help your peers in Nutanix Support
understand the issue.
There’s no such thing as too much data! Just make sure you provide detailed notes
about each data item’s relevance:
- ”This is a screenshot of iostat data collected on the customer’s Linux VM when the
problem was happening at around 7:43AM PDT.”
- “This is the log from the customer’s application. It’s logging I/O failures due to
timeouts. The timestamps are in GMT-5.”
Click
Dat a Collect
to edit ion
Master t it le st yle
Data Collection
Running collect_perf
• The m ost b asic collect ion is t o sim p ly run t he follow ing
com m and on any CVM in t he clust er:
collect_perf start
– It can t ake a few m inut es for it t o st art act ively collect ing d at a.
• Ensure t hat t he p rob lem occurs a few t im es d uring t he
collect ion.
§ Keep not es ab out w hat w as exp erienced and w hen d uring t he
collect ion.
• St op t he collect ion.
collect_perf stop
77
Data Collection
Running collect_perf
• The most basic collection is to simply run the following command on any CVM in the
cluster:
collect_perf start
• It can take a few minutes for it to start actively collecting data.
• Ensure that the problem occurs a few times during the collection.
o Keep notes about what was experienced and when during the collection.
• Stop the collection.
collect_perf stop
It can take several minutes for command to stop as well. Dependent upon the number
of nodes.
This KB goes into great detail about many options to collect_perf and perhaps most
importantly important cautions about its use.
Click
Dat a Collect
to edit ion
Master t it le st yle
Running collect_perf
• W hen t he com m and com p let es, you’ll find a *.tgz file
in / home/nutanix/data/performance on t he CVM
w here you ran t he com m and .
• At t ach t his collect ion t o t he case and m ake sure you
p rovid e any relevant ob servat ions t hat occurred d uring
t he collect ion.
78
Data Collection
Running collect_perf
• When the command completes, you’ll find a *.tgz file in
/home/nutanix/data/performance on the CVM where you ran the command.
• Attach this collection to the case and make sure you provide any relevant
observations that occurred during the collection.
If you collect any data from the UVM(s) during the collect_perf, include that as well. Just
be sure to specify timestamps to go along with corroborating evidence.
Remember this should accompany your properly framed case, so make sure you are
providing all the details:
- UVM(s) seeing the issue
- When it was seen
- What was seen
- …and so forth.
Click
Dat a Collect
to edit ion
Master t it le st yle
Data Collection.
Data Collection
Now that you’ve shared all the details with Support and provided some
meaningful data collections, we’re much close to resolution and a happy
customer.
• However, performance cases are often iterative.
o Initial analysis may lead to suggestions to try.
o Once those suggestions have been implemented, reassess performance.
o If we haven’t yet achieved our goal, get new data and repeat the process.
There’s always a bottleneck. We might successfully eliminate one, only to see another
one surface. The customer may still be seeing the same outcome, but we will need new
data to see what we are dealing with following any changes.
Click to edit Master t it le st yle
Sum m ary
81
Summary
Click to edit Master
Performance Troubleshoot
t it le st yle
ing - Summary
We also defined some terms like: latency, IOPS, throughput, utilization, and others.
Click to edit Master
Performance Troubleshoot
t it le st yle
ing - Summary
The layers can start off very generalized. Each high-level “layer” can also be broken
down into its relevant layers.
The goal is bottleneck isolation.
Click to edit Master
Performance Troubleshoot
t it le st yle
ing - Summary
87
Thank You!
88
Thank You!
Appendix A
/home/nutanix/foundation/log/20170509-144208-5
20170509 14:42:08 INFO Validating parameters. This may take few minutes
20170509 14:42:08 INFO Validating parameters. This may take few minutes
20170509 14:45:17 DEBUG Command '['ipmitool', '-U', u'ADMIN', '-P', u'ADMIN', '-H', '10.30.15.41',
'fru']' returned stdout:
onfig', '-g', 'idRacInfo']' returned stdout: Security Alert: Certificate is invalid - self signed certificate
Continuing execution. Use -S option for racadm to stop execution on certificate-related errors.
stderr:/opt/dell/srvadmin/sbin/racadm: line 13: printf: 0xError: invalid hex number ERROR: Unable to
connect to RAC at specified IP address.
20170509 14:46:17 INFO Detected class smc_wa for node with IPMI IP 10.30.15.41
20170509 14:46:17 INFO Preparing NOS package and making node specific Phoenix image
dation.node_10.30.15.47.iso
s/foundation.node_10.30.15.47.iso
20170509 14:47:04 DEBUG vmwa status: vmwa status Device 1: None Device 2: ISO File
[/home/nutanix/foundation/tmp/sessions/20170509-144208-
5/phoenix_node_isos/foundation.node_10.30.15.47.iso]
_node_isos/foundation.node_10.30.15.47.iso
20170509 14:50:30 INFO NIC with PCI address 04:00:0 will be used for NIC teaming if default teaming
fails
20170509 14:50:31 ERROR Command '/usr/bin/ipmicfg-linux.x86_64 -tp info' returned error code 13
stdout: stderr: Not TwinPro
20170509 14:50:31 ERROR Command '/usr/bin/ipmicfg-linux.x86_64 -tp nodeid' returned error code 13
stdout: stderr:Not TwinPro
20170509 14:50:31 INFO Making node specific Phoenix json. This may take few minutes
20170509 14:50:42 INFO Start downloading resources, this may take several minutes
20170509 14:50:42 INFO Waiting for Phoenix to finish downloading resources
ng = NX-1065S, node_position = A
20170509 14:50:43 INFO Downloading file 'nutanix_installer_package.tar' with size: 3702784000 bytes.
20170509 14:50:54 INFO Generating unique SSH identity for this Hypervisor-CVM pair.
20170509 14:50:54 INFO Generating SSL certificate for this Hypervisor-CVM pair.
20170509 14:50:55 INFO Extracting the SVM installer into memory. This will take some time...
20170509 14:50:56 INFO Using hcl from /phoenix/hcl.json with last_edit 1488347720
20170509 14:50:56 INFO Formatting all data disks ['sdd', 'sde', 'sdc']
', '--node_uuid=ca38fdbb-3e9a-43ae-b6a9-22905d61cf0d']
20170509 14:52:28 INFO Scanning all disks to assemble RAID. This may take few minutes
20170509 14:52:40 INFO Scanning all disks to assemble RAID. This may take few minutes
20170509 14:52:41 INFO Scanning all disks to assemble RAID. This may take few minutes
20170509 14:52:42 INFO Scanning all disks to assemble RAID. This may take few minutes
20170509 14:52:43 INFO Scanning all disks to assemble RAID. This may take few minutes
20170509 14:53:01 INFO Scanning all disks to assemble RAID. This may take few minutes
20170509 14:53:02 INFO Scanning all disks to assemble RAID. This may take few minutes
20170509 14:53:03 INFO Scanning all disks to assemble RAID. This may take few minutes
20170509 14:53:04 INFO Scanning all disks to assemble RAID. This may take few minutes
20170509 14:53:04 DEBUG 2017-05-09 21:51:20 INFO svm_rescue:602 Will image ['/dev/sdc'] from
/mnt/svm_installer/install/images
/svm.tar.xz.
2017-05-09 21:51:20 INFO svm_rescue:439 Disks detected from Phoenix: ['/dev/sdd', '/dev/sde',
'/dev/sdc']
2017-05-09 21:51:23 INFO svm_rescue:413 Need to repartition and format blank boot drive /dev/sdc
2017-05-09 21:52:00 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/usr/local/nutanix; tar -xvzf
/mnt/svm_installer/install
/pkg/nutanix-bootstrap-el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9.tar.gz'
2017-05-09 21:52:01 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/usr/local/nutanix; tar -xvzf
/mnt/svm_installer/install
/pkg/nutanix-core-el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9.tar.gz'
2017-05-09 21:52:08 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/usr/local/nutanix; tar -xvzf
/mnt/svm_installer/install
/pkg/nutanix-diagnostics-el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9.tar.gz'
2017-05-09 21:52:12 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/usr/local/nutanix; tar -xvzf
/mnt/svm_installer/install
/pkg/nutanix-infrastructure-el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9.tar.gz'
2017-05-09 21:52:12 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/usr/local/nutanix; tar -xvzf
/mnt/svm_installer/install
/pkg/nutanix-serviceability-el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9.tar.gz'
2017-05-09 21:52:12 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/usr/local/nutanix; tar -xvzf
/mnt/svm_installer/install
/pkg/nutanix-ncc-el6-release-ncc-3.0.1.1-latest.tar.gz'
2017-05-09 21:52:13 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/usr/local/nutanix; tar -xvzf
/mnt/svm_installer/install
/pkg/nutanix-minervacvm-el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9.tar.gz'
2017-05-09 21:52:13 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/usr/local/nutanix; tar -xvzf
/mnt/svm_installer/install
/pkg/nutanix-perftools-el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9.tar.gz'
2017-05-09 21:52:13 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/usr/local/nutanix; tar -xvzf
/mnt/svm_installer/install
/pkg/nutanix-syscheck-el6-release-danube-4.7.5.2-stable-
7c83bbaf29e9b603f3a0825bee65f568d79603b9.tar.gz'
2017-05-09 21:52:13 INFO svm_rescue:97 exec_cmd: bash -c 'cd /mnt/disk/srv/; tar -xvzf
/mnt/svm_installer/install/pkg/nutanix-
salt-el6-release-danube-4.7.5.2-stable-7c83bbaf29e9b603f3a0825bee65f568d79603b9.tar.gz'
2017-05-09 21:52:13 INFO svm_rescue:367 Root filesystem (on /dev/sdc1) UUID is a394d0df-aa23-4d8d-
8f1c-4c3f6130dd6a instead of
198775c3-801f-44fb-9801-9bf243f623bc.
2017-05-09 21:52:13 INFO svm_rescue:175 Creating Nutanix boot marker file from grub.conf ...
KERNEL=/boot/vmlinuz-3.10.0-229.46.1.el6.nutanix.20170119.cvm.x86_64
200n8 console=tty0'
INITRD=/boot/initramfs-3.10.0-229.46.1.el6.nutanix.20170119.cvm.x86_64.img
installer/el6-release-danube-4.7.5.2-stable-7c83bbaf29e9b603f3a0825bee65f568d79603b9' -xvf -
9745.tar.gz -C /mnt/data/nutanix
Hypervisor installation
20170509 14:59:01 INFO Adding device [u'03:00.0', u'1000', u'0072'] to passthrough devices
20170509 14:59:30 INFO Copying network configuration crashcart scripts into /mnt/stage/root/nutanix-
network-crashcart
20170509 14:59:32 ERROR Could not copy installer logs to '/mnt/stage/var/log'! Continuing..
20170509 14:59:32 INFO Installation of Acropolis base software successful: Installation successful.
20170509 14:59:32 INFO Rebooting node. This may take several minutes: Rebooting node. This may
take several minutes
20170509 14:59:32 INFO Rebooting node. This may take several minutes
20170509 15:01:36 INFO Expanding boot partition. This may take some time.
20170509 15:01:36 INFO Running cmd ['/usr/bin/nohup /sbin/resize2fs /dev/sda1 &>/dev/null &']
FIPS mode initialized ssh: connect to host 192.168.5.2 port 22: No route to host]. Will retry in 5 seconds
FIPS mode initialized ssh: connect to host 192.168.5.2 port 22: No route to host]. Will retry in 5 seconds
FIPS mode initialized ssh: connect to host 192.168.5.2 port 22: No route to host]. Will retry in 5 seconds
FIPS mode initialized ssh: connect to host 192.168.5.2 port 22: Connection refused]. Will retry in 5
seconds
FIPS mode initialized ssh: connect to host 192.168.5.2 port 22: Connection refused]. Will retry in 5
seconds
20170509 15:02:14 INFO Cmd [['/usr/bin/ssh -i /root/firstboot/ssh_keys/nutanix -o
StrictHostKeyChecking=no -o NumberOfPassword
FIPS mode initialized ssh: connect to host 192.168.5.2 port 22: Connection refused]. Will retry in 5
seconds
FIPS mode initialized ssh: connect to host 192.168.5.2 port 22: Connection refused]. Will retry in 5
seconds
FIPS mode initialized ssh: connect to host 192.168.5.2 port 22: Connection refused]. Will retry in 5
seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM
ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM
ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM
ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
20170509 15:03:07 INFO Cmd [['/usr/bin/ssh -i /root/firstboot/ssh_keys/nutanix -o
StrictHostKeyChecking=no -o NumberOfPassword
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM
ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM
ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry in 5 seconds
Nutanix Controller VM
ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM
ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM
ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM
ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry in 5 seconds
FIPS mode initialized Warning: Permanently added '192.168.5.2' (RSA) to the list of known hosts.
Nutanix Controller VM ls: cannot access /tmp/svm_boot_succeeded: No such file or directory]. Will retry
in 5 seconds
2017-05-26 13:40:11 INFO server.py:391 Creating upgrade zookeeper nodes if not already created
2017-05-26 13:40:11 INFO server.py:179 Registering method name and class name
2017-05-26 13:40:11 INFO server.py:183 Starting Dispatcher with component name: minerva_cvm
2017-05-26 13:40:11 INFO server.py:219 Initializing the CVM IP address list cache at zk node
/appliance/logical/samba/cvm_ipv4_address_list_cache
2017-05-26 13:40:11 INFO dispatcher.py:82 Fetching the task list for the component
2017-05-26 13:40:11 INFO server.py:206 Successfully cached the CVM IP address list present in zeus.
2017-05-26 13:40:11 INFO dispatcher.py:88 Fetched the task list
uuid: "9e2de0ba-3d4c-411f-8f85-621a2f5f6542"
name: "afs01"
nvm_list {
uuid: "ac3c7060-db9e-4297-b6f4-564b1083ef92"
name: "NTNX-afs01-1"
num_vcpus: 4
memory_mb: 12288
local_cvm_ipv4_address: ""
nvm_list {
uuid: "ddbf4012-800d-48e1-a92c-c6ca2c612563"
name: "NTNX-afs01-2"
num_vcpus: 4
memory_mb: 12288
local_cvm_ipv4_address: ""
nvm_list {
uuid: "66386092-223b-48d5-9340-9f25b6338229"
name: "NTNX-afs01-3"
num_vcpus: 4
memory_mb: 12288
local_cvm_ipv4_address: ""
internal_network {
ipv4_address_list: "10.30.14.244"
ipv4_address_list: "10.30.14.245"
ipv4_address_list: "10.30.14.246"
ipv4_address_list: "10.30.14.247"
netmask_ipv4_address: "255.255.240.0"
gateway_ipv4_address: "10.30.0.1"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
external_network_list {
ipv4_address_list: "10.30.15.241"
ipv4_address_list: "10.30.15.242"
ipv4_address_list: "10.30.15.243"
netmask_ipv4_address: "255.255.240.0"
gateway_ipv4_address: "10.30.0.1"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
dns_ipv4_address: "10.30.15.91"
ntp_ipv4_address: "10.30.15.91"
container_uuid: "cdc48f84-afcf-4b8d-b53c-7026d62512a0"
join_domain {
realm_name: "learn.nutanix.local"
username: "administrator"
password: "********"
set_spn_dns_only: false
}
size_bytes: 1099511627776
is_new_container: true
status: kNotStarted
time_stamp: 1496164063
task_header {
file_server_uuid: "9e2de0ba-3d4c-411f-8f85-621a2f5f6542"
file_server_name: "afs01"
version: "2.1.0.1"
size: 1360936448
md5_sum: "42b2b3336a5dd02e3ac58ce0b4b048e3"
service_vm_id: 6
filepath: "ndfs:///NutanixManagementShare/afs/2.1.0.1"
compatible_version_list: "2.1.0"
compatible_version_list: "2.0.2"
compatible_version_list: "2.0.1"
compatible_version_list: "2.0.0.3"
compatible_version_list: "2.0.0.2"
compatible_version_list: "2.0.0"
operation_type: kUpload
operation_status: kCompleted
operator_type: kUser
compatible_nos_version_list: "5.0"
compatible_nos_version_list: "5.0.0.*"
compatible_nos_version_list: "5.0.1"
compatible_nos_version_list: "5.0.1.*"
compatible_nos_version_list: "5.0.2"
compatible_nos_version_list: "5.0.2.*"
compatible_nos_version_list: "5.0.3"
compatible_nos_version_list: "5.0.3.*"
compatible_nos_version_list: "5.1"
compatible_nos_version_list: "5.1.0.1"
compatible_nos_version_list: "5.1.0.*"
download_time: 1495836373927000
release_date: 1494010736
full_release_version: "el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
2017-05-30 10:07:43 INFO genesis_utils.py:2484 Node 10.30.15.47 is not light compute node
2017-05-30 10:07:43 INFO genesis_utils.py:2484 Node 10.30.15.48 is not light compute node
2017-05-30 10:07:43 INFO genesis_utils.py:2484 Node 10.30.15.49 is not light compute node
uuid: "$<\263\256\213\245O\265\272\216_\355\210\312\020\213"
logical_timestamp: 1
type: kBridged
identifier: 0
name: "vlan0"
ipv4_address_list: "10.30.14.245"
ipv4_address_list: "10.30.14.246"
ipv4_address_list: "10.30.14.247"
netmask_ipv4_address: "255.255.240.0"
gateway_ipv4_address: "10.30.0.1"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
uuid: "$<\263\256\213\245O\265\272\216_\355\210\312\020\213"
logical_timestamp: 1
type: kBridged
identifier: 0
name: "vlan0"
ipv4_address_list: "10.30.15.243"
netmask_ipv4_address: "255.255.240.0"
gateway_ipv4_address: "10.30.0.1"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
2017-05-30 10:07:43 INFO genesis_utils.py:2484 Node 10.30.15.47 is not light compute node
2017-05-30 10:07:43 INFO genesis_utils.py:2484 Node 10.30.15.48 is not light compute node
2017-05-30 10:07:43 INFO genesis_utils.py:2484 Node 10.30.15.49 is not light compute node
2017-05-30 10:07:43 INFO validation.py:187 Total number of usable cvms: 3, nvms requested: 3
parent_task_uuid: "\037\3734t\277\004B\252\235\r?\212\267\224\324s"
name: "NTNX-afs01-1"
num_vcpus: 4
memory_size_mb: 12288
nic_list {
network_uuid: "$<\263\256\213\245O\265\272\216_\355\210\312\020\213"
}
ha_priority: 100
hwclock_timezone: "UTC"
▒<p`۞B▒▒▒VK▒▒
parent_task_uuid: "\037\3734t\277\004B\252\235\r?\212\267\224\324s"
name: "NTNX-afs01-2"
num_vcpus: 4
memory_size_mb: 12288
nic_list {
network_uuid: "$<\263\256\213\245O\265\272\216_\355\210\312\020\213"
ha_priority: 100
hwclock_timezone: "UTC"
H▒,▒▒,a%c
parent_task_uuid: "\037\3734t\277\004B\252\235\r?\212\267\224\324s"
name: "NTNX-afs01-3"
num_vcpus: 4
memory_size_mb: 12288
nic_list {
network_uuid: "$<\263\256\213\245O\265\272\216_\355\210\312\020\213"
ha_priority: 100
hwclock_timezone: "UTC"
f8`▒";HՓ@▒%▒3▒)
uuid: "$<\263\256\213\245O\265\272\216_\355\210\312\020\213"
logical_timestamp: 4
type: kBridged
identifier: 0
name: "vlan0"
uuid: "$<\263\256\213\245O\265\272\216_\355\210\312\020\213"
logical_timestamp: 4
type: kBridged
identifier: 0
name: "vlan0"
}
2017-05-30 10:07:44 ERROR uvm_network_utils.py:234 Network is acropolis managed.
uuid: "$<\263\256\213\245O\265\272\216_\355\210\312\020\213"
logical_timestamp: 4
type: kBridged
identifier: 0
name: "vlan0"
pipe 4
pipe 4
pipe 4
pipe 4
pipe 4
pipe 4
2017-05-30 10:07:47 INFO minerva_utils.py:500 Pinged ip list [u'10.30.15.240'], Non pinged ip list
[u'10.30.14.244', u'10.30.14.245', u'10.30.15.241', u'10.30.14.246', u'10.30.15.242', u'10.30.14.247',
u'10.30.15.243']
2017-05-30 10:07:47 INFO uvm_utils.py:1419 Creating VM anti-affinity group with vm group name
NTNX-afs01.
2017-05-30 10:07:47 INFO file_server.py:3068 Anti affinity rule of vm group name: NTNX-afs01
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 2
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
}
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 2
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 2
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
2017-05-30 10:09:16 INFO file_server.py:543 Polling all file server Deploy Vm Task
resolv_conf:
nameservers: ['10.30.15.91']
runcmd:
- sudo /usr/local/nutanix/cluster/bin/configure_network_configs --
internal_ipconfig="10.30.14.245/255.255.240.0/10.30.0.1" --
external_ipconfig="10.30.15.241/255.255.240.0/10.30.0.1" --external_interface_name="eth0:2"
- sudo /usr/local/nutanix/cluster/bin/configure_network_routes --
internal_ipconfig="10.30.14.245/255.255.240.0/10.30.0.1" --
internal_config_iproute_list="10.30.0.0/20"
- sudo /usr/local/nutanix/cluster/bin/configure_services --enable_services="ntpdate" --
disable_services="" --dns_servers="10.30.15.91" --ntp_servers="10.30.15.91"
- sudo /etc/rc.d/rc.local
users:
- name: nutanix
lock_passwd: false
ssh-authorized-keys:
- ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQCgvTZfgYRWtpb3+cE/qJwW1K7oGVbQKEgaQrdujnE07bKHecslQ
gCD2VnFJaEzeRZHsX5GC9LDOVrvDKDV6DltgeKMrv1k4yO4xH9nttSZMMPfjgGddHy6pW7Dc/ibU6wl4G/9
VHtjm8+vVbBo3wAEguU/lAR5lrbVkyZ0OT+HxYiVAagCPljWGYFrO7U7/AMjSWC1zqKFgC1q2ye7wFejawiB
86nxuHT6uMbiTxrbzMFL8X3VBZKe5PRrBMiDAjvRmm69ZD2vEUnl2B+YyGDOyNOwvdDdzfjsFCXn5oRRU6
GNybmDXeu9XCy7zna2GwcQwMcn2HHhS71paxPuaY6N nutanix@NTNX-16SM13150152-C-CVM
- ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDeSSYYtSwStBLEo6MYx0HLn6eatXKnJNdkBQJoOOiKD3b4dzsLLy
T0jaSjLcPHhE1m1KBoWtGyhRT8xzm76YRksU+6fvB3h/mFHnAj0hme7n1TYPr224z34yUZLuwYlp/wQhArxZ
YODo/1wZrGA1crfIvymYMEw52JBlFiJu6QMC6MfF9RHfxFeu1b9vj8aqrZlfhWqPkkAIGErAEpYsbPlH0t7PM
oBTRSkYmM73UCs0xIAGzIn+MK0hCcYYK6oGRLPtJGe7S6beyZtxp/xTHoHVJR6SC1ub5nGnR723O7/AwbC
qf5dWqjoWoXCxah7Jc3FOPyvk5ROLfRrOfD14at nutanix@NTNX-16SM13150152-B-CVM
- ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQCiIkYVSgNUKAGslHBDu1QswH66JA/X5tPThr4k96SQy9LrKnNUSc
jaUOIg5H0iglpEqEnOxC8CtEX5XUybT82GU2l9PUxa3uwd2cMWKqcYeg1w3WIs+x9vO1G6QOTPVGLVM07
7uyVlVs2CVO7uuDC3KTEQaszfi7NHMIQKig/9w3KCMLD44c/zj4FqIOuKezyhMjCIjITOASj6/yRLlF8wS1EY08
OmrXUyFugv1ORVndOmTaQOMsYPd2XGm/jLIPCfqVtIGt7Trs8XT+kh2d8uSvtOopKnJ4Ej+s2SL0c1xN6Wo6
A/LigTYCEaHVpeFyMZddmrdCojimTArWO/f3VD nutanix@NTNX-16SM13150152-A-CVM
2017-05-30 10:09:16 INFO file_server_misc.py:488 Starting iso creation for fsvm 10.30.14.245.
network-interfaces: |
auto eth0
address 10.30.14.245
network 10.30.0.0
netmask 255.255.240.0
broadcast 10.30.15.255
gateway 10.30.0.1
resolv_conf:
nameservers: ['10.30.15.91']
runcmd:
- sudo /usr/local/nutanix/cluster/bin/configure_network_configs --
internal_ipconfig="10.30.14.246/255.255.240.0/10.30.0.1" --
external_ipconfig="10.30.15.242/255.255.240.0/10.30.0.1" --external_interface_name="eth0:2"
- sudo /usr/local/nutanix/cluster/bin/configure_network_routes --
internal_ipconfig="10.30.14.246/255.255.240.0/10.30.0.1" --
internal_config_iproute_list="10.30.0.0/20"
- sudo /etc/rc.d/rc.local
users:
- name: nutanix
lock_passwd: false
ssh-authorized-keys:
- ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQCgvTZfgYRWtpb3+cE/qJwW1K7oGVbQKEgaQrdujnE07bKHecslQ
gCD2VnFJaEzeRZHsX5GC9LDOVrvDKDV6DltgeKMrv1k4yO4xH9nttSZMMPfjgGddHy6pW7Dc/ibU6wl4G/9
VHtjm8+vVbBo3wAEguU/lAR5lrbVkyZ0OT+HxYiVAagCPljWGYFrO7U7/AMjSWC1zqKFgC1q2ye7wFejawiB
86nxuHT6uMbiTxrbzMFL8X3VBZKe5PRrBMiDAjvRmm69ZD2vEUnl2B+YyGDOyNOwvdDdzfjsFCXn5oRRU6
GNybmDXeu9XCy7zna2GwcQwMcn2HHhS71paxPuaY6N nutanix@NTNX-16SM13150152-C-CVM
- ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDeSSYYtSwStBLEo6MYx0HLn6eatXKnJNdkBQJoOOiKD3b4dzsLLy
T0jaSjLcPHhE1m1KBoWtGyhRT8xzm76YRksU+6fvB3h/mFHnAj0hme7n1TYPr224z34yUZLuwYlp/wQhArxZ
YODo/1wZrGA1crfIvymYMEw52JBlFiJu6QMC6MfF9RHfxFeu1b9vj8aqrZlfhWqPkkAIGErAEpYsbPlH0t7PM
oBTRSkYmM73UCs0xIAGzIn+MK0hCcYYK6oGRLPtJGe7S6beyZtxp/xTHoHVJR6SC1ub5nGnR723O7/AwbC
qf5dWqjoWoXCxah7Jc3FOPyvk5ROLfRrOfD14at nutanix@NTNX-16SM13150152-B-CVM
- ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQCiIkYVSgNUKAGslHBDu1QswH66JA/X5tPThr4k96SQy9LrKnNUSc
jaUOIg5H0iglpEqEnOxC8CtEX5XUybT82GU2l9PUxa3uwd2cMWKqcYeg1w3WIs+x9vO1G6QOTPVGLVM07
7uyVlVs2CVO7uuDC3KTEQaszfi7NHMIQKig/9w3KCMLD44c/zj4FqIOuKezyhMjCIjITOASj6/yRLlF8wS1EY08
OmrXUyFugv1ORVndOmTaQOMsYPd2XGm/jLIPCfqVtIGt7Trs8XT+kh2d8uSvtOopKnJ4Ej+s2SL0c1xN6Wo6
A/LigTYCEaHVpeFyMZddmrdCojimTArWO/f3VD nutanix@NTNX-16SM13150152-A-CVM
2017-05-30 10:09:16 INFO file_server_misc.py:488 Starting iso creation for fsvm 10.30.14.246.
network-interfaces: |
auto eth0
address 10.30.14.246
network 10.30.0.0
netmask 255.255.240.0
broadcast 10.30.15.255
gateway 10.30.0.1
resolv_conf:
nameservers: ['10.30.15.91']
runcmd:
- sudo /usr/local/nutanix/cluster/bin/configure_network_configs --
internal_ipconfig="10.30.14.247/255.255.240.0/10.30.0.1" --
external_ipconfig="10.30.15.243/255.255.240.0/10.30.0.1" --external_interface_name="eth0:2"
- sudo /usr/local/nutanix/cluster/bin/configure_network_routes --
internal_ipconfig="10.30.14.247/255.255.240.0/10.30.0.1" --
internal_config_iproute_list="10.30.0.0/20"
- sudo /etc/rc.d/rc.local
users:
- name: nutanix
lock_passwd: false
ssh-authorized-keys:
- ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQCgvTZfgYRWtpb3+cE/qJwW1K7oGVbQKEgaQrdujnE07bKHecslQ
gCD2VnFJaEzeRZHsX5GC9LDOVrvDKDV6DltgeKMrv1k4yO4xH9nttSZMMPfjgGddHy6pW7Dc/ibU6wl4G/9
VHtjm8+vVbBo3wAEguU/lAR5lrbVkyZ0OT+HxYiVAagCPljWGYFrO7U7/AMjSWC1zqKFgC1q2ye7wFejawiB
86nxuHT6uMbiTxrbzMFL8X3VBZKe5PRrBMiDAjvRmm69ZD2vEUnl2B+YyGDOyNOwvdDdzfjsFCXn5oRRU6
GNybmDXeu9XCy7zna2GwcQwMcn2HHhS71paxPuaY6N nutanix@NTNX-16SM13150152-C-CVM
- ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDeSSYYtSwStBLEo6MYx0HLn6eatXKnJNdkBQJoOOiKD3b4dzsLLy
T0jaSjLcPHhE1m1KBoWtGyhRT8xzm76YRksU+6fvB3h/mFHnAj0hme7n1TYPr224z34yUZLuwYlp/wQhArxZ
YODo/1wZrGA1crfIvymYMEw52JBlFiJu6QMC6MfF9RHfxFeu1b9vj8aqrZlfhWqPkkAIGErAEpYsbPlH0t7PM
oBTRSkYmM73UCs0xIAGzIn+MK0hCcYYK6oGRLPtJGe7S6beyZtxp/xTHoHVJR6SC1ub5nGnR723O7/AwbC
qf5dWqjoWoXCxah7Jc3FOPyvk5ROLfRrOfD14at nutanix@NTNX-16SM13150152-B-CVM
- ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQCiIkYVSgNUKAGslHBDu1QswH66JA/X5tPThr4k96SQy9LrKnNUSc
jaUOIg5H0iglpEqEnOxC8CtEX5XUybT82GU2l9PUxa3uwd2cMWKqcYeg1w3WIs+x9vO1G6QOTPVGLVM07
7uyVlVs2CVO7uuDC3KTEQaszfi7NHMIQKig/9w3KCMLD44c/zj4FqIOuKezyhMjCIjITOASj6/yRLlF8wS1EY08
OmrXUyFugv1ORVndOmTaQOMsYPd2XGm/jLIPCfqVtIGt7Trs8XT+kh2d8uSvtOopKnJ4Ej+s2SL0c1xN6Wo6
A/LigTYCEaHVpeFyMZddmrdCojimTArWO/f3VD nutanix@NTNX-16SM13150152-A-CVM
2017-05-30 10:09:16 INFO file_server_misc.py:488 Starting iso creation for fsvm 10.30.14.247.
network-interfaces: |
auto eth0
address 10.30.14.247
network 10.30.0.0
netmask 255.255.240.0
broadcast 10.30.15.255
gateway 10.30.0.1
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 3
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
host_uuid: "\266\253\330\330\326\300Jo\214^,\352\332HA\307"
state: kOn
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 3
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
host_uuid: "o\225\352\224V6J\327\203\250\315\014T\025\325\363"
state: kOn
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 3
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
host_uuid: "\266\253\330\330\326\300Jo\214^,\352\332HA\307"
state: kOn
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
logical_timestamp: 6
uuid: "!\t\002\237$\277ON\215|\232\276l\264\207<"
sequence_id: 8
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020L>\350[\035\251H4\243\355\373\263\257\032\321\242\032\020\335\277@\022\200\rH\3
41\251,\306\312,a%c \002"
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164156955905
start_time_usecs: 1496164156985891
complete_time_usecs: 1496164157876469
last_updated_time_usecs: 1496164157876469
entity_list {
entity_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "L>\350[\035\251H4\243\355\373\263\257\032\321\242"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\"S\'\\xcb\\x8d)vN\\xd5\\x9d%\\x8c\
\x0f07\\xfa\\xfe\"\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS\'\np4\nNt
p5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
uuid: "!\t\002\237$\277ON\215|\232\276l\264\207<"
sequence_id: 8
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020L>\350[\035\251H4\243\355\373\263\257\032\321\242\032\020\335\277@\022\200\rH\3
41\251,\306\312,a%c \002"
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164156955905
start_time_usecs: 1496164156985891
complete_time_usecs: 1496164157876469
last_updated_time_usecs: 1496164157876469
entity_list {
entity_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "L>\350[\035\251H4\243\355\373\263\257\032\321\242"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\"S\'\\xcb\\x8d)vN\\xd5\\x9d%\\x8c\
\x0f07\\xfa\\xfe\"\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS\'\np4\nNt
p5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 4
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
logical_timestamp: 6
uuid: "B#\227x`\313Mf\236:\006\014\217~\022\353"
sequence_id: 7
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020\336sN\001\250\327O\334\231b\004\3416\010&3\032\020\254<p`\333\236B\227\266\364
VK\020\203\357\222 \002"
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164156927117
start_time_usecs: 1496164156953122
complete_time_usecs: 1496164157898542
last_updated_time_usecs: 1496164157898542
entity_list {
entity_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "\336sN\001\250\327O\334\231b\004\3416\010&3"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'0\\x98og\\x9dMK\\xcc\\xb8O\\xf8\
\x03ZO\\xb2\\xbf\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS\'\np4\nNt
p5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
sequence_id: 7
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020\336sN\001\250\327O\334\231b\004\3416\010&3\032\020\254<p`\333\236B\227\266\364
VK\020\203\357\222 \002"
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164156927117
start_time_usecs: 1496164156953122
complete_time_usecs: 1496164157898542
last_updated_time_usecs: 1496164157898542
entity_list {
entity_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "\336sN\001\250\327O\334\231b\004\3416\010&3"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'0\\x98og\\x9dMK\\xcc\\xb8O\\xf8\
\x03ZO\\xb2\\xbf\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS\'\np4\nNt
p5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 4
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 4
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 4
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
logical_timestamp: 5
uuid: ":\0228+d^@\341\203e\225\262T\261\364\226"
sequence_id: 9
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020\2226\306\311P\200Em\270\2177\264\236(\326\307\032\020f8`\222\";H\325\223@\237%
\2663\202) \002"
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164156971550
start_time_usecs: 1496164157003359
complete_time_usecs: 1496164158248746
last_updated_time_usecs: 1496164158248746
entity_list {
entity_id: "f8`\222\";H\325\223@\237%\2663\202)"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "\2226\306\311P\200Em\270\2177\264\236(\326\307"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'wO\"FQ9J\\xf3\\x8f\\x15\\x96\\xb7
\\xd2\\x1c\\x95D\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS\'\np4\nNt
p5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
uuid: ":\0228+d^@\341\203e\225\262T\261\364\226"
sequence_id: 9
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020\2226\306\311P\200Em\270\2177\264\236(\326\307\032\020f8`\222\";H\325\223@\237%
\2663\202) \002"
response {
error_code: 0
ret {
embedded: ""
}
create_time_usecs: 1496164156971550
start_time_usecs: 1496164157003359
complete_time_usecs: 1496164158248746
last_updated_time_usecs: 1496164158248746
entity_list {
entity_id: "f8`\222\";H\325\223@\237%\2663\202)"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "\2226\306\311P\200Em\270\2177\264\236(\326\307"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'wO\"FQ9J\\xf3\\x8f\\x15\\x96\\xb7
\\xd2\\x1c\\x95D\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS\'\np4\nNt
p5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 4
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 4
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 5
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
disk_list {
disk_addr {
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\354m\267\271dRH\200\234m\343/\343\001\035\225"
disk_label: "ide.0"
}
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-2.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\354m\267\271dRH\200\234m\343/\343\001\035\225"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-2.iso"
}
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 5
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
}
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 5
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
disk_list {
disk_addr {
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\346\315\034\272{BB\206\206\022,G\345\312\026D"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-1.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
}
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\346\315\034\272{BB\206\206\022,G\345\312\026D"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-1.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 5
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
2017-05-30 10:09:18 INFO file_server.py:363 Waiting for afs disk attachment to FileServer Vm NTNX-
afs01-2
2017-05-30 10:09:18 INFO file_server.py:363 Waiting for afs disk attachment to FileServer Vm NTNX-
afs01-1
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 5
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
disk_list {
disk_addr {
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\002\253D3\326\237B\026\210ET.\265\253!\347"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-3.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\002\253D3\326\237B\026\210ET.\265\253!\347"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-3.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
logical_timestamp: 5
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
2017-05-30 10:09:18 INFO file_server.py:363 Waiting for afs disk attachment to FileServer Vm NTNX-
afs01-3
2017-05-30 10:09:23 INFO file_server.py:373 Afs Disk attached, attaching home disk to FileServer Vm
NTNX-afs01-1
2017-05-30 10:09:23 INFO uvm_utils.py:737 vm_info_list {
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 6
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
disk_list {
disk_addr {
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\346\315\034\272{BB\206\206\022,G\345\312\026D"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-1.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
disk_list {
disk_addr {
adapter_type: kSCSI
device_index: 0
vmdisk_uuid: "\r~\300\320\230\325FY\240l\214\340K\200\274T"
disk_label: "scsi.0"
}
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\346\315\034\272{BB\206\206\022,G\345\312\026D"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-1.iso"
}
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
adapter_type: kSCSI
device_index: 0
vmdisk_uuid: "\r~\300\320\230\325FY\240l\214\340K\200\274T"
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 6
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
2017-05-30 10:09:23 INFO file_server.py:397 Waiting for home disk attachment to FileServer Vm NTNX-
afs01-1
2017-05-30 10:09:24 INFO file_server.py:407 Home Disk attached, attaching cassandra disk to FileServer
Vm NTNX-afs01-1
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 7
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
disk_list {
disk_addr {
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\346\315\034\272{BB\206\206\022,G\345\312\026D"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-1.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
disk_list {
disk_addr {
adapter_type: kSCSI
device_index: 0
vmdisk_uuid: "\r~\300\320\230\325FY\240l\214\340K\200\274T"
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
disk_list {
disk_addr {
adapter_type: kSCSI
device_index: 1
vmdisk_uuid: "\235\274x7\3643L\362\266\010[\311\031!\371\332"
disk_label: "scsi.1"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 48318382080
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\346\315\034\272{BB\206\206\022,G\345\312\026D"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-1.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
adapter_type: kSCSI
device_index: 0
vmdisk_uuid: "\r~\300\320\230\325FY\240l\214\340K\200\274T"
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
adapter_type: kSCSI
device_index: 1
vmdisk_uuid: "\235\274x7\3643L\362\266\010[\311\031!\371\332"
disk_label: "scsi.1"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 48318382080
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 7
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
}
2017-05-30 10:09:24 INFO file_server.py:430 Waiting for cassandra disk attachment to FileServer Vm
NTNX-afs01-1
2017-05-30 10:09:29 INFO file_server.py:373 Afs Disk attached, attaching home disk to FileServer Vm
NTNX-afs01-2
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 6
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
disk_list {
disk_addr {
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\354m\267\271dRH\200\234m\343/\343\001\035\225"
disk_label: "ide.0"
}
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-2.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
disk_list {
disk_addr {
adapter_type: kSCSI
device_index: 0
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
}
state: kOff
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\354m\267\271dRH\200\234m\343/\343\001\035\225"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-2.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
adapter_type: kSCSI
device_index: 0
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
}
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 6
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
}
2017-05-30 10:09:29 INFO file_server.py:397 Waiting for home disk attachment to FileServer Vm NTNX-
afs01-2
logical_timestamp: 6
sequence_id: 18
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020\336sN\001\250\327O\334\231b\004\3416\010&3\032\020\254<p`\333\236B\227\266\364
VK\020\203\357\222 \001"
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164164443324
start_time_usecs: 1496164164468599
complete_time_usecs: 1496164170607489
last_updated_time_usecs: 1496164170607489
entity_list {
entity_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "\336sN\001\250\327O\334\231b\004\3416\010&3"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'3\\x92\\xaa\\x95\\x16?L\\xa7\\x96
<4\\x87\\xfc\\xa8kJ\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS\'\np4\n
Ntp5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
sequence_id: 18
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020\336sN\001\250\327O\334\231b\004\3416\010&3\032\020\254<p`\333\236B\227\266\364
VK\020\203\357\222 \001"
}
}
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164164443324
start_time_usecs: 1496164164468599
complete_time_usecs: 1496164170607489
last_updated_time_usecs: 1496164170607489
entity_list {
entity_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "\336sN\001\250\327O\334\231b\004\3416\010&3"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'3\\x92\\xaa\\x95\\x16?L\\xa7\\x96
<4\\x87\\xfc\\xa8kJ\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS\'\np4\n
Ntp5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
2017-05-30 10:09:30 INFO file_server.py:407 Home Disk attached, attaching cassandra disk to FileServer
Vm NTNX-afs01-2
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 7
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
disk_list {
disk_addr {
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\354m\267\271dRH\200\234m\343/\343\001\035\225"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-2.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
}
disk_list {
disk_addr {
adapter_type: kSCSI
device_index: 0
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
disk_list {
disk_addr {
adapter_type: kSCSI
device_index: 1
vmdisk_uuid: "\312\275\374\235.\347E\022\206J\344K,k\363\344"
disk_label: "scsi.1"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 48318382080
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
}
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\354m\267\271dRH\200\234m\343/\343\001\035\225"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-2.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
adapter_type: kSCSI
device_index: 0
disk_label: "scsi.0"
}
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
adapter_type: kSCSI
device_index: 1
vmdisk_uuid: "\312\275\374\235.\347E\022\206J\344K,k\363\344"
disk_label: "scsi.1"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 48318382080
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 7
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
2017-05-30 10:09:30 INFO file_server.py:430 Waiting for cassandra disk attachment to FileServer Vm
NTNX-afs01-2
logical_timestamp: 6
uuid: "c9i\202\331A@\353\215\253\r\007C\221\241?"
sequence_id: 21
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020L>\350[\035\251H4\243\355\373\263\257\032\321\242\032\020\335\277@\022\200\rH\3
41\251,\306\312,a%c \001"
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164171056577
start_time_usecs: 1496164171089171
complete_time_usecs: 1496164173615734
last_updated_time_usecs: 1496164173615734
entity_list {
entity_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "L>\350[\035\251H4\243\355\373\263\257\032\321\242"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'\\x93\\xfc\\x1a7\\xc4UB/\\x91\\x8
4\\x7f\\xe9\\xb3Y\\xa0\\xcd\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULT
S\'\np4\nNtp5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
uuid: "c9i\202\331A@\353\215\253\r\007C\221\241?"
sequence_id: 21
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020L>\350[\035\251H4\243\355\373\263\257\032\321\242\032\020\335\277@\022\200\rH\3
41\251,\306\312,a%c \001"
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164171056577
start_time_usecs: 1496164171089171
complete_time_usecs: 1496164173615734
last_updated_time_usecs: 1496164173615734
entity_list {
entity_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "L>\350[\035\251H4\243\355\373\263\257\032\321\242"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'\\x93\\xfc\\x1a7\\xc4UB/\\x91\\x8
4\\x7f\\xe9\\xb3Y\\xa0\\xcd\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULT
S\'\np4\nNtp5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
2017-05-30 10:09:33 INFO file_server.py:373 Afs Disk attached, attaching home disk to FileServer Vm
NTNX-afs01-3
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 6
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
disk_list {
disk_addr {
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\002\253D3\326\237B\026\210ET.\265\253!\347"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-3.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
disk_list {
disk_addr {
adapter_type: kSCSI
device_index: 0
vmdisk_uuid: "2ZH\024\023\364B\310\240\003\307~i\267\r\023"
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
}
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\002\253D3\326\237B\026\210ET.\265\253!\347"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-3.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
2017-05-30 10:09:33 INFO uvm_utils.py:742 disk_addr {
adapter_type: kSCSI
device_index: 0
vmdisk_uuid: "2ZH\024\023\364B\310\240\003\307~i\267\r\023"
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 6
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
}
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
2017-05-30 10:09:33 INFO file_server.py:397 Waiting for home disk attachment to FileServer Vm NTNX-
afs01-3
2017-05-30 10:09:33 INFO file_server.py:407 Home Disk attached, attaching cassandra disk to FileServer
Vm NTNX-afs01-3
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 7
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
disk_list {
disk_addr {
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\002\253D3\326\237B\026\210ET.\265\253!\347"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-3.iso"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
disk_list {
disk_addr {
adapter_type: kSCSI
device_index: 0
vmdisk_uuid: "2ZH\024\023\364B\310\240\003\307~i\267\r\023"
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
disk_list {
disk_addr {
adapter_type: kSCSI
device_index: 1
vmdisk_uuid: "\213\205\023\2733ZJf\254\237\r\336},\3403"
disk_label: "scsi.1"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 48318382080
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
adapter_type: kIDE
device_index: 0
vmdisk_uuid: "\002\253D3\326\237B\026\210ET.\265\253!\347"
disk_label: "ide.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/NTNX-afs01-3.iso"
}
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 376832
is_cdrom: true
adapter_type: kSCSI
device_index: 0
vmdisk_uuid: "2ZH\024\023\364B\310\240\003\307~i\267\r\023"
disk_label: "scsi.0"
source_vmdisk_addr {
nfs_path: "/Nutanix_afs01_ctr/el6-release-euphrates-5.1.0.1-stable-
419aa3a83df5548924198f85398deb20e8b615fe"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 12884901888
adapter_type: kSCSI
device_index: 1
vmdisk_uuid: "\213\205\023\2733ZJf\254\237\r\336},\3403"
disk_label: "scsi.1"
container_id: 12406
scsi_passthrough_enabled: true
size_bytes: 48318382080
2017-05-30 10:09:33 INFO file_server.py:419 Attaching cassandra disk to FileServerVm NTNX-afs01-3
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 7
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
state: kOff
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
2017-05-30 10:09:33 INFO file_server.py:430 Waiting for cassandra disk attachment to FileServer Vm
NTNX-afs01-3
logical_timestamp: 6
uuid: "=\367MBd\304L\311\246x\342`17\243."
sequence_id: 24
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020\2226\306\311P\200Em\270\2177\264\236(\326\307\032\020f8`\222\";H\325\223@\237%
\2663\202) \001"
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164174221192
start_time_usecs: 1496164174243824
complete_time_usecs: 1496164175618069
last_updated_time_usecs: 1496164175618069
entity_list {
entity_id: "f8`\222\";H\325\223@\237%\2663\202)"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "\2226\306\311P\200Em\270\2177\264\236(\326\307"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'\\x8ba\\xf6\\xc0\\xa5\\xfbJ\\xdf\\x
bd
E\\x8b\\xda\\xd7\\x1a\\xd1\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS
\'\np4\nNtp5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
uuid: "=\367MBd\304L\311\246x\342`17\243."
sequence_id: 24
request {
method_name: "VmChangePowerState"
arg {
embedded:
"\022\020\2226\306\311P\200Em\270\2177\264\236(\326\307\032\020f8`\222\";H\325\223@\237%
\2663\202) \001"
}
}
response {
error_code: 0
ret {
embedded: ""
create_time_usecs: 1496164174221192
start_time_usecs: 1496164174243824
complete_time_usecs: 1496164175618069
last_updated_time_usecs: 1496164175618069
entity_list {
entity_id: "f8`\222\";H\325\223@\237%\2663\202)"
entity_type: kVM
operation_type: "VmChangePowerState"
message: ""
percentage_complete: 100
status: kSucceeded
parent_task_uuid: "\2226\306\311P\200Em\270\2177\264\236(\326\307"
component: "Uhura"
canceled: false
internal_opaque:
"(dp0\nS\'internal_vm_change_power_state_op_handle\'\np1\nS\'\\x8ba\\xf6\\xc0\\xa5\\xfbJ\\xdf\\x
bd
E\\x8b\\xda\\xd7\\x1a\\xd1\'\np2\nsS\'internal_current_state_info\'\np3\n(S\'CONSOLIDATE_RESULTS
\'\np4\nNtp5\ns."
deleted: false
internal_task: false
cluster_uuid: "\000\005PsR\302I\247\000\000\000\000\000\000\272\272"
disable_auto_progress_update: true
weight: 1000
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 9
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
host_uuid: "\266\253\330\330\326\300Jo\214^,\352\332HA\307"
state: kOn
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
}
vm_info_list {
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 9
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
host_uuid: "v$\262(\336;CW\252\322\021\261<%l\037"
state: kOn
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
vm_info_list {
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 9
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
host_uuid: "o\225\352\224V6J\327\203\250\315\014T\025\325\363"
state: kOn
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
2017-05-30 10:14:01 INFO file_server.py:2560 pre cloud init check for NTNX-afs01-1
2017-05-30 10:14:01 INFO file_server.py:2564 Checking if able to ssh for FileServerVm:NTNX-afs01-1 ip
10.30.14.245
2017-05-30 10:14:01 INFO file_server.py:2560 pre cloud init check for NTNX-afs01-2
2017-05-30 10:14:01 INFO file_server.py:2560 pre cloud init check for NTNX-afs01-3
2017-05-30 10:14:02 INFO minerva_utils.py:492 Pinging 10.30.0.1 PING 10.30.0.1 (10.30.0.1) 56(84)
bytes of data.
2017-05-30 10:14:02 INFO minerva_utils.py:500 Pinged ip list [u'10.30.0.1', u'10.30.0.1'], Non pinged ip
list []
2017-05-30 10:14:02 INFO file_server.py:2583 Checking if ntp timed out for FileServerVm:NTNX-afs01-2
ip 10.30.14.246
2017-05-30 10:14:02 INFO file_server.py:2587 Checking if ntp sync 10.30.15.91 for FileServerVm:NTNX-
afs01-2 ip 10.30.14.246
2017-05-30 10:14:02 INFO minerva_utils.py:492 Pinging 10.30.0.1 PING 10.30.0.1 (10.30.0.1) 56(84)
bytes of data.
2017-05-30 10:14:02 INFO minerva_utils.py:492 Pinging 10.30.0.1 PING 10.30.0.1 (10.30.0.1) 56(84)
bytes of data.
2017-05-30 10:14:02 INFO minerva_utils.py:500 Pinged ip list [u'10.30.0.1', u'10.30.0.1'], Non pinged ip
list []
2017-05-30 10:14:02 INFO file_server.py:2583 Checking if ntp timed out for FileServerVm:NTNX-afs01-3
ip 10.30.14.247
2017-05-30 10:14:02 INFO file_server.py:2587 Checking if ntp sync 10.30.15.91 for FileServerVm:NTNX-
afs01-3 ip 10.30.14.247
2017-05-30 10:14:02 INFO minerva_utils.py:492 Pinging 10.30.0.1 PING 10.30.0.1 (10.30.0.1) 56(84)
bytes of data.
2017-05-30 10:14:02 INFO minerva_utils.py:492 Pinging 10.30.0.1 PING 10.30.0.1 (10.30.0.1) 56(84)
bytes of data.
2017-05-30 10:14:02 INFO file_server.py:2583 Checking if ntp timed out for FileServerVm:NTNX-afs01-1
ip 10.30.14.245
2017-05-30 10:14:02 INFO file_server.py:2587 Checking if ntp sync 10.30.15.91 for FileServerVm:NTNX-
afs01-1 ip 10.30.14.245
2017-05-30 10:14:03 INFO file_server.py:2597 ntp offset drifted 25238.626737 for FileServerVm:NTNX-
afs01-2 ip 10.30.14.246
2017-05-30 10:14:03 INFO file_server.py:2600 ntp offset not drifted 25238.626737 for
FileServerVm:NTNX-afs01-2 ip 10.30.14.246
2017-05-30 10:14:03 INFO file_server.py:2614 Checking if stargate vip pingable 10.30.15.240 for
FileServerVm:NTNX-afs01-2 ip 10.30.14.246
2017-05-30 10:14:03 INFO file_server.py:2597 ntp offset drifted 25238.222605 for FileServerVm:NTNX-
afs01-3 ip 10.30.14.247
2017-05-30 10:14:03 INFO file_server.py:2600 ntp offset not drifted 25238.222605 for
FileServerVm:NTNX-afs01-3 ip 10.30.14.247
2017-05-30 10:14:03 INFO file_server.py:2614 Checking if stargate vip pingable 10.30.15.240 for
FileServerVm:NTNX-afs01-3 ip 10.30.14.247
2017-05-30 10:14:03 INFO file_server.py:2597 ntp offset drifted 25238.286112 for FileServerVm:NTNX-
afs01-1 ip 10.30.14.245
2017-05-30 10:14:03 INFO file_server.py:2600 ntp offset not drifted 25238.286112 for
FileServerVm:NTNX-afs01-1 ip 10.30.14.245
2017-05-30 10:14:03 INFO file_server.py:2614 Checking if stargate vip pingable 10.30.15.240 for
FileServerVm:NTNX-afs01-1 ip 10.30.14.245
2017-05-30 10:14:03 INFO minerva_utils.py:500 Pinged ip list [u'10.30.15.240'], Non pinged ip list []
2017-05-30 10:14:03 INFO minerva_utils.py:500 Pinged ip list [u'10.30.15.240'], Non pinged ip list []
2017-05-30 10:14:03 INFO minerva_utils.py:500 Pinged ip list [u'10.30.15.240'], Non pinged ip list []
2017-05-30 10:14:03 INFO file_server.py:2656 Nvm info dict {u'NTNX-afs01-1': [], u'NTNX-afs01-3': [],
u'NTNX-afs01-2': []}
2017-05-30 10:14:15 INFO file_server.py:3733 Bringing up File server services for afs01
2017-05-30 10:14:18 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:24 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:27 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:30 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:33 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:36 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:39 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:42 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:45 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:48 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:51 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:14:57 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:15:00 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:15:03 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:15:06 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:15:09 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:15:12 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:15:15 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:15:18 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:15:21 WARNING minerva_utils.py:1040 Cannot get cluster status for 10.30.14.244,
retrying
2017-05-30 10:15:56 INFO file_server.py:3972 Sending add File server message to FSVM IP:
10.30.14.244
nvm_list {
uuid: "ac3c7060-db9e-4297-b6f4-564b1083ef92"
name: "NTNX-afs01-1"
num_vcpus: 4
memory_mb: 12288
local_cvm_ipv4_address: ""
external_ip_list {
ip_address: "10.30.15.241"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
internal_ip_address: "10.30.14.245"
nvm_list {
uuid: "ddbf4012-800d-48e1-a92c-c6ca2c612563"
name: "NTNX-afs01-2"
num_vcpus: 4
memory_mb: 12288
local_cvm_ipv4_address: ""
external_ip_list {
ip_address: "10.30.15.242"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
internal_ip_address: "10.30.14.246"
nvm_list {
uuid: "66386092-223b-48d5-9340-9f25b6338229"
name: "NTNX-afs01-3"
num_vcpus: 4
memory_mb: 12288
local_cvm_ipv4_address: ""
external_ip_list {
ip_address: "10.30.15.243"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
internal_ip_address: "10.30.14.247"
internal_network {
ipv4_address_list: "10.30.14.245"
ipv4_address_list: "10.30.14.246"
ipv4_address_list: "10.30.14.247"
virtual_ipv4_address: "10.30.14.244"
netmask_ipv4_address: "255.255.240.0"
gateway_ipv4_address: "10.30.0.1"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
external_network_list {
ipv4_address_list: "10.30.15.241"
ipv4_address_list: "10.30.15.242"
ipv4_address_list: "10.30.15.243"
netmask_ipv4_address: "255.255.240.0"
gateway_ipv4_address: "10.30.0.1"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
dns_ipv4_address: "10.30.15.91"
ntp_ipv4_address: "10.30.15.91"
container_uuid: "cdc48f84-afcf-4b8d-b53c-7026d62512a0"
join_domain {
realm_name: "learn.nutanix.local"
username: "administrator"
password: "********"
set_spn_dns_only: false
cvm_ipv4_address_list: "10.30.15.47"
cvm_ipv4_address_list: "10.30.15.48"
cvm_ipv4_address_list: "10.30.15.49"
stargate_vip: "10.30.15.240"
size_bytes: 1099511627776
number_of_schedulable_cvms: 3
is_new_container: true
2017-05-30 10:15:56 INFO file_server.py:3978 Add File server (CVM-to-FSVM) IP: 10.30.14.244
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/util/ssl_.py:334: SNIMissingWarning: An HTTPS request has been
made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may
cause the server to present an incorrect TLS certificate, which can cause validation failures. You can
upgrade to a newer version of Python to solve this. For more information, see
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/util/ssl_.py:132: InsecurePlatformWarning: A true SSLContext
object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain
SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more
information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/connectionpool.py:843: InsecureRequestWarning: Unverified
HTTPS request is being made. Adding certificate verification is strongly advised. See:
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/connectionpool.py:843: InsecureRequestWarning: Unverified
HTTPS request is being made. Adding certificate verification is strongly advised. See:
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/connectionpool.py:843: InsecureRequestWarning: Unverified
HTTPS request is being made. Adding certificate verification is strongly advised. See:
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
uuid: "9e2de0ba-3d4c-411f-8f85-621a2f5f6542"
name: "afs01"
nvm_list {
uuid: "ac3c7060-db9e-4297-b6f4-564b1083ef92"
name: "NTNX-afs01-1"
local_cvm_ipv4_address: ""
internal_ip_address: "10.30.14.245"
nvm_list {
uuid: "ddbf4012-800d-48e1-a92c-c6ca2c612563"
name: "NTNX-afs01-2"
local_cvm_ipv4_address: ""
internal_ip_address: "10.30.14.246"
nvm_list {
uuid: "66386092-223b-48d5-9340-9f25b6338229"
name: "NTNX-afs01-3"
local_cvm_ipv4_address: ""
internal_ip_address: "10.30.14.247"
internal_network {
virtual_ipv4_address: "10.30.14.244"
netmask_ipv4_address: "255.255.240.0"
gateway_ipv4_address: "10.30.0.1"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
external_network_list {
netmask_ipv4_address: "255.255.240.0"
gateway_ipv4_address: "10.30.0.1"
virtual_network_uuid: "243cb3ae-8ba5-4fb5-ba8e-5fed88ca108b"
dns_ipv4_address: "10.30.15.91"
ntp_ipv4_address: "10.30.15.91"
container_uuid: "cdc48f84-afcf-4b8d-b53c-7026d62512a0"
cvm_ipv4_address_list: "10.30.15.47"
cvm_ipv4_address_list: "10.30.15.48"
cvm_ipv4_address_list: "10.30.15.49"
stargate_vip: "10.30.15.240"
afs_version: "2.1.0-419aa3a83df5548924198f85398deb20e8b615fe\n"
nvm_uuid_vm_uuid_list {
nvm_uuid: "3a33d6ff-98f8-454f-b16e-9433d9ba4980"
vm_uuid: "ac3c7060-db9e-4297-b6f4-564b1083ef92"
nvm_uuid_vm_uuid_list {
nvm_uuid: "aa391c08-39fd-4f67-bcd1-7338d2b4fecb"
vm_uuid: "ddbf4012-800d-48e1-a92c-c6ca2c612563"
nvm_uuid_vm_uuid_list {
nvm_uuid: "b94bbb55-b07f-4f3b-8fff-36e84e694a57"
vm_uuid: "66386092-223b-48d5-9340-9f25b6338229"
vm_uuid: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
logical_timestamp: 9
config {
name: "NTNX-afs01-1"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
host_uuid: "\266\253\330\330\326\300Jo\214^,\352\332HA\307"
state: kOn
hypervisor_specific_id: "\254<p`\333\236B\227\266\364VK\020\203\357\222"
allow_live_migrate: true
vm_info_list {
vm_uuid: "\335\277@\022\200\rH\341\251,\306\312,a%c"
logical_timestamp: 9
config {
name: "NTNX-afs01-2"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
}
hypervisor {
hypervisor_type: kKvm
host_uuid: "v$\262(\336;CW\252\322\021\261<%l\037"
state: kOn
hypervisor_specific_id: "\335\277@\022\200\rH\341\251,\306\312,a%c"
allow_live_migrate: true
vm_info_list {
vm_uuid: "f8`\222\";H\325\223@\237%\2663\202)"
logical_timestamp: 9
config {
name: "NTNX-afs01-3"
num_vcpus: 4
num_cores_per_vcpu: 1
memory_size_mb: 12288
hwclock_timezone: "UTC"
ha_priority: 100
agent_vm: false
hypervisor {
hypervisor_type: kKvm
host_uuid: "o\225\352\224V6J\327\203\250\315\014T\025\325\363"
state: kOn
hypervisor_specific_id: "f8`\222\";H\325\223@\237%\2663\202)"
allow_live_migrate: true
}
2017-05-30 10:16:03 INFO disaster_recovery.py:614 Updated the consistency group
protection_domain_name: "NTNX-afs01"
consistency_group_name: "NTNX-afs01-NVMS"
add_vm {
vm_id: "ac3c7060-db9e-4297-b6f4-564b1083ef92"
vm_name: "NTNX-afs01-1"
power_on: false
add_vm {
vm_id: "ddbf4012-800d-48e1-a92c-c6ca2c612563"
vm_name: "NTNX-afs01-2"
power_on: false
add_vm {
vm_id: "66386092-223b-48d5-9340-9f25b6338229"
vm_name: "NTNX-afs01-3"
power_on: false
2017-05-30 10:16:03 INFO file_server.py:4250 Adding public keys for fsvm 10.30.14.244
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/util/ssl_.py:132: InsecurePlatformWarning: A true SSLContext
object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain
SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more
information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/connectionpool.py:843: InsecureRequestWarning: Unverified
HTTPS request is being made. Adding certificate verification is strongly advised. See:
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
virtual-ip: 10.30.14.244
response: {"name":"7624b228-de3b-4357-aad2-11b13c256c1f","key":"ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQCgvTZfgYRWtpb3+cE/qJwW1K7oGVbQKEgaQrdujnE07bKHecslQ
gCD2VnFJaEzeRZHsX5GC9LDOVrvDKDV6DltgeKMrv1k4yO4xH9nttSZMMPfjgGddHy6pW7Dc/ibU6wl4G/9
VHtjm8+vVbBo3wAEguU/lAR5lrbVkyZ0OT+HxYiVAagCPljWGYFrO7U7/AMjSWC1zqKFgC1q2ye7wFejawiB
86nxuHT6uMbiTxrbzMFL8X3VBZKe5PRrBMiDAjvRmm69ZD2vEUnl2B+YyGDOyNOwvdDdzfjsFCXn5oRRU6
GNybmDXeu9XCy7zna2GwcQwMcn2HHhS71paxPuaY6N nutanix@NTNX-16SM13150152-C-CVM"}
url: https://10.30.14.244:9440/PrismGateway/services/rest/v1/cluster/public_keys
2017-05-30 10:16:03 INFO file_server.py:4250 Adding public keys for fsvm 10.30.14.244
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/connectionpool.py:843: InsecureRequestWarning: Unverified
HTTPS request is being made. Adding certificate verification is strongly advised. See:
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
virtual-ip: 10.30.14.244
response: {"name":"b6abd8d8-d6c0-4a6f-8c5e-2ceada4841c7","key":"ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDeSSYYtSwStBLEo6MYx0HLn6eatXKnJNdkBQJoOOiKD3b4dzsLLy
T0jaSjLcPHhE1m1KBoWtGyhRT8xzm76YRksU+6fvB3h/mFHnAj0hme7n1TYPr224z34yUZLuwYlp/wQhArxZ
YODo/1wZrGA1crfIvymYMEw52JBlFiJu6QMC6MfF9RHfxFeu1b9vj8aqrZlfhWqPkkAIGErAEpYsbPlH0t7PM
oBTRSkYmM73UCs0xIAGzIn+MK0hCcYYK6oGRLPtJGe7S6beyZtxp/xTHoHVJR6SC1ub5nGnR723O7/AwbC
qf5dWqjoWoXCxah7Jc3FOPyvk5ROLfRrOfD14at nutanix@NTNX-16SM13150152-B-CVM"}
url: https://10.30.14.244:9440/PrismGateway/services/rest/v1/cluster/public_keys
2017-05-30 10:16:03 INFO file_server.py:4250 Adding public keys for fsvm 10.30.14.244
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/connectionpool.py:843: InsecureRequestWarning: Unverified
HTTPS request is being made. Adding certificate verification is strongly advised. See:
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
virtual-ip: 10.30.14.244
response: {"name":"6f95ea94-5636-4ad7-83a8-cd0c5415d5f3","key":"ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQCiIkYVSgNUKAGslHBDu1QswH66JA/X5tPThr4k96SQy9LrKnNUSc
jaUOIg5H0iglpEqEnOxC8CtEX5XUybT82GU2l9PUxa3uwd2cMWKqcYeg1w3WIs+x9vO1G6QOTPVGLVM07
7uyVlVs2CVO7uuDC3KTEQaszfi7NHMIQKig/9w3KCMLD44c/zj4FqIOuKezyhMjCIjITOASj6/yRLlF8wS1EY08
OmrXUyFugv1ORVndOmTaQOMsYPd2XGm/jLIPCfqVtIGt7Trs8XT+kh2d8uSvtOopKnJ4Ej+s2SL0c1xN6Wo6
A/LigTYCEaHVpeFyMZddmrdCojimTArWO/f3VD nutanix@NTNX-16SM13150152-A-CVM"}
url: https://10.30.14.244:9440/PrismGateway/services/rest/v1/cluster/public_keys
2017-05-30 10:16:03 INFO nvm_rpc_client.py:122 Sending rpc with arg file_server_info_needed: true
svm_uuids_needed: true
all_volume_group_set_needed: true
nvm_uuid_to_vm_uuid_needed: true
task_header {
file_server_uuid: "9e2de0ba-3d4c-411f-8f85-621a2f5f6542"
virtual_ip: "10.30.14.244"
file_server_name: "afs01"
2017-05-30 10:16:03 INFO disaster_recovery.py:1220 PD: NTNX-afs01 schedule status: False, suspend
status: False
2017-05-30 10:16:03 INFO disaster_recovery.py:1224 PD: NTNX-afs01 schedule suspend intent updated
to False
realm_name: "learn.nutanix.local"
organizational_unit: ""
username: "administrator"
password: "********"
set_spn_dns_only: false
validate_credential_only: false
nvm_only: true
preferred_domain_controller: ""
2017-05-30 10:16:03 INFO file_server.py:4139 Joining domain learn.nutanix.local for file-server
9e2de0ba-3d4c-411f-8f85-621a2f5f6542
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/util/ssl_.py:132: InsecurePlatformWarning: A true SSLContext
object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain
SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more
information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
/usr/local/nutanix/minerva/lib/py/requests-2.12.0-
py2.6.egg/requests/packages/urllib3/connectionpool.py:843: InsecureRequestWarning: Unverified
HTTPS request is being made. Adding certificate verification is strongly advised. See:
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
virtual-ip: 10.30.14.244
response: {"taskUuid":"0ab11425-cb3e-4141-ac34-33bb041767b3"}
url: https://10.30.14.244:9440/PrismGateway/services/rest/v1/vfilers/9e2de0ba-3d4c-411f-8f85-
621a2f5f6542/joinDomain
2017-05-30 10:16:08 INFO file_server.py:4837 Creating home share for fileserver 9e2de0ba-3d4c-411f-
8f85-621a2f5f6542
2017-05-30 10:16:08 INFO file_server.py:4839 Args: ergon client:<ergon.client.client.ErgonClient object
at 0x77ac?▒▒▒▒s, task_timeout:1496168163
name: "home"
file_server_uuid: "9e2de0ba-3d4c-411f-8f85-621a2f5f6542"
is_win_prev_version_enabled: false
container_uuid: "cdc48f84-afcf-4b8d-b53c-7026d62512a0"
share_type: kHomes
quota_policy_list {
principal_type: kUser
principal_name: "_default"
quota_size_bytes: 0
enforcement_type: kSoft
is_abe_enabled: false
2017-05-30 10:16:08 INFO nvm_rpc_client.py:122 Sending rpc with arg file_server_info_needed: true
svm_uuids_needed: true
all_volume_group_set_needed: true
nvm_uuid_to_vm_uuid_needed: true
2017-05-30 10:16:08 INFO nvm_rpc_client.py:122 Sending rpc with arg share_name: "home"
task_header {
file_server_uuid: "9e2de0ba-3d4c-411f-8f85-621a2f5f6542"
share_name_list: "home"
file_server_name: "afs01"
2017-05-30 10:16:08 INFO nvm_rpc_client.py:122 Sending rpc with arg svm_uuids_needed: true
container_uuid_for_vg_create: "cdc48f84-afcf-4b8d-b53c-7026d62512a0"
vg_info_by_container_to_create: true
nvm_uuid_to_vm_uuid_needed: true
share_type: kHomes
2017-05-30 10:16:08 INFO disaster_recovery.py:1220 PD: NTNX-afs01 schedule status: False, suspend
status: False
2017-05-30 10:16:08 INFO disaster_recovery.py:1224 PD: NTNX-afs01 schedule suspend intent updated
to False
2017-05-30 10:16:08 INFO minerva_task_util.py:1505 ShareAddTask: 7 : Updated wal with status 300
2017-05-30 10:16:08 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:08 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:08 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:08 INFO storage_manager.py:411 Retrieving the configured volume group 548d055f-
5b54-481d-b22d-253172493e85
2017-05-30 10:16:09 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:09 INFO storage_manager.py:411 Retrieving the configured volume group 548d055f-
5b54-481d-b22d-253172493e85
2017-05-30 10:16:09 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:09 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:09 INFO storage_manager.py:623 Max possible target for vg_uuid 548d055f-5b54-
481d-b22d-253172493e85, number 6
2017-05-30 10:16:09 INFO storage_manager.py:641 Retrieving the configured volume group 548d055f-
5b54-481d-b22d-253172493e85
2017-05-30 10:16:09 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:09 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:09 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:09 INFO storage_manager.py:411 Retrieving the configured volume group b822259b-
a259-47df-a6f1-093f02c6ac8a
2017-05-30 10:16:09 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:09 INFO storage_manager.py:411 Retrieving the configured volume group b822259b-
a259-47df-a6f1-093f02c6ac8a
2017-05-30 10:16:09 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:09 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:09 INFO storage_manager.py:623 Max possible target for vg_uuid b822259b-a259-
47df-a6f1-093f02c6ac8a, number 6
2017-05-30 10:16:09 INFO storage_manager.py:641 Retrieving the configured volume group b822259b-
a259-47df-a6f1-093f02c6ac8a
2017-05-30 10:16:09 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:09 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:10 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:10 INFO storage_manager.py:411 Retrieving the configured volume group a42e4bcf-
3a13-4178-8e21-19829e58700f
2017-05-30 10:16:10 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:10 INFO storage_manager.py:411 Retrieving the configured volume group a42e4bcf-
3a13-4178-8e21-19829e58700f
2017-05-30 10:16:10 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:10 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:10 INFO storage_manager.py:623 Max possible target for vg_uuid a42e4bcf-3a13-
4178-8e21-19829e58700f, number 6
2017-05-30 10:16:10 INFO storage_manager.py:641 Retrieving the configured volume group a42e4bcf-
3a13-4178-8e21-19829e58700f
2017-05-30 10:16:10 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:10 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:10 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:10 INFO storage_manager.py:411 Retrieving the configured volume group 52be82b1-
88ad-4a49-b00b-af4e1d4236e3
2017-05-30 10:16:10 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:10 INFO storage_manager.py:411 Retrieving the configured volume group 52be82b1-
88ad-4a49-b00b-af4e1d4236e3
2017-05-30 10:16:10 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:10 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:10 INFO storage_manager.py:623 Max possible target for vg_uuid 52be82b1-88ad-
4a49-b00b-af4e1d4236e3, number 6
2017-05-30 10:16:10 INFO storage_manager.py:641 Retrieving the configured volume group 52be82b1-
88ad-4a49-b00b-af4e1d4236e3
2017-05-30 10:16:10 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:10 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:11 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:11 INFO storage_manager.py:411 Retrieving the configured volume group 11bbfd7d-
8882-4237-9b44-b444bfdb80a0
2017-05-30 10:16:11 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:11 INFO storage_manager.py:411 Retrieving the configured volume group 11bbfd7d-
8882-4237-9b44-b444bfdb80a0
2017-05-30 10:16:11 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:11 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:11 INFO storage_manager.py:623 Max possible target for vg_uuid 11bbfd7d-8882-
4237-9b44-b444bfdb80a0, number 6
2017-05-30 10:16:11 INFO storage_manager.py:641 Retrieving the configured volume group 11bbfd7d-
8882-4237-9b44-b444bfdb80a0
2017-05-30 10:16:11 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:11 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:11 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:11 INFO storage_manager.py:411 Retrieving the configured volume group dd70c208-
12aa-4310-ac9d-673bdac07259
2017-05-30 10:16:11 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:11 INFO storage_manager.py:411 Retrieving the configured volume group dd70c208-
12aa-4310-ac9d-673bdac07259
2017-05-30 10:16:11 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:11 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:11 INFO storage_manager.py:623 Max possible target for vg_uuid dd70c208-12aa-
4310-ac9d-673bdac07259, number 6
2017-05-30 10:16:11 INFO storage_manager.py:641 Retrieving the configured volume group dd70c208-
12aa-4310-ac9d-673bdac07259
2017-05-30 10:16:11 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:11 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:12 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:12 INFO storage_manager.py:411 Retrieving the configured volume group c6681eb9-
8a75-4cf9-a2e4-b378e2ced1b6
2017-05-30 10:16:12 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:12 INFO storage_manager.py:411 Retrieving the configured volume group c6681eb9-
8a75-4cf9-a2e4-b378e2ced1b6
2017-05-30 10:16:12 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:12 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:12 INFO storage_manager.py:623 Max possible target for vg_uuid c6681eb9-8a75-
4cf9-a2e4-b378e2ced1b6, number 6
2017-05-30 10:16:12 INFO storage_manager.py:641 Retrieving the configured volume group c6681eb9-
8a75-4cf9-a2e4-b378e2ced1b6
2017-05-30 10:16:12 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:12 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:12 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:12 INFO storage_manager.py:411 Retrieving the configured volume group 732ed7e0-
f44f-400b-9872-7fda49571778
2017-05-30 10:16:13 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:13 INFO storage_manager.py:411 Retrieving the configured volume group 732ed7e0-
f44f-400b-9872-7fda49571778
2017-05-30 10:16:13 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:13 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:13 INFO storage_manager.py:623 Max possible target for vg_uuid 732ed7e0-f44f-
400b-9872-7fda49571778, number 6
2017-05-30 10:16:13 INFO storage_manager.py:641 Retrieving the configured volume group 732ed7e0-
f44f-400b-9872-7fda49571778
2017-05-30 10:16:13 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:13 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:13 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:13 INFO storage_manager.py:411 Retrieving the configured volume group 953300f2-
155e-4328-a8f6-ea1e522a7ccd
2017-05-30 10:16:13 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:13 INFO storage_manager.py:411 Retrieving the configured volume group 953300f2-
155e-4328-a8f6-ea1e522a7ccd
2017-05-30 10:16:13 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:13 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:13 INFO storage_manager.py:623 Max possible target for vg_uuid 953300f2-155e-
4328-a8f6-ea1e522a7ccd, number 6
2017-05-30 10:16:13 INFO storage_manager.py:641 Retrieving the configured volume group 953300f2-
155e-4328-a8f6-ea1e522a7ccd
2017-05-30 10:16:13 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:13 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:13 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:13 INFO storage_manager.py:411 Retrieving the configured volume group 4c070990-
4eae-497f-897e-4a63828a4138
2017-05-30 10:16:14 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:14 INFO storage_manager.py:411 Retrieving the configured volume group 4c070990-
4eae-497f-897e-4a63828a4138
2017-05-30 10:16:14 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:14 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:14 INFO storage_manager.py:623 Max possible target for vg_uuid 4c070990-4eae-
497f-897e-4a63828a4138, number 6
2017-05-30 10:16:14 INFO storage_manager.py:641 Retrieving the configured volume group 4c070990-
4eae-497f-897e-4a63828a4138
2017-05-30 10:16:14 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:14 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:14 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:14 INFO storage_manager.py:411 Retrieving the configured volume group a038f292-
c981-4fe5-8d20-b44382f23feb
2017-05-30 10:16:14 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:14 INFO storage_manager.py:411 Retrieving the configured volume group a038f292-
c981-4fe5-8d20-b44382f23feb
2017-05-30 10:16:14 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:14 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:14 INFO storage_manager.py:623 Max possible target for vg_uuid a038f292-c981-
4fe5-8d20-b44382f23feb, number 6
2017-05-30 10:16:14 INFO storage_manager.py:641 Retrieving the configured volume group a038f292-
c981-4fe5-8d20-b44382f23feb
2017-05-30 10:16:14 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:14 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:14 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:14 INFO storage_manager.py:411 Retrieving the configured volume group b466e86d-
0837-4422-8ad8-94263440842a
2017-05-30 10:16:15 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:15 INFO storage_manager.py:411 Retrieving the configured volume group b466e86d-
0837-4422-8ad8-94263440842a
2017-05-30 10:16:15 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:15 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:15 INFO storage_manager.py:623 Max possible target for vg_uuid b466e86d-0837-
4422-8ad8-94263440842a, number 6
2017-05-30 10:16:15 INFO storage_manager.py:641 Retrieving the configured volume group b466e86d-
0837-4422-8ad8-94263440842a
2017-05-30 10:16:15 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:15 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:15 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:15 INFO storage_manager.py:411 Retrieving the configured volume group 8f2e1f4e-
6c6c-4f8a-8d3a-f04d51de545c
2017-05-30 10:16:15 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:15 INFO storage_manager.py:411 Retrieving the configured volume group 8f2e1f4e-
6c6c-4f8a-8d3a-f04d51de545c
2017-05-30 10:16:15 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:15 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:15 INFO storage_manager.py:623 Max possible target for vg_uuid 8f2e1f4e-6c6c-
4f8a-8d3a-f04d51de545c, number 6
2017-05-30 10:16:15 INFO storage_manager.py:641 Retrieving the configured volume group 8f2e1f4e-
6c6c-4f8a-8d3a-f04d51de545c
2017-05-30 10:16:15 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:15 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:16 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:16 INFO storage_manager.py:411 Retrieving the configured volume group c5f1bc91-
4990-4cd7-ad4b-c2e7e4393c93
2017-05-30 10:16:16 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:16 INFO storage_manager.py:411 Retrieving the configured volume group c5f1bc91-
4990-4cd7-ad4b-c2e7e4393c93
2017-05-30 10:16:16 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:16 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:16 INFO storage_manager.py:623 Max possible target for vg_uuid c5f1bc91-4990-
4cd7-ad4b-c2e7e4393c93, number 6
2017-05-30 10:16:16 INFO storage_manager.py:641 Retrieving the configured volume group c5f1bc91-
4990-4cd7-ad4b-c2e7e4393c93
2017-05-30 10:16:16 INFO storage_manager.py:529 Get a client to the local acropolis server
2017-05-30 10:16:16 INFO storage_manager.py:540 Checking if the volume groups are already present
2017-05-30 10:16:16 INFO storage_manager.py:282 creating vdisks and attaching to volume group
2017-05-30 10:16:16 INFO storage_manager.py:411 Retrieving the configured volume group e3dee4d6-
b939-4c14-af72-c19d2be0882d
2017-05-30 10:16:16 INFO storage_manager.py:307 Volume Disk Created and attached to VG.
2017-05-30 10:16:16 INFO storage_manager.py:411 Retrieving the configured volume group e3dee4d6-
b939-4c14-af72-c19d2be0882d
2017-05-30 10:16:16 INFO storage_manager.py:184 Added VDisks for SSD to the volume group
2017-05-30 10:16:16 INFO storage_manager.py:595 Not creating any Vdisk to pin to SSD
2017-05-30 10:16:16 INFO storage_manager.py:623 Max possible target for vg_uuid e3dee4d6-b939-
4c14-af72-c19d2be0882d, number 6
2017-05-30 10:16:16 INFO storage_manager.py:641 Retrieving the configured volume group e3dee4d6-
b939-4c14-af72-c19d2be0882d
2017-05-30 10:16:17 INFO minerva_task_util.py:1505 ShareAddTask: 7 : Updated wal with status 400
2017-05-30 10:16:17 INFO minerva_task_util.py:1505 ShareAddTask: 7 : Updated wal with status 500
2017-05-30 10:16:27 INFO minerva_task_util.py:1505 ShareAddTask: 7 : Updated wal with status 600
2017-05-30 10:16:27 INFO minerva_task_util.py:1505 ShareAddTask: 7 : Updated wal with status 700
nutanix@NTNX-16SM13150152-A-CVM:10.30.15.47:~/data/logs$