Vous êtes sur la page 1sur 3

Identify:

Identifying Issue: Can you describe the problem precisely? I want to know the
exact symptoms of the problem.
Timing: Did it happen once or many times? In certain business window?!
Location: Is it system-wide or localized? Like is I in the all hosts in the cluster?
Single host?
Single datastore? What are in common between these VMs that have the
issue? I need to eliminate the areas where the problem doesnt exist
Apparent Reason: Did you made any change that lead to that problem? I want
to identify the things that may cause the problem.
o Why you made that change?
o Verifying Reason: Can you revert that change back? I want to verify if
its the only cause of the problem. Kindly also try to document each
step of reverting back the change and the exact step when you the
error disappear. As I want to pin-point if certain configuration or step
lead to that error.
Collecting Evidences: Did you open any ticket with VMware support? Any hint
can help us to point at certain layer instead of investigating the whole layers.
o Do you have any alarm/event recorded? Any hint can help us to point
to certain layer instead of investigating the whole layers.
o Do you have any screenshots from hosts or guest Oss or any graphs
extracted from network/storage HW?
o Is your solution is compatible with vSphere version? Any noncompatibility may lead to unwanted results or problems.
At the end, you may have one or two layers to investigate. Jump ahead to
these layers using either top-down approach or bottom-top approach. DONT
JUMP AROUND LAYERS, even if you managed to find the solution

Compute:
Extract the compute layer diagram from them and draw it.
Tell me if the cluster used is DRS-balanced or not? Is there any DRS faults or
non-applied recommendations? If theres unbalance, that menas theres
contention on some hosts while others are not utilized.
What is your HA policy? What type of Admission control? If Admission Control
is set to high value, that may prevent powering on VMs.
Can you tell me if theres resource pools or not? If yes, tell me their settings.
Mis-configured settings on resource pool can lead to VMs contention.
Can we launch ESXTOP on one of the ESXi hosts where the problem exists?
ESXTOP will give us more insight in CPU/Memory usage on host.
ESXTOP VALUE
No. of vCPUs

Threshold
NA

CPU LOAD Avg.

NA

USD%

NA

meaning
May indicate over-provisioning in CPU
(Look to RDY% and CSTP%)
May indicate CPU Contention (look at
USD%)
High CPU Utilization

RUN%
SYS%

NA
20%

RDY%

10% max.

MLMTD%
WAIT%

1%
20-30%

IDLE%

20-30%

SWPWT%

5%

CSTP%

3%

USD%= %RUN+SYS% - OVRLP%


High CPU Usage of a VM
Indicate high IO, as VMKernel is
spending this CPU time waiting on
behalf of VM
Indicate high CPU Overcommitment,
high usage of vCPUs without need or
CPU limit.
Refer to MLMTD%, CSTP%, SWPWT%,
WAIT%, IDLE% and CPU Load Avg.
its value per vCPU
Indicates an artificial LIMIT on CPU.
Indicates either VM is idle most of
time and hence excessive use of
vCPUs or if WAIT%-IDLE% is high, it
may indicate a latency somewhere
due to high wait time of Guest OS for
IO or high wait time for swapped
pages to be read from guest. Refer to
either IDLE% or SWPWT%.
its value per vCPU
WAIT% includes IDLE%+SWPWT%
+Blocked
WAIT%-IDLE%=VMWAIT%
Indicates high use of vCPUs without
need.
its value per vCPU
Indicates high swap rate and memory
overcommitment.
Indicates excessive use of vCPUs,
and hence, high wait time for fast
executed vCPUs for other slow vCPUs
of certain VM.

Memory:
ESXTOP VALUE
PMEM

Threshold
NA

meaning
Displays the machine memory
statistics for the server. All numbers
are in megabytes.
total Total amount of machine
memory in the server.
vmk Amount of machine memory
being used by the ESXi VMkernel.
other Amount of machine memory
being used by everything other than
the ESXi VMkernel. free Amount of
machine memory that is free

CPU LOAD Avg.

NA

USD%

NA

RUN%
SYS%

NA
20%

RDY%

10% max.

MLMTD%
WAIT%

1%
20-30%

IDLE%

20-30%

SWPWT%

5%

CSTP%

3%

May indicate CPU Contention (look at


USD%)
High CPU Utilization
USD%= %RUN+SYS% - OVRLP%
High CPU Usage of a VM
Indicate high IO, as VMKernel is
spending this CPU time waiting on
behalf of VM
Indicate high CPU Overcommitment,
high usage of vCPUs without need or
CPU limit.
Refer to MLMTD%, CSTP%, SWPWT%,
WAIT%, IDLE% and CPU Load Avg.
its value per vCPU
Indicates an artificial LIMIT on CPU.
Indicates either VM is idle most of
time and hence excessive use of
vCPUs or if WAIT%-IDLE% is high, it
may indicate a latency somewhere
due to high wait time of Guest OS for
IO or high wait time for swapped
pages to be read from guest. Refer to
either IDLE% or SWPWT%.
its value per vCPU
WAIT% includes IDLE%+SWPWT%
+Blocked
WAIT%-IDLE%=VMWAIT%
Indicates high use of vCPUs without
need.
its value per vCPU
Indicates high swap rate and memory
overcommitment.
Indicates excessive use of vCPUs,
and hence, high wait time for fast
executed vCPUs for other slow vCPUs
of certain VM.

Vous aimerez peut-être aussi