Vous êtes sur la page 1sur 36

Best Practices in Capacity Planning

Brady Kimball

Cluster Resources, Inc.

Overview
Definition End-User Goals & Practices Administrative Goals & Practices Evaluation Steps Other Moab Features and Policies Q&A

Cluster Resources, Inc.

Capacity Planning
The process of determining the production capacity needed by an organization to meet changing demands for its products. Wikipedia How can I tell when I need more hardware? What existing tools do I have to improve throughput?

Cluster Resources, Inc.

End-User Goals
Increase utilization and throughput Improve application scalability Decrease chance of job failure

Cluster Resources, Inc.

Meeting End-User Goals


Give them feedback! Set expectations correctly Improve job size and duration requests Improve resource requirement specifications Moab has many tools for educating users
showq showbf showstart
Cluster Resources, Inc.

showq
Gives a basic view of Moab's workload Example
> showq active jobs-----------------------JOBID USERNAME STATE PROCS j2483 1 active jobs bob Running 2 REMAINING 00:19:07 STARTTIME Tue May 13 15:23:52

2 of 6 processors in use by local jobs (33.33%) 1 of 3 nodes active (33.33%) WCLIMIT WCLIMIT QUEUETIME QUEUETIME

eligible jobs---------------------JOBID USERNAME STATE PROCS blocked jobs----------------------JOBID USERNAME STATE PROCS

Cluster Resources, Inc.

showbf
Shows resources available for immediate use Example
> showbf -r 16 -d 3:00:00 backFill window (user: 'john' group: 'staff' partition: ALL) Mon Feb 16 08:28:54 partition ALL: 33 procs available with no time limit

Cluster Resources, Inc.

showstart
Shows best guess estimate for job's start time Either job ID of non-running job or procs/duration is required Example
> showstart 12@3600 job 12@3600 requires 12 procs for 1:00:00 Earliest start in 00:01:39 on Wed Aug 31 16:30:45 Earliest completion in 1:01:39 on Wed Aug 31 17:30:45 Best Partition: 32Bit

Cluster Resources, Inc.

Administrator Goals
Maximize use of current resources Insure policies are being met Insure each project is getting its share of the resources Understand when hardware upgrades are needed Minimize the effect of maintenance on other workload

Cluster Resources, Inc.

Meeting Administrator Goals


Provide statistics to highlight policy problems and hardware failures Tweak Moab policies to manage system strain Simulate changes to the supply of resources Provide logging and notification mechanisms

Cluster Resources, Inc.

Statistics Tools
Command line: showstats, showstats -f, showhist Graphical: Moab Cluster Manager

Cluster Resources, Inc.

Statistics Setup
Credential statistics
Users, groups, accounts, classes/queues, QoSs

Node statistics Job template statistics


##moab.cfg USERCFG[DEFAULT] ENABLEPROFILING=TRUE NODECFG[DEFAULT] ENABLEPROFILING=TRUE JOBCFG[large.min] JOBCFG[large.set] JOBMATCHCFG[large] TASKS=32 PRIORITY=+10000 JMIN=large.min

JSET=large.set JSTAT=large

Cluster Resources, Inc.

showstats Scheduler Example


> showstats -s moab active for 1:23:07:08 stats initialized on Fri May 0/10 5 3258/4083 0.74/114.54 1.74/6873.52 3.24K/3.37K 17/20 (0.000%) (79.794%) 9 09:40:45

Eligible/Idle Jobs: Active Jobs: Successful/Completed Jobs: Avg/Max QTime (Hours): Avg/Max XFactor: Dedicated/Total ProcHours: Current Active/Total Procs: Avg WallClock Accuracy: Avg Job Proc Efficiency: Est/Avg Backlog:

(96.239%) (85.000%)

58.377% 100.000% 00:12:38/INFINITY

Cluster Resources, Inc.

showstats Account Example


Efficiency will show the total processor utilization for that account Wallclock accuracy is the percent of wallclock used
> showstats -a statistics initialized Mon Mar 24 10:49:03 acct Engineering Research Administration Shared Test |------ Active ------| |--------- Completed ---------| Jobs Procs ProcHours Jobs % ... Effic WCAcc 26 1 2 0 0 240 30 10 0 0 37731.98 1122.56 249.91 0.00 0.00 657 643 425 122 89 34.06 ... 75.21 33.33 ... 98.99 22.03 ... 20.75 6.32 ... 89.78 4.61 ... 95.21 93.13 78.90 100.00 52.12 100.00

Cluster Resources, Inc.

Review Fairshare
Is each project getting what was promised to it? Are projects receiving more than their share and why?
> mdiag -f -o acct FairShare Information Depth: 8 intervals Interval Length: 12:00:00 Decay Rate: 1.00

FS Policy: DEDICATEDPS System FS Settings: Target Usage: 0.00 FSInterval FSWeight TotalUsage ACCT ------------Test Research Administration Engineering Shared % Target ------- ------100.00 ------0 1.0000 1531.8

Flags: 0

6.54 29.80 20.59 30.46 12.61

5.00 15.00 5.00 65.00 10.00

6.54 29.80 20.59 30.46 12.61

Cluster Resources, Inc.

Inspect Projects' Usage


If the expected metrics for a project is off, inspect the jobs and the nodes used for the project. Look for a common pattern in submission Graph the project's metrics, jobs, and nodes Use showhist in the Moab tools directory showhist [-a account_name] [-c|-q class_name] [-g group_name] [-q qos_name] [-u user_name] [-n days] [-s start date] [-e end date] [--help] [--man] [[-j] <job id>]

Cluster Resources, Inc.

showhist example
Retrieve node information from the job history of a project
> /usr/local/tools/showhist.moab.pl -a Engineering Job Id : User Name : Group Name : Account Name : Queue Name : Quality Of Service: Processor Count : Wallclock Duration: Submit Time : Start Time : End Time : Allocated Nodelist: j3324 user1 group1 Engineering batch Deadline 2 12:00:00:00 Mon May 15 18:04:30 2008 Mon May 16 01:10:36 2008 Mon May 26 07:12:38 2008 node001

Cluster Resources, Inc.

Node Graph Example

Cluster Resources, Inc.

Node Categorization
Set FORCERSVSUBTYPE to TRUE if you want to enforce use of a category Route category information through reservation creation Graphical charts show time-based and integral of time selected
> mrsvctl -c -b NetworkFailure -h node001,node002,node003,node004

Cluster Resources, Inc.

Node Categorization Charts

Cluster Resources, Inc.

Visual Cluster Diagnostics and Statistics


The visual cluster concisely gives the status of each node Visual Cluster uses Diagnose failures View node attributes View jobs on Nodes View credentials View reservations and more

Cluster Resources, Inc.

Node Attributes
Many attributes pertaining to capacity planning Built-in hardware information Generic metric hooks

Cluster Resources, Inc.

Graphing Node Attributes

Cluster Resources, Inc.

Graphing Node Attributes Based On Node State

Cluster Resources, Inc.

Graphing Node Attributes Based on Current Load

Cluster Resources, Inc.

Graphing Node Attributes Based on Historical Load

Cluster Resources, Inc.

Capacity Planning Charts

Cluster Resources, Inc.

Matrix Statistics
showstats -f <STATISTIC_TYPE> Capacity planning metrics include job efficiency, wallclock accuracy, QOS met, and others

Cluster Resources, Inc.

Other areas of inspection


Moab's logs ($MOABHOMEDIR/log) Moab's mdiag commands Event notifications
RM Failure Messages Generic Events Triggers

Cluster Resources, Inc.

Moab Events
If the command line and graphical tools aren't sufficient, inspect Moab's events Configurable via RECORDEVENTLIST $MOABHOMEDIR/stats/events.* files Tool included with Moab: ACCOUNTINGINTERFACEURL and showevents

Cluster Resources, Inc.

RECORDEVENTLIST
The default list is not the complete list Generic events placed on nodes can be viewed

# moab.cfg RECORDEVENTLIST JOBSTART, JOBCANCEL, JOBCOMPLETE, JOBFAIL, RSVCREATE, RSVSTART, RSVEND, SCHEDPAUSE, SCHEDSTART, SCHEDSTOP,NODEDOWN, NODEFAILURE, NODEUP

Cluster Resources, Inc.

ACCOUNTINGINTERFACEURL
Gets the events specified in the RECORDEVENTLIST in real-time The event being logged is passed to the URL as it's stdin Usually the exec protocol is used so a script can handle processing of the event
# moab.cfg ACCOUNTINGINTERFACEURL exec:///$TOOLSDIR/dumpacc.pl

Cluster Resources, Inc.

showevents example
Displays events found in the events files Part of Moab's tools directory Should be last effort to get the information you need
> /usr/local/tools/showevents.pl -n 4 Thu May 29 11:14:24 2008 job j6062 JOBSTART 11:14:23 1212081264 job j6062 JOBSTART 0 4 eng1 nobody 3600 Idle [NONE] 1212079311 1212081264 1212081264 1212081264 - >= 0M >= 0M - 1212079311 4 0 -:Standard - Engineering 9 0.00 ALL 1 0M 0M 0M 0 2140000000 node009,node009,node008,node008 local - [DEFAULT] - - 0.00 - - - 0 - -

Cluster Resources, Inc.

Policy Evaluation
Simulation Mode Monitor Mode Side-by-Side Mode

Moab

MoabMonitor

Production Scheduler

Moab

Moab

Trace Files

RM

RM

Cluster

Cluster

Cluster Resources, Inc.

What Have We Learned?


The more information given to Moab, the better Educate the users of your system Use the reporting tools within Moab to find points of failure and inefficiency Attempt to tweak policies for that last mile effort

Cluster Resources, Inc.

Discussion

Cluster Resources, Inc.

Vous aimerez peut-être aussi