Académique Documents
Professionnel Documents
Culture Documents
Brady Kimball
Overview
Definition End-User Goals & Practices Administrative Goals & Practices Evaluation Steps Other Moab Features and Policies Q&A
Capacity Planning
The process of determining the production capacity needed by an organization to meet changing demands for its products. Wikipedia How can I tell when I need more hardware? What existing tools do I have to improve throughput?
End-User Goals
Increase utilization and throughput Improve application scalability Decrease chance of job failure
showq
Gives a basic view of Moab's workload Example
> showq active jobs-----------------------JOBID USERNAME STATE PROCS j2483 1 active jobs bob Running 2 REMAINING 00:19:07 STARTTIME Tue May 13 15:23:52
2 of 6 processors in use by local jobs (33.33%) 1 of 3 nodes active (33.33%) WCLIMIT WCLIMIT QUEUETIME QUEUETIME
eligible jobs---------------------JOBID USERNAME STATE PROCS blocked jobs----------------------JOBID USERNAME STATE PROCS
showbf
Shows resources available for immediate use Example
> showbf -r 16 -d 3:00:00 backFill window (user: 'john' group: 'staff' partition: ALL) Mon Feb 16 08:28:54 partition ALL: 33 procs available with no time limit
showstart
Shows best guess estimate for job's start time Either job ID of non-running job or procs/duration is required Example
> showstart 12@3600 job 12@3600 requires 12 procs for 1:00:00 Earliest start in 00:01:39 on Wed Aug 31 16:30:45 Earliest completion in 1:01:39 on Wed Aug 31 17:30:45 Best Partition: 32Bit
Administrator Goals
Maximize use of current resources Insure policies are being met Insure each project is getting its share of the resources Understand when hardware upgrades are needed Minimize the effect of maintenance on other workload
Statistics Tools
Command line: showstats, showstats -f, showhist Graphical: Moab Cluster Manager
Statistics Setup
Credential statistics
Users, groups, accounts, classes/queues, QoSs
JSET=large.set JSTAT=large
Eligible/Idle Jobs: Active Jobs: Successful/Completed Jobs: Avg/Max QTime (Hours): Avg/Max XFactor: Dedicated/Total ProcHours: Current Active/Total Procs: Avg WallClock Accuracy: Avg Job Proc Efficiency: Est/Avg Backlog:
(96.239%) (85.000%)
Review Fairshare
Is each project getting what was promised to it? Are projects receiving more than their share and why?
> mdiag -f -o acct FairShare Information Depth: 8 intervals Interval Length: 12:00:00 Decay Rate: 1.00
FS Policy: DEDICATEDPS System FS Settings: Target Usage: 0.00 FSInterval FSWeight TotalUsage ACCT ------------Test Research Administration Engineering Shared % Target ------- ------100.00 ------0 1.0000 1531.8
Flags: 0
showhist example
Retrieve node information from the job history of a project
> /usr/local/tools/showhist.moab.pl -a Engineering Job Id : User Name : Group Name : Account Name : Queue Name : Quality Of Service: Processor Count : Wallclock Duration: Submit Time : Start Time : End Time : Allocated Nodelist: j3324 user1 group1 Engineering batch Deadline 2 12:00:00:00 Mon May 15 18:04:30 2008 Mon May 16 01:10:36 2008 Mon May 26 07:12:38 2008 node001
Node Categorization
Set FORCERSVSUBTYPE to TRUE if you want to enforce use of a category Route category information through reservation creation Graphical charts show time-based and integral of time selected
> mrsvctl -c -b NetworkFailure -h node001,node002,node003,node004
Node Attributes
Many attributes pertaining to capacity planning Built-in hardware information Generic metric hooks
Matrix Statistics
showstats -f <STATISTIC_TYPE> Capacity planning metrics include job efficiency, wallclock accuracy, QOS met, and others
Moab Events
If the command line and graphical tools aren't sufficient, inspect Moab's events Configurable via RECORDEVENTLIST $MOABHOMEDIR/stats/events.* files Tool included with Moab: ACCOUNTINGINTERFACEURL and showevents
RECORDEVENTLIST
The default list is not the complete list Generic events placed on nodes can be viewed
# moab.cfg RECORDEVENTLIST JOBSTART, JOBCANCEL, JOBCOMPLETE, JOBFAIL, RSVCREATE, RSVSTART, RSVEND, SCHEDPAUSE, SCHEDSTART, SCHEDSTOP,NODEDOWN, NODEFAILURE, NODEUP
ACCOUNTINGINTERFACEURL
Gets the events specified in the RECORDEVENTLIST in real-time The event being logged is passed to the URL as it's stdin Usually the exec protocol is used so a script can handle processing of the event
# moab.cfg ACCOUNTINGINTERFACEURL exec:///$TOOLSDIR/dumpacc.pl
showevents example
Displays events found in the events files Part of Moab's tools directory Should be last effort to get the information you need
> /usr/local/tools/showevents.pl -n 4 Thu May 29 11:14:24 2008 job j6062 JOBSTART 11:14:23 1212081264 job j6062 JOBSTART 0 4 eng1 nobody 3600 Idle [NONE] 1212079311 1212081264 1212081264 1212081264 - >= 0M >= 0M - 1212079311 4 0 -:Standard - Engineering 9 0.00 ALL 1 0M 0M 0M 0 2140000000 node009,node009,node008,node008 local - [DEFAULT] - - 0.00 - - - 0 - -
Policy Evaluation
Simulation Mode Monitor Mode Side-by-Side Mode
Moab
MoabMonitor
Production Scheduler
Moab
Moab
Trace Files
RM
RM
Cluster
Cluster
Discussion