Vous êtes sur la page 1sur 49

QCT Clustered Computing using LSF

An Introduction to LSF for QCT Engineering


revision 1.19 (1/27/2006)
2
About LSF
Platform LSF (load sharing facility) software enables efficient use of
resources through one common interface.
LSF acts as the glue to make license resources, host, CPU, operating
system, memory, and other resources available for jobs.
Without LSF everyone would have to manually coordinate who gets
access to which resources.
LSF acts as a master scheduler and coordinates access to resources
based on management defined policy.
LSF balances the workload across the compute pools.
User submit jobs to LSF system which uses priorities, jobs requirements,
and resource availability to determine where to place the job.
3
About LSF
QCT runs LSF to:
Dynamically add resources based on customer need. (hosts, memory, etc.)
Allow projects with aggressive tapeout schedules to get priority access to
resources.
Provide a seamless compute environment to QCT engineers.
Track resource usage.
Additional future uses:
High availability restart failed jobs automatically
Manage licenses between groups
Use idle resources at remote sites
4
What LSF Does
LSF
Workstations (Solaris, HPUX, Linux) Servers (Unix, HPUX, Linux)
License Availability Free memory
Idle Time Number of CPUs
Host Status Disk I/O Rate
Available Disk Space
Collects information on status of all resources in cluster
(resource can be any hardware or software entity).
5
What LSF Does
Workstations (Unix, HPUX, Linux) Servers (Unix, HPUX, Linux)
vsim
BuildGates
Calibre Verilog
vcs
Design
Compiler
LSF
-SDDR
-High Road
-Fast I.O.
Chooses ideal, available & appropriate resource to process the
computational job (i.e., specific hardware, O/S & software license).
6
QCT USA LSF Architecture
Campbell
75 hosts
San Diego
1,800 hosts
Raleigh
200 hosts
Austin
50 hosts
7
Overview of LSF Job Flow
Submit
Job
Gather
Job Data
Analyze
Output
LSF Cluster
E-Mail
App Log
LSF Queue
Monitor
Job
Job Done
Command
Wrapper
bsub
8
Basic Job Flow
Gather job requirements
Submit job into LSF
Job will PEND in a queue until resources for job are available
When resources available, job will be dispatched to best available
execution host(s)
LSF will re-create your environment on the execution host(s) exactly as it
was when the job was submitted, then start the job.
User can monitor job while running
When job is complete, results sent to user out written to output file
9
LSF Terminology
Server - Hosts where jobs run.
Can submit, query, and execute jobs.
Client - Does NOT execute or run LSF jobs.
Can only submit and query jobs. (login servers, sunray servers)
Queues Container for jobs. (Like checkout lines at market)
Cluster name ICEng for San Diego, CAM for Campbell, RTP for
Raleigh, AUS for Austin, TEST for San Diego test cluster
Job Command submitted to LSF for execution. LSF schedules,
controls, and tracks the job according to configured policy.
Job Slot Bucket into which a single unit of work is assigned. Each
host has a configured number of job slots. (We configure 1 job slot per
physical processor).
10
Gathering Data
Before submitting jobs to LSF, gather data about the program you want
to submit.
What operating system does it best run on? (Linux, Linux 64-bit, Solaris)
Which project is this job for?
How much memory does the job require?
Which license features does your job need to run?
Can your job run in parallel across multiple machines / processors?
Can your job be broken into logical, smaller chunks for faster processing?
Check the qct tool dependency database:
http://qctweb1.qualcomm.com/Resources/Dependency
Many wrappers may already code job submission with proper resource
requests.
11
Submitting to LSF Which Queue?
You submit jobs into an LSF queue, which specifies user access and
priorities to resources.
SunRay and MFU users have a specialized queue - app
priority/priority_linux urgent or final tapeout related jobs.
normal/linux regular jobs
regression_queue regression jobs
A few application queues (hspice, smartest, etc.) also exist.
Use bqueues w for a listing of queues and data -or-
bqueues l {queue_name} for details on a queue and the list admin
12
Submitting to LSF Which Queue?
On a sunray or mfu server (must submit all jobs through LSF)
Interactive design work, interactive simulations use app queue
Priority queues (priority, priority_linux to be combined)
Regular queues (normal, linux, idle )
On desktop workstation (recomended to submit all jobs through LSF)
Regular queues (normal, linux, idle, etc.)
Find authorized queues: Bqueues u myusername
13
LSF Queues
app interactive jobs for users with a sunray or mfu session
priority, priority_linux - urgent solaris or linux jobs or final tapeout related
work.
normal/linux regular queues
idle desktop solaris hosts only (this is the default queue, lowest priority)
short jobs > 30 minutes will be killed automatically (sparc only)
hspice hspice jobs only
smartest special queues for teradyne tester group
night jobs submitted in this queue only run at night
regression_queue regression jobs
14
Access to resources
Access to LSF resources is determined by Engineering Management.
QCT Engineering Services only implements the policy given to us.
-normal, linux, idle, night queues accessable to all QCT engineers.
-app queue only accessable from sunray or MFU servers
For project queues ask the queue administrator for access
bqueues l {queue_name} and look for the ADMINISTRATORS line.
15
Submitting your job - bsub
The bsub command submits that job to LSF for processing.
your_host> bsub q {queue name} {resource / options} {command}
With bsub, you define the resources and options needed for your job to
run.
Many wrappers produce the desired bsub command with appropriate
resource options.
You can also specify job run options, like output files and suppressing
LSF e-mail notification.
LSF allows for parallel job processing.
LSF dispatches the job to the host that best matches your request. The
more restrictive your resource request, the longer it will take to place the
job.
LSF dispatches to hosts until the job slots are full, or resources are
peaked (cpu loaded, no available application licenses, etc.)
16
QCT LSF Compute Policy
All compute intensive jobs that use licenses from EDA tools use LSF
No camping out on LSF compute servers
Jobs submitted through LSF must specify necessary license resource
http://qctes.qualcomm.com/twiki/bin/view/QCTES/UsagePolicy
17
Submitting your job
The bsub command is broken into several sections:
LSF options:
-q {queue_name} to specify queue to dispatch job to.
-o for output file designation to write job output.
-n for processor spanning / parallel process jobs.
-I | -Ip | -Is for interactive jobs. (pseudo-terminal and pseudo-terminal with
shell support)
Lots of other options in product documentation.
Resource request strings: -R specifying select and rusage statements.
Select specifies the characteristics a host must have to be considered a
potential execution host.
Rusage indicates that when the host is selected, what resources are required
for the job to run on the host.
Path & command to run.
18
bsub examples
bsub q eagle_queue Ip myjobscript
bsub q dove_queue R
select[(realpower >=1) && (type==SPARC)]
rusage [mem=1000, realpower=1:duration=1] myjob
bsub R select [select-string] rusage[rusage-string] ...
Specify the requirements your job needs to run and let LSF place the job.
If you dont select any resources, your job will be dispatched to any
available host in the queue you specify.
Queues are mixed architecture. Specify the resources needed for your
application.
For multi-processor jobs: bsub n 4 (4 cpus required)
bsub n 4 q eagle_queue myjob
19
Common Resource Select Options
bsub q saber_queue R select[ compute ] myjob
Run job only on solaris compute server (host in computer server room)
Use compute when you dont want your job to suspend.
bsub q idle R select[ desktop ] myjob
Run job only on a desktop workstation (may be suspended)
bsub R select[ type==SPARC ] myjob
Run job on a solaris host
busb R select[ (type==SPARC) && (os_version==2.8) ] myjob
Run job only on solaris 2.8 host
bsub R select[ type==LINUX ] myjob
Run job on an x86 (32 bit) host running linux
bsub R select[ (type==LINUX) || (type==SPARC) ] myjob
Run job on either solaris or linux
20
Common Resource Select Options
Old New Meaning
os2_8 (type==SPARC) && (os_version==2.8) solaris 2.8
linux (type==LINUX) linux (x86_32)
linux (type==LINUX64) opteron (x86_64)
linux (type==LINUXIA64) linux (ia64)
desktop solaris sun workstation in office
compute solaris sun server in server room
mem free memory (dynamic changes depending on allocated memory)
maxmem total memory installed on host (static does not change)
cpuf Do not use this resource
21
Common Resource Errors
Do NOT use compute resource when submitting to linux
queues
Do NOT use compute resource when submitting to app
queue.
Do NOT use os2_8 resource when submitting to linux
queues
In the future we may have script in place to monitor users for submitting
jobs which with invalid resoure combinations.
Also working with Platform to prevent submission of jobs with invalid
resource combinations.
22
Rusage Options
Rusage virtually reserve the specified resources for your job
rusage[xsim=1:duration=1, mem=10000]
rusage[xsim=1:duration=1, ysim=1:duration=1, zsim=1:duration=1]
LSF does not check out a license from the license server only your
application can do it. LSF scheduler will virtually reserves the license
resource for the specified duration to give your application time to start
and check out the license itself. This prevents new jobs from being
dispatched and taking the license.
LSF will monitor your job and decrement the reserved memory by the
actual memory used by the job.
23
Rusage Memory and Licenses
Ensure your job gets the necessary memory with rusage.
bsub R rusage[mem = 16000] myjob.
All jobs requiring large amounts of memory should use rusage or else
another job could place on the host and cause your job to crash.
Be sure to use mem in your rusage (NOT maxmem)
24
Job submission
Submit with -Is (Interactive shell support)
Creates a pseudo terminal with shell mode support (handles CTRL-C
and CTRL-Z, properly) and sends output to terminal.
(type==SPARC) || (type==LINUX)
Candidate execution hosts can be either solaris or linux.
Shell limits for long running simulations, your wrapper may need to
unlimit settings from the shell. unlimit cputime unlimit datasize
25
Prohibited Job Submission
All jobs should be submitted through LSF batch system via bsub
Not allowed: Submitting xterms or shells to project queues or compute
servers. (Only allowed in app queues.)
Never allowed: Use of lsrun, lsgrun, lslogin, or ch
Users running xterms or shells tcsh/csh/bash on compute hosts may
cause your jobs to crash or may cause other jobs to crash. Any
processes launched from the shell may steal the cpu, memory, etc from
another process.
26
Checking job status
Once your job is submitted, you can check on the status of the job with
the bjobs command.
Command by itself will show you your jobs.
bjobs u all will show all jobs in system.
bjobs u all q {queue_name} shows all jobs in that queue.
bjobs u all m {hostname / host group} shows all user jobs on a single host
or host group.
A job can be in several states:
Pend Not yet started
Run Job is running
USUSP suspend by user or admin
SSUSP suspended from threshold, run window
DONE job completed
EXIT job completed non-zero status
27
Checking job status
If your job is pending, it can be pending for any number of causes. You
can do a bjobs lp {job_ID} for a detailed description on why its pending.
The more restrictive your select and rusage statements, the more time it will
take LSF to find a candidate host for you.
LSF does not validate the logic in your select or rusage statements. So, if
you request more memory than is available on any LSF host, your job will
continually pend.
Your job may also pend because a license feature is not available for the
job.
Your job may also be pending because you have reached your limit on a
queue, and LSF has throttled the number of jobs it can dispatch for you.
28
Job status commands
bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
1869442 user1 RUN phoenix_qu kenvil compute-spa xterm Sep 15 03:42
1979970 user2 RUN devo_queue bridger compute-spa *c_new1.do Sep 17 10:42
1879711 user3 RUN dora65_que olgly compute-x86 *04.12-SP3 Sep 15 09:20
1895128 user4 RUN raven_queu oreana compute-x86 *libre_drv Sep 15 19:30
1880282 user5 RUN phoenix_qu kamoro compute-x86 *aPlace.sc Sep 15 10:05
1888906 user6 RUN dora65_que sr-san-08 compute-x86 *no_ar.run Sep 15 15:48
1977333 user7 RUN conan_queu sr-san-18 compute-x86 *n_sta.tcl Sep 17 07:58
1893440 user8 RUN phoenix_qu kamoro compute-x86 *ace_topo Sep 15 18:23
2025726 user9 RUN phoenix_qu mfu-san-02 compute-x86 *n SUSE.64 Sep 18 23:01
29
Job status commands
bjobs u all q eagle_queue
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
591912 user1 RUN eagle_queu lisbon strontium */eagle.pt Feb 15 16:47
632827 usre2 RUN eagle_queu adelaide strontium xterm Feb 17 10:18
645065 user3 RUN eagle_queu adelaide strontium xterm Feb 17 15:07
645769 user4 RUN eagle_queu olgly strontium *_9_CLK $* Feb 17 15:20
633299 user5 RUN eagle_queu lisbon rubidium *_oen_dout Feb 17 11:10
633343 user6 RUN eagle_queu harper tsuchiura realpowerX Feb 17 11:20
645642 user7 RUN eagle_queu lisbon evaki *_clk_dout Feb 17 15:1
bjobs lp {job_ID}
Job <636830>, User <johndoe>, Project <default>, Status <PEND>, Queue <linux>,
Job Priority <50>, Command <./gpssim-fast-linux ../Phoenix
_BW.cfg GPS_N20_M176_C12.20_101.cfg>
Tue Feb 17 12:47:41: Submitted from host <ackerly>, CWD </prj/gpsone/systems/si
ms/egps_roc/test_unif_pfa>, Output File </dev/null>;
PENDING REASONS:
User has reached the per-user job slot limit of the queue;
30
Job output
Your job may complete with either a done or exited state.
Done indicates the job ran to successful completion.
Exit indicates it exited for a given reason / exit code.
Exit codes can be LSF or application specific. Common reasons for exit are
unable to obtain a license (license resource not specified in rusage), or
problems with LSF execution host. Re-submit your job or if an unknown exit
code, notify vlsi.unix.help.
When your job completes, you will receive e-mail notification from LSF.
Contains a summary of job output and statistics.
E-mail notification disable with output file (-o) or specifying output file to
/dev/null
Check your application log for further output.
31
LSF User Fairshare
Some queues have fairshare enabled (linux, hspice, regression_queue)
Dynamic priority based upon recently job history and other similar
pending jobs.
Jobs in fairshare queues are not dispatched FCFS.
bqueues l to see user priority in queue.
Updates based upon new and completed jobs.
Allows users who submit small number of jobs to go ahead of others.
32
LSF Host Status
To check the status of LSF hosts, you can run any of the following
commands:
bhosts w This shows the general LSF state of the host, if its accepting
jobs, and what job slots / jobs are running on the host.
lshosts w This shows the architecture type / model of the hosts within LSF,
the CPU factors, number of CPUs memory and defined LSF resources (to
use with a select statement).
Both commands allow for a l {hostname} for detailed status.
EX) bhosts l strontium
Note: Even with w option, lsload will still truncate long hostnames.
33
LSF Host Status
Check on the status of LSF hosts by using the bhosts w command.
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
abbotsford closed_Adm - 1 0 0 0 0 0
ashtabula ok - 1 0 0 0 0 0
beni closed_Full - 1 1 1 0 0 0
beryllium ok - 4 3 3 0 0 0
bray unavail - 1 0 0 0 0 0
jakarta closed_Busy - 2 1 0 1 0 0
The number indicate the maximum, and number of jobs a host can have /
is running and jobs within a particular state.
A host may be in several states:
OK everything on the host is OK and its ready for jobs.
closed_Adm means the host is undergoing sys admin maintenance.
closed_Full | closed_Busy means the host has no open job slots to dispatch
jobs to or theshold has been exceeded.
unavail means LSF cannot communicate with the host.
34
LSF Host Status
lshosts w
lshosts w {hostname}
lshosts w R {resource}
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
abbotsford SPARC Ultra_60_1 12.0 1 512M 839M Yes (desktop os2_8)
ashtabula SPARC Ultra_60_1 12.0 1 1024M 3667M Yes (ikos os2_8)
beryllium SPARC Sun_Fire_480R_4 40.0 4 32768M 4038M Yes (compute os2_8)
compute-ia64-san-001 LINUX64 Itanium2 100.0 4 48777M 8691M Yes ()
mottelson LINUX XEON_2 99.0 2 5940M 4000M Yes (linux linux_2_4)
schwarzes023 LINUX Opteron_246_2 100.0 2 8028M 1027M Yes (linux linux_2_4)
Use any of the information in the output to help correctly format your
bsub select statement.
35
Job Status Commands
After your job is submitted to LSF, you can run a series of tools to control
your job.
The bkill command allows you to terminate your job. This command
terminates the job. If you kill a running job, you will still receive LSF
output with an exit code indicating it was terminated.
EX) bkill {job_ID}
The bmod command allows you to modify your job submission. Its best
used to modify a job that is pending.
When pending, all options of the bsub can be modified.
Only certain options can be modified for a running job, including the resource
requirement string, CPU & memory limits, and job output options.
EX) bmod {bsub options} {job_ID}
36
Job History
LSF keeps a log of all submitted jobs. Several LSF commands pull job
information and summary statistics.
The bhist command shows a detailed history of jobs. It can be used to
get bjobs-like status information in a brief format, or more detailed
information about jobs.
You can specify one or multiple job IDs to report on.
You can specify queues, users, projects, hosts, or host groups that jobs ran
in/on.
You can specify date ranges for when jobs completed, were dispatched, or
started.
LSF keeps job information in memory for 1 hour. If your job is earlier
than that, youll need to specify LSF to check all log files with the n 0
option.
The bhist command can be submitted to LSF just like other jobs, and
youll get the output in e-mail.
37
bhist
bhist {job_ID}
bhsit l {job_ID}
Shows summary job information or detailed job status (by time) for that job.
bhist q {queue_name} u {username} P {project idenfier} l
Will specify similar output for select queues, users, and or projects.
To specify a timeframe, use time format MM/DD/YYYY/HH24:MI in the
following commands:
Dispatched during a specified time: -D{time0},{time1}
Completed or exited during a specified time: -C{time0},{time1}
Started during a specified time: -S{time0},{time1}
All jobs (all states) during a specified time: -T{time0},{time1}
In most cases, the n 0 option will need to be used.
38
bhist output
JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
678129 johndoe *M_FIXED 11 0 60788 0 0 0 60799
678133 johndoe *M_FIXED 22 0 60770 0 0 0 60792
678220 johndoe *M_FIXED 51 0 60183 0 495 0 60729
678233 johndoe *M_FIXED 88 0 51132 0 9497 0 60717
678247 johndoe *M_FIXED 81 0 23972 0 36657 0 60710
Job <678129>, User <johndoe>, Project <default>, Command <#!/bin/bash;/prj/vlsi/ch
eetah/systems/bin/CHEETAH_SIM_FIXED_7.1-linux -ref 2>&1 >
results.0217.232007>
Tue Feb 17 23:20:12: Submitted from host <pauling>, to Queue <linux>, CWD </usr
/local/projects/vlsi/saber/systems/users/nyee/CSIM_PERFORM
ANCE/DL/cfg_cheetah_NM_sttd/test20805_EcIor-12>, Requested
Resources <type=any>;
Tue Feb 17 23:20:23: Dispatched to <skou>;
Tue Feb 17 23:20:23: Starting (Pid 1307);
Tue Feb 17 23:20:23: Running with execution home </usr2/johndoe>, Execution CWD </
usr/local/projects/vlsi/saber/systems/users/johndoe/CSIM_PERF
ORMANCE/DL/cfg_cheetah_NM_sttd/test20805_EcIor-12>, Execut
ion Pid <1307>;
Summary of time in seconds spent in various states by Wed Feb 18 16:17:03
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
11 0 61000 0 0 0 61011
39
Job History
Another command is bacct, which shows summary statistics as well as
job output (like bhist).
It takes the same options as bhist, allowing you to report by:
One or multiple job IDs.
Queues, users, projects, hosts, or host groups that jobs ran in/on.
Date ranges for when jobs completed, were dispatched, or started.
Unlike bhist, bacct will automatically search through all LSF accounting
files.
The bacct command can be submitted to LSF just like other jobs, and
youll get the output in e-mail.
40
bacct command
bacct {job_ID}
bhsit l {job_ID}
Shows summary job information or detailed job status (by time) for that job.
bacct q {queue_name} u {username} P {project idenfier} l
Will specify similar output for select queues, users, and or projects.
To specify a timeframe, use time format MM/DD/YYYY/HH24:MI in the
following commands:
Dispatched during a specified time: -D{time0},{time1}
Completed or exited during a specified time: -C{time0},{time1}
Started during a specified time: -S{time0},{time1}
41
bacct command
Accounting information about jobs that are:
- submitted by all users.
- accounted on all projects.
- completed normally or exited
- completed between Thu Jan 1 00:00:00 2004 and Fri Jan 2 00:00:00
2004 - executed on all hosts.
- submitted to all queues.
------------------------------------------------------------------------------
SUMMARY: ( time unit: second )
Total number of done jobs: 14443 Total number of exited jobs: 1892
Total CPU time consumed: 12806144.0 Average CPU time consumed: 784.0
Maximum CPU time of a job: 401677.1 Minimum CPU time of a job: 0.0
Total wait time in queues: 32101168.0
Average wait time in queue: 1965.2
Maximum wait time in queue:746718.0 Minimum wait time in queue: 0.0
Average turnaround time: 3206 (seconds/job)
Maximum turnaround time: 875735 Minimum turnaround time: 3
Average hog factor of a job: 0.74 ( cpu time / turnaround time )
Maximum hog factor of a job: 1.88 Minimum hog factor of a job: 0.00
Total throughput: 680.95 (jobs/hour) during 23.99 hours
Beginning time: Jan 1 00:00 Ending time: Jan 2 00:00
bacct u all C2004/01/01/00:00,2004/01/02/00:00
42
LSF Analytics
QCT-ES has installed Platform Analytics that captures and reports LSF
and license information.
Provides LSF data reporting as well as license usage reporting.
QCT-IT utilizes this information to:
Predict future usage.
Analyze and fix resource bottlenecks.
Purchase host resources to add to the cluster to make job submission faster.
Understand license usage patterns to purchase or re-mix license features.
Data is available to all QCT employees who are interested.
email analytics for report request
portal accesable from http://lsf.qualcomm.com
QCT-IT provides customized reporting services for CAD management
and chip leads.
43
LSF Analytics - Architecture
LSF Cluster
LSF Cluster
License Servers License usage
Agent &
data file
Agent &
data file
DB (Oracle)
FLEX
lmstat
Cognos /
Web
Reports
San Diego
Other sites
San Diego
ETL
44
LSF Analytics Sample Report
45
Recent Job Memory Trends
Jobs using 4 - 8G of memory
(8/2003-8/15/2004)
0
20
40
60
80
100
120
140
160
7
/
2
7
/
2
0
0
3
8
/
1
0
/
2
0
0
3
8
/
2
4
/
2
0
0
3
9
/
7
/
2
0
0
3
9
/
2
1
/
2
0
0
3
1
0
/
5
/
2
0
0
3
1
0
/
1
9
/
2
0
0
3
1
1
/
2
/
2
0
0
3
1
1
/
1
6
/
2
0
0
3
1
1
/
3
0
/
2
0
0
3
1
2
/
1
4
/
2
0
0
3
1
2
/
2
8
/
2
0
0
3
1
/
1
1
/
2
0
0
4
1
/
2
5
/
2
0
0
4
2
/
8
/
2
0
0
4
2
/
2
2
/
2
0
0
4
3
/
7
/
2
0
0
4
3
/
2
1
/
2
0
0
4
4
/
4
/
2
0
0
4
4
/
1
8
/
2
0
0
4
5
/
2
/
2
0
0
4
5
/
1
6
/
2
0
0
4
5
/
3
0
/
2
0
0
4
6
/
1
3
/
2
0
0
4
6
/
2
7
/
2
0
0
4
7
/
1
1
/
2
0
0
4
7
/
2
5
/
2
0
0
4
8
/
8
/
2
0
0
4
N
u
m
b
e
r

o
f

j
o
b
s
46
QCT-ES Role in LSF
QCT-ES provides help with your LSF job needs. We will troubleshoot
issues you have with running LSF.
We monitor the cluster for performance, improving LSF performance
when needed.
We make modifications to LSF through host adjustments, queue
changes, user changes, and LSF settings.
We physically maintain the hosts within the LSF cluster.
We work with engineers to understand their compute needs and plan
appropriate resources to meet those needs.
We install and monitor license keys / resources.
We work with engineers on their job wrapper needs.
47
LSF Resources
QCT-ES LSF web site, with documentation, vendor docs, and FAQs.
http://lsf.qualcomm.com
LSF man pages: man {lsf_command}
Platform LSF documentation (pdf and html)
Additional LSF slice of knowledge training sessions.
PERL AVL:
http://qctes.qualcomm.com/twiki/bin/view/QCTES/QCTESAvlDocs
On the QCT-ES web site:
--FAQ LSF questions.
-Introductoin to LSF (pdf and html) guide by Platform computing
-Much more
48
Common problems
Batch system not responding
Reconfiguration in progress
LSF event file rotation in progress
Failed simulation
No license, out of disk space, linux automounter/nfs, host out of memory
Disk quota (home directory)
linux x86 kernel file size limit
Long pend times
All hosts busy
Invalid resource options
Out of licenses
Queue limit
49
How to Get Help
e-mail vlsi.unix.help for any issues.
When submitting a ticket, please include as much information about your
job as possible. For example:
Provide the Job ID and the output of bjobs lp {job_ID} so we can see how
the job is run.
Provide the host / queue youre having problems with.
Provide the license resource thats unavailable that youre trying to use.
Provide any LSF or application exit codes you received.
Provide the output of your program log for additional support.
Be descriptive and explain your issue.
http://lsf.qualcomm.com

Vous aimerez peut-être aussi