Vous êtes sur la page 1sur 101

Cluster Computing at PIK

a tutorial

Ciaron Linstead

10th May 2016


Introduction Environment Modules SLURM Python Documentation Questions

Outline

1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions

Ciaron Linstead IT Services 2


Introduction Environment Modules SLURM Python Documentation Questions

Introduction

Cluster configuration
Environment modules
SLURM - the workload scheduler
Create, submit and monitor jobs
Anaconda Python environment
Documentation

Ciaron Linstead IT Services 3


Introduction Environment Modules SLURM Python Documentation Questions

Introduction

Logged into cluster?

Ciaron Linstead IT Services 4


Introduction Environment Modules SLURM Python Documentation Questions

Introduction

Logged into cluster?


Run a Python, R, shell script?

Ciaron Linstead IT Services 4


Introduction Environment Modules SLURM Python Documentation Questions

Introduction

Logged into cluster?


Run a Python, R, shell script?
Compiled and run C, C++, Fortran?

Ciaron Linstead IT Services 4


Introduction Environment Modules SLURM Python Documentation Questions

Introduction

Logged into cluster?


Run a Python, R, shell script?
Compiled and run C, C++, Fortran?
Submitted jobs via SLURM?

Ciaron Linstead IT Services 4


Introduction Environment Modules SLURM Python Documentation Questions

Introduction

Logged into cluster?


Run a Python, R, shell script?
Compiled and run C, C++, Fortran?
Submitted jobs via SLURM?
comfortable on the command line (cd, ls, mkdir)?

Ciaron Linstead IT Services 4


Introduction Environment Modules SLURM Python Documentation Questions

Introduction

Logged into cluster?


Run a Python, R, shell script?
Compiled and run C, C++, Fortran?
Submitted jobs via SLURM?
comfortable on the command line (cd, ls, mkdir)?
download/compiled/install 3rd party software in Linux?

Ciaron Linstead IT Services 4


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)
3.5x throughput compared to iPlex Infiniband

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)
3.5x throughput compared to iPlex Infiniband
0.25x latency

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)
3.5x throughput compared to iPlex Infiniband
0.25x latency
Filesystem
2 petabyte GPFS (parallel) filesystem storage

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)
3.5x throughput compared to iPlex Infiniband
0.25x latency
Filesystem
2 petabyte GPFS (parallel) filesystem storage
/p/projects, /p/tmp, /p/system

Ciaron Linstead IT Services 5


Introduction Environment Modules SLURM Python Documentation Questions

Compute node configuration

2x Intel Xeon E5-2667v3 "Haswell" CPUs


8 cores per CPU, 16 cores total
64GB DRAM: TruDDR4 2133MHz
(4GB per core)
No local disk

Ciaron Linstead IT Services 6


Introduction Environment Modules SLURM Python Documentation Questions

Compute node - RAM+

Regular compute nodes with 256GB memory


16GB per core
4 available
priority to large-memory jobs (but can run regular jobs)
--partition=ram_gpu

Ciaron Linstead IT Services 7


Introduction Environment Modules SLURM Python Documentation Questions

Compute node - GPU

Regular + 256GB RAM + Nvidia Tesla K40c GPU


2 available
1.66 TFlops per card (plus the 16 CPUs)
CUDA/OpenCL interface for programming
--partition=ram_gpu --gres=gpu:1

Ciaron Linstead IT Services 8


Introduction Environment Modules SLURM Python Documentation Questions

Outline

1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions

Ciaron Linstead IT Services 9


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules - motivation

Have a look at the search path: echo $PATH

Ciaron Linstead IT Services 10


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules - motivation

Have a look at the search path: echo $PATH


when you run "command", the shell goes through this list

Ciaron Linstead IT Services 10


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules - motivation

Have a look at the search path: echo $PATH


when you run "command", the shell goes through this list
shell runs the first one it finds

Ciaron Linstead IT Services 10


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules - motivation

Have a look at the search path: echo $PATH


when you run "command", the shell goes through this list
shell runs the first one it finds
what if I want multiple versions?

Ciaron Linstead IT Services 10


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules - motivation

Have a look at the search path: echo $PATH


when you run "command", the shell goes through this list
shell runs the first one it finds
what if I want multiple versions?
same goes for library paths, e.g. NetCDF, MPI

Ciaron Linstead IT Services 10


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

list software: "module avail"

Ciaron Linstead IT Services 11


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

list software: "module avail"


-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0

Ciaron Linstead IT Services 11


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

list software: "module avail"


-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0

"module load compiler/intel/16.0.0" - load one

Ciaron Linstead IT Services 11


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

list software: "module avail"


-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0

"module load compiler/intel/16.0.0" - load one


"module load compiler/intel"

Ciaron Linstead IT Services 11


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

list software: "module avail"


-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0

"module load compiler/intel/16.0.0" - load one


"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!

Ciaron Linstead IT Services 11


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

list software: "module avail"


-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0

"module load compiler/intel/16.0.0" - load one


"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!
"module list" - what’s loaded?

Ciaron Linstead IT Services 11


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

list software: "module avail"


-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0

"module load compiler/intel/16.0.0" - load one


"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!
"module list" - what’s loaded?
"module unload compiler/intel/16.0.0" - unload one

Ciaron Linstead IT Services 11


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

list software: "module avail"


-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0

"module load compiler/intel/16.0.0" - load one


"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!
"module list" - what’s loaded?
"module unload compiler/intel/16.0.0" - unload one
"module purge" - unload all

Ciaron Linstead IT Services 11


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

list software: "module avail"


-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0

"module load compiler/intel/16.0.0" - load one


"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!
"module list" - what’s loaded?
"module unload compiler/intel/16.0.0" - unload one
"module purge" - unload all
"module show compiler/intel/16.0.0" - what’s being
done?

Ciaron Linstead IT Services 11


Introduction Environment Modules SLURM Python Documentation Questions

modules - an example modulefile

module show nco/4.5.0


--------------------------------------------------------------
/p/system/modulefiles/tools/nco/4.5.0:

module-whatis Enable usage for nco version 4.5.0


setenv NCOROOT /p/system/packages/nco/4.5.0
prepend-path PATH /p/system/packages/nco/4.5.0/bin
prepend-path INCLUDE /p/system/packages/nco/4.5.0/include
prepend-path LD_LIBRARY_PATH /p/system/packages/nco/4.5.0/lib
prepend-path MANPATH /p/system/packages/nco/4.5.0/share/man
--------------------------------------------------------------

Ciaron Linstead IT Services 12


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

infrastructure loaded automatically


modules in /p/system/modulefiles, organised by category

Ciaron Linstead IT Services 13


Introduction Environment Modules SLURM Python Documentation Questions

Environment Modules

infrastructure loaded automatically


modules in /p/system/modulefiles, organised by category
add your own via $MODULEPATH
"export MODULEPATH=$MODULEPATH:$HOME/modulefiles"
add this to your .bashrc file

Ciaron Linstead IT Services 13


Introduction Environment Modules SLURM Python Documentation Questions

Exercise - Let’s add a custom module!

Download and build a library

Ciaron Linstead IT Services 14


Introduction Environment Modules SLURM Python Documentation Questions

Exercise - Let’s add a custom module!

Download and build a library


Install in our $HOME directory

Ciaron Linstead IT Services 14


Introduction Environment Modules SLURM Python Documentation Questions

Exercise - Let’s add a custom module!

Download and build a library


Install in our $HOME directory
Write a modulefile

Ciaron Linstead IT Services 14


Introduction Environment Modules SLURM Python Documentation Questions

Exercise - Let’s add a custom module!

Download and build a library


Install in our $HOME directory
Write a modulefile
recommendation: install in
/some/path/<compiler>/package_name/version/

Ciaron Linstead IT Services 14


Introduction Environment Modules SLURM Python Documentation Questions

Exercise - Let’s add a custom module!

Download and build a library


Install in our $HOME directory
Write a modulefile
recommendation: install in
/some/path/<compiler>/package_name/version/
e.g. /home/linstead/software/intel/gmp/6.1.0

Ciaron Linstead IT Services 14


Introduction Environment Modules SLURM Python Documentation Questions

Exercise - Let’s add a custom module!

Download and build a library


Install in our $HOME directory
Write a modulefile
recommendation: install in
/some/path/<compiler>/package_name/version/
e.g. /home/linstead/software/intel/gmp/6.1.0
Prep:
cp -r /home/linstead/phd16/ $HOME

Ciaron Linstead IT Services 14


Introduction Environment Modules SLURM Python Documentation Questions

Hints - build

cd
mkdir -p software/gmp/6.1.0 && cd
software/gmp/6.1.0
tar xvf ../phd16/gmp-6.1.0.tar.xz
cd gmp-6.1.0
module load compiler/gnu/5.2.0
./configure –prefix=$HOME/software/gmp/6.1.0
make && make install

Ciaron Linstead IT Services 15


Introduction Environment Modules SLURM Python Documentation Questions

Hints - module

mkdir -p $HOME/modulefiles/gmp
cp /home/linstead/modulefiles/gmp/6.1.0
$HOME/modulefiles/gmp
edit $HOME/modulefiles/gmp/6.1.0 to match your
installation

Ciaron Linstead IT Services 16


Introduction Environment Modules SLURM Python Documentation Questions

Outline

1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions

Ciaron Linstead IT Services 17


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

Simple Linux Utility for Resource Management (SLURM)


SLURM manages and allocates cluster resources
programs (typically) submitted as jobs to a queue
"sbatch myscript.sh"
myscript.sh is a regular script, plus SLURM info

Ciaron Linstead IT Services 18


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - a simple submit script

1 #!/bin/bash
2
3 #SBATCH --partition=standard
4 #SBATCH --qos=short
5 #SBATCH --job-name=sumprimes
6 #SBATCH --output=sumprimes-%j.out
7 #SBATCH --error=sumprimes-%j.err
8 #SBATCH --account=its
9 #SBATCH --ntasks=1
10
11 $HOME/mycode/sumprimes 1 10000000001
submit with sbatch <filename>

Ciaron Linstead IT Services 19


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--partition - physical nodes ("standard" or "ram_gpu")


"sinfo"

Ciaron Linstead IT Services 20


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--partition - physical nodes ("standard" or "ram_gpu")


"sinfo"
--qos - job type, determines limits and thus priority
"sacctmgr show qos" (or see my alias ssq)
interesting fields: Name, Priority, MaxWall, GrpTRES,
MaxTRES

Ciaron Linstead IT Services 20


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--partition - physical nodes ("standard" or "ram_gpu")


"sinfo"
--qos - job type, determines limits and thus priority
"sacctmgr show qos" (or see my alias ssq)
interesting fields: Name, Priority, MaxWall, GrpTRES,
MaxTRES
--job-name
distinguish between your jobs in the queue
%j (Job ID) gives jobs unique filenames

Ciaron Linstead IT Services 20


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--output/--error
location of standard output and error (e.g. "print" statements)
omit "error" and STDERR will go to "output"

Ciaron Linstead IT Services 21


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--output/--error
location of standard output and error (e.g. "print" statements)
omit "error" and STDERR will go to "output"
--account
the project this job relates to
see "groups" command for the projects you belong to

Ciaron Linstead IT Services 21


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--output/--error
location of standard output and error (e.g. "print" statements)
omit "error" and STDERR will go to "output"
--account
the project this job relates to
see "groups" command for the projects you belong to
--ntasks
number of copies of this program to run
--ntasks=1 for serial (non-MPI, non parallel) jobs

Ciaron Linstead IT Services 21


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - parallel (MPI) submit script

as previous example, with these differences:


1 #SBATCH --ntasks=128
2
3 module purge
4 module load mpi/intel/5.1.3
5 # run parallel code with mpirun
6 mpirun -bootstrap slurm -n $SLURM_NTASKS $HOME/mycode/
sumprimes 0 10000000000

Ciaron Linstead IT Services 22


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--ntasks=128
give me 128 processor cores, I don’t care where
SLURM will attempt to pack sockets and nodes

Ciaron Linstead IT Services 23


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--ntasks=128
give me 128 processor cores, I don’t care where
SLURM will attempt to pack sockets and nodes
for performance, I may require packing/blocking
--nodes=8
--tasks-per-node=16

Ciaron Linstead IT Services 23


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--ntasks=128
give me 128 processor cores, I don’t care where
SLURM will attempt to pack sockets and nodes
for performance, I may require packing/blocking
--nodes=8
--tasks-per-node=16
or
--nodes=16
--tasks-per-node=8
(but see next slide!)

Ciaron Linstead IT Services 23


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--nodes=16 and --tasks-per-node=8


uses half the cores on each node
the other half are available for other users
implications for memory/disk bandwidth

Ciaron Linstead IT Services 24


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - the workload scheduler

--nodes=16 and --tasks-per-node=8


uses half the cores on each node
the other half are available for other users
implications for memory/disk bandwidth
--exclusive gets you the whole node
8GB RAM per task with 16/8 above
up to 64GB per tasks on standard nodes
use sparingly!
see also --cpus-per-task

Ciaron Linstead IT Services 24


Introduction Environment Modules SLURM Python Documentation Questions

SLURM examples - threaded (OpenMP) submission


script
OpenMP.sh
1 #!/bin/bash
2 # (options omitted for brevity)
3 #SBATCH --nodes=1
4 #SBATCH --tasks-per-node=1
5 #SBATCH --cpus-per-task=16
6
7 export OMP_NUM_THREADS=16
8 # OR export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
9
10 $HOME/mydir/myprog.exe
11 # OR srun $HOME/mydir/myprog.exe
submit with sbatch OpenMP.sh

Ciaron Linstead IT Services 25


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

show queue: "squeue -u <username>"


alias sq=’squeue -u <username>’

Ciaron Linstead IT Services 26


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

show queue: "squeue -u <username>"


alias sq=’squeue -u <username>’
SQUEUE_FORMAT="%.18i %.9P %.8j %.8u %.2t %.10M
%.10L %.10l %.6D %.6C %.8q %R"
scontrol show job <job_id>

Ciaron Linstead IT Services 26


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs


scontrol show job
1 JobId=321608 JobName=sumprimes
2 UserId=linstead(405) GroupId=users(100)
3 Priority=6858 Nice=0 Account=its QOS=short
4 JobState=RUNNING Reason=None Dependency=(null)
5 Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
6 RunTime=00:00:05 TimeLimit=1-00:00:00 TimeMin=N/A
7 SubmitTime=2016-05-04T14:47:13 EligibleTime=2016-05-04T14:47:13
8 StartTime=2016-05-04T14:47:13 EndTime=2016-05-05T14:47:13
9 Partition=standard AllocNode:Sid=login01:15215
10 NodeList=cs-e14c01b[02-05]
11 BatchHost=cs-e14c01b02
12 NumNodes=4 NumCPUs=64 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
13 TRES=cpu=64,mem=229376,node=4
14 Socks/Node=* NtasksPerN:B:S:C=16:0:*:* CoreSpec=*
15 MinCPUsNode=16 MinMemoryCPU=3.50G MinTmpDiskNode=0
16 Command=/home/linstead/cluster-examples/sumprimes/mpi/slurm.sh
17 WorkDir=/home/linstead/cluster-examples/sumprimes/mpi
18 StdErr=/home/linstead/cluster-examples/sumprimes/mpi/sumprimes-321608.err
19 StdOut=/home/linstead/cluster-examples/sumprimes/mpi/sumprimes-321608.out

Ciaron Linstead IT Services 27


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs


sview - a graphical monitoring tool

Ciaron Linstead IT Services 28


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs


sview - right-click on a job

Ciaron Linstead IT Services 29


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

Did my job(s) finish yet?


check squeue -u <username>
check sacct

Ciaron Linstead IT Services 30


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

sacct
1 [15:47:13] linstead@login01:~$ sacct
2 JobID JobName Partition Account AllocCPUS State ExitCode
3 ------------ ---------- ---------- ---------- ---------- ---------- --------
4 321467 sumprimes standard its 32 CANCELLED+ 0:0
5 321467.batch batch its 8 CANCELLED 0:15
6 321467.0 pmi_proxy its 4 FAILED 7:0
7 321472 sumprimes standard its 32 COMPLETED 0:0
8 321472.batch batch its 8 COMPLETED 0:0
9 321472.0 pmi_proxy its 4 COMPLETED 0:0
10 325608 sumprimes standard its 64 FAILED 9:0
11 325608.batch batch its 16 FAILED 9:0
12 325608.0 pmi_proxy its 4 COMPLETED 0:0

Ciaron Linstead IT Services 31


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

ExitCode n:m
n : code returned by the job script
m : signal which caused the process to terminate (if signalled)

Ciaron Linstead IT Services 32


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

By default sacct goes back to midnight

Ciaron Linstead IT Services 33


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

By default sacct goes back to midnight


all my jobs in April:
sacct -S04.01 -E05.01

Ciaron Linstead IT Services 33


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

By default sacct goes back to midnight


all my jobs in April:
sacct -S04.01 -E05.01
Add/view extra fields
sacct --format=...
e.g. just ID and end time sacct -ojobid,end

Ciaron Linstead IT Services 33


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

By default sacct goes back to midnight


all my jobs in April:
sacct -S04.01 -E05.01
Add/view extra fields
sacct --format=...
e.g. just ID and end time sacct -ojobid,end
see man sacct
the format of most SLURM commands is configurable, either
via --format/-o or $S???_FORMAT variables.

Ciaron Linstead IT Services 33


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

Exercise: what time did jobs 325608 and 325611 start and
end?
Did any fail? If so, what were the exit codes?

Ciaron Linstead IT Services 34


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - monitoring jobs

Exercise: what time did jobs 325608 and 325611 start and
end?
Did any fail? If so, what were the exit codes?
sacct --jobs 325608,325611 -a --format=start,end,state,exitc

Ciaron Linstead IT Services 34


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - job arrays

Submit and manage collections of similar jobs


--array=0-31
each job in the array takes the same settings from submit
script
each job has a unique index
$SLURM_ARRAY_TASK_ID

Ciaron Linstead IT Services 35


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - job arrays

Task ID is available to my script:


#!/bin/bash

#SBATCH --qos=short
#SBATCH --partition=standard
#SBATCH --array=0-15
#SBATCH --output=jobarray-%A_%a.out

echo ${SLURM_ARRAY_TASK_ID}
(e.g. "./myprog inputfile_${SLURM_ARRAY_TASK_ID}")

Ciaron Linstead IT Services 36


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - job arrays

Read $SLURM_ARRAY_TASK_ID in Python:


import os, sys

try:
task = os.environ[’SLURM_ARRAY_TASK_ID’]
except KeyError:
print "Not running with SLURM job arrays"
sys.exit(1)

Ciaron Linstead IT Services 37


Introduction Environment Modules SLURM Python Documentation Questions

SLURM - job arrays

Read $SLURM_ARRAY_TASK_ID in C:
#include <stdio.h>
#include <stdlib.h>

int main ()
{
printf("task ID: %s\n", getenv("SLURM_ARRAY_TASK_ID"));
return(0);
}

Ciaron Linstead IT Services 38


Introduction Environment Modules SLURM Python Documentation Questions

Outline

1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions

Ciaron Linstead IT Services 39


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing

Anaconda 2.3.0 (module load anaconda/2.3.0)


/p/system/packages/anaconda/2.3.0/bin/python
conda list

Ciaron Linstead IT Services 40


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues
source activate testenv

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues
source activate testenv
conda env export > environment.yml

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues
source activate testenv
conda env export > environment.yml
conda env create -f environment.yml

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing


Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues
source activate testenv
conda env export > environment.yml
conda env create -f environment.yml
(can also manage R packages, we can set this up if there’s
interest)

Ciaron Linstead IT Services 41


Introduction Environment Modules SLURM Python Documentation Questions

Python for scientific computing

linstead@login01:~$ source activate testenv

discarding /p/system/packages/anaconda/2.3.0/bin from PATH


prepending /home/linstead/.conda/envs/testenv/bin to PATH
(testenv)linstead@login01:~$

Ciaron Linstead IT Services 42


Introduction Environment Modules SLURM Python Documentation Questions

Exercise

create a new environment


install packages ipython and matplotlib

Ciaron Linstead IT Services 43


Introduction Environment Modules SLURM Python Documentation Questions

Outline

1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions

Ciaron Linstead IT Services 44


Introduction Environment Modules SLURM Python Documentation Questions

Documentation / Help

Cluster User Guides:


https://www.pik-potsdam.de/services/it/hpc/user-guides
Environment Modules: http://modules.sourceforge.net/
conda package manager:
http://conda.pydata.org/docs/using/
SLURM: http://slurm.schedmd.com/
man pages (module, sbatch, srun, sinfo, squeue etc.)

Ciaron Linstead IT Services 45


Introduction Environment Modules SLURM Python Documentation Questions

Documentation / Help

Questions/Problems/Requests:
http://www.pik-potsdam.de/services/it/hpc
mailto:cluster-support@pik-potsdam.de

Ciaron Linstead IT Services 46


Introduction Environment Modules SLURM Python Documentation Questions

Outline

1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions

Ciaron Linstead IT Services 47


Introduction Environment Modules SLURM Python Documentation Questions

Questions

Any questions?

Ciaron Linstead IT Services 48

Vous aimerez peut-être aussi