SLURM: Workload Manager

SLURM (Simple Linux Utility for Resource Management) is a scalable open-source scheduler used on a number of world class clusters.

The currently installed SLURM version on PlaFRIM is 19.05.2.

You will find below a brief description to help users to launch jobs on the platform. More details are available in the official SLURM Quick Start User Guide and in the official SLURM documentation.

Interactive jobs

You need to allocate some resources.

salloc -N2 -t 00:30:00

salloc: Granted job allocation 17397

The command squeue can be used to have a look at the job state:

squeue --job 17397

JOBID PARTITION NAME USER      ST TIME NODES   NODELIST(REASON)
17397     routage   bash      bouchoui R   1:05      2    miriel[007-008]

squeue can also give more information.

squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %.3C %.20R" --job 17397

To get information about a specific job, scontrol show job <JobId> can be used.

squeue header details:

JOBIDThe job identifier
PARTITIONThe partition on which the job is running, use sinfo command to display all partitions on the cluster.
NAMEthe name of job,  to define or change the name (in batch mode) use -J <name_of_job>
USERthe login of the job owner
STthe state of submitted job PENDING, RUNNING, FAILED, COMPLETED, … etc.
TIMEThe time limit for the job (NOTE : if the user doesn’t define the time limit for his job, the default time limit of the partition will be used).
NODESize of nodes.
NODELISTList of nodes used.

There is a compact format for job’s state:

  • PD (pending): Job is awaiting resource allocation,
  • R (running): Job currently has an allocation,
  • CA (cancelled): Job was explicitly cancelled by the user or system administrator,
  • CF (configuring): Job has been allocated resources, but is waiting for them to become ready,
  • CG (completing): Job is in the process of completing. Some processes on some nodes may still be active,
  • CD (completed): Job has terminated all processes on all nodes,
  • F (failed): Job terminated with non-zero exit code or other failure condition,
  • TO (timeout): Job terminated upon reaching its time limit,
  • NF (node failure): Job terminated due to failure of one or more allocated nodes.

The job 17397 is in running (R) mode, on the default partition routage, and 2 nodes are used, miriel007 and miriel008.

In the same shell terminal, run  srun your_executable (the command will use all the allocated resources)

srun hostname

miriel007
miriel008

you can also login to one of the allocated nodes by using ssh however all slurm’s variables environment will not be set.

ssh miriel007

Once connected to a node with ssh, if you want to run a command on all allocated resources, you must run the srun command with the jobid option followed by the id associated to your job.

@miriel007~$ srun --jobid=17397 hostname

miriel007
miriel008

From the salloc connection, or directly from a devel node, you can launch an interactive session with srun.

srun -N1 --exclusive --time=30:00 --pty bash -i

You are then directly connected to the first node.

hostname

miriel006

Non-interactive (Batch) jobs

cat script-slurm.sh

!/usr/bin/env bash
# Job name
SBATCH -J TEST_Slurm
# Asking for one node
SBATCH -N 1
# Output results message
SBATCH -o slurm.sh%j.out
# Output error message
SBATCH -e slurm.sh%j.err

echo “=====my job information ==== “

echo “Node List: ” $SLURM_NODELIST
echo “my jobID: ” $SLURM_JOB_ID
echo “Partition: ” $SLURM_JOB_PARTITION
echo “submit directory:” $SLURM_SUBMIT_DIR
echo “submit host:” $SLURM_SUBMIT_HOST
echo “In the directory: $PWD
echo “As the user: $USER

Launch the job using the command sbatch

sbatch script-slurm.sh

Submitted batch job 7421

to get information about the running jobs

squeue

and more…

scontrol show job <jobid>

to delete a running job

scancel <jobid>

to watch the output of the job 17421

cat slurm.sh17421.out

=====my job information ====
Node List: miriel003
my jobID: 7449
Partition: routage
submit directory: /home/bouchoui/Tests
submit host: devel02
In the directory: /home/bouchoui/Tests
As the user: bouchoui

Parallel  Programming (MPI)

MPI usage depends on the implementation of MPI being used (for more details http://slurm.schedmd.com/mpi_guide.html).

We describe how to use OpenMPI and IntelMPI which are installed on PlaFRIM.

OpenMPI

Load your environment using the appropriate modules.

Currently, we provide the following OpenMPI implementations

module avail mpi/openmpi

mpi/openmpi/2.0.4 mpi/openmpi/3.1.4 mpi/openmpi/4.0.1 mpi/openmpi/4.0.1-intel mpi/openmpi/4.0.2 mpi/openmpi/4.0.2-testing mpi/openmpi/4.0.3 mpi/openmpi/4.0.3-mlx

To use the 4.0.3 version

module load mpi/openmpi/4.0.3

To run a MPI program, you can use mpirun.

salloc -N 3
mpirun hostname

miriel040.plafrim.cluster
miriel041.plafrim.cluster
miriel042.plafrim.cluster

salloc -n 3
mpirun hostname

miriel023.plafrim.cluster
miriel023.plafrim.cluster
miriel023.plafrim.cluster

To compile a MPI application, you can use mpicc

mpicc -o program program.c

To run the program

mpirun --mca btl openib,self program

Intel MPI

Load your environment using the appropriate modules.

module avail mpi/intel
module add compiler/gcc compiler/intel mpi/intel

Create a file with the names of the machines that you want to run your job on:

 srun hostname  -s| sort -u > mpd.hosts

To run your application on these nodes, use mpiexec.hydra, and choose the fabrics for intra-node and inter-nodes mpi communcation:

 export I_MPI_FABRICS=shm:tmi
 mpiexec.hydra -f mpd.hosts -n $SLURM_NNODES ./a.out