When you wish to use the Sphyrna research cluster, you must create a job and submit it to our job scheduler. The scheduler helps ensure fair access to the HPC cluster by scheduling resources efficiently across the system for simultaneous jobs. If CPU/IO-intensive jobs are not submitted through the job scheduler, they may be terminated.
The job scheduler we use is called Slurm. This software enables us to provide large-but-finite compute resources to a NSU campus research community.
Depending on how you wish to use the cluster, there are two basic categories for jobs:
Getting ready to submit:
Before submitting your job to the Slurm Scheduler, you need to do a bit of planning. This may involve trial-and-error, for which interactive jobs may be helpful. The three most salient variables are as follows:
Example Slurm script:
1.GPU Partition SLURM submission template
#!/bin/bash 
 #SBATCH --job-name=gpu_job_
 #SBATCH --partition=gpu
 #SBATCH --nodes=1 # node count
 #SBATCH --ntasks-per-node=1 # total number of tasks per node
 #SBATCH --cpus-per-task=16 # cpu-cores per task (>1 if multi-threaded tasks)
 #SBATCH --mem=256G # total memory per node (4 GB per cpu-core is default)
 #SBATCH --gres=gpu:2 # number of gpus per node
 #SBATCH --time=1-10:00:00 # total run time limit (HH:MM:SS)
 #SBATCH --error=gpu_job.%J.err
 #SBATCH --output=gpu_job.%J.out
2.CPU Partition SLURM submission template
#!/bin/bash
 #SBATCH --job-name=cpu_job_
 #SBATCH --partition=cpu
 #SBATCH --nodes=4 # node count
 #SBATCH --ntasks-per-node=1 # total number of tasks per node
 #SBATCH --cpus-per-task=16 # cpu-cores per task (>1 if multi-threaded tasks)
 #SBATCH --mem=256G # total memory per node (4 GB per cpu-core is default)
 #SBATCH --time=1-10:00:00 # total run time limit (HH:MM:SS)
 #SBATCH --error=cpu_job.%J.err
 #SBATCH --output=cpu_job.%J.out
3.CPU+GPU Mix Partition SLURM submission template
#!/bin/bash 
 #SBATCH --job-name=mix_job_ 
 #SBATCH --partition=mix 
 #SBATCH --nodes=5 # node count
 #SBATCH --ntasks-per-node=1 # total number of tasks per node
 #SBATCH --cpus-per-task=16 # cpu-cores per task (>1 if multi-threaded tasks)
 #SBATCH --mem=256G # total memory per node (4 GB per cpu-core is default)
 #SBATCH --gres=gpu:2 # number of gpus per node
 #SBATCH --time=1-10:00:00 # total run time limit (HH:MM:SS)
 #SBATCH --error=mix_job.%J.err 
 #SBATCH --output=mix_job.%J.out
Slurm Command Reference
| 
 slurm command  | 
||
| 
 Slurm Command Reference Command  | 
 Purpose  | 
 Example  | 
| 
 sinfo  | 
 View information about Slurm nodes and partitions  | 
 sinfo --partition investor  | 
| 
 squeue  | 
 View information about jobs  | 
 squeue -u myname  | 
| 
 sbatch  | 
 Submit a batch script to Slurm  | 
 sbatch myjob  | 
| 
 scancel  | 
 Signal or cancel jobs, job arrays or job steps  | 
 scancel jobID  | 
| 
 srun  | 
 Run an interactive job  | 
 srun --ntasks 4 --partition investor --pty bash  |