Batch jobs
Tip
Contact Secretariat System Administrators with HPC Questions!
vshanka@clemson.edu,jopoole@clemson.edu,madonay@clemson.edu
Contact IHG Research Cores with Research Questions!
ihgcores@clemson.edu
A common way of running an analysis on Secretariat is to run your command(s) / script(s) from within an “sbatch script.” This method allows the user to request specific resources from within the script rather than typing them out on the command line. Since Secretariat’s resources are managed by Slurm, resource requests must be denoted in what is called an sbatch header.
Attention
It is important to be careful when requesting resources on Secretariat. If you request more than what you actually need, this could stall other users waiting for resources to become available. If you are unsure about how much you should allocate for your job, please contact us <tip-contact>`!
Template
Here is a template of how to declare Slurm parameters in the header of an sbatch script. This information should be placed at the top of the script that you will be submitting to Secretariat. After creating this script, save it as [jobname].sh.
Note: Please see the FAQ page for information on creating / editing files on Secretariat.
1#!/bin/bash
2#
3#SBATCH --job-name=[jobname]
4#SBATCH --cpus-per-task=[cpus]
5#SBATCH --partition=[partition]
6#SBATCH --time=[time]
7#SBATCH --mem=[mem]
8#SBATCH --output=[/path/to/][jobname].%j.out
9#SBATCH --error=[/path/to/][jobname].%j.err
Attention
At the least, please set the parameters for lines 1-7. The only line that is absolutely required is the first: #!/bin/bash. Lines 8-11 may be helpful for organizational purposes, but are optional. Please see the Slurm documentation for a full list of possible sbatch header options.
Explanation:
[
jobname]: name that will be assigned to a job within Secretariat[
cpus]: number (positive integer) of CPUs to allocate to a job[
partition]: name of partition to which job will be submitted[
time]: maximum amount of time to allocate to a jobminutes
minutes:seconds
hours:minutes:seconds
days-hours
days-hours:minutes
days-hours:minutes:seconds
[
mem]: maximum amount of memory (positive integer) to allocate to each nodekilobyes: K
megabyes: M
gigabytes: G
terabytes: T
[
/path/to/][jobname]: parent directory and filename of which to print standard error and output
Example
Here is an example of an sbatch header for a script to run fastp.
1#!/bin/bash
2#
3#SBATCH --job-name=fastp_ex
4#SBATCH --cpus-per-task=1
5#SBATCH --partition=compute
6#SBATCH --time=00:30:00
7#SBATCH --mem=2G
8#SBATCH --output=/opt/ohpc/pub/workshop/toyout/logs/fastp_ex.%j.out
9#SBATCH --error=/opt/ohpc/pub/workshop/toyout/logs/fastp_ex.%j.err
10
11# Load software
12module load fastp/0.21.0
13
14# Save directories as variables
15export dir_in="/opt/ohpc/pub/workshop/toysets/fastq/dnaseq"
16export dir_out="/opt/ohpc/pub/workshop/toyout/fastp"
17
18# Prepare directories
19mkdir -p ${dir_out}
20cd ${dir_in}
21
22# Execute function for each fastq file
23# Note: This example is for paired-end data
24for r1 in *_R1_001.fastq.gz
25do
26 r2="$(echo ${r1} | sed -e 's/R1/R2/')"
27 prefix="$(echo ${r1} | cut -f1-3 -d'_')"
28
29 fastp \
30 -in1 ${dir_in}/${r1} \
31 -in2 ${dir_in}/${r2} \
32 -out1 ${dir_out}/${prefix}_R1.out \
33 -out2 ${dir_out}/${prefix}_R2.out \
34 --json ${dir_out}/${prefix}.json \
35 --html ${dir_out}/${prefix}.html
36done
Explanation:
This script sets up a job named fastp_ex to execute the function fastp. This script allocates 1 CPU on a compute node (compute[001-004]) with up to 2 GB of memory and no more than 30 minutes of runtime to complete this job. Standard error and output will be outputted to separate files in /opt/ohpc/pub/workshop/tmp/logs.
Attention
To actually submit this script to Secretariat, please refer to the Slurm commands tab.
Jobs and nodes and tasks, oh my!
When allocating resources to jobs, particularly with respect to nodes and CPUs, there may be more than one way to accomplish the same result. This is due to the relationship between --nodes, --ntasks-per-node, --cpus-per-task, and --ntasks.
--nodes: number of nodes to be allocated to a job--ntasks-per-node: number of tasks to be allocated per node--cpus-per-task: number of CPUs to allocate per task--ntasks: maximum number of tasks to allocate to a job
Attention
All of these values must be positive integers.
Amended from the example on the Slurm FAQ page, suppose you need to allocate 4 CPUs to a particular job. There are a variety of ways to request 4 CPUs, and depending on the job, one method might be preferable. Here are some examples.
Slurm paramaters |
Interpretation |
|---|---|
|
4 independent processes |
|
4 processes with 1 CPU each, spread across 4 distinct nodes |
|
4 processes spread across 2 nodes |
|
4 processes on the same node |
|
1 process with up to 4 CPUs for multithreading |
|
2 processes with up to 2 CPUs for multithreading |
Attention
Know your software! Make sure that the software within your script supports multiple CPU usage before requesting resources that allow for multithreading.