It is recommended to submit your jobs using a batch script, which helps automate job submission and resource management. Sample scripts can be found in the /home/examples/ directory on the HPC system.
#!/bin/bash
#SBATCH --job-name="My job"
#SBATCH --partition=cpu-2g
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=1G
###SBATCH --gres=gpu:1
#SBATCH --time=0-1:30
#SBATCH --chdir=.
#SBATCH --output=cout.txt
#SBATCH --error=cerr.txt
###SBATCH --test-only
sbatch_pre.sh
module load ...
mpiexec.gforker -n $SLURM_CPUS_PER_TASK MyProgram
sbatch_post.sh
In the example script above,
Pay attention to the key parts highlighted below:
Job name: My job
Partition: Uses the cpu-2g partition (You can check available partitions using the sinfo command.)
CPU cores: Requests 4 cores (Note: More cores do not always mean faster or more efficient execution; it depends on your application.)
GPU: Requests 1 GPU card
Time limit: 0 days 1 hour 30 minutes
Standard output: Redirected to cout.txt
Error output: Redirected to cerr.txt
module load ... : Loads the required software environment. Refer to the Environment Modules section for details.
mpiexec.gforker ...: Executes your program.
#SBATCH --test-only: Runs the script in test mode (checks configuration without executing the job).
Submit the job using the following command,
sbatch job.sh
Under normal conditions, the file cout.txt will be generated immediately. The first few lines of its output will look like this,
============================ Messages from Goddess ============================
* Job starting from: Fri Dec 22 11:52:10 CST 2017
* Job ID : 118
* Job name : My job
* Job partition : normal
* Nodes : 1
* Cores : 4
* Working directory: ~/examples/sbatch
===============================================================================
If the program finishes successfully, the end of the cout.txt file will look like this,
============================ Messages from Goddess ============================
* Job ended at : Fri Dec 22 11:52:10 CST 2017
===============================================================================
In the previous example, the job ID is 118, and its working directory is ~/examples/sbatch.
You can monitor your job and system status with the following commands,
squeue(監看你的工作狀態)
squeue — View the status of your submitted jobs.
sinfo — View the status of all system nodes.
To cancel a specific job (for example, job ID 118), use,
scancel 118
Some commercial applications require releasing both hardware resources and software licenses when a job is stopped, so using scancel may not work properly in those cases. For example, with LS-DYNA, if you need to stop a running job early, create a file named D3KIL in the job’s working directory. Then the LS-DYNA solver will automatically detect this file, terminate the analysis, and release both hardware resources and software licenses.