View on GitHub

hackyhour

Source for information on fortnightly bioinformatics help session

What’s going on: checking the status of projects and jobs on NeSI (and other places)

It can be important to keep track of what you are doing on the server. How many CPUs you have used, how many you have left. How much disk space is left on your project, or your home folder. Here are a few commands to help you keep track, along with links to more information on NeSI.

CPU usage

There is a command for NeSI to track how many core hours you have used over the course of each project. Note, that this is not just the number of hours a job runs, but the time is multiplied by the number of CPUs used for that job. The amount of working memory (RAM) may also be figured in that calculation. To keep it straight, use the following command:

In this example, we are looking at logical CPU core use for the project nesi00420

nn_corehour_usage -l nesi00420

To view the Fair Share adjusted CPU core use (see this link for details on Fair Share and here for job prioritisation), use -f instead of -l

nn_corehour_usage -f nesi00420

And to see only two months of usage use the -n option

nn_corehour_usage -l nesi00420 -n 2

Storage quota

Your home folder and each project folder has storage limits–for both the size and number of files. This link gives an overview of NeSI system quotas. Running the following command will help you find where you are running out of space.

nn_storage_quota

(just like that, no other options)

Dini at NeSI reminded me about two other useful commands for your home folder (which has a low storage limit). One to show the folders with the most files (in order):

find . -printf "%h\n" | cut -d/ -f-2 | sort | uniq -c | sort -rn

And the other to sort the largest directories

du -a | sort -n -r | head -n 10

Checking status of active or submitted jobs

This is likely one of the first slurm commands you learn (after submitting a job with sbatch, but as a reminder, here is how to check to see your job’s status:

squeue -u user.name

The other important command for active commands is the scancel command, which will cancel a job:

scancel 12345678

This will cancel the job with id: 12345678 (check the status with squeue to get the job id)

You can also check on recently run jobs with the sacct command. Here is a link for all job related commands and other useful information.