Run over your storage quota?
If you are experiencing:
- Issues accessing or using Strudel,
- Error messages like
Disk quota exceeded
.
then this could be the result of:
- running over your storage quota in a key directory, OR
- running out of disk space on a local disk like
/tmp/
.
If a key directory is full...
If one of your key directories is full, then software tools may break. This is especially true for your home directory. To resolve this:
- Connect to a login node. Either connect via SSH, or connect to Strudel and open a terminal.
- Verify you are over quota. See
user_info
. You will see quota usage over 100% if you are over quota. - Identify exactly which files/directories are using up your quota. See
ncdu
. - Remove or move large files. See Common causes of disk filling up for advice here.
Your home quota will not be increased, so please follow the instructions below to clean up your home directory.
Common causes of disk filling up
Remember to first use ncdu
to figure out if any of these examples apply to you! For solutions involving setting environment variables,
consider placing these in your ~/.bashrc
so these variables are automatically set every time you log in to M3. Note that cache files are generally safe to delete.
Cause | Detail | Solution |
---|---|---|
Conda | By default, Conda will store your environments and their packages in your home directory, easily filling up your quota. | Configure Conda as in our guide. Then either move your old Conda environments to their new location, or simply delete all of your Conda environments and packages with:rm -rf ~/.conda/envs rm -rf ~/.conda/pkgs |
Pip cache | By default, Python's pip installer places cache files in ~/.cache/pip/ . | Set PIP_CACHE_DIR . For example: PROJECT_ID=ab12 export PIP_CACHE_DIR="/scratch/$PROJECT_ID/$USER/pip-cache" |
Apptainer cache | By default, Apptainer places cache files in ~/.apptainer/cache/ . | Set APPTAINER_CACHEDIR . For example: PROJECT_ID=ab12 export APPTAINER_CACHEDIR="/scratch/$PROJECT_ID/$USER/apptainer-cache" mkdir -p "$APPTAINER_CACHEDIR" |
VNC log files | Not super common, but sometimes Strudel usage results in a very large log file in ~/.vnc/ . | No nice solution sadly, but this is quite rare. Just delete the log file and move on. |
Your own large files! | You may have put your own large data files in your home directory. | Move the files into a project or scratch directory instead. |
XDG_CACHE_HOME
Some software will obey the XDG_CACHE_HOME
environment variable
environment variable.
If a local disk directory like /tmp/
is full...
Sometimes when you run a program, it will produce an error like:
OSError: [Errno 28] No space left on device
But when you check your storage quotas using user_info
, you seem to be well
under quota for all of your directories! In this scenario, it is almost
always /tmp/
filling up. /tmp/
is unique to each node on M3 and is used
for storing temporary files. The login nodes have relatively little space in /tmp/
:
[lexg@m3-login3 ~]$ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-root 46G 16G 28G 37% /
So if you are on a login node and have this issue, you should run your program in a Slurm job on a compute node. Then, /tmp/
should be bound to a much larger local disk (> 1 TB):
[lexg@m3s101 ~]$ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 2.9T 7.1M 2.8T 1% /tmp
/tmp/
is shared by every user on a node, but it's very rare that it ever
fills up on compute nodes. A user's files in /tmp/
are automatically
deleted once their job terminates.
If somehow /tmp/
is still filling up even inside of a Slurm job, then you
can try setting the TMPDIR
environment
variable to a scratch directory
in your job script (or shell if using an interactive session). For example:
PROJECT_ID=ab12 # Change this to your own project ID
export TMPDIR="/scratch/$PROJECT_ID/$USER/tmp"
mkdir -p "$TMPDIR" # in case this cache directory doesn't already exist