The most important part of disk usage monitoring is to check the utilization of each filesystem and each mount point which can reveal existing or impending issues with the storage space on your infrastructure.
What Does Disk Usage (DU) Mean?
Disk usage (DU) refers to the portion or percentage of computer storage that is currently in use. It contrasts with disk space or capacity, which is the total amount of space that a given disk is capable of storing. Disk usage is a crucial metric to any computing system, as it gives the user the information needed not only for storage, but also software requirements and overall operation. Although it usually refers to a computer’s hard disk, it may also refer to external storage, such as a USB drive or compact disc (CD).
How to check the disk usage?
If you are using Linux (or other Unix based OS) the most popular way to check the disk usage is by executing the "df -H" command (the disk free command is pretty versatile so don’t be confused by its name) which will report how much space is used, available, percentage used, and the mount point of every disk attached to your system:
[email protected]:~$ df -H
Filesystem Size Used Avail Use% Mounted on
udev 3.9G 0 3.9G 0% /dev
tmpfs 796M 584K 796M 1% /run
/dev/mapper/cm--vg-root 63G 21G 39G 36% /
tmpfs 3.9G 344K 3.9G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda1 470M 122M 324M 28% /boot
tmpfs 796M 0 796M 0% /run/user/1000
Alternatively, you can use the du command which displays the disk usage. This tool can display disk usage for individual directories in Linux, giving you a finer-grained view of your disk usage. You can use the -h option to get a human readable output
[email protected]-vm:~$ du -h /etc/netdata/
and optionally the -s option to simply display the total disk usage of a directory / filesystem.
[email protected]-vm:~$ du -hs /etc/netdata/
How to monitor your disk usage across your Infrastructure?
Now that we know how to check the disk usage on every individual server but checking this from time to time and monitoring the disk usage are two completely different things. While the commands we talked about are very useful they are not built for 24x7 monitoring or for going back in time to the root cause of an issue. And if your infrastructure comprises 100s of nodes, logging into each node, executing ‘du’ / ‘df’ linux commands, and monitoring the disk usage is an almost impossible task. The ideal solution is to use a distributed monitoring tool like Netdata which collects various metrics related to disks and one of them being the disk usage. The disk.space chart is located under the “Mount Points” section and by default it shows the disk utilization of all the mount points across all your nodes in the infrastructure.
It is a good idea to group the chart by nodes to get an account of the disk utilization per node.
You can also drill deeper by filtering out specific nodes through the Global node filter on the top right corner for further troubleshooting.
And with the multiple Group by options available on the disk usage chart, you can look at the specific aspect of disk usage based on the issue being inspected.
The most important part of disk utilization monitoring is to check the utilization of each filesystem and each mount point which can reveal existing or impending issues with the storage space on your infrastructure.
There are default alerts which get triggered on reaching the thresholds and you can modify the thresholds based on your requirement.
# low disk space
# checking the latest collected values
# raise an alarm if the disk is low on
# available disk space
os: linux freebsd
families: !/dev !/dev/* !/run !/run/* *
calc: $used * 100 / ($avail + $used)
warn: $this > (($status >= $WARNING ) ? (80) : (90))
crit: $this > (($status == $CRITICAL) ? (90) : (98))
delay: up 1m down 15m multiplier 1.5 max 1h
info: disk $family space utilization
The alert above monitors the disk.space chart / metric, excludes some of the filesystems like /dev, /dev/, /run and /run/ and raises a Warning alert when the disk utilization exceeds 80% in all cases and if the current state is already Warning or higher, raises an alert when the disk utilization exceeds 90%. Similarly, a critical alert is raised when the disk utilization exceeds 90% in all cases and if the current alert state is Critical, it raises an alert when the disk utilization exceeds 98%. The alert will remain active for 20 minutes (5 + the 15 minute down in the delay hysteresis) unless the alert gets raised again.
Note: Netdata currently does not monitor directories, but this functionality is coming "soon".
Let us hear from you
If you haven’t already, sign up now for a free Netdata account!