My company has a Linux cluster with a terabyte of attached storage. Over time we noticed the head node was becoming more overloaded. Inspection of the system showed that users were starting dozens of copies of the du utility to determine disk space usage. This was a natural thing for them to do, because they had a need to know how much disk space was available. A lack of disk space would cause their software builds and tests to fail. The problem was that it takes five to seven hours for a du of the entire shared filesystem. Thus, when the filesystem was nearly full (as it of course usually was), the number of du processes would increase almost exponentially.
To address this problem, we first set up automated nightly disk space reports, so that users could check the status without running du. This still did not solve the problem, as the amount of used space could fluctuate dramatically over the course of 24 hours. Users still wanted and needed to run their own du processes throughout the workday.
While adding more disk space would have solved the problem, we are using a large disk array that is already filled to maximum capacity. In general, users tend to fill up all available disk space anyway, no matter how much you give them.
We then developed a policy: users could run du on any directory they owned. In addition, user du processes would be allowed to run for a maximum of one hour of wall time. Users in the wheel group would be exempt from these restrictions.
I was given the task of developing a tool to implement this policy. Some sort of wrapper around the existing du seemed like an obvious choice: the script could validate the input, abort if an invalid path was given, and terminate the du process if it ran too long.
I wrote a basic bash script in perhaps an hour's time. Then I thought about how to run it, and that is where I ran into trouble. I had thought that I would make the script set user id (setuid) or set group id (setgid) root, i.e. when run by any user it would actually run in the root group. Then, I could change the permissions on the real du so that only root could run it. The result would be that normal users could only access the real du through the wrapper script.
Of course that would make a pretty boring article, and in reality it didn't turn out to be that simple:
Full Story .