Making Sense of the Linux Filesystem Hierarchy

Nov 14, 2024

What is the difference between /bin, /usr/bin, and /usr/local/bin? If I want to install software on a Unix-derived OS like Linux, where should it go? What about /var and /run? How are those different? And what is /opt for?

You can get deep into a career as a programmer without understanding what all these directories are supposed to contain. I have! Most software developers install software using a package manager; a package manager decides where to install software for you, so you never have to ask yourself whether installing something under /usr/bin or /usr/local/bin or some other directory makes the most sense.

If you want to install your own software from source, or especially if you want to package your software and make it available to others, you need to understand what all these directories are for.

What makes this difficult (other than the abbreviated names) is that “what all these directories are for” is a shifting target determined by convention and not technical constraints. There’s nothing about how a computer works that requires us to have a /usr directory. The choice is arbitrary and why “usr” was chosen and why “usr” survives comes down to history and culture. This makes the rules that govern where to put things on your system more like the rules of grammar and usage in English than like any other cold, hard rules in computer science.

There is a standard explaining what all these directories are for called the Filesystem Hierarchy Standard. The FHS is a 43-page document prepared by a working group of the Linux Foundation that specifies, for Linux and other Unix-derived systems, the purpose and required contents of a few dozen different directories. On Linux, you can run man hier to view a man page summarizing the FHS.

The FHS is helpful, but it doesn’t tell you everything you need to know. Like a dictionary, it can tell you what words are supposed to mean but not necessarily how they are actually used.

I’ve recorded here what I’ve been able to figure out so far both about what the FHS says a directory should contain and about what that directory contains in practice. I haven’t covered every directory that appears in the FHS. Instead, I’ve focused on the many directories that seem to have overlapping purposes and tried to explain the distinctions between them.

A Quick Aside About Filesystems and Where They Appear

Most of us own a personal computer and do our computing on that machine. Our computer’s filesystem is used only by us and all of the data lives on the same physical device.

Many of the distinctions drawn between directories in the FHS make little sense unless you remember that personal computing is not the only context in which filesystems are used. A single-user machine is the easy case. The harder case involves a “site,” meaning a group of computers under the control of a system administrator—imagine a computer lab in a computer science building. In the context of a site, computers might share filesystems between them (all mounted over the network), while also maintaining separate host-specific filesystems. The computers might also be used by hundreds of different users. The FHS tries to specify a directory layout that anticipates the needs of a site system administrator while still making sense for personal computing.

A site administrator cares about which directories can be mounted by multiple machines at once. A site administrator might also care about which directories can be mounted as read-only, which can have performance or security benefits. The FHS therefore categorizes files along two axes: Files can be shareable or unshareable and they can be variable or static. A major goal of the FHS is to group like files with like files, so that, for example, a directory containing shareable, static files can be stored on a read-only filesystem shared among all hosts at a site.

The following table (adapted from the FHS) shows some standard directories and how they would be categorized along these two axes:

	Shareable	Unshareable
Static	`/usr`, `/opt`	`/etc`, `/boot`
Variable	`/var/mail`	`/run`

We’ll come back to exactly what some of these directories are supposed to store below. But the main idea here is that directories like /usr aren’t supposed to change often and can be mounted by multiple machines, whereas directories like /run are specific to a single host and change all the time, so should not be shared between machines.

Drawing Distinctions in the Filesystem Hierarchy

`/home` vs. `/root`

These first two are easy. /home should contain all of the user home directories, with the exception of the root user, whose home directory should be /root. Storing the root user’s home directory outside of /home allows /home to live on a separate filesystem that may not always be mounted.

`/mnt` vs. `/media`

According to the FHS, /mnt is a “mount point for a temporarily mounted filesystem” whereas /media is a “mount point for removable media.” These things seem similar, but the FHS is actually quite helpful here and explains that /mnt has historically been used to mount a single filesystem, whereas /media is intended to contain mount points for multiple filesystems.

On Ubuntu, and perhaps other Linux distributions, a USB drive you plug into your computer should automatically get mounted under /media. An icon might also appear for that drive in your file manager or on your desktop. On the other hand, a filesystem mounted to /mnt won’t create an icon.

In practice, it seems /mnt is reserved for manually mounting filesystems while /media is more under the control of the operating system.

`/var` vs. `/run`

Both /var and /run are supposed to contain variable data written by programs as they run. Distinguishing between them is tricky and made harder by the fact that on some systems the /var/run directory is also a thing. It helps to understand the history behind both directories.

Before /var existed, variable data was written to the /etc and /usr directories. This was a problem, because it made it impossible to mount those directories as read-only, shared filesystems. /var became a central location to store data written out by programs. Files previously written to /etc or /usr now were written under /var instead, allowing those directories to remain free of variable data.

/var has standard subdirectories for different kinds of variable data. There is /var/cache for cache data, /var/log for log data, and even /var/games for variable game data (like save files, presumably). Database systems might also store files under /var/lib/<name of db>. Postgres, on some Linux distributions, stores data under /var/lib/pgsql.

Originally, the /var directory also contained /var/run. According to an older version of the FHS published in 2004, /var/run was for “system information data describing the system since it was booted.” The property that most distinguished it from other directories in /var is that /var/run had to be wiped on each reboot of the system. Programs writing PID files (a file containing just the process ID of the program) and creating UNIX domain sockets were supposed to put them here.

/run was introduced later to solve some shortcomings of /var/run. The main problem with /var is that it can be mounted too late in the boot process to be useful to daemons like systemd and udev that are involved in booting. Those daemons needed somewhere to write out runtime data that was always available to them. /run, a root-level directory, can be mounted earlier than /var without difficulty, whereas trying to mount /var/run before /var was nontrivial. In the current version of the FHS, /run has replaced /var/run. It serves the same purpose but just lives at a different path. If /var/run is needed for backward-compatibility, it should be a symlink to /run.

`/tmp` vs. `/var/tmp` vs. `/run`

The /var directory also contains a standard subdirectory for temporary files called /var/tmp. This raises the question of what the root-level /tmp is for if there is already a directory for temporary files under /var.

The FHS explains the difference well. /var/tmp is for temporary files that need to be preserved between system reboots. /tmp on the other hand can safely be wiped at any time, and programs “must not assume that any files or directories in /tmp are preserved between invocations of the program.”

That distinction is pretty straightforward. But if /tmp is supposed to contain files wiped on reboot, how is it different from /run, which is also wiped on reboot?

A major difference between /tmp and /run is the permissions usually set on the directories. The FHS doesn’t cover this in too much detail, but /run should only be writable by the root user. /tmp on the other hand is a utility provided to all user-space programs and so should be writable by all users. /tmp usually has the sticky bit set, which provides some protection against programs interfering with each other, but otherwise it is a public dumping ground for temporary data.

`/bin` vs. `/sbin`

The /bin directory contains executable programs used by all users of a system, while the /sbin directory is supposed to contain programs used only by the system administrator.

The FHS states:

Deciding which things go into /sbin is simple: if a normal (not a system administrator) user will ever run it directly, then it must be placed in /bin. Ordinary users should not have to place /sbin in their path.

The FHS also explains that this distinction is made for user convenience and not for security. There are other security mechanisms in place that ensure that ordinary users cannot cause trouble even if they have access to an executable in /sbin, so the distinction is not made to “hide” system administration programs from ordinary users.

This distinction seems to be a case where the FHS is behind the times. Arch Linux has already merged /bin and /sbin and Fedora is planning to do so in its next release. I would not be surprised if a future edition of the FHS removes /sbin entirely.

The rationale given for the change in Fedora is that in practice most users routinely invoke programs stored under /sbin and have both /bin and /sbin in their PATH.

`/bin` vs. `/usr/bin`

The /usr directory might have the least helpful, most confusing name of any directory in the standard filesystem hierarchy. It does not contain user-specific data (that’s what /home contains), so why is it called /usr?

According to Rob Landley, its name is a historical accident. When Ken Thompson and Dennis Ritchie were first developing Unix, they had two hard disks available to store the operating system on. The hard disks were each only 1.5 megabytes large. Unix grew too large to fit on only one of the hard disks, so it ended up split across both of them. /usr originally stored user-specific data, but it was commandeered to hold a duplicate of the root-level directories like /bin and /lib just so some files under /bin and /lib could be moved to /usr/bin and /usr/lib and stored on the second disk.

The more important programs remained on the first disk because they would be needed to troubleshoot the system if something went wrong mounting the second disk. But the main motivation for splitting things up was simply that there was no more space on the hard disk used for the root filesystem.

Later, when Thompson and Ritchie acquired a third hard disk, they moved all the user-specific data out of /usr into /home, which was stored on the new hard disk. They kept the name /usr, probably because there was no replacement for it that wouldn’t be awkward. A much more accurate name for /usr would be something like /secondary.

Like the distinction between /bin and /sbin, the distinction between /bin and /usr/bin is increasingly an archaism. The FHS still specifies a split, but many Linux distributions, including Arch Linux, Fedora, and Ubuntu have merged the two. They have merged /lib and /usr/lib as well. On your system you likely still have a /bin and a /lib directory, but if you look closely these will probably be symlinks to /usr/bin and /usr/lib.

If you want to learn more about the justification for merging /bin and /usr/bin, there is a good document explaining it here.

`/opt` vs. `/usr/local/bin`

/usr/local is for software installed “locally” by the system administrator. What “local” means is not very clear, since the FHS says /usr/local can still be used for software shared between a group of hosts. “Local” in this case means software added by the system administrator on top of whatever software comes as part of the operating system distribution deployed at the site.

/usr/local contains various directories mirroring some of the directories that exist under / and /usr. Locally-installed binaries should go under /usr/local/bin.

On a personal computer, /usr/local is probably a good place to put software you’ve compiled yourself, assuming you want it to be available to all users on your system. If you are the only user of your system, then it doesn’t really matter, and you might as well install your software under your home directory.

/opt serves a similar purpose to /usr/local. It is “reserved for the installation of add-on application software packages.” What makes /opt different from other places you might install software is that software under /opt is supposed to conform to a particular layout.

Elsewhere on a Unix system, the various components of installed software get split up among several different directories. Something installed by your operating system under /usr might have an executable living under /usr/bin, a supporting library living under /usr/lib, and documentation living under /usr/share/man. In the world of /opt, files are instead grouped first by the package name. So a package called my-amazing-software would be installed at /opt/my-amazing-software. The executables that are part of my-amazing-software would be installed at /opt/my-amazing-software/bin, the supporting libraries would be installed at /opt/my-amazing-software/lib, and so on.

Installing software this way has a few advantages. For one, it becomes very easy to uninstall a package—you just have to delete the entire directory. A package like this can also be trivially installed from a zip file or tarball.

This Stack Overflow commenter suggests that the primary difference between /usr/local and /opt is that the former is for additional software compiled and maintained in-house, while /opt is for additional software created and prepackaged by a third party. In both cases, the software is in addition to whatever software is provided by the operating system distribution.

There are many directories that I have not covered here, including interesting directories important to Linux like /dev or /proc. These directories have a clear and unique purpose, so if you’re curious about what those do you should take a look at the man hier man page.

The most confusing part of the standard Unix directory layout is the many directories that seem to do the same thing. On a personal computer, many of the distinctions made between these directories don’t make sense! It’s important to remember that computing happens in other contexts and that personal computing wasn’t even a thing when Unix was first developed. In a context where multiple users share a computing system, there is a meaningful distinction between software installed under /usr/local and under your home directory, while rules governing the usage of /run vs /tmp and so forth become more necessary.

Making Sense of the Linux Filesystem Hierarchy

Nov 14, 2024

A Quick Aside About Filesystems and Where They Appear

Drawing Distinctions in the Filesystem Hierarchy

/home vs. /root

/mnt vs. /media

/var vs. /run

/tmp vs. /var/tmp vs. /run

/bin vs. /sbin

/bin vs. /usr/bin