Making Sense of The Linux Filesystem Hierarchy
Nov 14, 2024
What is the difference between /bin
, /usr/bin
, and /usr/local/bin
? If I
want to install software on a Unix-derived OS like Linux, where should it go?
What about /var
and /run
? How are those different? And what is /opt
for?
You can get deep into a career as a programmer without understanding
what all these directories are supposed to contain. I have! Most software
developers install software using a package manager; a package manager decides
where to install software for you, so you never have to ask yourself whether
installing something under /usr/bin
or /usr/local/bin
or some other
directory makes the most sense.
If you want to install your own software from source, or especially if you want to package your software and make it available to others, you need to understand what all these directories are for.
What makes this difficult (other than the abbreviated names) is that “what all
these directories are for” is a shifting target determined by convention and
not technical constraints. There’s nothing about how a computer works that
requires us to have a /usr
directory. The choice is arbitrary and why “usr”
was chosen and why “usr” survives comes down to history and culture. This makes
the rules that govern where to put things on your system more like the rules
of grammar and usage in English than like any other cold, hard rules in
computer science.
There is a standard explaining what all these directories are for called the
Filesystem Hierarchy
Standard. The FHS is a 43-page
document prepared by a working group of the Linux Foundation that specifies,
for Linux and other Unix-derived systems, the purpose and required contents of
a few dozen different directories. On Linux, you can run man hier
to view
a man page summarizing the FHS.
The FHS is helpful, but it doesn’t tell you everything you need to know. Like a dictionary, it can tell you what words are supposed to mean but not necessarily how they are actually used.
I’ve recorded here what I’ve been able to figure out so far both about what the FHS says a directory should contain and about what that directory contains in practice. I haven’t covered every directory that appears in the FHS. Instead, I’ve focused on the many directories that seem to have overlapping purposes and tried to explain the distinctions between them.
A Quick Aside About Filesystems and Where They Appear
Most of us own a personal computer and do our computing on that machine. Our computer’s filesystem is used only by us and all of the data lives on the same physical device.
Many of the distinctions drawn between directories in the FHS make little sense unless you remember that personal computing is not the only context in which filesystems are used. A single-user machine is the easy case. The harder case involves a “site,” meaning a group of computers under the control of a system administrator—imagine a computer lab in a computer science building. In the context of a site, computers might share filesystems between them (all mounted over the network), while also maintaining separate host-specific filesystems. The computers might also be used by hundreds of different users. The FHS tries to specify a directory layout that anticipates the needs of a site system administrator while still making sense for personal computing.
A site administrator cares about which directories can be mounted by multiple machines at once. A site administrator might also care about which directories can be mounted as read-only, which can have performance or security benefits. The FHS therefore categorizes files along two axes: Files can be shareable or unshareable and they can be variable or static. A major goal of the FHS is to group like files with like files, so that, for example, a directory containing shareable, static files can be stored on a read-only filesystem shared among all hosts at a site.
The following table (adapted from the FHS) shows some standard directories and how they would be categorized along these two axes:
Shareable | Unshareable | |
---|---|---|
Static | /usr , /opt |
/etc , /boot |
Variable | /var/mail |
/run |
We’ll come back to exactly what some of these directories are supposed to
store below. But the main idea here is that directories like /usr
aren’t
supposed to change often and can be mounted by multiple machines, whereas
directories like /run
are specific to a single host and change all the time,
so should not be shared between machines.
Drawing Distinctions in the Filesystem Hierarchy
/home
vs. /root
These first two are easy. /home
should contain all of the user home
directories, with the exception of the root user, whose home directory should
be /root
. Storing the root user’s home directory outside of /home
allows
/home
to live on a separate filesystem that may not always be mounted.
/mnt
vs. /media
According to the FHS, /mnt
is a “mount point for a temporarily mounted
filesystem” whereas /media
is a “mount point for removable media.” These
things seem similar, but the FHS is actually quite helpful here and explains
that /mnt
has historically been used to mount a single filesystem, whereas
/media
is intended to contain mount points for multiple filesystems.
On Ubuntu, and perhaps other Linux distributions, a USB drive you plug into
your computer should automatically get mounted under /media
. An icon might
also appear for that drive in your file manager or on your desktop. On the
other hand, a filesystem mounted to /mnt
won’t create an icon.
In practice, it seems /mnt
is reserved for manually mounting filesystems
while /media
is more under the control of the operating system.
/var
vs. /run
Both /var
and /run
are supposed to contain variable data written by
programs as they run. Distinguishing between them is tricky and made harder by
the fact that on some systems the /var/run
directory is also a thing. It
helps to understand the history behind both directories.
Before /var
existed, variable data was written to the /etc
and /usr
directories. This was a problem, because it made it impossible to mount those
directories as read-only, shared filesystems. /var
became a central location
to store data written out by programs. Files previously written to /etc
or
/usr
now were written under /var
instead, allowing those directories to
remain free of variable data.
/var
has standard subdirectories for different kinds of variable data. There
is /var/cache
for cache data, /var/log
for log data, and even /var/games
for variable game data (like save files, presumably). Database systems might
also store files under /var/lib/<name of db>
. Postgres, on some Linux
distributions, stores data under /var/lib/pgsql
.
Originally, the /var
directory also contained /var/run
. According to an
older version of the FHS published in 2004, /var/run
was for “system
information data describing the system since it was booted.” The property that
most distinguished it from other directories in /var
is that /var/run
had
to be wiped on each reboot of the system. Programs writing PID files (a file
containing just the process ID of the program) and creating UNIX domain sockets
were supposed to put them here.
/run
was introduced later to solve some shortcomings of /var/run
. The main
problem with /var
is that it can be mounted too late in the boot process to
be useful to daemons like systemd and udev that are involved in booting. Those
daemons needed somewhere to write out runtime data that was always available to
them. /run
, a root-level directory, can be mounted earlier than /var
without difficulty, whereas trying to mount /var/run
before /var
was
nontrivial. In the current version of the FHS, /run
has replaced /var/run
.
It serves the same purpose but just lives at a different path. If /var/run
is
needed for backward-compatibility, it should be a symlink to /run
.
/tmp
vs. /var/tmp
vs. /run
The /var
directory also contains a standard subdirectory for temporary files
called /var/tmp
. This raises the question of what the root-level /tmp
is
for if there is already a directory for temporary files under /var
.
The FHS explains the difference well. /var/tmp
is for temporary files that
need to be preserved between system reboots. /tmp
on the other hand can
safely be wiped at any time, and programs “must not assume that any files or
directories in /tmp
are preserved between invocations of the program.”
That distinction is pretty straightforward. But if /tmp
is supposed to
contain files wiped on reboot, how is it different from /run
, which is also
wiped on reboot?
A major difference between /tmp
and /run
is the permissions usually set
on the directories. The FHS doesn’t cover this in too much detail, but /run
should only be writable by the root user. /tmp
on the other hand is a utility
provided to all user-space programs and so should be writable by all users.
/tmp
usually has the sticky bit
set, which provides some protection against programs interfering with each
other, but otherwise it is a public dumping ground for temporary data.
/bin
vs. /sbin
The /bin
directory contains executable programs used by all users of a
system, while the /sbin
directory is supposed to contain programs used only
by the system administrator.
The FHS states:
Deciding which things go into
/sbin
is simple: if a normal (not a system administrator) user will ever run it directly, then it must be placed in/bin
. Ordinary users should not have to place/sbin
in their path.
The FHS also explains that this distinction is made for user convenience and
not for security. There are other security mechanisms in place that ensure that
ordinary users cannot cause trouble even if they have access to an executable
in /sbin
, so the distinction is not made to “hide” system administration
programs from ordinary users.
This distinction seems to be a case where the FHS is behind the times. Arch
Linux has already merged /bin
and /sbin
and Fedora is planning to do so in
its next
release.
I would not be surprised if a future edition of the FHS removes /sbin
entirely.
The rationale
given
for the change in Fedora is that in practice most users routinely invoke
programs stored under /sbin
and have both /bin
and /sbin
in their PATH
.
/bin
vs. /usr/bin
The /usr
directory might have the least helpful, most confusing name of any
directory in the standard filesystem hierarchy. It does not contain
user-specific data (that’s what /home
contains), so why is it called /usr
?
According to Rob
Landley,
its name is a historical accident. When Ken Thompson and Dennis Ritchie were
first developing Unix, they had two hard disks available to store the operating
system on. The hard disks were each only 1.5 megabytes large. Unix grew too
large to fit on only one of the hard disks, so it ended up split across both of
them. /usr
originally stored user-specific data, but it was commandeered
to hold a duplicate of the root-level directories like /bin
and /lib
just so
some files under /bin
and /lib
could be moved to /usr/bin
and /usr/lib
and stored on the second disk.
The more important programs remained on the first disk because they would be needed to troubleshoot the system if something went wrong mounting the second disk. But the main motivation for splitting things up was simply that there was no more space on the hard disk used for the root filesystem.
Later, when Thompson and Ritchie acquired a third hard disk, they moved all the
user-specific data out of /usr
into /home
, which was stored on the new hard
disk. They kept the name /usr
, probably because there was no replacement for
it that wouldn’t be awkward. A much more accurate name for /usr
would be something like /secondary
.
Like the distinction between /bin
and /sbin
, the distinction between /bin
and /usr/bin
is increasingly an archaism. The FHS still specifies a split,
but many Linux distributions, including Arch Linux, Fedora, and Ubuntu have
merged the two. They have merged /lib
and /usr/lib
as well. On your system
you likely still have a /bin
and a /lib
directory, but if you look closely
these will probably be symlinks to /usr/bin
and /usr/lib
.
If you want to learn more about the justification for merging /bin
and
/usr/bin
, there is a good document explaining it
here.
/opt
vs. /usr/local/bin
/usr/local
is for software installed “locally” by the system
administrator. What “local” means is not very clear, since the FHS says
/usr/local
can still be used for software shared between a group of hosts.
“Local” in this case means software added by the system administrator on top
of whatever software comes as part of the operating system distribution
deployed at the site.
/usr/local
contains various directories mirroring some of the directories
that exist under /
and /usr
. Locally-installed binaries should go under
/usr/local/bin
.
On a personal computer, /usr/local
is probably a good place to put software
you’ve compiled yourself, assuming you want it to be available to all users on
your system. If you are the only user of your system, then it doesn’t really
matter, and you might as well install your software under your home directory.
/opt
serves a similar purpose to /usr/local
. It is “reserved for the
installation of add-on application software packages.” What makes /opt
different from other places you might install software is that software under
/opt
is supposed to conform to a particular layout.
Elsewhere on a Unix system, the various components of installed software get
split up among several different directories. Something installed by your
operating system under /usr
might have an executable living under /usr/bin
,
a supporting library living under /usr/lib
, and documentation living under
/usr/share/man
. In the world of /opt
, files are instead grouped first by
the package name. So a package called my-amazing-software
would be installed
at /opt/my-amazing-software
. The executables that are part of
my-amazing-software
would be installed at /opt/my-amazing-software/bin
, the
supporting libraries would be installed at /opt/my-amazing-software/lib
, and
so on.
Installing software this way has a few advantages. For one, it becomes very easy to uninstall a package—you just have to delete the entire directory. A package like this can also be trivially installed from a zip file or tarball.
This Stack Overflow
commenter
suggests that the primary difference between /usr/local
and /opt
is that
the former is for additional software compiled and maintained in-house, while
/opt
is for additional software created and prepackaged by a third party. In
both cases, the software is in addition to whatever software is provided by the
operating system distribution.
There are many directories that I have not covered here, including interesting
directories important to Linux like /dev
or /proc
. These directories
have a clear and unique purpose, so if you’re curious about what those do you
should take a look at the man hier
man page.
The most confusing part of the standard Unix directory layout is the many
directories that seem to do the same thing. On a personal computer, many of
the distinctions made between these directories don’t make sense! It’s
important to remember that computing happens in other contexts and that
personal computing wasn’t even a thing when Unix was first developed. In a
context where multiple users share a computing system, there is a meaningful
distinction between software installed under /usr/local
and under your home
directory, while rules governing the usage of /run
vs /tmp
and so forth
become more necessary.