The Elements of a File System

By definition, a file system needs to store files, and they also contain directories. The files are stored within the directories, and these directories can have subdirectories. Something, somewhere, has to record where all the files are located within the file system, what they’re called, which accounts they belong to, which permissions they have, and much more. This information is called metadata because it’s data that describes other data.

In the Linux ext4 file system, the inode and directory structures work together to provide an underpinning framework that stores all the metadata for every file and directory. They make the metadata available to anyone who requires it, whether that’s the kernel, user applications, or Linux utilities, such as ls, stat, and df.

Inodes and File System Size

While it’s true there’s a pair of structures, a file system requires many more than that. There are thousands and thousands of each structure. Every file and directory requires an inode, and because every file is in a directory, every file also requires a directory structure. Directory structures are also called directory entries, or “dentries.”

Each inode has an inode number, which is unique within a file system. The same inode number might appear in more than one file system. However, the file system ID and inode number combine to make a unique identifier, regardless of how many file systems are mounted on your Linux system.

Remember, in Linux, you don’t mount a hard drive or partition. You mount the file system that’s on the partition, so it’s easy to have multiple file systems without realizing it. If you have multiple hard drives or partitions on a single drive, you have more than one file system. They might be the same type—all ext4, for example—but they’ll still be distinct file systems.

All inodes are held in one table. Using an inode number, the file system easily calculates the offset into the inode table at which that inode is located. You can see why the “i” in inode stands for index.

The variable that contains the inode number is declared in the source code as a 32-bit, unsigned long integer. This means the inode number is an integer value with a maximum size of 2^32, which calculates out to 4,294,967,295—well over 4 billion inodes.

That’s the theoretical maximum. In practice, the number of inodes in an ext4 file system is determined when the file system is created at a default ratio of one inode per 16 KB of file system capacity. Directory structures are created on the fly when the file system is in use, as files and directories are created within the file system.

There’s a command you can use to see how many inodes are in a file system on your computer. The -i (inodes) option of the df command instructs it to display its output in numbers of inodes.

We’re going to look at the file system on the first partition on the first hard drive, so we type the following:

The output gives us:

File system: The file system being reported on. Inodes: The total number of inodes in this file system. IUsed: The number of inodes in use. IFree: The number of remaining inodes available for use. IUse%: The percentage of used inodes. Mounted on: The mount point for this file system.

We’ve used 10 percent of the inodes in this file system. Files are stored on the hard drive in disk blocks. Each inode points to the disk blocks that store the contents of the file they represent. If you have millions of tiny files, you can run out of inodes before you run out of hard drive space. However, that’s a very difficult problem to run into.

In the past, some mail servers that stored email messages as discrete files (which rapidly led to large collections of small files) had this issue. When those applications changed their back ends to databases, this solved the problem, though. The average home system won’t run out of inodes, which is just as well because, with the ext4 file system, you can’t add more inodes without reinstalling the file system.

To see the size of the disk blocks on your file system, you can use the blockdev command with the –getbsz (get block size) option:

The block size is 4096 bytes.

Let’s use the -B (block size) option to specify a block size of 4096 bytes and check the regular disk usage:

This output shows us:

File system: The file system on which we’re reporting. 4K-blocks: The total number of 4 KB blocks in this file system. Used: How many 4K blocks are in use. Available: The number of remaining 4 KB blocks that are available for use. Use%: The percentage of 4 KB blocks that have been used. Mounted on: The mount point for this file system.

In our example, file storage (and storage of the inodes and directory structures) has used 28 percent of the space on this file system, at the cost of 10 percent of the inodes, so we’re in good shape.

Inode Metadata

To see the inode number of a file, we can use ls with the -i (inode) option:

The inode number for this file is 1441801, so this inode holds the metadata for this file and, traditionally, the pointers to the disk blocks where the file resides on the hard drive. If the file is fragmented, very large, or both, some of the blocks the inode points to might hold further pointers to other disk blocks. And some of those other disk blocks might also hold pointers to another set of disk blocks. This overcomes the problem of the inode being a fixed size and able to hold a finite number of pointers to disk blocks.

That method was superseded by a new scheme that makes use of “extents.” These record the start and end block of each set of contiguous blocks used to store the file. If the file is unfragmented, you only have to store the first block and file length. If the file is fragmented, you have to store the first and last block of each part of the file. This method is (obviously) more efficient.

If you want to see whether your file system uses disk block pointers or extents, you can look inside an inode. To do so, we’ll use the debugfs command with the -R (request) option, and pass it the inode of the file of interest. This asks debugfs to use its internal “stat” command to display the contents of the inode. Because inode numbers are only unique within a file system, we must also tell debugfs the file system on which the inode resides.

Here’s what this example command would look like:

As shown below, the debugfs command extracts the information from the inode and presents it to us in less:

We’re shown the following information:

Inode: The number of the inode we’re looking at. Type: This is a regular file, not a directory or symbolic link. Mode: The file permissions in octal. Flags: Indicators that represent different features or functionality. The 0x80000 is the “extents” flag (more on this below). Generation: A Network File System (NFS) uses this when someone accesses remote file systems over a network connection as though they were mounted on the local machine. The inode and generation numbers are used as a form of file handle. Version: The inode version. User: The owner of the file. Group: The group owner of the file. Project: Should always be zero. Size: The size of the file. File ACL: The file access control list. These were designed to allow you to give controlled access to people who aren’t in the owner group. Links: The number of hard links to the file. Blockcount: The amount of hard drive space allocated to this file, given in 512-byte chunks. Our file has been allocated eight of these, which is 4,096 bytes. So, our 98-byte file sits within a single 4,096-byte disk block. Fragment: This file is not fragmented. (This is an obsolete flag. ) Ctime: The time at which the file was created. Atime: The time at which this file was last accessed. Mtime: The time at which this file was last modified. Crtime: The time at which the file was created. Size of extra inode fields: The ext4 file system introduced the ability to allocate a larger on-disk inode at format time. This value is the number of extra bytes the inode is using. This extra space can also be used to accommodate future requirements for new kernels or to store extended attributes. Inode checksum: A checksum for this inode, which makes it possible to detect if the inode is corrupted. Extents: If extents are being used (on ext4, they are, by default), the metadata regarding the disk block usage of files has two numbers that indicate the start and end blocks of each portion of a fragmented file. This is more efficient than storing every disk block taken up by each portion of a file. We have one extent because our small file sits in one disk block at this block offset.

Where’s the File Name?

We now have a lot of information about the file, but, as you might have noticed, we didn’t get the file name. This is where the directory structure comes into play. In Linux, just like a file, a directory has an inode. Rather than pointing to disk blocks that contain file data, though, a directory inode points to disk blocks that contain directory structures.

Compared to an inode, a directory structure contains a limited amount of information about a file. It only holds the file’s inode number, name, and the length of the name.

The inode and the directory structure contain everything you (or an application) need to know about a file or directory. The directory structure is in a directory disk block, so we know the directory the file is in. The directory structure gives us the file name and inode number. The inode tells us everything else about the file, including timestamps, permissions, and where to find the file data in the file system.

Directory Inodes

You can see the inode number of a directory just as easily as you can see them for files.

In the following example, we’ll use ls with the -l (long format), -i (inode), and -d (directory) options, and look at the work directory:

Because we used the -d (directory) option, ls reports on the directory itself, not its contents. The inode for this directory is 1443016.

To repeat that for the home directory, we type the following:

The inode for the home directory is 1447510, and the work directory is in the home directory. Now, let’s look at the contents of the work directory. Instead of the -d (directory) option, we’ll use the -a (all) option. This will show us the directory entries that are usually hidden.

We type the following:

Because we used the -a (all) option, the single- (.) and double-dot (..) entries are displayed. These entries represent the directory itself (single-dot), and its parent directory (double-dot.)

If you look at the inode number for the single-dot entry, you that it’s1443016—the same inode number we got when we discovered the inode number for the work directory. Also, the inode number for the double-dot entry is the same as the inode number for the home directory.

That’s why you can use the cd .. command to move up a level in the directory tree. Likewise, when you precede an application or script name with  ./, you let the shell know from where to launch the application or script.

As we’ve covered, three components are required to have a well-formed and accessible file in the file system: the file, the directory structure, and the inode. The file is the data stored on the hard drive, the directory structure contains the name of the file and its inode number, and the inode contains all the metadata for the file.

Symbolic links are file system entries that look like files, but they’re really shortcuts that point to an existing file or directory. Let’s see how they manage this, and how the three elements are used to achieve this.

Let’s say we’ve got a directory with two files in it: one is a script, and the other is an application, as shown below.

We can use the ln command and the -s (symbolic) option to create a soft link to the script file, like so:

We’ve created a link to my_script.sh called geek.sh. We can type the following and use ls to look at the two script files:

ls -li *.sh

The entry for geek.sh appears in blue. The first character of the permissions flags is an “l” for link, and the -> points to my_script.sh . All of this indicates that geek.sh is a link.

As you probably expect, the two script files have different inode numbers. What might be more surprising, though, is the soft link, geek.sh, doesn’t have the same user permissions as the original script file. In fact, the permissions for geek.sh are much more liberal—all users have full permissions.

The directory structure for geek.sh contains the name of the link and its inode. When you try to use the link, its inode is referenced, just like a regular file. The link inode will point to a disk block, but instead of containing file content data, the disk block contains the name of the original file. The file system redirects to the original file.

We’ll delete the original file, and see what happens when we type the following to view the contents of geek.sh:

The symbolic link is broken, and the redirect fails.

We now type the following to create a hard link to the application file:

To look at the inodes for these two files, we type the following:

Both look like regular files. Nothing about geek-app indicates it’s a link in the way the ls listing for geek.sh did. Plus, geek-app has the same user permissions as the original file. However, what might be surprising is both applications have the same inode number: 1441797.

The directory entry for geek-app contains the name “geek-app” and an inode number, but it’s the same as the inode number of the original file. So, we have two file system entries with different names that both point to the same inode. In fact, any number of items can point to the same inode.

We’ll type the following and use the stat program to look at the target file:

We see that two hard links point to this file. This is stored in the inode.

In the following example, we delete the original file and try to use the link with a secret, secure password:

Surprisingly, the application runs as expected, but how? It works because, when you delete a file, the inode is free to be reused. The directory structure is marked as having an inode number of zero, and the disk blocks are then available for another file to be stored in that space.

If the number of hard links to the inode is greater than one, however, the hard link count is reduced by one, and the inode number of the directory structure of the deleted file is set to zero. The file contents on the hard drive and inode are still available to the existing hard links.

We’ll type the following and use stat once more—this time on geek-app:

These details are pulled from the same inode (1441797) as the previous stat command. The link count was reduced by one.

Because we’re down to one hard link to this inode, if we delete geek-app, it would truly delete the file. The file system will free up the inode and mark the directory structure with an inode of zero. A new file can then overwrite the data storage on the hard drive.

RELATED: How to Use the stat Command on Linux

Inode Overheads

it’s a neat system, but there are overheads. To read a file, the file system has to do all the following:

Find the right directory structure Read the inode number Find the right inode Read the inode information Follow either the inode links or the extents to the relevant disk blocks Read the file data

A bit more jumping around is necessary if the data is noncontiguous.

Imagine the work that has to be done for ls to perform a long format file listing of many files. There’s a lot of back and forth just for ls to get the information it needs to generate its output.

Of course, speeding up file system access is why Linux tries to do as much preemptive file caching as possible. This helps greatly, but sometimes—as with any file system—the overheads can become apparent.

Now you’ll know why.