Linux Kernel Source Structure

Exploring The Linux Kernel - Part 3

The Linux kernel was originally created by Linus Torvalds in 1991.
Here is a short post on the history of the Linux kernel.

The Linux kernel is one of the largest open source kernels that exists today. It is constantly evolving with the help of thousands of contributors who spend their free time giving back to the open source community.

Like all open source projects, the Linux kernel source code is available to download in it's entirety. It can be found on kernel.org where you can download it, make changes to it, and experiment with it as you see fit. The source code can also be found on GitHub although this is simply a mirror and development does not take place through Github.

In this post we'll discuss the structure of the Linux kernel source tree. As of the writing of this post the Linux kernel is broken up into 22 subfolders. Not all of these files are source code. There are many documents and tools also stored in the source tree.

  • Documentation
  • arch
  • block
  • certs
  • crypto
  • drivers
  • firmware
  • fs
  • include
  • init
  • ipc
  • kernel
  • lib
  • mm
  • net
  • samples
  • scripts
  • security
  • sound
  • tools
  • usr
  • virt
Documentation
The documentation directory contains exactly as you'd assume, documentation. In this folder you can find the file 00-INDEX which contains an alphabetical list of all the documentation that you can find in this directory as well as a one-line description of what each file documents. There is also a 00-INDEX in most subdirectories as well. The files contain everything from descriptions of how to build the kernel, to instructions for anyone wanting to personally contribute to kernel development.

arch
The arch directory contains kernel code that is specific to particular architecture. This directory is further subdivided into directories for each platform, such as arm and x86. The code here is generally non portable and handles aspects of the kernel such as interrupt handlers and low-level memory access.

block
The block directory contains generic code for the block drivers. Linux I/O block allow for the utilization of block devices such as hard disk drives or other slow access memory devices. The device specific block driver code can be found in the drivers directory.

certs
This directory contains the certificate and key files needed for module signature verification. The kernel is not configured by default to build with module signature verification support, however enabling it is as simple as changing a couple of configuration flags. There is also support for using your own key pairs if needed.

crypto
The crypto directory contains the implementation of the Linux cryptographic API. This API is used by both consumers of the cryptographic services and programs requesting data transformations such as ciphers. Alternatively the crypto API allows for compression transformations as well.

drivers
This directory contains code for specific device drivers. The directories are broken up further into types of drivers. For example all the Serial Peripheral Interface (SPI) drivers can be found in the SPI directory which is further divided into directories for each specific architecture.

There is another very important directory within drivers called staging (or driver staging tree). This directory contains drivers which are not quite ready to be merged into the main kernel source tree but are made accessible because of historic difficulties providing users access to these in-development drivers.

firmware
The firmware directory contains device firmware that is needed to use certain drivers. In the source tree this directory only contains a single makefile to generate some configuration files. 

fs
The fs directory contains all the code related to handling different types of filesystems. This directory is further subdivided into directories for each supported file system type. It also contains generic filesystem code that is used between different file system implementations like files.c which implements file descriptor management.

include
The include directory contains the majority of the file includes (.h files) needed during the kernel building process. There is also a directory called asm that contains architecture specific include files. 

init
The init directory contains the initialization code for the kernel. There are also several other miscellaneous files such as version.c which defines the structure of the version string.

ipc
the ipc (inter-process communication) directory contains code implementing inter-process communication such as semaphores and other methods of handling shared memory in the system. 

kernel
This directory contains miscellaneous kernel code that doesn't quite fit anywhere else. This includes things such as the function printk() which allows writing out character strings in kernel space, as well as pid.c which contains code that handles the assignment of pids (process identification number) to processes.

lib
The lib directory contains the kernel modules and library images needed to boot the system. This directory also contains files that need to be included for certain functionality, for example a C preprocessor would require the inclusion of /lib/cpp

mm
This directory contains system agnostic memory management code implementing things such as virtual memory. The code for implementing platform specific memory management operations like kalloc can be found in the arch directory.

net
The net directory contains the high level networking code. This code receives packets from the network drivers to either use within the kernel or send to userspace applications. This directory is further divided into directories containing different network protocol implementations.

samples
The samples directory contains examples of kernel modules. It also contains code illustrating the used of functions from other parts of the kernel. This directory is not necessary when compiling the Linux kernel.

scripts
This directory contains configuration and build scripts used when building the kernel, Nothing in this directory is compiled directly into the kernel itself, Instead it is used for keeping the scripts generated for or by tools.

security
This directory contains the implementation of different security models.

sound
This directory contains implementation of sound card drivers and code related to handling system sound functions.

tools
This directory contains tools that are useful for testing and interacting with the kernel. This includes things such as tools for testing SPI drivers. As well as programs for demonstrating functionality like leds/uledmon; a userspace program for demonstrating LED use. These tools are not built with the kernel by default.

usr
This directory contains code that builds the initial root filesystem that will be used in the early userspace.

virt
This directory currently contains code related to the KVM (Kernel-based Virtual Machine). The KVM allows you to use multiple virtual machines running Linux or Windows. All virtual machines have private virtualized hardware.
.

This is a blog series on the linux kernel; why it's important, what it does it, and how it does it.
The first post in this series can be found here Part 1 - The Kernel  
The next post in this series can be found here 

Linux Kernel History

Exploring the Linux Kernel - Part 3.5

Early History

The Linux kernel has a long (in computer terms) history. It started more than 25 years ago and is now one of the biggest open source projects that has ever existed

It all began with a computer science student posting on the MINIX forum in August 1991.
The following is the post that started it all.

From: torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)
Newsgroups: comp.os.minix
Subject: What would you like to see most in minix?
Summary: small poll for my new operating system
Message-ID: <1991Aug25.205708.9541@klaava.Helsinki.FI>
Date: 25 Aug 91 20:57:08 GMT
Organization: University of Helsinki 
Hello everybody out there using minix – 
I’m doing a (free) operating system (just a hobby, won’t be big and
professional like gnu) for 386(486) AT clones. This has been brewing
since april, and is starting to get ready. I’d like any feedback on
things people like/dislike in minix, as my OS resembles it somewhat
(same physical layout of the file-system (due to practical reasons)
among other things). 
I’ve currently ported bash(1.08) and gcc(1.40), and things seem to work.
This implies that I’ll get something practical within a few months, and
I’d like to know what features most people would want. Any suggestions
are welcome, but I won’t promise I’ll implement them 🙂 
Linus (torvalds@kruuna.helsinki.fi) 
PS. Yes – it’s free of any minix code, and it has a multi-threaded fs.
It is NOT protable (uses 386 task switching etc), and it probably never
will support anything other than AT-harddisks, as that’s all I have :-(.
 With this post many people from the MINIX community started getting involved with the project and contributing code.

Only a month later in September 1991 the first version of Linux, 0.01 was released. It was provided through an FTP server on the Finish University and Research Network. This first version of Linux had a measly 10 000 lines of code.

Development of the kernel continued at a steady pace thereafter with the next version .002 coming out in October of the same year.

The next milestone release was version 0.11 which was the first version of Linux that was self-hosted. self-hosted is any program that can be used to produce new versions of itself. A common example is a compiler which can compile it's own source code.

Next was version 0.12 which was not particularly interesting as far as code improvements go, however it was the first version to use the new GNU General Public License. Before this Linux had been under a license made by Linus himself which had restricted commercial use.

Later in march 1992 the first version of Linux that could support running the X Window System was released. This version was dubbed 0.95 because Linus felt that with the completion of X Windows System support that Linux was nearly feature complete for version 1.0.

However it wouldn't be until march 14th 1994 until the first version was released. Version 1.0 of Linux contained 176,250 lines of code.

Linux Now

The current version of Linux is version 4.13.9 as of the writing of this article. The codebase has grown to be over 19 million lines of code and has had thousands and thousands of contributors help improve the kernel. It's used in everything from desktop computers, phones, all the way to things like cars and calculators.

Random Facts
  • Linux was originally going to be named Freax.
  • The Linux mascot is a penguin named tux, an idea originating from when Linus mentioned being bitten by a little penguin at the zoo.

This is a blog series on the linux kernel; why it's important, what it does it, and how it does it.
The first post in this series can be found here Part 1 - The Kernel  

Posts and Series

The Linux Kernel

Types of Kernels

Exploring The Linux Kernel - Part 2


In the previous post we discussed very generally what a kernel is and what it does. In this post we will discuss different types of kernels and the different aspects of kernel design

One of the biggest differentiating factors in a kernel is how the kernel tasks are divided and handled. There are 2 main classes of kernels called Monolithic kernels and Micro kernels and their differences come in the form of differing philosophies on "The Separation of Mechanism and Policy". 

Here a policy is something that needs to be done for a particular task, and a mechanism is the way to do it. An example of a policy is when an operating system needs to decide how to give memory to the most important process first. An example of a mechanism is implementing this policy with a priority queue.

A Micro kernel uses separation of policy and mechanism by providing a few general and simple polices. It then allows the users to create systems such as drivers or file system handlers outside of the kernel. In a Micro kernel these systems are contained within "servers" which use the kernel to communicate between one another and other pieces of software running on the Operating System.
An example of the micro Kernel architecture.


On the other hand a Monolithic kernel has a large, rich set of policies and requires that all the software running above the kernel use it. In this design approach the kernel contains everything that is required to perform every kernel related task. This includes components such as Device drivers, Scheduler, Memory handling,  File systems,  Network stacks.


Some examples of  Monolithic and Micro kernels

Monolithic Kernel
BSD
Linux
Solaris
Micro Kernels
MINIX
GNU
Hydra


While monolithic vs Micro is one of the biggest differentiators when it comes to kernel design there are many other attributes that define how a kernel behaves. Some kernels have integrated security features like firewalls. Some have something called virtualization which is when a system takes a resource that can't be shared such as a memory, and makes virtual copies of it so different parts of the system don't know they are sharing the same resource (more on this later).

A great resource comparing different types of kernels can be found on wikipedia here.

Now that we know some basic information about what a kernel is and what defines a particular kernel, we will look at the linux kernel in particular and examine what makes it so great.


This is a blog series on the linux kernel; why it's important, what it does it, and how it does it.
The first post in this series can be found here Part 1 - The Kernel  
The next post in this series can be found here Part 3 - The Linux Kernel Source Tree

The Kernel

Exploring The Linux Kernel - Part 1


This blog series is meant to give a general overview of the linux kernel. Posts will range from general information like "How big is the Linux kernel?" along with more technical topics such as file system architecture. But before we get into the internals of the Linux kernel; what is a kernel, and what does it do exactly?

In general, a kernel is a piece of low level software that controls basic aspects of the system. There are several main responsibilities the kernel has that can be found in almost all kernels. 
These include..
  • Managing memory like volatile memory such as RAM or nonvolatile memory found on Hard Disk Drives or SSDs
  • Managing requests for resources or functionality from higher levels of the operating system (also called system calls)
  • Accessing peripheral devices and allowing the operating system to access peripheral devices that include things like USB, HDMI, and Audio


The kernel is different from an operating system. An operating system is generally built on top of a kernel but sometimes the lines are very blurry. For example the Linux OS distribution "Ubuntu" is built on top of the linux kernel and while the kernel handles the interaction with the hardware, the rest of the operating system handles components such as the desktop and it's icons. The kernel is what allows Ubuntu to look and feel the same to both users and programmers regardless of whether its running on a 64-bit x86 machine, or a 32-bit ARM based machine.

The kernel is important because it allows a layer of abstraction between a programmer and the hardware they are programming on. When someone is writing software for a particular operating system it's important that the experience is identical, or very similar, regardless of the hardware it's running on. This allows software to be more portable as well as allowing code to be portable and reusable. 

There are many examples of kernels because nearly every operating system has one.  Some operating systems and their respective kernels are...
These kernels and their operating systems may seem very different to one another, yet most of them have similar features that define the kind of kernel they are. In the next blogpost we talk about different types of kernels and the advantages of each one.



This is a blog series on the linux kernel; why it's important, what it does it, and how it does it.
The first post in this series can be found here Part 1 - The Kernel  
The next post in this series can be found here Part 2 - Types of Kernels