Os Notes
Os Notes
Purushottam Kulkarni
[email protected]
Department of Computer Science
Indian Institute of Technology Bombay
January 1, 2021
Chapter 1
1
Ring for Operating Systems
The world of Systems and more specifically Operating Systems is the subject of this course. As
part the first lecture we will setup context for this sub-area in Computer Science and Engineering,
understand its need, requirements and services, discuss a few key building blocks in the design of a
modern operating system and end with a set of examples of the above.
1.1.1 Abstractions
What is a machine without an operating system or any software? — cold stone, a paper weight. The
hardware system—microprocessors, memory, devices and all the paraphernalia, requires to be told
what to do. The interface to program all hardware is via the instruction set architecture (ISA), an
interface specified via a set of instructions and a system model (consisting of registers and memory).
e.g., the CISC and RISC ISAs, the x86 ISA.
While this is the basis, we seem to be happily coding away to glory and developing programs,
applications, spam and what-not like a current run-rate of 20 runs per over2 . . . what is the catch?
Is programming hardware with low-level instruction interfaces programmed in to our DNA? Well!
obviously no. The catch is the design and engineering practice of abstractions, in fact a layer of
abstractions. The idea of an abstraction is to hide details of implementation and to expose an
interface for specific functionality. In fact, the ISA itself in an abstraction. The system software
stack consists of several abstractions, and on top of which is the user and her applications. Some
common examples of abstractions,
1 This chapter title is a rip off of (inspired by) the book, Ring for Jeeves, by P.G. Wodehouse. A highly recommended
read.
2 Cricket is boring when this happens.
1
1.1. WHAT IS AN OPERATING SYSTEM? Ring for Operating Systems
2
1.2. OPERATING SYSTEM SERVICES Ring for Operating Systems
ISA interface (the instruction set, registers, memory addressing etc.) provided by the microprocessor
and associated devices.
A modern operating system provides a sizeable set of services and functionality to the higher
layers of the software stack. The primary interface to access the operating system functionalities
is through an application binary interface (ABI). The ABI provides a mechanism for programs to
communicate and invoke functionality provided by the operating sustem. The ABI interface of
UNIX-like operating systems is called the system call interface. A system call invokes a pre-defined
functionality in the operating system and requires a strict syntax and procedure for invocation. For
example, typically operating systems provide system calls for file management—open, close, etc.,
which are different from the ‘C’ Standard library calls. Since functionality of the operating system
is already compiled in to a binary, the interface is called the application binary interface.
Referring to Figure 1.1, typically programs need not invoke system calls (and hence OS services)
directly, they are routed through standard libraries. For example, libC, is the standard C library
that translates calls from user programs and invokes appropriate system calls. The standard libraries
decouple the program from the underlying OS interface, as long as the program to library interface is
standardized, the underlying OS and its interface can change. Standard libraries expose an applica-
tion programming interface for programs to integrate and use with their source code. For example,
a typical binary of a C program, will be genrated through a series of steps, one of which involves
linking the application source code libraries that implement OS and hardware-specific implemen-
tation of the API. As long as the API is standardized and has a standard library implementation,
programs are portable across OS and hardware instances. For example, the same helloworld.c C
program runs correctly on Linux and Windows.
Hold on, but this requires the programs to be ? . Now you know why.
4. Communication
Similar to input-output, another vital requirement for programs is to communicate with each
other, e.g., here is the result you asked for, or here is the web page you asked for. Programs
may want to communicate with each other on the same machine or on different machines
3
1.3. A COMPUTER ARCHITECTURE INTERLUDE Ring for Operating Systems
across the network. An operating system enables all such modes of communication via the
shared-memory, network stack, remote calls and other such services.
5. Resource management
This is the bread-and-butter service of an operating system. A system has resources and several
programs want/compete for resources. Resource demands vary over time and in quantity. The
juggler (actually the scheduler, multiplexer, de-multiplexer, decision maker) who manages these
resources is the operating system. Example management decisions include, which process to
execute next?, has this process run long enough? how do I order these requests to read data
of the disk? which process is binging on memory? etc.
6. Error detection
Packets that you receive may have errors, the disk may have failed—bad sectors, memory loca-
tions may be misbehaving—what you write is NOT what you get, a device may malfunction,
disks can crash etc.
“Blistering barnacles (ten thousand or more) who worries,detects and fixes, all this.”
“Ahem! that you would be me, the Operating System.”
4
1.3. A COMPUTER ARCHITECTURE INTERLUDE Ring for Operating Systems
controllers3 have compute elements and can execute in parallel, but contend on access and use of
the system bus. Next, each of these components are described briefly.
1.3.1 Processors
The CPU is characterized by the ISA that it is architected for, it executes instructions of the ISA
and implements features specified by the ISA. The von Neumann architecture (which is adhered
widely by the general purpose CPUs) specifies a computation model, where the processing unit is
separated from the program and associated data. A general purpose compute mechanism interacts
with the memory of a system to fetch and execute instructions and access data in memory and on
external devices. The compute unit itself has no embedded logic if its own. All logic and state of
programs is loaded in memory, and is to be fetched and executed by the processor.
Instruction execution
The basic execution cycle of the CPU consists of the following sequence of operations—fetch, decode
and execute. The CPU consists of a special register called the program counter (ip or the eip of
the x86 architecture) that stores the memory address of the next instruction to execute. Once an
instruction is fetched, the program counter is incremented to point to the next address, and the
loop continues. Advancements in CPU design have led to parallelizing the fetch, decode and execute
components via the pipelining based multi-stage execution technique. The pipelining idea being,
each instruction stage has a different pipeline making progress in parallel— while one instruction is
executed, another is decoded and yet another fetched from memory, all in parallel. The pipelining
benefits are affected by branch instructions, (conditional) instructions that change the address of
the instruction pointer to something other than the next sequential address. Branch instructions
force flusing of the instruction pipelines and loose benefits of parallel execution. An implication of
pipelining on OS design is of code optimization— how to generate machine code to exploit pipelining?
3 Typically, controllers are examples of specialized hardware running custom software to communicate with and
5
1.3. A COMPUTER ARCHITECTURE INTERLUDE Ring for Operating Systems
Registers
Other than the program counter (register), the CPU also contains a set of registers to aid in quick
access and temporary storage of data, for memory management, to control behavior of the CPU, for
indexing , for debuging etc. Details of a few are as follows,
• The general purpose registers are primarily used to store intermediate results, temporary
variables and to exploit the fact that access to registers is orders of magnitude faster than
access to memory. Usually, a CPU will have instructions that copy data across registers, and
from register to a memory location and vice-versa.
• Current state of a CPU is stored in a register called the program status word (PSW), also the
eflags register of the x86 architecture. This register is usually a register of bit-values, each of
which reflects a condition or configuration of the CPU. For example, the zero flag bit is set to
one if an arithmetic operation by the CPU results in a zero. Similarly, the carry flag, the sign
flag, the overflow flag etc. are set for correspoding side-effects of operations. The interrupt
flag indicates whether the CPU is currently accepting interrupts etc. To update bits of the
status word register privileged access is required. Not all bits can be read or written from user
space, while some can be read but not written.
• Another commonly required register of the CPU is the stack pointer, which points to the
top of the stack in current execution context. The stack pointer is used to store frames of
functions—return addresses, local variables, input arguments etc. during a nested function
calling sequence.
• Control registers that control behavior of the CPU. For example, bit 0 in the CR0 register
enables or disables protected mode of execution on x86 hardware4
On an x86 CPU the following types of CPU registers exist: general purpose registers, segment
registers, index registers, pointer registers, flags register, control registers, debug registers, test
registers, descriptor table registers, performance monitoring registers etc.
4 Protected mode of execution is not the same as user-mode vs. kernel-mode execution. We park this point till we
6
1.3. A COMPUTER ARCHITECTURE INTERLUDE Ring for Operating Systems
Multiprocessors
Another angle to push to limits of CPU capabilities was the design of multiprocessors and hyper-
threading (multi-threading) systems. With hyper threading, a single CPU supports multiple hyper-
threads, process equivalents but which can be switched in the order of nanoseconds. The hyper-
threads appear the operating system as CPUs, a physical CPU with 2 cores with 2 hyper-threads
each, appears are 4 CPUs to the operating system. With multiprocessor systems, the system has
multiple CPU cores and processes can execute on any of them. As we will see later CPUs have local
optimizations related to memory etc. and also need to communicate with each other to share events.
An operating system needs to be aware of the impacts of its process scheduling mechanism to exploit
the benefits of parallel compute facility and simultaneously avoid blind spots in a multi-processing
situation. For example, co-scheduling two processes on two cores which communicate with each
other is far better than scheduling them one after the other.
7
1.3. A COMPUTER ARCHITECTURE INTERLUDE Ring for Operating Systems
Figure 1.4: The memory hierarchy with access latency and capacity details.
keeping state in memory that is never going to be used! With NVRAMs, an additional dimension
of what data should be persisted adds further complexity.
Other types memory on a system are ROM (read-only memory) and EEPROM (electrically
erasable programmable ROM). EEPROMS are typically used to store startup code of machine,
which initializes the machine, probes and resets devices and looks for the boot-up procedure.
Figure 1.4 shows a list of different types of memory along with their typical access latencies and
capacities.
8
1.4. HOW DOES THE OS DO WHAT IT DOES? Ring for Operating Systems
9
1.4. HOW DOES THE OS DO WHAT IT DOES? Ring for Operating Systems
For now, the OS trusts no one and requires absolute control of all resources at all times to
correctly multiplex/divide them across different execution entities. This control is established via
two modes of execution—the user mode and the kernel/privileged mode. The game plan is as follows,
• All actions to manipulate and in some cases access operating system state and hardware
resources are privileged and hence can be performed only in the privileged kernel mode. The
kernel mode of execution is the unfettered mode of execution, with access to all resources and
state. The operating system obviously sets itself up to execute in the kernel mode and uses its
ownership on all its state and resources for control.
• User-level programs execute in a less-privileged mode, the user-mode of execution. In this
mode the programs are autonomous and execute without interference or arbitration by the
operating system. The caveat being that execution in user mode continues till a privileged
action is required, in which case execution switch to the kernel mode and the operating system
takes over. The OS then does what needs to be done and reverts back to the less privileged
user mode to continue progress of the user program.
10
1.5. OPERATING SYSTEM TYPES Ring for Operating Systems
Interrupt-driven execution
The second key of the operating system game plan is interrupt-driven execution. As mentioned
earlier (in Section 1.2), interesting stuff happens when programs interact with each other and with
the external world. This is accomplished via the hardware-assisted mechanism of interrupts, which
literally means current execution is interrupted to inform that an event awaits service.
The interrupt service is a tight coupling between the hardware and the operating system. A
hardware interrupt is raised by physically toggling the voltage level on pin connected to the CPU.
Next, the CPU abandons all work (assuming interrupts are enabled) and switches to servicing
the interrupt. The hardware supports the execution pause and switch to interrupt processing, the
interrupt handler (also know as the interrupt service routine, ISR) itself is all operating system. The
OS determines nature of the interrupt and invokes the appropriate handler for further processing.
Typical hardware interrupts are keyboard, network packet arrival, disk block read completion, timer
interrupt, etc. All hardware interrupts are non-deterministic (except maybe the timer interrupt),
i.e., they can occur at any time.
Software interrupts or exceptions on the other hand are deterministic, they occur due explicit
invocation or due to undesired effects during execution. A system call invocation depends on a
software interrupt mechanism.
All modern day hardware and operating systems rely on interrupt-driven execution as a basis
for computation. Also to add, as devices get faster (100 Gbps network cards) the gap between
processing and IO capacity is changing and on some subsystems (like the network) polling based IO
handling is fast gaining traction. The idea is not to receive and process an interrupt on each packet
arrival, but to poll for packets/events and consume several in one go. This mode can be treated as
a scope building block for modern day efficient operating systems.
11
1.7. FOOD FOR THOUGHT Ring for Operating Systems
10. What happens if physically memory is setup for a system which is much smaller than the
addressable range?
12
Chapter 2
A matter of processes
We begin our discussion of the operating systems underbelly with the most tangible entity of the
software world—a process. Machines exist to serve, by executing stuff (or is it all the Matrix1 ?).
The execution entity is called a process, also referred as a thread 2 , a job, and a task. Here, we
concern ourselves with how an operating system manages processes.
13
2.1. WHAT IS A PROCESS? Ring for Operating Systems
Figure 2.1: Memory regions and components of a process as shown on a Linux system.
• Processor registers
A process during execution uses the hardware registers available as part of the ISA. Values
stored in these registers as part of the process.
Note that most hardware provides a special instruction pointer register which is saved and
restored during context switch. The CPU relies on value on this register for instruction fetch-
decode-execute.
• Stack
All programs rely on some form of function-call abstraction. An abstraction that exposes a
functionality along with an interface to specify input arguments and expect return values. A
function calling sequence requires the return address and values of the local variables to be
stored, before execution is transferred to a function being invoked. The stack data structure
is an appropriate fit to store function call state and roll it back in an last-in-first-out manner.
Each process has an associated stack to utilize for this purpose.
• Data section
Typically programs are not CPU-centric code, they work with data stored in variables. The
data section of a process is the memory region that stores static variables—declared at compile
time and pre-allocated memory area for the same. These include, global and local variables.
• Heap
Similar to static allocation of memory for variables, processes also depend on dynamic allocation—
a standard design practice to use memory only when needed. All dynamic allocations and
related memory requirements are part of the heap area associated with a process.
The context of a process is its execution state, which is captured by a pointer to the next instruction
for execution, the values of the CPU registers during execution, and memory regions that form the
different components of the process (stack, heap, data, text etc.).
Figure 2.1 shows a memory map of a process on Linux system. The following form the figure
are to be noted, the memory region for each component of the process, the different components
14
2.2. PROCESS STATES Ring for Operating Systems
Figure 2.2: The different states of execution a process can be in its lifetime.
themselves—the text section, the data section, the heap, the stack and the text section mapped
from several shared libraries. The format of each entry is as follows:
address range, permissions, offset, device, inode, pathname
15
2.3. PROCESS REPRESENTATION Ring for Operating Systems
Once a processes finishes its task and exits or hits an error condition, the state of the process is set
to terminated, in which clean up and book-keeping tasks are completed, before the process vanishes
in to thin air.
• Identifier
A number and name of the process. This information is often used by user-level tools to
communicate with processes. e.g., kill -9 <pid>.
• State
The state of execution of a process—ready, running, waiting, terminated/halted etc. For
example, a process which has been halted would not be selected by the OS to be scheduled on
the CPU.
• Execution context
During execution, a process occupies the CPU and uses its registers which defines the execution
context of the process. The PCB needs to store this context—registers, program counter,
stack pointer etc., which need to preserved for correct pause-and-resume of processes in a
multi-processing setup.
• Memory layout information
As discussed in Section 2.1, a process consumes memory for different purposes—to load the
executable in memory, to allocated memory for its variables, the stack etc. The OS needs to
maintain state to know the memory regions allocated to each process. This information is
related to the memory-related information that the OS stores on a per process basis in the
PCB. Not only does the OS use this information to identify memory regions, store and use
information regarding attributes of the region—read only, read/write, execute etc. The story
of memory is much more interesting, virtual memory is coming!
• IO-related information
The primary IO related information stored as part of the PCB is related to files—the list of
files opened by the process, offset in each file at which the next file operation will execute,
a pointer to the cache where file content is cached etc. Linux also maintains a pointer to a
list of functions that can override the default file functions on a per process basis, e.g., system
programmer, with appropriate rights, can override the functions associated with file operations.
read can be replaced with myread.
• Scheduling and accounting information
This category includes information related to scheduling of processes, the time spent so far
3 When you mean an OS maintains state, you mean store stuff in memory right? Yes.
16
2.4. CONTEXT SWITCHING Ring for Operating Systems
on the CPU, priority of the process, time spend waiting, number of time switched out and
scheduled etc. Further, information related to time spend in user-mode and in kernel-mode for
this process is stored, size of active memory, size of virtual memory and several such parameters
that characterize execution of the process.
• Event information
Events can be queued up for processes when a process is not executing, e.g., signals in UNIX-
like operating systems, these are events that user-level processes send to each other. Informa-
tion about these events needs to be stored per process, so that the process can address them
as soon as it is scheduled.
in good time.
17
2.6. SYSTEM CALLS REVISITED Ring for Operating Systems
once the boot process is complete, the operating system switches5 to execution of the first user-level
process. Once in user-land, Voila! everyone is back in business.
Operating systems typically provide an interface (a system call) that allows a (user) process to
create other user processes. Conceptually, the requirement is that of creating new processes and
configuring each process to execute the desired programs/applications. Each of these processes will
individually depend on OS services and the purpose of operating systems to serve is served! The
process that creates a new process is called the parent process and the new process is a child process
of the parent. When a new process is created it can execute in one of the possibilities,
1. The parent and the child process execute the same program and both execute concurrently.
For example, a web-server process may periodically create new child process to serve incoming
web requests, while simultaneously serving requests itself.
2. The parent creates a child process, and the child process loads a new program. This is the
modus operandi of operating systems—setup a special first user-level process, which in turn
creates new (child) processes, each of which starts other user-level programs (the login screen,
the graphical user interface, initialization programs, etc.).
Unix-like operating systems provide two system calls (and their associated variants) for this
purpose—fork and exec.
fork
The fork system call is responsible for the following tasks,
1. Duplicate a process by using the process state of the parent/calling process. Creates a new
process-control-block and in effect creates a new
exec
1. a
5 This first switch from the OS to a process is subtle, most switching to user-mode happens when a user-level
program has previously executing and is paused for a switch to kernel-mode. This first switch is unsolicited!
18
2.7. PROCESS SCHEDULING Ring for Operating Systems
Further, the parameters required for system calls are passed using the general purpose registers—
eax, ebx .... The two examples show invocation of the exit and read system calls. Value in the
eax register is the system call number. This number is used by the generic system call handler of
the operating system to determine which system call is being requested. As the Figure 2.3 shows,
the system call number can be used as index in a list of function pointers to system calls. The
appropriate function is invoked and kernel then returns to user-mode, with the return value stored
in the eax register.
19
2.8. FOOD FOR THOUGHT Ring for Operating Systems
20
Chapter 3
mmm . . . memory
21