Linux Kernel Memory Addressing – kernel 2.6.x

Linux’s efficient and robust memory management is a cornerstone of its success. while users rarely think about it, a lot of complex work happens behind the scenes to ensure processes have the memory they need, and that they don’t interfere with each other. This blog post will explore how Linux leverages the x86 architecture’s memory addressing capabilities, focusing on logical, linear, and physical addresses, segmentation, and paging.

The Foundation: Logical, Linear, and Physical Addresses

At its core, memory addressing involves translating a logical address into a physical address, The 80×86 architecture, particularly in protected mode, introduces a crucial intermediary: the linear address.

First think of it as a three step process:

Logical Address: this is the address a programmer uses in their code. It’s a 32-bit integer repersenting a memory location.
Segmentation: The 80×86 architecture traditionally uses segmentation. Logical address are broken down into segment and an offset. The segment defines a base address and the offset specifies the distance from the base, However, modern Linux largely avoids segmentation for most processes due to its complexity and limitations.
Paging: The linear address is then translated into physical address through a process called paging. This involves using page tables and a Memory Management Unit (MMU) to map linear addresses to physical memory location.

Why Segmentation Isn’t the Star in Linux

Historically, segmentation was integral to 80×86 architecture. However, Linux prefers paging for several reasons:

Simplicity: Managing memory is simpler when all processes share the same set of linear address, while paging can map the same linear address space into different physical address space.
Portability: RISC architectures often have limited support for segmentation. Linux’s design portability across a wide range of architectures.

Intro Segment Descriptors

Segment descriptors (SD) are 8-byte data structures that describle the characteristics of each segment in memory, Each segment (code, data, stack, etc.) has a corresponding segment descriptor. SD are stored in Global Descriptor Table (GDT) or Local Descriptor Table (LDT)

segmentation unit

The segmentation unit is a component of the 80×86 processor responsible for translating logical address into linear address.

Convert steps:

Examines the TI field of the Segment Selector: This field determines whether the Segment Descriptor is stored in the GDT or LDT
Computes the address of the segment descriptor
Adds the offset to the Base field: Final it adds the offset of the logical address to the Base field of the Segment descriptor to obtain the final linear address.

Global Descriptor Table (GDT):

Purpose: The GDT is a central table containing segment descriptors that define the memory segments accessible by the system. There is only one GDT per cpu in a multiprocessor system.

Storage: Stored in the cpu_gdt_table array, with address and sizes in cpu_gdt_descr

ENTRY(cpu_gdt_table)
	.quad 0x0000000000000000	/* NULL descriptor */
	.quad 0x0000000000000000	/* 0x0b reserved */
	.quad 0x0000000000000000	/* 0x13 reserved */
	.quad 0x0000000000000000	/* 0x1b reserved */
	.quad 0x0000000000000000	/* 0x20 unused */
	.quad 0x0000000000000000	/* 0x28 unused */
	.quad 0x0000000000000000	/* 0x33 TLS entry 1 */
	.quad 0x0000000000000000	/* 0x3b TLS entry 2 */
	.quad 0x0000000000000000	/* 0x43 TLS entry 3 */
	.quad 0x0000000000000000	/* 0x4b reserved */
	.quad 0x0000000000000000	/* 0x53 reserved */
	.quad 0x0000000000000000	/* 0x5b reserved */

	.quad 0x00cf9a000000ffff	/* 0x60 kernel 4GB code at 0x00000000 */
	.quad 0x00cf92000000ffff	/* 0x68 kernel 4GB data at 0x00000000 */
	.quad 0x00cffa000000ffff	/* 0x73 user 4GB code at 0x00000000 */
	.quad 0x00cff2000000ffff	/* 0x7b user 4GB data at 0x00000000 */

	.quad 0x0000000000000000	/* 0x80 TSS descriptor */
	.quad 0x0000000000000000	/* 0x88 LDT descriptor */

	/* Segments used for calling PnP BIOS */
	.quad 0x00c09a0000000000	/* 0x90 32-bit code */
	.quad 0x00809a0000000000	/* 0x98 16-bit code */
	.quad 0x0080920000000000	/* 0xa0 16-bit data */
	.quad 0x0080920000000000	/* 0xa8 16-bit data */
	.quad 0x0080920000000000	/* 0xb0 16-bit data */
	/*
	 * The APM segments have byte granularity and their bases
	 * and limits are set at run time.
	 */
	.quad 0x00409a0000000000	/* 0xb8 APM CS    code */
	.quad 0x00009a0000000000	/* 0xc0 APM CS 16 code (16 bit) */
	.quad 0x0040920000000000	/* 0xc8 APM DS    data */

	.quad 0x0000000000000000	/* 0xd0 - unused */
	.quad 0x0000000000000000	/* 0xd8 - unused */
	.quad 0x0000000000000000	/* 0xe0 - unused */
	.quad 0x0000000000000000	/* 0xe8 - unused */
	.quad 0x0000000000000000	/* 0xf0 - unused */
	.quad 0x0000000000000000	/* 0xf8 - GDT entry 31: double-fault TSS */

Layout: Contains 18 segment descriptors and 14 unused/reserved entries, the unused entries are strategically places for hardware cache alignment.

Segment Defined:

4 user/kernel code and data segments
A task state segment (TSS) – unique for each processor
Thress Thread-Local Storage TLS segments
Thress segments related to Advanced Power Management (APM)
Five segments relate to plug-and-play BIOS services

Access: The CPU uses the gdtr register to locate GDT

Local Descriptor Table (LDT)

Purpose: An LDT is a per-process table of segment descriptors. It allows processes to define their own custom segments. Most Linux user-mode applications don’t use LDTs, so the kernel provides a default LDT

Custom LDTs: Processes can create their owm custom LDTs using the modify_ldt() system call

GDT update: When a process starts using a custom LDT, the corresponding entry in the CPU specific GDT is updated

Storage: The address and size of the LDT are stored in the ldtr register

Intro Memory Paging

Core Concepts: Paging and Page Tables

Paging: Linux uses paging to manage memory efficiently. Instead of contiguous blocks of memory, RAM is divided into fixed-size units called Page Frames (typically 4KB), Processes don’t directly access physical memory locations, they use linear address. Paging translates these linear addresses to physical address.

Page Tables: The translation from linear to physical address is done by Page Tables. These are data structures stored in RAM that map linear addresses to physical page frames.

Regular Paging

The 32 Bits of a linear address divided into thress level fields:

Directory (10 bits)
Table (10 bits)
Offset (12 bits)

The control register cr3 stores the physical address of the Page Directory which is used. The directory field of the linear address uses with the cr3 to find the proper page directory, Using the table field with the proper page directory we can find the address of the proper page table, Finally adding the offset field with the proper page table we get the real page address.

Paging In Linux

Linux adopts a common paging model that fits both 32 bits and 64 bits architectures.

The structure of Page Tables:

Page Global Directory (PGD): This is the top-level table, It contains pointers (address) to the PUD, think of it as the “root” of the paging tree. Each process has its own PGD. The Kernel carefully manages the PGDs and ensure they are correctly loaded when a process switches. The cr3 control register holds the physical address of the current process’s PGD.
Page Upper Directory (PUD): Not always present, especially in 32-bit systems. It contains pointers to the Page Middle Directories.
Page Middle Directory (PMD): It contains pointers to the Page Table
Page Table (PT): This is the bottom level. Each entry in a Page Table points to a specific page frame in physical memory.

Process Page Table (PPT)

These are sets of page tables maintained for each process running on the system. They map the process’s linear address to a physical memory address.

When a process is in User Mode, it uses linear addresses below 0xc0000000.

When a process is in Kernel Mode, it uses linear addresses greater than or equal to 0xc0000000.

Kernel Page Table (KPT)

The kernel maintains its own set of page tables, rootes at a Master Kernel Page Global Directory, this master KPGD and its associated tables are not directly used by processes or kernel threads.

The Key Differences & Relationship (PPT and KPT)

process-specific vs kernel wide: process page tables are unique to each process, while the kernel page tables are a central and shared resource.
Template/Reference: The kernel page tables act as a template or reference for setting up the process page tables.
Dynamic updates: Kernel updates are propagated to process tables, ensuring consistency.

Summarize

Core Concept: The linux kernel uses a hierarchical page table system to translate linear virtual addresses used by processes into physical addresses in RAM. This allows for memory protection, virtual memory, and efficient memory management.

Page Global Directory (PGD)
The top-level structure in the paging hierarchy. It contains entries that point to Page Upper Directories (PUD)
Each process has its own PGD, The kernel has a master PGD
- pgd_index(addr) Macro: Calculates the index of the PGD entry for a given linear address
- pgd_offset(mm, addr) Macro: Calculates the linear address of the PGD entry for a given address and memory descriptor
- pgd_offset_k(address) Macro: Calculates the linear address of the kernel PGD entry
- pgd_page(pgd): Gets the address of the page Fram containing the PGD
Page Upper Directory (PUD)
Points to Page Middle Directories(PMDs)
Each PGD entry points to a PUD
- pud_offset(pgd, addr)`: Calculates the linear address of the PUD entry for a given address
- pud_page(pud)`: Gets the address of the page frame containing the PU
Page Middle Directory (PMD)
Points to Page Tables (PTs)
Each PUD entry points to a PMD
- pmd_index(addr): Calculates the index of the PMD entry for a given address
- pmd_offset(pud, addr): Calculates the linear address of the PMD entry for a given address
- pmd_page(pmd): Gets the address of the page frame containing the PMD
Page Table (PT)
The lowest level. Contains entries that directly map linear addresses to physical addresses (page frames)
Each PMD entry points to a PT
- pte_offset_map(dir, addr)`: Calculates the linear address of the PT entry for a given address
- pte_page(x)`: Gets the page descriptor address of the page referenced by the PT entry
- pte_to_pgoff(pte)`: Extracts the physical page offset from a PTE

Additional Import Concepts:

Page Descriptor: A data structure associated with each physical page frame. It contains information about the page’s status, access rights, and other metadata.
CR3 Register: A CPU register that holds the physical address of the current process’s PGD. Switching processes involves updating CR3
TLB (Translation Lookaside Buffer): A cache that stores recent translations from linear to physical addresses to speed up memory access.

Paging Levels:

32-bit system (PGD -> PT)
32-bit system (with PAE physical address extension) (PGD -> PUD -> PT)
64 bit system (PGD -> PUD -> PMD -> PT)

The Paging Process (simplified)

CPU requests memory: the CPU generates a linear address to access memory
MMU lookup: The MMU checks the the TLB for a translation of that linear address
- TLB Hit: The MMU immediately retrieve the corresponding physical address
- TLB Miss: The MMU must walk the page tables. It uses the bits of the linear address to index into the PGD, then the PUD, then the PMD, and finally the PT, to find the physical address. This is a slower process. The Translation is then stored in the TLB for future use.
Memory Access: The MMU provides the physical address to the memory controller, which accesses the data information in RAM.

In essence, the page tables form a tree-like structure that allows the kernel to map a process’s linear address space to the physical memory, providing memory management and protection.