The Virtual Address Space
This is part 1 of a multi-part series.
- Stack and Heap for Swift (and other) Programmers
- The Virtual Address Space (you are here)
- Introduction to the Stack
- Limitations of the Stack (coming soon)
- Introduction to the Heap
- Stack and Heap for Swift Programmers
- Stack and Heap for SwiftUI programmers
As we’ll eventually discuss in detail, the stack and the heap are two distinct regions in the memory address space of a running program. What is a program’s address space, and what else is in there?
Programs are instructions and data
Whether written in Swift or C, Kotlin or COBOL, ARM or 6502 assembly, or any other language, executable programs are made of CPU instructions1 and the data they manipulate. Initially, the instructions, and data which remain constant throughout the program’s execution, live in a file on disk. When a program is executed, the operating system creates a process with an associated memory address space, and loads the instructions and constant data from the file into that address space. Execution of the program code then begins on the process’s main thread. The program may spawn additional threads, but all threads in the program’s process have access to the same memory addresses in its address space.
The address space
From a program’s perspective, the memory that is available to it is a contiguous array of bytes. Each byte of a program’s memory space has a unique identifier, or address, which is an unsigned integer, starting at 0 and increasing by 1 for each subsequent address. The maximum amount of memory the program can address is 2^N bytes, where N is the number of bits the CPU and operating system use to store addresses.
Modern CPUs and operating systems use addresses that are 64 bits or 8 bytes in size. 64-bit addressing allows for 2^64 or 18,446,744,073,709,551,616 bytes. That’s 16 exabytes (EB) or about 17 billion gigabytes! These 16 EB are the maximum addressable memory space available to a program running on a 64-bit CPU.
The 64-bit address space is large
2^64 is a number that resists attempts to represent it in any sort of diagram. It’s literally astronomical! Let’s say we wanted to make a scaled diagram of the 64-bit address space, which is usually represented vertically, since we talk about “lower” and “higher” memory addresses. If we drew a column of 128 bytes that fit from bottom to top of a 27” 5K display, which is about 14” high2, then to show all 16 EB we would need a stack of displays more than 5 light-years tall. That’s about one-third further than the distance to Alpha Centauri, the nearest star system to our Sun3, or more than 2,000 times further than Voyager 1 has traveled in its ongoing, 45+ year mission4. If all the displays were Apple Studio Displays with the tilt- and height-adjustable stand, they would cost more than one quarter sextillion US dollars5.
Doing the same diagram for a 32-bit address space would require a stack of displays only as tall as the Earth is wide6, and much more affordable at only 67 billion dollars7.
Well, of course we’re going to need a lot of space if we want to be able to read each byte value without a magnifying glass. What if each byte was as small as a pixel? Impossible. If all 14,745,600 pixels on a 5K display represented the 64-bit address space, one byte would be represented by less than one trillionth of a pixel, and even 4 gigabytes, the entirety of a 32-bit address space, would occupy only a few thousands of a pixel8. In fact each pixel would represent a little more than one terabyte9.
Well, that is all ridiculous. One-terabyte SD cards exist, and they are pretty small. Let’s talk about real-world sizes. Measuring an SD card, we see that its area minus the connector is roughly 4 square centimeters. Let’s assume all of that area is used to store the bits10. At the same bit density, a 16 EB flash memory chip would be the size of a football pitch11.
2^64 is a really big number.
Virtual vs. physical memory
Of course the CPUs we use day-to-day aren’t connected to 16 EB of real memory. Those are virtual, not physical addresses. Modern 64-bit operating systems like macOS or Linux take care of backing those virtual bytes with as much RAM and swap disk space as available.
So, immediately after starting, a program’s address space contains its instructions and its constant data12. How those instructions and data are arranged in memory differs between operating systems. Let’s explore what the specific arrangement is on Apple platforms.
An Apple platform example
On Apple platforms, executable programs are stored in files using the Mach object file format, or Mach-O. In addition to a program’s instructions and constant data, Mach-O stores the list of steps necessary to load the program into memory.
We can see the definitions of those steps—and therefore learn exactly what lives in the file, and where in the address space the operating system puts it—using the otool
program. Let’s try this with a familiar macOS app: Calculator.
A native GUI app on Apple platforms is packaged as an app bundle. In general, a bundle is a directory hierarchy which groups related files together so they can be manipulated as a unit. An app bundle in particular contains an executable file in Mach-O format, along with resources it uses, like images and language translation files. An app seen in the Finder, with the .app extension, is the root directory of an app bundle.
Here is the directory hierarchy of Calculator’s app bundle:
$ tree -L 3 /System/Applications/Calculator.app
/System/Applications/Calculator.app
└── Contents
├── Info.plist
├── MacOS
│ └── Calculator
├── PkgInfo
├── PlugIns
│ ├── BasicAndSci.calcview
│ └── Hexadecimal.calcview
├── Resources
│ ├── AppIcon.icns
│ ├── Assets.car
│ ├── Base.lproj
│ ├── Calculator.loctable
│ ├── ConversionCategories.plist
[many more language localization files]
│ └── zh_TW.lproj
├── _CodeSignature
│ └── CodeResources
└── version.plist
All of the bundle’s contents are in the aptly-named Contents
directory at the root of the bundle. The Mach-O executable file is at Contents/MacOS/Calculator
.
Here’s how we use otool
to examine the Mach-O load commands in the executable of Calculator.app.
otool -l /System/Applications/Calculator.app/Contents/MacOS/Calculator | less
We’re piping the output through less
, because there’s a lot of it, and we’re mostly concerned with the beginning.
__PAGEZERO: 4 GB of nothing
The first load command does something quite interesting!
/System/Applications/Calculator.app/Contents/MacOS/Calculator:
Load command 0
cmd LC_SEGMENT_64
cmdsize 72
segname __PAGEZERO
vmaddr 0x0000000000000000
vmsize 0x0000000100000000
fileoff 0
filesize 0
maxprot 0x00000000
initprot 0x00000000
nsects 0
flags 0x0
The command type is LC_SEGMENT_64
, which is a command that usually loads data into a segment of the program process’s 64-bit virtual address space. A segment is a contiguous region of virtual memory granted certain properties, like whether it can be read from or written to, or whether instructions can be executed from it. For example, CPU instructions are loaded into a segment that allows reading and executing, but not writing, so the program can’t be modified at runtime.
The name of the segment being loaded in this first command is __PAGEZERO
, and it starts at address zero, as indicated by vmaddr 0x0000000000000000
, and is 0x100000000 or 4,294,967,296 bytes in size, as indicated by vmsize 0x0000000100000000
. That’s the first 4 GB of the address space, equal to the maximum amount of memory that could possibly be available to a 32-bit program!
What’s being loaded in all that space? Nothing! filesize 0
means no bytes from the file are loaded into memory.
But note that maxprot
and initprot
are set to 0x00000000. None of those 32 bits is turned on (set to 1), which means this segment isn’t readable (0b001), writable (0b010), or executable (0b100). This segment load command tells macOS to make the first 4 GB of the 64-bit virtual address space illegal to access!
That gives us an idea of the vastness of a 64-bit virtual address space: a portion of it that is equal to all the memory that could possibly be available to a 32-bit app is simply ignored. Why? To help catch null dereference errors. Attempting to read or write the memory at any address from 0x0000000000000000 to 0x00000000FFFFFFFF will immediately crash the program, rather than causing a much harder to debug error—or a security violation!—perhaps a long time later when the incorrect—or maliciously inserted!—data read from those addresses is used.
__TEXT: CPU instructions and constant data
Next is another segment load command, but this one actually does load data from the executable file into the address space. The segment name is __TEXT
, and its contents include the CPU instructions and some of the constant data for the program.
Load command 1
cmd LC_SEGMENT_64
cmdsize 952
segname __TEXT
vmaddr 0x0000000100000000
vmsize 0x0000000000028000
fileoff 0
filesize 163840
maxprot 0x00000005
initprot 0x00000005
nsects 11
flags 0x0
This command loads the first 160 KB of the executable file, as indicated by fileoff 0
and filesize 163840
, into the first 160 KB of virtual memory after the __PAGEZERO
segment, as indicated by vmaddr 0x0000000100000000
and vmsize 0x0000000000028000
.
Note that the output I’m showing came from running otool
on an Apple Silicon Mac, so it’s showing the arm64 portion of the Calculator universal binary. If you ran otool
on an Intel Mac, the output you’re seeing is for the x86_64 portion and will have different sizes.
The choice of 160 KB—a multiple of 16 KB—is deliberate. Modern operating systems like macOS and Linux are highly optimized for loading pages of data from disk into memory, since that is the same operation needed to back virtual memory with swap space. On ARM-based Apple platforms, the page size is 16 KB.
maxprot
and initprot
are set to 0x00000005 or 0b101, which makes the __TEXT
segment readable (0x1 or 0b001) and executable (0x4 or 0b100), but not writable (0x2 or 0b010).
A segment can be divided into multiple sections, and __TEXT
has eleven, as indicated by nsects 11
. Many of those sections are implementation details of the Objective-C language, which is the subject of a completely different blog post. But let’s look at two that are relevant to our current topic.
The first section in the __TEXT
segment is __text
, which contains the sequence of instructions that the CPU executes to run the Calculator app.
Section
sectname __text
segname __TEXT
addr 0x000000010000400c
size 0x0000000000016784
offset 16396
align 2^2 (4)
reloff 0
nreloc 0
flags 0x80000400
reserved1 0
reserved2 0
The instructions start at byte 16396 in the executable file, are loaded into virtual memory starting at address 0x10000400c, and there are 0x16784 or 92,036 bytes worth of them. As we’ll discuss in more detail in the next post, ARM64 CPU instructions are always 4 bytes long, so this section contains exactly 23,009 instructions.
We said the __TEXT
segment also contains some of a program’s constant data. We can see one example of that in its __cstring
section, which is where a program’s C string literals live. If we scroll ahead in the otool
output a bit, we’ll see
Section
sectname __cstring
segname __TEXT
addr 0x0000000100025fc0
size 0x0000000000001aa9
offset 155584
align 2^0 (1)
reloff 0
nreloc 0
flags 0x00000002
reserved1 0
reserved2 0
We can examine the contents of a section in a segment with otool -s <segment name> <section name> -V
. Run the following command in a separate Terminal tab or window.
otool -s __TEXT __cstring -V /System/Applications/Calculator.app/Contents/MacOS/Calculator | less
We see all of Calculator’s C string literals. Here are the first few dozen, including some containing conversion specifiers like %lu
.
/System/Applications/Calculator.app/Contents/MacOS/Calculator:
Contents of (__TEXT,__cstring) section
0000000100025fc0 0
0000000100025fc2 ,0.
0000000100025fc6 e
0000000100025fc8 .
0000000100025fca -
0000000100025fcc
0000000100025fcd M
0000000100025fcf touchBar
0000000100025fd8 currentView.touchBar
0000000100025fed Deg
0000000100025ff1 using degrees input mode
000000010002600a Rad
000000010002600e using radians input mode
0000000100026027 ( )
000000010002602b PAREN_ERROR
0000000100026037 %.16lf
000000010002603e %llu
0000000100026043 DivByZero
000000010002604d Inf
0000000100026051 -Inf
0000000100026056 NaN
000000010002605a Over
000000010002605f Under
0000000100026065 Error
000000010002606b Unknown error with value: %lu
The big picture
We’re just trying to get a sense of what kind of data—and how much—the operating system loads into memory when it launches a program, so let’s skip to the last segment load command.
Load command 4
cmd LC_SEGMENT_64
cmdsize 72
segname __LINKEDIT
vmaddr 0x0000000100038000
vmsize 0x000000000000c000
fileoff 229376
filesize 32992
maxprot 0x00000001
initprot 0x00000001
nsects 0
flags 0x0
The __LINKEDIT
section starts at virtual address 0x100038000 and is 0xc000 bytes long, so it ends at virtual address 0x100044000. That’s 278,528 bytes of CPU instructions and constant data beyond the initial 4 GB illegal zone.
We can now recognize two different classes of memory addresses when we see them in the wild:
0x0000000000000000-0x00000000FFFFFFFF
: ☠️No admittance! ☠️Reading or writing these addresses crashes our program!0x0000000100000000-0x00000001________
: Program instructions and constant data. The upper end of this range varies with the amount of instructions and constant data, but it will most likely be less than 0x0000000140000000, since most programs will have less than a gigabyte of code and data13. This region of virtual memory contains only the code and data compiled directly (statically) into the program executable—dynamic libraries used by a program live in their own separate Mach-O files.
There is still a vast amount of the 64-bit address space remaining to explore.
Runtime memory allocation
And we need it! Programs that only work with constant data aren’t that useful. Programs need the ability to load new data into memory as they run. This runtime-allocated data can be stored in the space that’s left over after the program’s instructions and constant data have been loaded. But where specifically should it go?
Where to store it?
Where to store runtime data is a decision that faced the very first computer programmers who were writing directly in machine code, with full control over the memory space of their programs. If their runtime memory needs were simple and predictable, they could perhaps get away with establishing known memory locations where specific kinds of runtime data would be stored. If, for example, they were writing a program that could sort up to 1,000 four-byte integers, they might have reserved 4,000 bytes of memory for that purpose.
But what if sorting 1,000 four-byte integers was only one of several problems a program needed to solve, and only about 4,000 bytes of memory were available? The first magnetic core memory stored 1 KB, and the Apple II shipped with as little as 4 KB of RAM.
Or what if 1,000 integers was an edge case, and most of the time you only needed to sort 100 integers. Then most of the time 90% of that space was wasted!
For much of the history of programming, memory was an expensive and therefore limited resource. Programmers needed the ability to use the same precious memory for different data throughout the course of the program.
Take a moment to think about what features you might want in a runtime memory allocation system, then join me in the next post!
Revision History
- 2023-04-04: Corrected and expanded discussion about the size of the 64-bit address space.
-
The machine might be virtual, in the case of programs written for the JVM, for example. ↩
-
Each byte value would have 2880 / 128 = 22.5 pixels of height at 2x resolution—let’s say 9 points with a bit of padding—so they would be quite legible. If you doubled the number of bytes displayed, they’d be a lot harder to read, but you would need a stack of displays reaching only about two-thirds of the way to Alpha Centauri. ↩
-
2^57 * 36.2 centimeters ≅ 5.22 * 10^13 kilometers ↩
-
(2^57 * 36.2 centimeters) / (23.807 * 10^9 kilometers) ≅ 2190 ↩
-
2^57 * $1,999 = $288,086,260,963,635,888,128 ↩
-
2^25 * 36.2 centimeters ≅ 12,150 kilometers ↩
-
2^25 * $1,999 = $67,075,309,568 ↩
-
(5120 * 2880) / 2^64 ≅ 7.993 * 10^-13; (5120 * 2880) / 2^32 ≅ 0.003433. ↩
-
2^64 / (5120 * 2880) ≅ 1.251 * 10^12 ↩
-
This is probably a significant overestimate, given that there are 1 TB micro SD cards, but I haven’t yet figured out a reasonable estimate of the NAND flash chip area needed to store 1 bit. ↩
-
When talking about storage, 1 TB is usually 10^12 = 1,000,000,000,000 bytes, not 2^40 = (2^10)^4 = 1024 * 1024 * 1024 * 1024 = 1,099,511,627,776 bytes, but for this rough calculation it doesn’t matter, so we’ll use 2^40 to keep the math simple. 2^24 * 4.000 square centimeters ≅ 6711 square meters. ↩
-
It may also contain other things we won’t discuss right now, like instructions and constant data from shared libraries used by your program, I/O ports for communication with the kernel and devices, and the contents of memory-mapped files. ↩
-
The largest executable of an app currently in my /Applications folder belongs to the game The Pathless. It is 390 MB in size, and
otool -l
reports that its__LINKEDIT
segment ends at address 0x000000010CEC4000, or about 207 MB past the illegal first 4 GB of the address space. ↩