Hello world from a bootloader

In my day, registers were 16 bits, computers ran one program at a time, and the highest known number was 65,536. Programs had direct access to memory so if you messed up your pointer math you could crash the whole computer, which was the style at the time..

Now, modern computers run hundreds or thousands of programs at once and use newfangled nonsense like virtual memory to provide isolation between programs. CPU register size doubled to 32 bits and then again to a ridiculous 64 bits.

Despite these supposed advancements, x86 computers have remained remarkably backwards-compatible: 16-bit real mode still exists on every modern x86 CPU.

Even today, when an x86 computer boots, it starts in 16-bit real mode for compatibility with chips going back to the late 70s. It executes a tiny chunk of 16-bit code from the beginning of the boot disk, just enough to load the rest of the system and enable 64-bit and protected mode.

This tiny piece of 16-bit code is called a bootloader, and as it turns out, it's pretty straightforward to make one that does almost nothing!

BIOS

When an x86 computer boots, the BIOS starts first. Its job is to initialize hardware and begin the boot process.

Note: UEFI is a newer replacement for the BIOS but since I don't know anything about UEFI, we'll go with the BIOS.

BIOS interrupts

The BIOS provides a set of services which can be used by our bootloader to do useful things like:

  • Print text to the screen.
  • Manipulate the cursor.
  • Switch to certain graphics modes.
  • Inspect the computer's hardware configuration.

These services are accessed by setting registers to specific values and issuing a BIOS interrupt with the appropriate code.

For example, this snippet prints the character A to screen:

    mov ah, 0x0E
    mov al, 'A'
    int 0x10

Back in the days of DOS it was common for everyday programs to use this BIOS-provided functionality, but these days it's mostly just used by bootloaders.

With these services in place, the BIOS checks the configured boot device for a master boot record containing a bootloader which can be run.

Master boot record

The master boot record (MBR) is a tiny 512-byte structure at the very beginning of a disk. This structure will house our custom bootloader code.

Here's a sample layout of a master boot record:

Offset Description Size (bytes)
+0 Bootloader executable code 446
+446 Partition entry #1 16
+462 Partition entry #2 16
+478 Partition entry #3 16
+494 Partition entry #4 16
+510 Boot signature (0x55, 0xAA) 2

Note: There are a lot of variations on this structure. Most involve using the end of the code section to store additional meta-data about the disk.

When it's time to boot into an operating system, the BIOS checks the configured boot disk for the boot signature: a magic number sequence of [0x55, 0xAA] at byte offsets 510 and 511. If found, the BIOS assumes this is a valid boot sector.

The BIOS loads the 512-byte MBR into memory starting at address 0x7C00 and executes the code at this address, passing control over to the bootloader.

Note: The GUID Partition Table or GPT is a newer replacement format for partition tables.

A custom bootloader

A typical bootloader is responsible for bootstrapping the rest of the system. It may look up hardware configuration data from the BIOS, implement a simple filesystem driver, and use that driver to load the rest of the system into memory from disk.

Our simple bootloader won't be quite so fancy: we'll just write a message to the screen.

Here's the full assembly file (line-by-line breakdown to follow):

%define NULL 0

org 0x7C00

    mov si, message

print_string:
    lodsb

    ; Check for the NULL-termination character. If found, exit the loop.
    cmp al, NULL
    je infinite_loop

    ; Write the byte in `al` as an ASCII character to the screen.
    mov ah, 0x0E
    int 0x10

    jmp print_string

infinite_loop:
    jmp infinite_loop

message: db "Hi, I'm a bootloader who doesn't load anything.", NULL

; Pad out the file to the 510th byte with zeroes.
times 510-($-$$) db 0

; MBR boot signature.
db 0x55, 0xAA

To run it, save the above code to a file named hello.asm and use NASM to assemble it:

$ nasm hello.asm -f bin -o hello.bin

Now use QEMU to emulate an x86 computer and boot from the custom bootloader:

$ qemu-system-x86 hello.bin

When I run this, I see:

Line-by-line breakdown

If you're interested in a line-by-line breakdown, read on!

%define NULL 0

This is a NASM macro. No code is emitted to the binary as a result of this directive, it's just a convenience for the programmer to make it a little clearer that the 0 is the NULL-termination character.

org 0x7C00

When the virtual machine boots, it will load the bootloader binary into memory starting at address 0x7C00. NASM uses the org directive to figure out what offsets to use for things like jump labels and the message string.

    mov si, message

si is the source index register. It's commonly used for reading source data into an algorithm, and that's exactly what we'll use it for!

print_string:

This begins the loop to print the message to the screen. Each time the loop runs, it prints one character.

    lodsb

This instruction is short for something like load string byte. It loads one byte into the al register from the address si is pointing to (in this case, the message string).

Additionally, it increments the address in si so we'll see the next character in the following loop iteration instead of processing the same character over and over.

    cmp al, NULL

The message string is NULL-terminated, which means the end of the string is marked by the NULL value: 0 in ASCII. We compare the current character to the NULL-termination character to see if we've reached the end of the string yet.

    je break_loop

If the current character is NULL, we've reached the end of the string, so we jump out of the loop to stop processing.

    mov ah, 0x0E
    int 0x10

If there are still characters left in the string, we ask the BIOS to print one! By setting ah to 0x0E and then interrupting the BIOS with interrupt code 0x10, the BIOS will check al for an ASCII character and print it out on the screen.

    jmp print_string

Jump back to the print_string: label. This starts the loop over again and outputs the next character to the screen.

infinite_loop:
    jmp infinite_loop

Once the entire string has been printed, we'll jump here to stop printing characters.

This is an infinite loop at the end of the bootloader. If this wasn't here, execution would continue and the computer would try to execute the contents of message as if it were x86 code.

message: db "Hi, I'm a bootloader who doesn't load anything.", NULL

This is the message we print to screen, in ASCII encoding. The last character is the NULL-termination character, which marks the end of the string so the loop knows when to stop.

times 510-($-$$) db 0

This inscrutable-looking line tells the assembler to pad out the generated binary file to the 510th byte with 0s.

Roughly translated:

  • times repeats a directive a given number of times.
  • 510-($-$$) is an expression which evaluates to 510 minus the number of bytes before this position in the file.
  • db generates a byte in the binary file.
  • 0 indicates the generated byte should have a value of, you guessed it, 0.

So all together, this writes 0s up to and including the 510th byte of the binary file.

db 0x55, 0xAA

This writes the 2-byte boot sequence to the end of the file. For a bootloader to be recognized as such, it's expected to have its 511th and 512th bytes set to the values 0x55 and 0xAA. If these aren't here, some BIOS implementations may not recognize this as a bootloader and fail to load it.