User Input
In a previous section, we wrote text out to the user, but have had no way to get input back from the user. In this section, we will introduce a new system call which allows us to read a line of text from the console.
sys_read is the opposite of sys_write. While sys_write writes data from memory to the console, sys_read reads data from the console and saves that data into memory for later use by the program. Making a sys_read system call is very similar to using sys_write: all we have to do is set the registers to the appropriate values and tell the operating system when we're ready.
To make a sys_write call:
rax
must be set to 1, indicating sys_writerdi
must be set to 1, indicating stdout (console output).rsi
must be set to an address in memory where the string to be printed can be found.rdx
must be set to the number of characters to write from memory to the console.
Compare the above to sys_read, which is pretty similar:
rax
must be set to 0, indicating sys_readrdi
must be set to 0, indicating stdin (console input).rsi
must be set to an address in memory where the input string can be saved.rdx
must be set to the maximum number of characters to accept from the user.
The first program we'll make that uses sys_read will be very simple. It will accept input from the user and then print that same string right back out. Create a new file called "repeat.asm" and type the following program into it:
%define sys_exit 60
%define sys_read 0
%define sys_write 1
%define stdin 0
%define stdout 1
%define success 0
section .bss
%define buffer_len 64
buffer: resb buffer_len
section .text
global _start
_start:
; Read input from the user
mov rax, sys_read
mov rdi, stdin
mov rsi, buffer
mov rdx, buffer_len
syscall
; Write whatever the user entered back out
mov rdx, rax
mov rax, sys_write
mov rdi, stdout
mov rsi, buffer
syscall
; End the program
mov rax, sys_exit
mov rdi, success
syscall
There are three high-level operations here:
- Read a line of input from the user and save that input into memory.
- Write the input string from memory back out to the console.
- Exit the program.
Let's go through the source file in detail:
%define sys_exit 60
%define sys_read 0
%define sys_write 1
%define stdin 0
%define stdout 1
%define success 0
These are the constants we'll be using to make system calls. sys_exit is 60, stdin is 0, etc. This is just like previous programs, but we've added some new definitions because of the new system call being made.
section .bss
%define buffer_len 64
buffer: resb buffer_len
This is a new section type. Previously, we have worked with the text and data sections. This is a new type of section, called bss. Take a look at how these three sections compare:
- The text section is where code (instructions) go.
- The data section is for initialized data. This is memory for which we have an initial value when the program starts. In the "Hello, world!" section, we printed a string out to the user whose value we knew ahead of time.
- The bss section is for uninitialized data. This is memory which will be set dynamically by the program as it runs. Since the value of this memory will be set to whatever the user enters, we don't know what it will be ahead of time.
We could use the data section for this if we really wanted to, by giving buffer some garbage initial value that we expect to be overwritten, but it's wasteful to include that garbage data in the executable file. The bss section allows us to say we need a region of memory reserved, without actually taking up that number of bytes on disk. The operating system will reserve the requested number of bytes in memory each time the program runs.
So the purpose of this section is to make a region of memory which the user's input can be written to. Let's break it up into pieces and check out each line individually:
section .bss
This defines the beginning of the bss section, where any uninitialized memory is declared.
%define buffer_len 64
This creates a constant called buffer_len, which will be the total number of bytes of memory reserved for storing user input. In this case, anywhere we use the text "buffer_len" in the code, it will be replaced with the number 64. This value can be basically whatever you want, but 64 is a reasonable number in this case.
Note: this is not actually part of the bss section. %define
is an example
of an assembler directive, which does not translate directly to machine code.
It's a convenience offered by the assembler which allows us to define the size
of the buffer once and then refer to it elsewhere, so if we ever want to change
the size of the buffer, we only have to change it in this one place.
buffer: resb buffer_len
This is where the magic happens. This line declares the area in memory where the user's input will be stored. This is in 3 sections:
- buffer is the name of the area in memory we're declaring. Anywhere in the code that we use the name "buffer" will be replaced with the memory address to the beginning of this region in memory.
- resb stands for "reserve bytes". This tells the assembler we're reserving some number of bytes of memory.
- buffer_len gives the number of bytes we want to reserve. In this case we're using the constant buffer_len, which is 64. We could alternatively just type the number 64 here.
All together, this reserves a 64-byte area in memory which we can refer to by the name "buffer". When the program runs, this memory will be reserved for the program and we'll be able to read and write to it.
section .text
Now we're getting into more familiar territory. This is where the code begins.
global _start
_start:
This is the entry-point of the program, marking the first instruction that will be executed when the program is run.
; Read input from the user
mov rax, sys_read
mov rdi, stdin
mov rsi, buffer
mov rdx, buffer_len
syscall
The first thing the program does is read input from the user by making a
sys_read system call. Like other system calls, we set up the registers with
the details of the operation we want carried out and then issue the syscall
instruction, which notifies the operating system to do our bidding.
In this case, we're telling the operating system to read characters from the
console and store them in memory at the location given by buffer. The
operating system will let the user type until they hit the enter key, and then
up to 64 characters of text will be saved to memory. After the syscall
instruction executes, the total number of bytes entered by the user will be
available to us in the rax
register. Whatever text the user entered will
be stored in memory, and we'll be able to access it through the label buffer.
; Write whatever the user entered back out
mov rdx, rax
mov rax, sys_write
mov rdi, stdout
mov rsi, buffer
syscall
Now that the user's input is stored in memory and we can reference that region of memory with the name buffer, we just print whatever the user typed right back out to them.
This is very similar to previous sys_write calls, with one major difference.
Previously, we printed a static string "Hello, world!", meaning that we knew what
the string would be ahead of time, as well as how many characters it would be.
This time around, we don't actually know how many characters the user may have
entered. We know they couldn't have entered more than 64 characters, but other
than that, we have no idea. Luckily, sys_read returns the number of characters
the user entered in the register rax
. sys_write expects the number of
characters to write to be in the register rdx
. So we copy the value left by
sys_read in rax
to rdx
, where sys_write expects it.
Altogether, this system call tells the operating system to write the number of bytes that were previously read from memory, starting at the address buffer, out to the console.
; End the program
mov rax, sys_exit
mov rdi, success
syscall
Finally, we make a third system call to exit the program successfully.
Make sure the program is typed correctly as listed above, save it as "repeat.asm", and run it using the "run" script from previous sections:
./run repeat
The program should appear to pause and do nothing, waiting for input from you. Type some text (like "Greetings!") and press enter. The program should repeat whatever you typed and exit. The total output should look something like this:
Greetings!
Greetings!
0
Remember that 0 is the program status code, indicating that the program exited successfully.
Prompting
The "repeat.asm" program doesn't tell the user what to do: it just hangs until the user presses enter. We can mix and match sys_write and sys_read calls to provide some instructions to the user and some formatting to the output.
For this next program, we're going to ask the user for their name and then greet them. This can be broken down into the following system calls:
- sys_write - print "Please enter your name: "
- sys_read - input the user's name
- sys_write - print "Hello, "
- sys_write - print the user's name
- sys_write - print "!"
This will produce final output that looks a bit like this (depending on what you enter):
Please enter your name: Brian
Hello, Brian!
To get started, create a new file called "helloname.asm" and type the following program in:
%define sys_exit 60
%define sys_read 0
%define sys_write 1
%define stdin 0
%define stdout 1
%define success 0
%define newline 10
section .bss
%define name_max_len 64
name: resb name_max_len
name_len: resq 1
section .data
prompt: db "Please enter your name: "
prompt_len: equ $-prompt
response_start: db "Hello, "
response_start_len: equ $-response_start
response_end: db "!", newline
response_end_len: equ $-response_end
section .text
global _start
_start:
; Write the prompt out to the console
mov rax, sys_write
mov rdi, stdout
mov rsi, prompt
mov rdx, prompt_len
syscall
; Read the user's name from the console
mov rax, sys_read
mov rdi, stdin
mov rsi, name
mov rdx, name_max_len
syscall
; Store the number of characters entered by the user
mov [name_len], rax
; Write the start of the response
mov rax, sys_write
mov rdi, stdout
mov rsi, response_start
mov rdx, response_start_len
syscall
; Write the user's name
mov rax, sys_write
mov rdi, stdout
mov rsi, name
mov rdx, [name_len]
syscall
; Write the end of the response
mov rax, sys_write
mov rdi, stdout
mov rsi, response_end
mov rdx, response_end_len
syscall
; End the program
mov rax, sys_exit
mov rdi, success
syscall
This is a much longer program than the previous one, but it mostly just reuses the same concepts. There are only a couple of new things here. Let's step through it in detail:
%define sys_exit 60
%define sys_read 0
%define sys_write 1
%define stdin 0
%define stdout 1
%define success 0
%define newline 10
These are the same constants we defined before. The only difference is the inclusion of newline, with a value of 10. This is the newline character (produced when you press enter). We'll use this for formatting purposes.
section .bss
%define name_max_len 64
name resb: name_max_len
name_len: resq 1
Here we declare our uninitialized data. Like before, we reserve a 64 byte area in memory for user input. This time we call it name since this is where the user's name will be stored.
We also declare a new value called name_len. This is where we'll store the number of characters the user inputs (the length of name), so we can use it later. The declaration follows the same structure as the name declaration:
- name_len names the memory we're reserving so we can refer to it in the code.
- resq means to reserve a quad-word. This is 8 bytes, or 64 bits. On a 64-bit processor, the registers are 64 bits each. This makes 64 bits a natural size for an integer, since it requires no conversion to move it around between registers and memory.
- 1 means we only need one quad-word reserved. This is not a series of bytes like the string, it's only one piece of data: the number of characters typed by the user.
Altogether, the bss section defines two regions of memory:
- name, which is 64 bytes and will be used to store up to 64 characters entered by the user.
- name_len, which is 8 bytes and will be used to store a single integer indicating the total number of characters entered by the user.
section .data
prompt: db "Please enter your name: "
prompt_len: equ $-prompt
response_start: db "Hello, "
response_start_len: equ $-response_start
response_end db: "!", newline
response_end_len: equ $-response_end
Here is the data section, where we declare some initialized data. This is memory for which we have values ahead of time. We're declaring 3 static strings, plus a length count for each:
- prompt will be shown to the user first, telling them what to do.
- response_start will be printed before the user's name is repeated back to them.
- response_end will be printed after the user's name, giving punctuation and formatting to the response: an exclamation point and a newline character.
Each of these also has an accompanying _len value so we know how many characters each string contains.
section .text
global _start
_start:
Now we get to the code!
; Write the prompt out to the console
mov rax, sys_write
mov rdi, stdout
mov rsi, prompt
mov rdx, prompt_len
syscall
The first thing we do is make a sys_write call to print out "Please enter your name: " when the program starts.
; Read the user's name from the console
mov rax, sys_read
mov rdi, stdin
mov rsi, name
mov rdx, name_max_len
syscall
Next up, we read some input from the user. Whatever they type is stored in memory starting at the address indicated by name.
; Store the number of characters entered by the user
mov [name_len], rax
After the sys_read call returns, the number of characters entered by the user
will be provided in the rax
register. We're going to need this later, but
unlike in the previous program, we won't be using it immediately. We're going
to print the static string "Hello, " first, which will involve overwriting both
rax
and rdx
. By the time we get around to writing the user's name back out,
the information we need (the number of characters in the user's name) will be
lost.
In order to get around this, we need a place to temporarily save the number of characters in the user's name.
The instruction above copies the value from rax
into memory at the address
indicated by name_len. Notice the phrasing there. name_len is a memory
address: information about where we can store this data. This is unlike dealing
with registers, which are storage locations themselves. You can copy a value
directly to a register, but when dealing with a memory address you have to
clarify that you want to copy the value to memory at the given address.
This is where the square brackets come in. They're necessary because name_len
refers to an address in memory where data can be stored. The actual value of
name_len might be something like 0x6001b4
, or wherever the operating system
chooses to put it. We want the value of rax
to be copied into memory at that
location.
You may be wondering why the square brackets aren't always required. For example, when we read the user's input into memory, the instruction has no square brackets:
mov rsi, name
In the code above, name is a memory address just like name_len. The
difference is that the sys_read system call expects an address. It expects
rsi
to contain an address in memory where it can write the input data. If
we put name in square brackets, that would copy the memory itself into rsi
instead of the address. When the sys_read call tried to write to that
location in memory it would end up in the wrong place.
Let's take a short digression to explain this better. Here is a table showing some (made up) locations in memory:
Label | Address | Value | ASCII |
---|---|---|---|
string | 0x6001b0 | 66 | 'G' |
0x6001b1 | 114 | 'r' | |
0x6001b2 | 101 | 'e' | |
0x6001b3 | 101 | 'e' | |
0x6001b4 | 116 | 't' | |
0x6001b5 | 105 | 'i' | |
0x6001b6 | 110 | 'n' | |
0x6001b7 | 103 | 'g' | |
0x6001b8 | 115 | 's' |
The table above shows 9 bytes in memory, containing the string "Greetings". Each byte has its own unique address ranging from 0x6001b0 to 0x6001b8. The first byte has a label: string.
If we refer to string directly, we're talking about the memory address. For example:
mov rax, string
The above instruction would set rax
to the value 0x6001b0, which is the
address of the beginning of the string.
However, if we refer to string with square brackets, we're referring to the value stored in memory at the address 0x6001b0:
mov rax, [string]
This instruction would set rax
to the value of the first 8 characters in the
string: "Greeting". We can also refer to individual characters:
mov byte al, [string]
mov byte bl, [string + 4]
These instructions would load the character "G" into the register al
and the
character "t" into the register 'bl'.
Data labels like name and name_len are just addresses which point to locations in memory which contain data. Adding square brackets indicates that you're interested in the data at that location in memory, not the address itself.
; Write the start of the response
mov rax, sys_write
mov rdi, stdout
mov rsi, response_start
mov rdx, response_start_len
syscall
Now that the user has entered their name, we begin to respond. This system call prints out the string response_start, which is "Hello, ".
; Write the user's name
mov rax, sys_write
mov rdi, stdout
mov rsi, name
mov rdx, [name_len]
syscall
Next, we print the name the user entered. Again, notice the square brackets:
[name_len]
.
name_len is an address in memory. It might be something like 0x6001b4 (or wherever the operating system decided to locate it). We don't want to print 0x6001b4 bytes to the console, since there aren't nearly that many available. Instead, we want to look up the value stored at the address 0x6001b4 and print that number of characters. This should be a more reasonable number like 5 or 8, depending on the length of the user's name. So we use the square brackets to indicate this.
The total output so far will look something like this (if your name happens to be Brian):
Hello, Brian
Now we finish up the output:
; Write the end of the response
mov rax, sys_write
mov rdi, stdout
mov rsi, response_end
mov rdx, response_end_len
syscall
To finish off the sentence and apply some formatting, we write the string
response_end: "!\n" to the console. The exclamation point is added to the end
of the name and the newline character \n
is for formatting purposes.
; End the program
mov rax, sys_exit
mov rdi, success
syscall
Finally, we end the program here. Type it all into a file called "helloname.asm" and run it with the "run" script:
./run helloname
Enter your name when it prompts you, and you should see something like the following:
Please enter your name: Brian
Hello, Brian
!
0
Okay, not quite what we were going for. Why is the exclamation point on its own line? To troubleshoot the problem, try returning the number of characters entered by the user as the program status code to see how many characters the OS thinks we entered. Change the following:
mov rdi, success
To this:
mov rdi, [name_len]
This will report the number of characters we enter as the program status code so we can get some feedback. Make the change, save the file, and rerun it. You should see something more like this:
Please enter your name: Brian
Hello, Brian
!
6
6?! I only typed 5 letters! The thing is, the operating system is including the enter key pressed after typing the name. So for the name "Brian", the actual string we get back is "Brian\n". That extra newline is garbage, it's not part of the data, it's just formatting. We can prevent the newline from being written by subtracting 1 from the value of name_len. Even though the string will still have a newline after it (we can't stop the operating system from including it), we can ignore it by only paying attention to the first 5 characters.
Change the following section:
; Store the number of characters entered by the user
mov [name_len], rax
To this:
; Store the number of characters entered by the user
dec rax
mov [name_len], rax
rax
contains the number of characters entered by the user. Before saving that
value to name_len for later use, we now decrement that value. This means to
subtract 1 from it. The instruction dec rax
subtracts 1 from whatever
value happens to be in rax
. If you entered 6 characters including the enter
key, this will change it to 5. If you entered 8, this will change it to 7.
By subtracting 1 from the number of characters we write out, we effectively ignore the last character in the string by printing only the part of the string we care about.
Make the change, save, and rerun. You should now get something like this:
Please enter your name: Brian
Hello, Brian!
5
The formatting is no longer messed up. We're ignoring the last character in the string by printing one fewer than the number of characters the operating system returned. The trailing newline is not printed, so our exclamation point appears on the same line as the name.