Entering user mode in long mode
by Technocoder
Note: This post does not discuss context switching, but the actual method of which to get into user mode.
What?
There’s four different rings in the Intel processor. Ring 0 is called kernel mode and Ring 3 is called user mode. User mode has a lot more security and restricted access to various instructions and areas of memory.
Why?
A rogue program running in Ring 0 can destroy the entire system, as it has almost no bounds on what it can do. This is why programs are ran in Ring 3; so that protection mechanisms can be put in place and at the very least, limit the amount of damage a virus can do. It’s also to prevent buggy code from affecting the rest of the system.
How?
The easiest way to enter Ring 3 is by hijacking an interrupt and pretending we were in Ring 3 to begin with, before we entered the interrupt. In exampleOS, we use the timer interrupt for this very purpose in this file.
The Global Descriptor Table
Before paging was introduced, the main memory protection mechanism was through the use of “segmentation”. Essentially, regions of memory would be assigned a descriptor which describes the properties of that region, such as the privilege level to access it, whether it is executable, and whether it is writable. In long mode, we no longer use segmentation for protection, but it is still required. Instead, we just create two segments, for code and for data, and have it span the entire address space.
Descriptors can be marked as either Ring 0 or Ring 3. More importantly, they tell the processor what privilege mode we were in before entering an interrupt. We will use this later to jump into user mode.
The Interrupt Stack Frame
When an interrupt is received by the processor, two important things happen.
- The processor saves some data about its current state onto the stack
- The processor jumps into the appropriate interrupt handler
The data that is saved is called the Interrupt or Exception Stack Frame. Here’s what it looks like:
The two important elements here are the stack and code segment. If their descriptors’ privilege level is Ring 3 then the processor thinks that we were in Ring 3 before the interrupt occurred. More importantly, when we return from the interrupt, the processor will place us into Ring 3.
Creating the descriptors
You’ll need to add two new descriptors to your GDT with these flags:
User Code Segment: USER_SEGMENT | PRESENT | EXECUTABLE | LONG_MODE | RING_USER
User Data Segment: USER_SEGMENT | PRESENT | WRITABLE | LONG_MODE | RING_USER
The hexadecimal version of these descriptors are:
User Code Segment: 0x0020_f800_0000_0000
User Data Segment: 0x0020_f200_0000_0000
These descriptors are almost the same as the kernel code and data descriptors except
they have the RING_USER
flag enabled.
See exampleOS’s construction of these descriptors here.
Referencing the descriptors
Once you’ve added the new descriptors to the table, you’ll need their segment selectors as well.
Segment selectors act as an index into the Global Descriptor Table, as they point to a specific descriptor. They are 16 bits long. Bits 3 to 15 store the actual index into the GDT and bits 0 to 1 store the “requested privilege level” (or RPL for short). For a program to access a segment, their current descriptor’s RPL needs to be lower or equal to the current privilege level of the processor. For our purposes, all we need the RPL to be is Ring 3.
The index of a descriptor is calculated by their byte index divided by eight (because descriptors have a size of eight bytes). For example, if you already had three descriptors in the GDT (including the null descriptor), the fourth descriptor would have an index of 4.
exampleOS’s segment selectors can be found here.
The FLAGS register
The FLAGS register stores information about the state of the processor. For now,
we don’t need to worry about it, but we do need to create a valid FLAGS value. The
most minimal FLAGS register value is: 0x2
which only has the flag RESERVED
set.
Faking the stack
Now we can try and get into user mode. First you need to setup an interrupt stack frame:
pushq USER_DATA_SEGMENT_SELECTOR
pushq rsp
pushq 0x2
pushq USER_CODE_SEGMENT_SELECTOR
pushq ADDRESS_OF_FUNCTION
Replace USER_DATA_SEGMENT_SELECTOR
and USER_CODE_SEGMENT_SELECTOR
with their numerical
values. Replace ADDRESS_OF_FUNCTION
with the address of a function you want executed
in user mode. See exampleOS’s faked stack here.
Finally, add this instruction at the end:
iret
This is the instruction for returning from an interrupt. The processor will pop off all the pushed values, and hopefully bring us into user mode.
Now what?
If you’re lucky, you should have either gotten a page fault or nothing happened at all. If you get a General Protection Fault, double check that all your descriptors are loaded properly, and the selectors are all correct. The address of the function should not be zero either as executing code at address zero will cause a fault.
Interrupts are not working anymore
If you’ve had a timer or keyboard interrupt set up, you may notice that it’s no longer firing. This is because
for interrupts to work, the INTERRUPT_ENABLE
flag needs to be set in the FLAGS register. If you want interrupts,
this is the FLAGS value to push: 0x202
Fixing the page fault
The page that the function resides in must be marked as USER_ACCESSIBLE
and EXECUTABLE
.
Additionally, the page directories needed to access the page must also be marked as
USER_ACCESSIBLE
. This is because when the processor resolves a virtual address, it needs
to read all the page directories to access the page. exampleOS always adds the USER_ACCESSIBLE
flag when creating a page directory.
Caveat
We push the rsp
register onto the exception stack frame. This is bad for two reasons:
- The stack is located in a kernel page
- The user mode program shares the same stack with the kernel
If you call any function, then a page fault will occur because the processor can’t write to the stack. Typically, you should have a separate stack for every user mode program. In a later post, we will fix this issue and look at how a user mode program is organised in memory.
tags: user-mode - descriptors - global-descriptor-table - interrupts - interrupt-stack-frame