Operating Systems
Notes on CMU 213/513 Introduction to Computer Systems.
Course link: https://www.cs.cmu.edu/~213/.
X86-64 Registers
| Register | Description | 32-bit Equivalent |
|---|---|---|
| %rax | Accumulator for operands and results data | %eax |
| %rbx | Pointer to data in the data segment | %ebx |
| %rcx | Counter for string and loop operations | %ecx |
| %rdx | I/O pointer | %edx |
| %rdi | Pointer to first data in the segment (memory) | %edi |
| %rsi | Pointer to last data in the segment (memory) | %esi |
| %rbp | Pointer to the base of the stack | %ebp |
| %rsp | Pointer to the top of the stack | %esp |
| %r8-%r15 | Additional general-purpose registers | %r8d-%r15d |
| %rip | Instruction pointer (Programm counter) | %eip |
Common Instructions
-
Move Data
mov dest, src: Move data fromsrctodest.- Operand Types
- Immediate: Constant value (e.g.,
$5,$0xFF), start with$. - Register: Value stored in a register (e.g.,
%rax,%r13). - Memory: Value stored in memory (e.g.,
[RAX],[RSP+8]). - Rules 1: No way to move directly from memory to memory.
- Rules 2: No way to move anything to immediate.
- Immediate: Constant value (e.g.,
mov (%rcx),%rax; Move the 8 bytes at address in rcx to raxmov $0x12345678,%rax; Move the immediate value 0x12345678 to raxmov %rax,%rbx; Move the value in rax to rbxmov %rax,(%rcx); Move the value in rax to the address in rcx
-
leaq: Load effective addressleaq dest, src: Load the address ofsrcintodest.- Example:
leaq 8(%rcx), %rax; Load the address (rcx + 8) into rax
-
Arithmetic and Logic Instructions
-
add, sub, imul, xor, and, oretc. -
Example:
add %rbx, %rax; rax = rax + rbx -
Shift and logical shift:
salorshl: Shift left (multiply by 2)sar: Shift right (divide by 2, signed, arithmetic)shr: Logical shift right (divide by 2, unsigned)- Example:
shl $3, %rax; rax = rax * 8
- Jumping
jmp label: Unconditional jump tolabel.- Conditional jumps based on flags:
je(jump if equal, ZF=1)jne(jump if not equal, ZF=0)js(jump if sign, SF=1)jns(jump if not sign, SF=0)jl(jump if less, SF != OF)jle(jump if less or equal, ZF=1 or SF != OF)jg(jump if greater, ZF=0 and SF == OF)jge(jump if greater or equal, SF == OF)
- Example:
je label; Jump to label if the zero flag is set
Address computation
Let say %rdx = 0xf000 and %rcx = 0x0100
Expressions:
0x8(%rdx): Address = 0xf000 + 0x8 = 0xf008(%rdx,%rcx): Address = 0xf000 + 0x0100 = 0xf100(%rdx,%rcx,4): Address = 0xf000 + 0x0100 * 4 = 0xf4000x10(,%rdx, 2): Address = 0x10 + 0xf000 * 2 = 0x1e0100x20(%rdx,%rcx,8): Address = 0x20 + 0xf000 + 0x0100 * 8 = 0xf820
Processor states
- Temporary values (%rax, %rbx, etc.)
- Location of runtime stack (i.e., %rsp)
- Location of current code (i.e., %rip)
- Stateus of recent tests (CF, ZF, SF, OF, etc.)
- Carry Flag (CF): Set if an operation generates a carry.
- Zero Flag (ZF): Set if the result of an operation is zero.
- Sign Flag (SF): Set if the result of an operation is negative.
- Overflow Flag (OF): Set if an operation results in a signed overflow.
- Implicitly set by arithmetic and logical instructions. (e.g.,
add,sub,cmp) - Not affected by data movement instructions (e.g.,
mov,lea). - Explicity set by
cmpinstruction.- Let say
cmpq b, a, likea-bwithout setting destination- CF is set if a < b, ZF is set if a == b, SF is set if a < b (signed), OF is set if overflow occurs (signed).
- Let say
Conditional Move Instructions
cmoveX dest, src: Move src to dest if condition X is met.
-
X can be:
e: equal (ZF=1)ne: not equal (ZF=0)l: less (SF != OF)le: less or equal (ZF=1 or SF != OF)g: greater (ZF=0 and SF == OF)ge: greater or equal (SF == OF)s: sign (SF=1)ns: not sign (SF=0)
-
Example:
absdiff:
movq %rdi, %rax # rax = a
subq %rsi, %rax # rax = a - b
movq %rsi, %rdx # rdx = b
subq %rdi, %rdx # rdx = b - a
cmovle %rdx, %rax # if b - a > 0, rax = a
ret
-
Branching can be expensive due to pipeline stalls.
-
Conditional move instructions can help avoid branches.
-
40 cycles for a mispredicted branch vs. 1 cycle for a conditional move.
-
Bad cases:
- Expensive calculation, e.g.,
val = t()? hard1(): hard2();. - Risky computations, e.g.,
val = p? *p: 0;, as both values are computed. - With side effects, e.g.,
val = (p > 0) ? p *= 7 : p += 3;, as both values are computed andpis incremented.
- Expensive calculation, e.g.,
Switch
-
Implemented with a jump table for dense cases.
let say
switch(x) {
case val_0:
block_0;
...etc
}The assembly code:
cmpl $val_n, %eax
ja default_case # if x > val_n, jump to default_case (jump above)
jmp *.L4(,%rax,8) # jump to the address in jump_table + x * 8
# jump_table:
.section .rodata
.align 8
.L4:
.quad case_0
.quad case_1
...
.quad case_n -
Jump table is an array of addresses of the case blocks!!
Procedures
Mechanisms:
- Passing control.
- Passing Data.
- Memory management.
- Mechanism all implemented with machine instructions.
-
- x86-64 implementation of a procedure uses only those mechanisms required.
x86-64 Stack
-
Region of memory managed with stack discipline (LIFO).
-
Grows toward lower addresses.
-
Register %rsp points to the top of the stack.
-
pushq src: Decrement %rsp by 8, then storesrcat the address in %rsp. -
popq dest: Load the value at the address in %rsp intodest, then increment %rsp by 8. (The original value is still in memory but not in the stack!)
Control Flow
-
Call instruction
call label: Push the address of the next instruction (return address) onto the stack, then jump tolabel.- Example:
call foo; Call the procedurefoo.
-
Return instruction
ret: Pop the return address from the stack and jump to that address.- Example:
ret; Return from the current procedure.
-
Stack Frame
-
Every new function call creates a new stack frame.
-
Stack frame contains:
- Return address
- Base register (%rbp)
- Stack register (%rsp)
-
Recursive functions create multiple stack frames, probably leading to stack overflow!
-
Caller frame vs. Callee frame (Current stack frame)
- Caller frame: The stack frame of the function that calls another function.
- Callee frame: The stack frame of the function that is called.
Register Saving Convention
- Caller-saved registers: %rax, %rcx, %rdx, %rsi, %rdi, %r8-%r11
- Caller must save and restore if it wants to preserve their values across a function call.
- Callee-saved registers: %rbx, %rbp, %r12-%r15
- Callee must save and restore if it uses them.
x86-64 Linux Memory Layout
-
Stack
- Runtime stack (8MB limit), check by
limitcommand!v - E.g. local variables
- Runtime stack (8MB limit), check by
-
Heap
- Dynamically allocated memory (e.g.,
malloc,new)
- Dynamically allocated memory (e.g.,
-
Data
- Statically allocated data
- E.g. global variables,
staticvariables
- Text / Shared Library
- Executable machine instructions
- Readonly.
graph TD
subgraph Stack
direction TB
StackTop[High Address]
StackBottom[Low Address]
end
subgraph Heap
direction TB
HeapTop[High Address]
HeapBottom[Low Address]
end
subgraph Data
direction TB
DataTop[High Address]
DataBottom[Low Address]
end
subgraph Text
direction TB
TextTop[High Address]
TextBottom[Low Address]
end
StackTop --> StackBottom
StackBottom --> HeapTop
HeapTop --> HeapBottom
HeapBottom --> DataTop
DataTop --> DataBottom
DataBottom --> TextTop
TextTop --> TextBottom
Performance Optimization
-
Minimize procedure calls
- Inline small functions.
- Use macros for very small functions.
-
Minimize register saving
- Avoid using callee-saved registers in leaf functions.
-
Shift instead of multiply or divide
- Multiply usually need 3 cycles, shift only 1 cycle.
- Addition is also 1 cycle.
- AVX2 / SIMD (Vectorization)
- Single Instruction Multiple Data
- Process multiple data with a single instruction.
- E.g., process 8 integers (32-bit) or 4 doubles (64-bit) in parallel.
- Branches
- Avoid branches when possible.
- Use conditional move instructions to avoid branches.
- Branch prediction: CPU predicts the outcome of a branch to keep the pipeline full.
- Branch Misprediction Recover Penalty: 20-40 cycles.
- Can be a major performance bottleneck.
Summary
Assembler Control
- Conditional jump
- Conditional move
- Indirect jump (jump table)
Memory
- MMU (Mmory-management unit), a hardware device that maps virtual addresses to physical addresses.
- PTE (Page Table Entry), a data structure used by the MMU to store the mapping between virtual and physical addresses. (L1, L2 cache in the MMU).
- PTA (Page Table Address), the address of the page table in memory.
- PPN (Physical Page Number), the number of a physical page.
- VPN (Virtual Page Number), the number of a virtual page.
- VPO (Virtual Page Offset), the offset within a virtual page.
- PPO (Physical Page Offset), the offset within a physical page.
Simple Memory System example, TLB(Translation Lookaside Buffer):
Addressing
- 16 bits vertual address space
- 12 bits physical address space
- 64 kB vertual memory
Memory Mapping
VM areas initialized by associating them with disk objects.
- Virtual Address = VPN * Page Size + VPO
- Physical Address = PPN * Page Size + PPO
- Page Size = 2^n bytes, n is the number of bits for the page offset
Sometime different process will access the same physical memory, e.g., shared libraries.
Copy-on-write (COW) Objects: Steps
- Create a new VM area for the new process.
- Map the new VM area to the same disk object as the original process.
- Mark the new VM area as read-only.
- When either process tries to write to the shared page, a page fault occurs.
- The OS allocates a new physical page, copies the contents of the original page to the new page, and updates the page table to map the virtual address to the new physical page.
- The write operation is then retried, and it succeeds on the new page.
Dynamic Memory Allocation
Keep Track of Free Blocks
- Implicit free list
- link all blocks by length
- for each block, need both size and allocation status.
- Could store this information in two words: wasteful.
- standard trick, use the low-order bit of the size word to indicate allocation status.
- Explicit free list
- Link all "free" blocks
- Segregated free list
- Different free lists for different size classes
- Blocks sorted by size
- Used balanced tree (red-black tree).
