16-Byte Alignment

Created on Mar 5, 2026

Updated on Apr 26, 2026

By Vesper Vei

4 minutes read

Table of Contents

16-Byte Alignment

16-Byte Alignment

16-Byte Alignment

In the world of Pwn, 16-byte alignment (Stack Alignment) is absolutely a beginner’s “number one killer.” You may have built a perfect ROP chain with flawless logic, yet when running locally or attacking remotely, the program mysteriously crashes directly with SIGSEGV (segmentation fault) inside system or printf.

It feels like you prepared a master key, only to find the lock won’t open because the core is off by 1 millimeter.

1. Why does this rule exist? (Hardware’s “OCD”)

This rule comes from the x86-64 SSE (Streaming SIMD Extensions) instruction set. For maximum performance, when the CPU processes 128-bit (16-byte) data (such as floating-point operations), it uses instructions like movaps (Move Aligned Packed Single-Precision Floating-Point Values).

Unwritten rule: movaps requires the memory address being operated on to be divisible by 16 (that is, the last hexadecimal digit of the address must be 0).
Consequence: If the address is not aligned, the CPU will directly refuse to work, throw an exception, and crash the program. Modern glibc functions (such as printf and system) make heavy use of these instructions internally for optimization.

2. The “dynamic changes” of 16-byte alignment

This is the part that most easily confuses people. According to the System V ABI standard:

Before executing the call instruction, the stack pointer RSP must be 16-byte aligned.

Let’s track how the stack changes:

Before the call: RSP = 0x...00 (16-byte aligned, address ending in 0).
Execute call: The CPU automatically pushes the return address (8 bytes) onto the stack.
- At this point, RSP becomes 0x...F8 (no longer a multiple of 16, offset by 8 bytes).
Enter the function body: The first instruction of the function is usually push rbp.
- At this point, RSP becomes 0x...F0 (hey! back to being a multiple of 16).

Conclusion: While executing logic inside the function body, the compiler assumes by default that RSP is aligned. But if you jump directly to the start of the function by special means (such as ROP), the situation is different.

3. A Pwn player’s nightmare: the alignment pit in a ROP chain

When you build a ROP chain, you are effectively manually “stitching together” return addresses.

Typical failure case:

You found the address of system and prepared pop rdi; ret for argument passing. Your payload looks like this:

[padding] + [pop rdi; ret] + [/bin/sh_addr] + [system_addr]

In this process, you skip the normal call flow and jump directly into system via ret. If at this moment your RSP happens to be sitting at an “awkward” position (for example, ending in 8 instead of 0), then as soon as system executes movaps internally, it will crash immediately.

4. The ultimate remedy: add one more `ret`

If you find that your ROP chain crashes when calling system, and debugging shows it dies on movaps, the simplest black magic is: before calling the target function, insert a useless ret instruction.

Fixed payload:

[padding] + [ret_gadget] + [pop rdi; ret] + [/bin/sh_addr] + [system_addr]

Principle: The essence of ret is pop rip. Executing one extra ret moves RSP by 8 bytes, achieving a gorgeous transition “from unaligned to aligned.”

5. How do you identify it while debugging?

In GDB (for example, using the pwndbg plugin), when you crash:

Look at the instruction where it crashed. If it is something like movaps XMMWORD PTR [rsp+...], xmm....
Check the value of RSP. If the last digit of the address is not 0 (for example, it is 8).
Case closed: this is a 16-byte alignment problem.

Summary Notes

State	RSP State	Notes
Before a standard Call	Must be aligned (ends in `0`)	Mandatory ABI requirement
After executing Call	Unaligned (ends in `8`)	Because an 8-byte return address was pushed
Crash after ROP jump	Usually ends in `8`	Causes SIMD instructions inside `system` to fail
Alignment strategy	Add one `ret` instruction	Force a shift by 8 bytes to cancel the offset

Now you understand why sometimes your payload logic is completely correct, yet you still need to mysteriously add one more ret to make it work, right? Do you want to look at an actual piece of assembly code that causes this kind of crash for this alignment issue, or are you already ready to talk about the “ultimate trick” involving printf variadic arguments and registers?

16-Byte Alignment

Table of Contents

16-Byte Alignment

1. Why does this rule exist? (Hardware’s “OCD”)

2. The “dynamic changes” of 16-byte alignment

3. A Pwn player’s nightmare: the alignment pit in a ROP chain

4. The ultimate remedy: add one more ret

5. How do you identify it while debugging?

Summary Notes

Relationship Graph

4. The ultimate remedy: add one more `ret`