Go

~/victorpierre.dev $ cat learning/programming/go/memory-layout.md

Memory Layout

Go 6 min read 1273 words

Why Field Order Matters

You might expect a struct to take up exactly the sum of its fields’ sizes, with the fields stored one after another. Go doesn’t do that. It leaves gaps between fields so the CPU can read each one quickly. Those gaps are called padding, and how many you get depends on the order you list the fields in.

For a struct you create once, the wasted space doesn’t matter. But if you keep a slice of a few million of them, the gaps turn into real memory, extra cache misses, and more work for the garbage collector. Reordering the fields costs nothing and usually wins the space back.

What Alignment Is

The gaps exist because of alignment. A CPU doesn’t read memory one byte at a time; it reads it in fixed-size chunks called words (8 bytes on a 64-bit machine). To keep those reads fast, the hardware expects a value to sit at an address that’s a multiple of its size. A value that crosses a word boundary needs an extra read, and on some architectures it’s an outright fault.

So every type has an alignment. In Go it’s equal to the type’s size, capped at the word size. On a 64-bit machine:

Type Size (bytes) Alignment
bool, int8, uint8, byte 1 1
int16, uint16 2 2
int32, uint32, float32, rune 4 4
int64, uint64, float64, int 8 8
pointer, uintptr, unsafe.Pointer 8 8
string (ptr + len) 16 8
slice (ptr + len + cap) 24 8
interface{} / any (two words) 16 8

From that, the compiler lays out a struct using three rules:

  1. Each field starts at an offset that’s a multiple of its own alignment. If the next free offset isn’t aligned, the compiler inserts padding until it is.
  2. The struct’s own alignment is the largest alignment of any field it holds.
  3. The struct’s total size is rounded up to a multiple of that alignment. The leftover bytes at the end are called tail padding, and they’re there so that every element of a []T stays aligned.

A Worked Example

Here’s a struct whose fields hold 15 bytes of actual data (1 + 8 + 1 + 4 + 1):

type BadLayout struct {
    a bool  // 1 byte
    b int64 // 8 bytes
    c bool  // 1 byte
    d int32 // 4 bytes
    e bool  // 1 byte
}

Let’s place the fields one at a time, tracking the next free offset as we go.

a goes at offset 0. b is an int64, so it needs an offset divisible by 8; the next one is 8, which wastes the seven bytes after a. c goes at 16. d is an int32 and needs an offset divisible by 4, so it lands at 20, wasting three more bytes. e goes at 24. The struct’s alignment is 8 (because of b), so its size rounds up from 25 to 32, adding seven bytes of tail padding.

That’s 32 bytes to hold 15 bytes of data. Over half the struct is padding.

Here’s that block drawn as a grid, eight bytes per row (one word). Teal cells hold real data, red cells are wasted padding:

  block-beta
  columns 8
  a["a"] p0["padding"]:7
  b["b (int64)"]:8
  c["c"] p1["pad"]:3 d["d (int32)"]:4
  e["e"] p2["tail padding"]:7

  classDef data fill:#118098,stroke:#333,color:#fff;
  classDef pad fill:#a23b3b,stroke:#333,color:#fff;

  class a,b,c,d,e data
  class p0,p1,p2 pad

Now move the bigger fields to the front:

type GoodLayout struct {
    b int64 // 8 bytes
    d int32 // 4 bytes
    a bool  // 1 byte
    c bool  // 1 byte
    e bool  // 1 byte
}

This time d (align 4) sits at offset 8, and the three one-byte fields fill the gap right after it. Only one byte of tail padding is left over. Same fields, same types, 16 bytes instead of 32:

  block-beta
  columns 8
  b["b (int64)"]:8
  d["d (int32)"]:4 a["a"] c["c"] e["e"] p0["pad"]

  classDef data fill:#118098,stroke:#333,color:#fff;
  classDef pad fill:#a23b3b,stroke:#333,color:#fff;

  class b,d,a,c,e data
  class p0 pad

Check It Yourself

You don’t have to take my word for it. The unsafe package reports the real numbers:

import (
    "fmt"
    "unsafe"
)

func main() {
    fmt.Println(unsafe.Sizeof(BadLayout{}))  // 32
    fmt.Println(unsafe.Sizeof(GoodLayout{})) // 16

    // Where each field actually lives:
    var g GoodLayout
    fmt.Println(unsafe.Offsetof(g.b)) // 0
    fmt.Println(unsafe.Offsetof(g.d)) // 8
    fmt.Println(unsafe.Offsetof(g.a)) // 12

    // Alignment requirement of a type:
    fmt.Println(unsafe.Alignof(int64(0))) // 8
}

Sizeof counts the padding, and Offsetof shows you where the gaps are.

It’s Not Just About Memory

The obvious win is saving bytes, but a tighter layout helps in a couple of other ways too.

First, the cache. A CPU cache line is usually 64 bytes. Four GoodLayout values fit in one line, but only two BadLayout values do. When you iterate over a large slice, the smaller struct touches roughly half as many cache lines, and in a tight loop that’s often where the time actually goes.

Second, the garbage collector. Fewer bytes per object means less to allocate, and less memory for the collector to scan and track.

And it all scales with the count. A slice of ten million BadLayout needs about 320 MB; the same data as GoodLayout needs 160 MB.

When You Actually Want Padding

Reordering is usually about getting rid of padding, but now and then you add it on purpose. If two goroutines write to two different fields that happen to land on the same cache line, every write invalidates the other core’s copy of that line. The cores end up fighting over it, and performance quietly tanks. This is called false sharing. Padding the fields apart so each gets its own cache line fixes it:

type Counters struct {
    a uint64
    _ [56]byte // pad to 64 bytes so b lands on its own cache line
    b uint64
}
Everything here assumes a 64-bit platform (amd64, arm64). On a 32-bit build the word is 4 bytes, so an int64 or float64 may only be 4-byte aligned. That’s why a 64-bit field you use with sync/atomic has to be the first field of the struct on those platforms; otherwise the atomic access can be misaligned and panic.
Gotcha: a zero-sized field at the end. A struct{} or [0]T takes up no space, unless it’s the last field of a struct. There it gets padded to a full word, so that taking its address can’t produce a pointer past the end of the allocation (which would trip up the garbage collector). If you need a zero-sized field, put it first.

Let the Tooling Do It

You rarely need to reorder fields by hand. The fieldalignment analyzer finds wasteful structs and can rewrite them for you:

go install golang.org/x/tools/go/analysis/passes/fieldalignment/cmd/fieldalignment@latest

fieldalignment ./...        # report structs that waste space
fieldalignment -fix ./...   # rewrite them in place
Only bother with this where it pays off: structs you allocate by the million, hot loops, large caches. For something you build once, like a config struct, order the fields however reads best. Don’t trade clarity for bytes nobody will miss.

Summary

  • The compiler inserts padding between fields so each one lands at an offset that matches its alignment.
  • A struct’s alignment is that of its largest field, and its total size is rounded up to a multiple of it.
  • Listing fields from largest to smallest packs them tightly and leaves the least padding.
  • Smaller structs use less memory, sit better in cache, and give the garbage collector less to do, and that adds up fast across big slices.
  • Use unsafe.Sizeof, Offsetof, and Alignof to inspect a layout, and fieldalignment to fix it.

References