Mach-O Binary Format
Mach-O (Mach Object) is the binary format used for all executables, dynamic libraries (.dylib), frameworks, and kernel extensions on Apple platforms.
Why You Need to Understand Mach-O
- RE tools (IDA, Ghidra) parse Mach-O to display code β you need to know what they are parsing
- Exploit payloads often need to craft fake Mach-O headers (fake objects)
- Code signing signatures reside within the Mach-O structure
- The dyld shared cache is a mega-Mach-O containing most system frameworks
- The kernelcache is also in Mach-O format
Overall Structure
βββββββββββββββββββββββββββββββ
β Mach-O Header β β magic, cputype, filetype, ncmds
βββββββββββββββββββββββββββββββ€
β Load Commands β β LC_SEGMENT_64, LC_SYMTAB, LC_DYSYMTAB, ...
β (array of command structs) β Describes the binary's layout in memory
βββββββββββββββββββββββββββββββ€
β β
β Segments β
β βββββββββββββββββββββββββ β
β β __TEXT segment β β β Code, read-only data, string constants
β β βββ __text section β β Machine code
β β βββ __stubs β β PLT-equivalent for lazy binding
β β βββ __stub_helper β β Helper code for lazy binding
β β βββ __cstring β β C string literals
β β βββ __const β β Read-only constants
β βββββββββββββββββββββββββ€ β
β β __DATA segment β β β Writable data
β β βββ __data β β Initialized global variables
β β βββ __bss β β Uninitialized globals (zero-filled)
β β βββ __objc_classlist β β ObjC class definitions
β β βββ __objc_selrefs β β ObjC selector references
β β βββ __got β β Global Offset Table
β β βββ __la_symbol_ptr β β Lazy symbol pointers
β βββββββββββββββββββββββββ€ β
β β __DATA_CONST segment β β β Data writable only at load time
β β βββ __const β β vtables, method tables (locked after dyld)
β βββββββββββββββββββββββββ€ β
β β __LINKEDIT segment β β β Metadata for linker
β β (symbol table, string β β
β β table, code signatureβ β
β β relocation info) β β
β βββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββ
Detailed Components
1. Mach-O Header
struct mach_header_64 {
uint32_t magic; // 0xFEEDFACF (64-bit) or 0xFEEDFACE (32-bit)
cpu_type_t cputype; // CPU_TYPE_ARM64 = 0x0100000C
cpu_subtype_t cpusubtype; // CPU_SUBTYPE_ARM64E = 0x02 (PAC-enabled)
uint32_t filetype; // MH_EXECUTE, MH_DYLIB, MH_KEXT_BUNDLE, ...
uint32_t ncmds; // Number of load commands
uint32_t sizeofcmds; // Total size of load commands
uint32_t flags; // MH_PIE, MH_NO_HEAP_EXECUTION, ...
uint32_t reserved; // Padding (64-bit only)
};
Important file types:
| Constant | Value | Meaning |
|---|---|---|
MH_EXECUTE |
0x2 | Executable (apps, daemons) |
MH_DYLIB |
0x6 | Dynamic library (.dylib) |
MH_BUNDLE |
0x8 | Loadable bundle (.bundle) |
MH_DYLINKER |
0x7 | Dynamic linker (dyld itself) |
MH_KEXT_BUNDLE |
0xB | Kernel extension |
MH_FILESET |
0xC | Kernelcache (iOS 12+, contains kernel + kexts) |
Important CPU subtypes:
CPU_SUBTYPE_ARM64_ALL(0x0) β standard arm64CPU_SUBTYPE_ARM64E(0x2) β arm64e with PAC support (A12+)
2. Load Commands
Load commands are an array of structs placed sequentially after the header. Each command starts with:
struct load_command {
uint32_t cmd; // LC_SEGMENT_64, LC_SYMTAB, ...
uint32_t cmdsize; // Size of this command (including this header)
};
Most important load commands:
LC_SEGMENT_64 β Defines memory segments
struct segment_command_64 {
uint32_t cmd; // LC_SEGMENT_64
uint32_t cmdsize;
char segname[16]; // "__TEXT", "__DATA", ...
uint64_t vmaddr; // Virtual address when loaded
uint64_t vmsize; // Size in virtual memory
uint64_t fileoff; // Offset in file
uint64_t filesize; // Size in file
vm_prot_t maxprot; // Maximum protection (r/w/x)
vm_prot_t initprot; // Initial protection
uint32_t nsects; // Number of sections in segment
uint32_t flags;
};
Protection bits:
VM_PROT_READ(0x1) β__TEXThas readVM_PROT_WRITE(0x2) β__DATAhas writeVM_PROT_EXECUTE(0x4) β__TEXThas execute__TEXT: r-x (read + execute, no write β W^X policy)__DATA: rw- (read + write, no execute)
LC_CODE_SIGNATURE β Code signing data
Points to the code signature blob in __LINKEDIT. Contains:
- Code Directory: hash of each page
- CMS Signature: PKCS#7 certificate chain + signature
- Entitlements blob: plist XML/binary
- Requirements blob: code requirements
LC_ENCRYPTION_INFO_64 β App Store encryption
App Store apps have the __TEXT segment encrypted. Must be decrypted before analysis.
Others
LC_SYMTABβ Symbol table locationLC_DYSYMTABβ Dynamic symbol tableLC_LOAD_DYLIBβ Dependency declarationsLC_MAINβ Entry point (offset from __TEXT)LC_UUIDβ Unique build identifierLC_SOURCE_VERSIONβ Source version info
3. Sections
Each segment contains multiple sections:
struct section_64 {
char sectname[16]; // "__text", "__cstring", ...
char segname[16]; // Parent segment name
uint64_t addr; // Virtual address
uint64_t size;
uint32_t offset; // File offset
uint32_t align; // Alignment (power of 2)
uint32_t reloff; // Relocation entries offset
uint32_t nreloc; // Number of relocations
uint32_t flags; // Section type + attributes
uint32_t reserved1; // Indirect symbol table index (for stubs)
uint32_t reserved2; // Stub size (for stubs)
uint32_t reserved3;
};
Sections important for exploitation:
| Section | Segment | Content | Exploitation relevance |
|---|---|---|---|
__text |
__TEXT |
Machine code | Gadget hunting, code analysis |
__stubs |
__TEXT |
Lazy binding stubs | Indirect calls, hooking points |
__const |
__TEXT |
Read-only constants | vtable locations, method tables |
__const |
__DATA_CONST |
Writable-at-load constants | vtable overwrite (before locked) |
__got |
__DATA |
Global Offset Table | GOT overwrite attacks |
__la_symbol_ptr |
__DATA |
Lazy symbol pointers | Symbol pointer overwrite |
__objc_classlist |
__DATA |
ObjC class list | Class method swizzling |
__bss |
__DATA |
Zero-initialized data | Uninitialized variable exploitation |
4. Fat / Universal Binaries
struct fat_header {
uint32_t magic; // 0xCAFEBABE (big-endian!)
uint32_t nfat_arch; // Number of architectures
};
struct fat_arch {
cpu_type_t cputype;
cpu_subtype_t cpusubtype;
uint32_t offset; // Offset to Mach-O for this arch
uint32_t size;
uint32_t align;
};
A fat binary is a container holding multiple Mach-Os (arm64, arm64e, x86_64, β¦). dyld selects the correct architecture when loading.
5. Kernelcache (MH_FILESET)
On iOS 12+, the kernelcache uses the MH_FILESET type β containing the kernel + all kernel extensions in a single file:
Kernelcache (MH_FILESET)
βββ kernel (com.apple.kernel)
βββ com.apple.iokit.IOSurface
βββ com.apple.iokit.IOGraphicsFamily
βββ com.apple.driver.AppleARMPlatform
βββ com.apple.security.sandbox
βββ ... (hundreds of kexts)
Each kext entry is an embedded Mach-O with its own segments/sections.
Practical Analysis
Dump Mach-O header
# View header
otool -hV /usr/bin/ls
# Output:
# Mach header
# magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
# MH_MAGIC_64 ARM64 ALL 0x00 EXECUTE 19 1496 NOUNDEFS DYLDLINK TWOLEVEL PIE
# View load commands
otool -lV /usr/bin/ls
# View sections
otool -l /usr/bin/ls | grep -A5 "sectname"
Parse with Python
import struct
def parse_macho(path):
with open(path, 'rb') as f:
magic = struct.unpack('<I', f.read(4))[0]
if magic == 0xFEEDFACF:
cputype, cpusubtype, filetype, ncmds, sizeofcmds, flags, reserved = \
struct.unpack('<IIIIIiI', f.read(28))
print(f"64-bit Mach-O, {ncmds} load commands")
for i in range(ncmds):
cmd, cmdsize = struct.unpack('<II', f.read(8))
print(f" LC #{i}: cmd=0x{cmd:x}, size={cmdsize}")
f.read(cmdsize - 8) # skip rest of command
Resources
- Apple
<mach-o/loader.h>β definitive struct definitions - LIEF Project β Library for parsing Mach-O (Python/C++)
- Jonathan Levin β macOS and iOS Internals Vol. I (Binary Format chapter)
- Mach-O Wikipedia
Exercises
- Hex dump analysis: Open a Mach-O binary in a hex editor, identify the header, first load command, and
__TEXTsegment by hand - Write a Mach-O parser: Write a Python script to parse the header + all load commands
- Compare arm64 vs arm64e: Extract the same binary for both architectures, compare load commands
- Extract embedded Mach-Os: From an MH_FILESET kernelcache, extract one kext Mach-O
- Modify a load command: Change an LC value with a hex editor, observe the behavior