Mach-O Binary Format
Mach-O (Mach Object) là định dạng binary dùng cho mọi executable, dynamic library (.dylib), framework, và kernel extension trên Apple platforms.
Tại Sao Cần Hiểu Mach-O
- RE tools (IDA, Ghidra) parse Mach-O để hiển thị code — bạn cần biết chúng đang parse gì
- Exploit payloads thường cần craft Mach-O headers giả (fake objects)
- Code signing signature nằm trong Mach-O structure
- dyld shared cache là mega-Mach-O chứa hầu hết system frameworks
- Kernelcache cũng là Mach-O format
Cấu Trúc Tổng Quan
┌─────────────────────────────┐
│ Mach-O Header │ ← magic, cputype, filetype, ncmds
├─────────────────────────────┤
│ Load Commands │ ← LC_SEGMENT_64, LC_SYMTAB, LC_DYSYMTAB, ...
│ (mảng các command structs) │ Mô tả layout của binary trong memory
├─────────────────────────────┤
│ │
│ Segments │
│ ┌───────────────────────┐ │
│ │ __TEXT segment │ │ ← Code, read-only data, string constants
│ │ ├── __text section │ │ Machine code
│ │ ├── __stubs │ │ PLT-equivalent cho lazy binding
│ │ ├── __stub_helper │ │ Helper code cho lazy binding
│ │ ├── __cstring │ │ C string literals
│ │ └── __const │ │ Read-only constants
│ ├───────────────────────┤ │
│ │ __DATA segment │ │ ← Writable data
│ │ ├── __data │ │ Initialized global variables
│ │ ├── __bss │ │ Uninitialized globals (zero-filled)
│ │ ├── __objc_classlist │ │ ObjC class definitions
│ │ ├── __objc_selrefs │ │ ObjC selector references
│ │ ├── __got │ │ Global Offset Table
│ │ └── __la_symbol_ptr │ │ Lazy symbol pointers
│ ├───────────────────────┤ │
│ │ __DATA_CONST segment │ │ ← Data writable only at load time
│ │ └── __const │ │ vtables, method tables (locked after dyld)
│ ├───────────────────────┤ │
│ │ __LINKEDIT segment │ │ ← Metadata for linker
│ │ (symbol table, string │ │
│ │ table, code signature│ │
│ │ relocation info) │ │
│ └───────────────────────┘ │
└─────────────────────────────┘
Chi Tiết Từng Thành Phần
1. Mach-O Header
struct mach_header_64 {
uint32_t magic; // 0xFEEDFACF (64-bit) hoặc 0xFEEDFACE (32-bit)
cpu_type_t cputype; // CPU_TYPE_ARM64 = 0x0100000C
cpu_subtype_t cpusubtype; // CPU_SUBTYPE_ARM64E = 0x02 (PAC-enabled)
uint32_t filetype; // MH_EXECUTE, MH_DYLIB, MH_KEXT_BUNDLE, ...
uint32_t ncmds; // Số lượng load commands
uint32_t sizeofcmds; // Tổng size của load commands
uint32_t flags; // MH_PIE, MH_NO_HEAP_EXECUTION, ...
uint32_t reserved; // Padding (64-bit only)
};
File types quan trọng:
| Constant | Value | Ý nghĩa |
|---|---|---|
MH_EXECUTE |
0x2 | Executable (apps, daemons) |
MH_DYLIB |
0x6 | Dynamic library (.dylib) |
MH_BUNDLE |
0x8 | Loadable bundle (.bundle) |
MH_DYLINKER |
0x7 | Dynamic linker (dyld itself) |
MH_KEXT_BUNDLE |
0xB | Kernel extension |
MH_FILESET |
0xC | Kernelcache (iOS 12+, chứa kernel + kexts) |
CPU subtypes quan trọng:
CPU_SUBTYPE_ARM64_ALL(0x0) — standard arm64CPU_SUBTYPE_ARM64E(0x2) — arm64e với PAC support (A12+)
2. Load Commands
Load commands là mảng structs nối tiếp nhau sau header. Mỗi command bắt đầu bằng:
struct load_command {
uint32_t cmd; // LC_SEGMENT_64, LC_SYMTAB, ...
uint32_t cmdsize; // Size of this command (including this header)
};
Load commands quan trọng nhất:
LC_SEGMENT_64 — Định nghĩa memory segments
struct segment_command_64 {
uint32_t cmd; // LC_SEGMENT_64
uint32_t cmdsize;
char segname[16]; // "__TEXT", "__DATA", ...
uint64_t vmaddr; // Virtual address khi load
uint64_t vmsize; // Size trong virtual memory
uint64_t fileoff; // Offset trong file
uint64_t filesize; // Size trong file
vm_prot_t maxprot; // Maximum protection (r/w/x)
vm_prot_t initprot; // Initial protection
uint32_t nsects; // Số sections trong segment
uint32_t flags;
};
Protection bits:
VM_PROT_READ(0x1) —__TEXTcó readVM_PROT_WRITE(0x2) —__DATAcó writeVM_PROT_EXECUTE(0x4) —__TEXTcó execute__TEXT: r-x (read + execute, không write → W^X policy)__DATA: rw- (read + write, không execute)
LC_CODE_SIGNATURE — Code signing data
Trỏ đến code signature blob trong __LINKEDIT. Chứa:
- Code Directory: hash của mỗi page
- CMS Signature: PKCS#7 certificate chain + signature
- Entitlements blob: plist XML/binary
- Requirements blob: code requirements
LC_ENCRYPTION_INFO_64 — App Store encryption
App Store apps có __TEXT segment encrypted. Cần decrypt trước khi phân tích.
Khác
LC_SYMTAB— Symbol table locationLC_DYSYMTAB— Dynamic symbol tableLC_LOAD_DYLIB— Dependency declarationsLC_MAIN— Entry point (offset from __TEXT)LC_UUID— Unique build identifierLC_SOURCE_VERSION— Source version info
3. Sections
Mỗi segment chứa nhiều sections:
struct section_64 {
char sectname[16]; // "__text", "__cstring", ...
char segname[16]; // Parent segment name
uint64_t addr; // Virtual address
uint64_t size;
uint32_t offset; // File offset
uint32_t align; // Alignment (power of 2)
uint32_t reloff; // Relocation entries offset
uint32_t nreloc; // Number of relocations
uint32_t flags; // Section type + attributes
uint32_t reserved1; // Indirect symbol table index (cho stubs)
uint32_t reserved2; // Stub size (cho stubs)
uint32_t reserved3;
};
Sections quan trọng cho exploitation:
| Section | Segment | Nội dung | Exploitation relevance |
|---|---|---|---|
__text |
__TEXT |
Machine code | Gadget hunting, code analysis |
__stubs |
__TEXT |
Lazy binding stubs | Indirect calls, hooking points |
__const |
__TEXT |
Read-only constants | vtable locations, method tables |
__const |
__DATA_CONST |
Writable-at-load constants | vtable overwrite (trước khi locked) |
__got |
__DATA |
Global Offset Table | GOT overwrite attacks |
__la_symbol_ptr |
__DATA |
Lazy symbol pointers | Symbol pointer overwrite |
__objc_classlist |
__DATA |
ObjC class list | Class method swizzling |
__bss |
__DATA |
Zero-initialized data | Uninitialized variable exploitation |
4. Fat / Universal Binaries
struct fat_header {
uint32_t magic; // 0xCAFEBABE (big-endian!)
uint32_t nfat_arch; // Number of architectures
};
struct fat_arch {
cpu_type_t cputype;
cpu_subtype_t cpusubtype;
uint32_t offset; // Offset to Mach-O for this arch
uint32_t size;
uint32_t align;
};
Fat binary = container chứa nhiều Mach-O (arm64, arm64e, x86_64, …). dyld chọn đúng architecture khi load.
5. Kernelcache (MH_FILESET)
Trên iOS 12+, kernelcache dùng MH_FILESET type — chứa kernel + tất cả kernel extensions trong 1 file:
Kernelcache (MH_FILESET)
├── kernel (com.apple.kernel)
├── com.apple.iokit.IOSurface
├── com.apple.iokit.IOGraphicsFamily
├── com.apple.driver.AppleARMPlatform
├── com.apple.security.sandbox
├── ... (hundreds of kexts)
Mỗi kext entry là 1 embedded Mach-O với riêng segments/sections.
Phân Tích Thực Hành
Dump Mach-O header
# Xem header
otool -hV /usr/bin/ls
# Xem load commands
otool -lV /usr/bin/ls
# Xem sections
otool -l /usr/bin/ls | grep -A5 "sectname"
Parse bằng Python
import struct
def parse_macho(path):
with open(path, 'rb') as f:
magic = struct.unpack('<I', f.read(4))[0]
if magic == 0xFEEDFACF:
cputype, cpusubtype, filetype, ncmds, sizeofcmds, flags, reserved = \
struct.unpack('<IIIIIiI', f.read(28))
print(f"64-bit Mach-O, {ncmds} load commands")
for i in range(ncmds):
cmd, cmdsize = struct.unpack('<II', f.read(8))
print(f" LC #{i}: cmd=0x{cmd:x}, size={cmdsize}")
f.read(cmdsize - 8) # skip rest of command
Tài Nguyên
- Apple
<mach-o/loader.h>— definitive struct definitions - LIEF Project — Library cho parsing Mach-O (Python/C++)
- Jonathan Levin — macOS and iOS Internals Vol. I (Binary Format chapter)
- Mach-O Wikipedia
Bài Tập
- Hex dump analysis: Mở 1 Mach-O binary trong hex editor, identify header, first load command,
__TEXTsegment by hand - Write Mach-O parser: Viết Python script parse header + tất cả load commands
- Compare arm64 vs arm64e: Extract cùng binary cho cả 2 arch, so sánh load commands
- Extract embedded Mach-Os: Từ MH_FILESET kernelcache, extract 1 kext Mach-O
- Modify load command: Thay đổi 1 LC value bằng hex editor, observe behavior