Mach-O (Mach Object) là định dạng binary dùng cho mọi executable, dynamic library (.dylib), framework, và kernel extension trên Apple platforms.


Tại Sao Cần Hiểu Mach-O

  • RE tools (IDA, Ghidra) parse Mach-O để hiển thị code — bạn cần biết chúng đang parse gì
  • Exploit payloads thường cần craft Mach-O headers giả (fake objects)
  • Code signing signature nằm trong Mach-O structure
  • dyld shared cache là mega-Mach-O chứa hầu hết system frameworks
  • Kernelcache cũng là Mach-O format

Cấu Trúc Tổng Quan

┌─────────────────────────────┐
│       Mach-O Header         │  ← magic, cputype, filetype, ncmds
├─────────────────────────────┤
│      Load Commands          │  ← LC_SEGMENT_64, LC_SYMTAB, LC_DYSYMTAB, ...
│  (mảng các command structs) │     Mô tả layout của binary trong memory
├─────────────────────────────┤
│                             │
│         Segments            │
│  ┌───────────────────────┐  │
│  │ __TEXT segment         │  │  ← Code, read-only data, string constants
│  │  ├── __text section    │  │     Machine code
│  │  ├── __stubs           │  │     PLT-equivalent cho lazy binding
│  │  ├── __stub_helper     │  │     Helper code cho lazy binding
│  │  ├── __cstring         │  │     C string literals
│  │  └── __const           │  │     Read-only constants
│  ├───────────────────────┤  │
│  │ __DATA segment         │  │  ← Writable data
│  │  ├── __data            │  │     Initialized global variables
│  │  ├── __bss             │  │     Uninitialized globals (zero-filled)
│  │  ├── __objc_classlist  │  │     ObjC class definitions
│  │  ├── __objc_selrefs    │  │     ObjC selector references
│  │  ├── __got             │  │     Global Offset Table
│  │  └── __la_symbol_ptr   │  │     Lazy symbol pointers
│  ├───────────────────────┤  │
│  │ __DATA_CONST segment   │  │  ← Data writable only at load time
│  │  └── __const           │  │     vtables, method tables (locked after dyld)
│  ├───────────────────────┤  │
│  │ __LINKEDIT segment     │  │  ← Metadata for linker
│  │  (symbol table, string │  │
│  │   table, code signature│  │
│  │   relocation info)     │  │
│  └───────────────────────┘  │
└─────────────────────────────┘

Chi Tiết Từng Thành Phần

1. Mach-O Header

struct mach_header_64 {
    uint32_t magic;       // 0xFEEDFACF (64-bit) hoặc 0xFEEDFACE (32-bit)
    cpu_type_t cputype;   // CPU_TYPE_ARM64 = 0x0100000C
    cpu_subtype_t cpusubtype;  // CPU_SUBTYPE_ARM64E = 0x02 (PAC-enabled)
    uint32_t filetype;    // MH_EXECUTE, MH_DYLIB, MH_KEXT_BUNDLE, ...
    uint32_t ncmds;       // Số lượng load commands
    uint32_t sizeofcmds;  // Tổng size của load commands
    uint32_t flags;       // MH_PIE, MH_NO_HEAP_EXECUTION, ...
    uint32_t reserved;    // Padding (64-bit only)
};

File types quan trọng:

Constant Value Ý nghĩa
MH_EXECUTE 0x2 Executable (apps, daemons)
MH_DYLIB 0x6 Dynamic library (.dylib)
MH_BUNDLE 0x8 Loadable bundle (.bundle)
MH_DYLINKER 0x7 Dynamic linker (dyld itself)
MH_KEXT_BUNDLE 0xB Kernel extension
MH_FILESET 0xC Kernelcache (iOS 12+, chứa kernel + kexts)

CPU subtypes quan trọng:

  • CPU_SUBTYPE_ARM64_ALL (0x0) — standard arm64
  • CPU_SUBTYPE_ARM64E (0x2) — arm64e với PAC support (A12+)

2. Load Commands

Load commands là mảng structs nối tiếp nhau sau header. Mỗi command bắt đầu bằng:

struct load_command {
    uint32_t cmd;       // LC_SEGMENT_64, LC_SYMTAB, ...
    uint32_t cmdsize;   // Size of this command (including this header)
};

Load commands quan trọng nhất:

LC_SEGMENT_64 — Định nghĩa memory segments

struct segment_command_64 {
    uint32_t  cmd;            // LC_SEGMENT_64
    uint32_t  cmdsize;
    char      segname[16];    // "__TEXT", "__DATA", ...
    uint64_t  vmaddr;         // Virtual address khi load
    uint64_t  vmsize;         // Size trong virtual memory
    uint64_t  fileoff;        // Offset trong file
    uint64_t  filesize;       // Size trong file
    vm_prot_t maxprot;        // Maximum protection (r/w/x)
    vm_prot_t initprot;       // Initial protection
    uint32_t  nsects;         // Số sections trong segment
    uint32_t  flags;
};

Protection bits:

  • VM_PROT_READ (0x1) — __TEXT có read
  • VM_PROT_WRITE (0x2) — __DATA có write
  • VM_PROT_EXECUTE (0x4) — __TEXT có execute
  • __TEXT: r-x (read + execute, không write → W^X policy)
  • __DATA: rw- (read + write, không execute)

LC_CODE_SIGNATURE — Code signing data

Trỏ đến code signature blob trong __LINKEDIT. Chứa:

  • Code Directory: hash của mỗi page
  • CMS Signature: PKCS#7 certificate chain + signature
  • Entitlements blob: plist XML/binary
  • Requirements blob: code requirements

LC_ENCRYPTION_INFO_64 — App Store encryption

App Store apps có __TEXT segment encrypted. Cần decrypt trước khi phân tích.

Khác

  • LC_SYMTAB — Symbol table location
  • LC_DYSYMTAB — Dynamic symbol table
  • LC_LOAD_DYLIB — Dependency declarations
  • LC_MAIN — Entry point (offset from __TEXT)
  • LC_UUID — Unique build identifier
  • LC_SOURCE_VERSION — Source version info

3. Sections

Mỗi segment chứa nhiều sections:

struct section_64 {
    char      sectname[16];   // "__text", "__cstring", ...
    char      segname[16];    // Parent segment name
    uint64_t  addr;           // Virtual address
    uint64_t  size;
    uint32_t  offset;         // File offset
    uint32_t  align;          // Alignment (power of 2)
    uint32_t  reloff;         // Relocation entries offset
    uint32_t  nreloc;         // Number of relocations
    uint32_t  flags;          // Section type + attributes
    uint32_t  reserved1;      // Indirect symbol table index (cho stubs)
    uint32_t  reserved2;      // Stub size (cho stubs)
    uint32_t  reserved3;
};

Sections quan trọng cho exploitation:

Section Segment Nội dung Exploitation relevance
__text __TEXT Machine code Gadget hunting, code analysis
__stubs __TEXT Lazy binding stubs Indirect calls, hooking points
__const __TEXT Read-only constants vtable locations, method tables
__const __DATA_CONST Writable-at-load constants vtable overwrite (trước khi locked)
__got __DATA Global Offset Table GOT overwrite attacks
__la_symbol_ptr __DATA Lazy symbol pointers Symbol pointer overwrite
__objc_classlist __DATA ObjC class list Class method swizzling
__bss __DATA Zero-initialized data Uninitialized variable exploitation

4. Fat / Universal Binaries

struct fat_header {
    uint32_t magic;       // 0xCAFEBABE (big-endian!)
    uint32_t nfat_arch;   // Number of architectures
};

struct fat_arch {
    cpu_type_t cputype;
    cpu_subtype_t cpusubtype;
    uint32_t offset;      // Offset to Mach-O for this arch
    uint32_t size;
    uint32_t align;
};

Fat binary = container chứa nhiều Mach-O (arm64, arm64e, x86_64, …). dyld chọn đúng architecture khi load.

5. Kernelcache (MH_FILESET)

Trên iOS 12+, kernelcache dùng MH_FILESET type — chứa kernel + tất cả kernel extensions trong 1 file:

Kernelcache (MH_FILESET)
├── kernel (com.apple.kernel)
├── com.apple.iokit.IOSurface
├── com.apple.iokit.IOGraphicsFamily
├── com.apple.driver.AppleARMPlatform
├── com.apple.security.sandbox
├── ... (hundreds of kexts)

Mỗi kext entry là 1 embedded Mach-O với riêng segments/sections.


Phân Tích Thực Hành

Dump Mach-O header

# Xem header
otool -hV /usr/bin/ls

# Xem load commands
otool -lV /usr/bin/ls

# Xem sections
otool -l /usr/bin/ls | grep -A5 "sectname"

Parse bằng Python

import struct

def parse_macho(path):
    with open(path, 'rb') as f:
        magic = struct.unpack('<I', f.read(4))[0]
        if magic == 0xFEEDFACF:
            cputype, cpusubtype, filetype, ncmds, sizeofcmds, flags, reserved = \
                struct.unpack('<IIIIIiI', f.read(28))
            print(f"64-bit Mach-O, {ncmds} load commands")
            for i in range(ncmds):
                cmd, cmdsize = struct.unpack('<II', f.read(8))
                print(f"  LC #{i}: cmd=0x{cmd:x}, size={cmdsize}")
                f.read(cmdsize - 8)  # skip rest of command

Tài Nguyên

  • Apple <mach-o/loader.h> — definitive struct definitions
  • LIEF Project — Library cho parsing Mach-O (Python/C++)
  • Jonathan Levin — macOS and iOS Internals Vol. I (Binary Format chapter)
  • Mach-O Wikipedia

Bài Tập

  1. Hex dump analysis: Mở 1 Mach-O binary trong hex editor, identify header, first load command, __TEXT segment by hand
  2. Write Mach-O parser: Viết Python script parse header + tất cả load commands
  3. Compare arm64 vs arm64e: Extract cùng binary cho cả 2 arch, so sánh load commands
  4. Extract embedded Mach-Os: Từ MH_FILESET kernelcache, extract 1 kext Mach-O
  5. Modify load command: Thay đổi 1 LC value bằng hex editor, observe behavior