Assembly Codes and x86 Instruction Set: A Beginner’s Guide

Understanding x86 assembly and its instruction set is essential for grasping how programs interact with hardware. This guide provides an overview of the structure of assembly programs, the x86 instruction set, and how various elements like registers, memory, and the stack operate at runtime. By focusing on the essentials, this article helps you decode disassembled programs and uncover the logic behind them.


Assembly Programs: Structure and Components

Assembly programs consist of four main components:

  1. Instructions: These are the actual operations executed by the CPU (e.g., MOV EAX, 1).
  2. Directives: Commands that guide the assembler but don’t translate directly into machine code (e.g., .text, .data).
  3. Labels: Symbolic names that reference specific locations in the program (e.g., label1:).
  4. Comments: Human-readable annotations ignored by the assembler (e.g., ; This is a comment).

Key Sections in an Assembly Program

  1. Data Section (.data or .rodata): Stores constants and initialized data, such as string literals.
    Example:asmCopy.rodata .LC0: .string "Hello, World!"
  2. Text Section (.text): Contains the executable code of the program.
    Example:asmCopy.text .global main main: MOV EDI, OFFSET FLAT:.LC0 ; Load address of the string CALL puts ; Print the string

x86 Instruction Set: Basics

x86 instructions follow the format:
Mnemonic Destination, Source

  • Mnemonic: Represents the operation (e.g., MOV, ADD, CMP).
  • Operands: Specify the data the instruction operates on (e.g., registers, memory addresses, or constants).
Instruction Types
  1. Data Movement: Transfers data between registers, memory, or constants.
    • MOV EAX, 10 – Moves 10 into the EAX register.
    • LEA EAX, [EBX+4] – Loads the effective address of EBX+4 into EAX.
  2. Arithmetic: Performs mathematical operations.
    • ADD EAX, 5 – Adds 5 to the value in EAX.
    • SUB EBX, EAX – Subtracts EAX from EBX.
  3. Logical: Executes bitwise operations.
    • AND EAX, EBX – Performs a bitwise AND between EAX and EBX.
    • OR EAX, 0x1 – Sets the least significant bit in EAX.
  4. Control Flow: Directs the program execution path.
    • JMP label – Unconditionally jumps to a label.
    • JE label – Jumps to a label if the zero flag is set.
  5. Stack Operations: Manages data on the stack.
    • PUSH EAX – Places the value of EAX onto the stack.
    • POP EAX – Removes the top value from the stack into EAX.

Registers in x86 Architecture

  1. General-Purpose Registers:
    • EAX, EBX, ECX, EDX – Used for arithmetic, data storage, and loop control.
    • ESI, EDI – Source and destination indexes, often for string operations.
    • ESP, EBP – Stack pointer and base pointer, used for stack management.
  2. Special-Purpose Registers:
    • Instruction Pointer (EIP): Points to the next instruction to execute.
    • Flags Register (EFLAGS): Tracks the results of operations (e.g., zero flag, sign flag).
  3. Segment Registers:
    • CS, DS, SS, ES, FS, GS – Define memory segments for code, data, and stack.
  4. Control and Debug Registers:
    • CR0-CR4 – Control CPU operation modes.
    • DR0-DR7 – Provide hardware support for breakpoints and debugging.

Memory Operands in x86

x86 allows memory operands using the formula:
Base + (Index * Scale) + Displacement

Example:

asmCopyMOV EAX, [EBX+4*ECX+8]
  • Base: EBX
  • Index: ECX (multiplied by a scale factor of 4)
  • Displacement: 8

The Stack in x86

The stack is a Last-In-First-Out (LIFO) data structure used for function calls, local variables, and return addresses.

  • PUSH: Adds data to the top of the stack.
  • POP: Removes data from the top of the stack.
Stack Example
asmCopyPUSH 5      ; Pushes 5 onto the stack
PUSH 10     ; Pushes 10 onto the stack
POP EAX     ; Removes 10 and stores it in EAX

In this example, the stack grows downward, and EAX now holds the value 10.


AT&T vs. Intel Syntax

x86 assembly is written in two common syntaxes:

  1. Intel Syntax (used in this guide):
    • Destination operand appears first (MOV EAX, 10).
    • Operands are size-determined (e.g., MOV determines if it’s a byte or word operation).
  2. AT&T Syntax:
    • Source operand appears first (MOVL $10, %EAX).
    • Uses suffixes to indicate operand size (B for byte, W for word, L for long).

Conditional Branching and Status Flags

The CMP instruction and flags in the EFLAGS register allow conditional branching:

  • Zero Flag (ZF): Set if the result is zero.
  • Sign Flag (SF): Set if the result is negative.
  • Overflow Flag (OF): Set if an operation causes an overflow.

Example:

asmCopyCMP EAX, EBX     ; Compare EAX with EBX
JE label         ; Jump to label if EAX == EBX

Conclusion

Understanding x86 assembly and its instruction set is invaluable for reverse engineering, malware analysis, and low-level programming. By breaking down programs into their components, mastering registers, and analyzing memory operations, you can decode the behavior of even complex binaries.

Leave a Comment

Your email address will not be published. Required fields are marked *