The 80x86 Opcode Summary

Prepared by Tom Novelli, August 2003
Revised, August 2009

Table of Contents

Caveat Emptor - Missing Pieces

The reason I wrote this was to show how the core 80x86 instruction set is organized, as concisely as possible, to help me write compilers and assemblers for use under Linux and other "modern" systems.

The non-core instructions are relatively straightforward, but you won't find anything about them here. What's missing:

  • FPU instructions
  • MMX, SSE, SSE2/3/4+, and other extensions
  • Supervisor mode instructions
  • Segmentation cruft
  • 16-bit "Real Mode" cruft

Notation & Terminology

All numbers are in OCTAL unless otherwise indicated. Viewed in the proper radix, the x86 is a thing of beauty, almost :-)

Here's the general format of machine code instructions, i.e. the instruction encoding:

prefix(es)  opcode  [xrm  [sib]]  disp  imm

prefixes
Zero or more prefix bytes that affect the following opcode.
opcode
One or two opcode bytes encode the instruction, i.e. MOV, ADD, SUB, etc. Some simple instructions (e.g. PUSH, POP) encode their operands in the opcode.
xrm
Also called the ModRM or Mod-Reg-R/M byte: more on this in the next section.
sib
Scale-Index-Base byte: an "overflow byte" for XRM; more later.
disp
Displacement: a 32-bit absolute memory address, or, in certain forms of CALL/JMP instructions, a short (8-bit) or near (32-bit) relative offset.
imm
Immediate: i.e., a constant, a literal value. Used when the source operand is hard-coded in the instruction. Can be 8, 16, or 32 bits depending on prefixes and opcode.

A few more terms and abbreviations used below:

word
The machine's word size, 32-bit in 32-bit mode, and so forth. Default for most instructions; can be overridden by OPSIZ and ADRSIZ prefixes.
reg
Register operand
r/m
Register-or-Memory operand

Opcode Encodings

Here's an example:

Encoding Mnemonic Notes
210+dw xrm MOV r/m, reg  
  • 210 is the base opcode, in octal.
  • +dw means you may add a direction flag to bit 1 of the opcode, and you may add a word-size flag to bit 0.
  • xrm means an XRM byte must follow the opcode.
  • Anywhere there's a letter in place of an octal digit, substitute a 3-bit number (2-bit for the most significant digit, of course).

Most opcodes include one or two of the following flags, almost always in these bits:

+w bit 0 Word size: 0=byte, 1=word
+d bit 1 Direction: reverse src,dest (Applies to MOV, ALU)
+s bit 1 Sign-extend imm8 to word (Applies to PUSH, ALU, IMUL3)

Operand Encodings

Most of the instruction set's apparent complexity can be factored out as follows.

Register Encodings

Reg No. 0 1 2 3 4 5 6 7
byte AL CL DL BL AH CH DH BH
word AX CX DX BX SP BP SI DI
dword EAX ECX EDX EBX ESP EBP ESI EDI
sreg ES CS SS DS FS GS    

These registers all have names corresponding with their special purpose. Notice that they're sorted by their encoding, not alphabetically as in all the manuals. (As if machine code hacking wasn't confusing enough :-)

  1. AX = Accumulator (for arithmetic, logic, etc.)
  2. CX = Counter (for LOOP, REP, etc.)
  3. DX = Data (e.g. in IN/OUT) or Double (e.g. in MUL/DIV)
  4. BX = Base (for base+displacement addressing)
  5. SP = Stack Pointer
  6. BP = Base Pointer (aka Stack Frame Pointer)
  7. SI = Source Index (for string operations)

The segment registers are nearly obsolete. The CS (Code), SS (Stack), and DS (Data) segments are simply set to 0 under any Unix-like OS. This makes it a lot easier to do nifty things like JIT, "Cheney on the MTA" GC, etc.

XRM (ModRM) Encoding

When an instruction involves two operands, they are usually encoded in the second byte, called the "XRM" byte (officially it's "ModRM" but that doesn't fit neatly into our little ASCII diagrams!) If a memory reference is involved (whether absolute or relative to a register), then we call it the displacement; it can be a full 32 bits, shown as disp32 below, or it can be a "short" 8 bits, shown as disp8.

xxrrrmmm (binary), best viewed as 3 octal digits:
x: modifier flag (indicates one of four cases)
r: register (normally the source operand)
m: reg/mem (normally the destination operand)
Encoding Reg/Mem Addressing Mode Notes
0rm DS:[base]  
0r5 disp32 DS:[disp32] Can't use EBP as Base with no disp
1rm disp8 DS:[base + disp8]  
2rm disp32 DS:[base + disp32]  
xr4 sib SIB byte follows (see below) Can't use ESP as Base
3rm reg  

Note: Special cases are shown in italics.

SIB (Scale*Index+Base) Encoding

More complex memory references are encoded in a second "SIB" byte, which always follows an "XRM" byte.

ssiiibbb (binary), again, best viewed as 3 octal digits:
s: Scale (multiplier for the Index register)
i: Index register
b: Base register
Encoding Reg/Mem Addressing Mode Notes
0r4 sib DS:[base + scale*index]  
0r4 si5 disp32 DS:[scale*index + disp32] Can't use EBP as Base w/o disp
1r4 sib disp8 DS:[base + scale*index + disp8]  
2r4 sib disp32 DS:[base + scale*index + disp32]  
xr4 04b DS:[base] Can't use ESP as Index
xr4 s4b --- Undefined if s > 0

Instruction Encodings

MOV

MOV dest, src

Encoding Mnemonic Notes
210+dw xrm MOV r/m, reg  
214+d xsm MOV r/m, sreg Segments not used in Unix
240+dw disp MOV acc, mem  
26r imm8 MOV reg, imm8  
27r imm32 MOV reg, imm32 bit 3 = 'W' bit
306+w xrm imm MOV r/m, imm  

LEA

LEA dest, src -- Load effective address (store address of src in dest)

215 xrm LEA reg, r/m

XCHG

206+w xrm XCHG reg, r/m  
22r XCHG EAX, reg (XCHG EAX,EAX = 220 = NOP :-)

Arithmetic and Logic (ALU)

??? dest, src

Eight instructions following a pattern, with three different forms.

0p0+dw xrm ??? r/m, reg  
0p4+w imm ??? acc, imm  
200+sw xpm imm ??? r/m, imm 202 (extend word->byte) is invalid
... p=0 ADD  
... p=1 OR  
... p=2 ADC  
... p=3 SBB  
... p=4 AND  
... p=5 SUB  
... p=6 XOR  
... p=7 CMP Read-only SUB; sets FLAGS only.

TEST dest, src

Read-only AND; sets FLAGS but discards its result.

204+w xrm TEST reg, r/m
250+w imm TEST acc, imm
366+w x0m imm TEST r/m, imm

??? src

Accumulator is implicit destination.

366+w x2m NOT r/m
366+w x3m NEG r/m
366+w x4m MUL r/m
366+w x5m IMUL r/m
366+w x6m DIV r/m
366+w x7m IDIV r/m

IMUL dest, src (Integer Multiply) IMUL src, src, dest (Strange RISC-like form :-)

017 257 xrm imm IMUL reg, r/m  
151+(2w) xrm imm IMUL reg, r/m, imm (r/m * imm -> reg)

INC/DEC

10r INC reg32
11r DEC reg32
376+w x0m INC r/m
376+w x1m DEC r/m

PUSH/POP

Same pattern as INC/DEC.

12r PUSH reg32
13r POP reg32
150+s imm PUSH imm
377 x6m PUSH r/m
217 x0m POP r/m
140 PUSHA
141 POPA

Shift/Rotate

Eight instructions following a pattern, with three different forms.

300+w xpm imm8 ??? r/m, imm Rotate by a number (modulo opsize)
320+w xpm ??? r/m, 1 Rotate by one
322+w xpm ??? r/m, CL Rotate by value in CL register
... p=0 ROL  
... p=1 ROR  
... p=2 RCL  
... p=3 RCR  
... p=4 SHL/SAL  
... p=5 SHR  
... p=7 SAR  

BCD Conversion

047 DAA  
057 DAS  
067 AAA  
077 AAS  
324 012 AAM 012 specifies base 10. Some 80x86 chips accept others.
325 012 AAD  

Zero/Sign Extend

Load a byte into a full-width register.

017 266+w MOVZX reg, r/m8 Zero-extend byte to word
017 276+w MOVSX reg, r/m8 Sign-extend byte to word

Sign-extend the accumulator; typically used before DIV.

230 CBW / CWDE Half-width to full-width (AX -> EAX)
231 CWD / CDQ Full-width to double-width (EAX -> EAX:EDX)

Control Transfer

160+cc disp8 Jcc (short)  
017 200+cc disp32 Jcc (near)  
017 220+cc x0m SETcc r/m8  
340 disp8 LOOPNE  
341 disp8 LOOPE  
342 disp8 LOOP  
343 disp8 JCXZ  
350 disp CALL disp Relative displacement
377 x2m CALL r/m Absolute address
351 disp JMP disp Relative
377 x4m JMP r/m Absolute
303 RET  
302 imm16 RET imm Drops N locals from stack
310 imm32 imm8 ENTER locals, nesting Considered obsolete
311 LEAVE (ditto)
313 RET FAR Pops CS:IP
312 imm16 RET FAR imm ... and drops N locals

Condition Codes (note that most Jcc instructions have several aliases):

cc Mnemonics Flags Operation Long-Winded Name
00 o OF=1   Overflow
01 no OF=0   Not Overflow
02 c b nae CF=1 < unsigned u< Carry / Below unsigned
03 nc nb ae CF=0 > unsigned Not Carry / Not Below / Above/Equal
04 z e ZF=1 == Zero / Equal
05 nz ne ZF=0 != Not Zero / Not Equal
06 be na CF=1 & ZF=1 <= unsigned Below/Equal / Not Above
07 nbe a CF=0 & ZF=0 > unsigned Above / Not Below/Equal
10 s SF=1 < 0 Sign bit (Negative)
11 ns SF=0 >= 0 Not Sign (Positive)
12 p pe PF=1   Parity (Even)
13 np po PF=0   No Parity (Odd)
14 l nge SF<>OF < Less / Not Greater/Equal
15 nl ge SF==OF >= Not Less / Greater/Equal
16 le ng ZF=1 | SF<>OF <= Less/Equal / Not Greater
17 nle g ZF=0 & SF==OF > Not Less/Equal / Greater

Flags

234 PUSHF Push full FLAGS register
235 POPF Pop to FLAGS; certain flags protected for security
236 SAHF Store AH -> FLAGS (only affects SF,ZF,AF,PF,CF)
237 LAHF Load low byte of FLAGS -> AH
365 CMC Complement CF (carry flag)
370 CLC Clear CF
371 STC Set CF
372 CLI Clear IF (disable hardware interrupts)
373 STI Set IF (enable hardware interrupts)
374 CLD Clear DF (string operations go forward)
375 STD Set DF (string operations go backward)

Strings

244+w MOVS  
246+w CMPS  
252+w STOS  
254+w LODS  
256+w SCAS  
154+w INS Operands acc, DX implied
156+w OUTS Operands DX, acc implied

IN/OUT

IN value, port OUT port, value

Notice the redundant assembler mnemonics. Your options are really very limited with IN/OUT :-)

344+w imm8 IN acc, port
346+w imm8 OUT port, acc
354+w IN acc, DX
356+w OUT DX, acc

Prefixes

I'll just list the opcodes. Refer to a processor manual for usage details.

146 OPSIZ  
147 ADRSIZ  
360 LOCK  
363 REP (same as REPE, REPZ)
362 REPNE (also REPNZ)

Miscellaneous

017 31r BSWAP reg  
364 HLT  
315 imm8 INT imm8  
316 INT0  
314 INT3  
317 IRET  
360 LOCK  
220 NOP  
233 WAIT  
327 XLAT Equivalent to AL = [EBX+AL]

64-bit Extensions

This is known variously as x64, AMD 64, Intel 64, IA-32e, EM64T. They're all the same, with minor exceptions. (Not to be confused with Itanium/IA-64)

Main changes to the core opcodes:

  • RIP-relative addressing for easy position-independent code (PIC)

    • ModRM encodings with Mod=00 are RIP-relative in 64-bit mode. No REX
    • SIB encodings are still absolute, always.
  • 64-bit registers: RAX, RBX, etc. (expanded versions of EAX, etc.) These can still be accessed in 32, 16, or 8 bit slices as before. They're still special-purpose.

  • Eight new general-purpose registers: R8..R15. These generally behave the same the first eight, but aren't implicitly used by any instructions (I don't think they are). R12 and R13 have quirks corresponding to ESP and EBP.

Use the REX prefix byte to access the new registers.

REX prefix encoding: 0100wrxb (binary)

w: Sets 64-bit operand size (default is still 32-bit)
r: Bit 3 of Reg in ModRM
x: Bit 3 of Index in SIB
b: Bit 3 of Base in ModRM or SIB

The REX prefix replaces the single-byte INC/DEC opcodes (10r/11r).

Other changes:

  • Sixteen 128-bit SIMD registers: XMM0..XMM15

  • Eight 80-bit FPU registers: FPR0..FPR7, aka ST(0)..ST(7). (Same registers, different instructions?)

  • NX (No eXecute) page bit

  • Cruft removed (it's still there, but there are no opcodes for it in 64-bit mode):

    • Segment registers have no effect (except FS and GS which were retained out of pity for Microsoft)
    • TSS (Task State Segments, intended for multitasking but not actually needed)
    • V86 mode (for virtualizing ancient 16-bit programs... use an emulator now.)