Looking at assembly code for a "hello world"-ish program. This is similar to the example at https://www.cs.cmu.edu/afs/cs/academic/ class/15213-f15/www/lectures/05-machine-basics.pdf Files hello.c C code hello.i C code with included files (#include) added . hello.s assembly code hello.o machine code before linking hello linked executable machine code Compile with $ gcc -m64 -Og --save-temps hello.c -o hello Those compile options are -m64 64bit machine architecture -Og optimize for debugging --save-temps keep all intermediate (temporary) files -o hello create ouput executable named "hello" That creates hello.i, hello.s, hello.o, and ./hello . Running the executable gives : $ ./hello Hello! Did you know that 2 + 3 is 5? We can disassemble that executable with $ objdump -d hello > hello.objdump which creates a hello.objdump file, similar to hello.s. And we can get into the debugger with $ gdb hello # <=== you type this ... (gdb) start # <=== and this. Temporary breakpoint 1 at 0x6eb Starting program: /home/jupyter-jimmahoney/fall2020/systems/c/hello_world/hello Temporary breakpoint 1, 0x00005555555546eb in main () where you can do things like (gdb) info frame Stack level 0, frame at 0x7fffffffe7d0: rip = 0x5555555546eb in main; saved rip = 0x7ffff7a05b97 Arglist at 0x7fffffffe7c0, args: Locals at 0x7fffffffe7c0, Previous frame's sp is 0x7fffffffe7d0 Saved registers: rip at 0x7fffffffe7c8 and this (gdb) info registers rax 0x5555555546eb 93824992233195 rbx 0x0 0 rcx 0x555555554760 93824992233312 rdx 0x7fffffffe8b8 140737488349368 rsi 0x7fffffffe8a8 140737488349352 rdi 0x1 1 rbp 0x555555554760 0x555555554760 <__libc_csu_init> rsp 0x7fffffffe7c8 0x7fffffffe7c8 r8 0x7ffff7dd0d80 140737351847296 r9 0x7ffff7dd0d80 140737351847296 r10 0x1 1 r11 0x0 0 r12 0x5555555545d0 93824992232912 r13 0x7fffffffe8a0 140737488349344 r14 0x0 0 r15 0x0 0 rip 0x5555555546eb 0x5555555546eb
eflags 0x246 [ PF ZF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 and this gdb) disass main Dump of assembler code for function main: => 0x00005555555546eb <+0>: sub $0x18,%rsp 0x00005555555546ef <+4>: mov %fs:0x28,%rax 0x00005555555546f8 <+13>: mov %rax,0x8(%rsp) 0x00005555555546fd <+18>: xor %eax,%eax 0x00005555555546ff <+20>: lea 0x4(%rsp),%rdx 0x0000555555554704 <+25>: mov $0x3,%esi 0x0000555555554709 <+30>: mov $0x2,%edi 0x000055555555470e <+35>: callq 0x5555555546de 0x0000555555554713 <+40>: mov 0x4(%rsp),%r8d 0x0000555555554718 <+45>: mov $0x3,%ecx 0x000055555555471d <+50>: mov $0x2,%edx 0x0000555555554722 <+55>: lea 0xbf(%rip),%rsi # 0x5555555547e8 0x0000555555554729 <+62>: mov $0x1,%edi 0x000055555555472e <+67>: mov $0x0,%eax 0x0000555555554733 <+72>: callq 0x5555555545b0 <__printf_chk@plt> 0x0000555555554738 <+77>: mov 0x8(%rsp),%rcx 0x000055555555473d <+82>: xor %fs:0x28,%rcx 0x0000555555554746 <+91>: jne 0x555555554752 0x0000555555554748 <+93>: mov $0x0,%eax 0x000055555555474d <+98>: add $0x18,%rsp 0x0000555555554751 <+102>: retq 0x0000555555554752 <+103>: callq 0x5555555545a0 <__stack_chk_fail@plt> End of assembler dump. ... and lots of other cool stuff. ;) Notes : * There are two conventions for representing x86 assembly code, AT&T and Intel. See for example https://en.wikipedia.org/wiki/X86_assembly_language#Syntax Our textbook, gdb, and the other unix tools uses AT&T, and so will we. * The stuff in hello.s that starts with a period, like ".cfi_startpfoc", are "assembler directives", i.e. instructions for the compiler which don't directly generate machine code. ("CFI" is "Call Frame Information", which adds debugging information to the machine code file.) * The assembly instructions may come in several variations which mean the same thing, even looking slightly different in hello.objdump and hello.s. For example, (ignoring the .cfi_* stuff) ; -- hello.s -- sumstore: pushq %rbx movq %rdx, %rbx call plus movl %eax, (%rbx) popq %rbx ret ; -- hello.objdump -- 6de : 6de: 53 push %rbx 6df: 48 89 d3 mov %rdx,%rbx 6e2: e8 f3 ff ff ff callq 6da 6e7: 89 03 mov %eax,(%rbx) 6e9: 5b pop %rbx 6ea: c3 retq The "q" and the end of "pushq" or "retq" means "quad" i.e. 64bit, which is the default for -m64. So here "callq" and "call" are both the same. See for example https://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax Your mission is to wrap your head around what's going on here, so that you understand the connection between the C code, the assembly code, and what's going on in the x86 registers and memory. The topics to study include * bits, bytes, hex, ascii, ints, and pointers * x86 registers, addressing and moving data around * x86 logical and arithmetic operations * x86 "call" and the stack : how functions work * x86 control flow : "jump if" instructions * using gdb (gnu debugger) to examine a program as it runs Are we having fun yet? Jim Mahoney | cs.bennington.college | Aug 2020 | MIT License