第三章 程序的机器级表示
第三章 程序的机器级表示
3.1 从历史角度看
Intel 的处理器产品线,通常又被叫做 x86 处理器,已经经历了一个很长时间的发展和进步
原书这里介绍了 Intel 历史上的经典处理器型号,我把它省略了吧,因为很像 Intel 的发家史
AMD 的全称是 Advanced Micro Devices
obsolete adj. 废弃的;老式的
3.2 程序的编码
假设我们编写了两个C文件 p1.c 和 p2.c
现在用下面的命令行进行编译
1 | gcc -Og -o p p1.c p2.c |
-Og
要求编译器进行能够保留原始C代码结构的优化,不要过度优化
执行这条命令后,gcc 编译器会调用一系列程序将 源代码 变成 可执行文件
首先,预处理器(preprocessor)会处理所有 预编译代码 和 宏
然后编译器(compiler)将C语言翻译成对应的 汇编语言 版本
接着,汇编器(assembler)将 汇编语言 翻译成 全是二进制机器语言 的目标文件
最后,链接器(linker)出马,将之前的 目标文件 与 库代码等连接到一起,生成最后的可执行文件
3.2.1 机器级代码
机器级代码的格式和行为全部由 指令集架构(instruction set architecture,ISA)决定
elaborate adj. 复杂的
机器级的代码使用的是虚拟地址,操作系统负责 虚拟地址 和 真实的物理地址 之间的转换
3.2.2 一个代码例子
1 | long mult2(long, long); |
我们把文件命名为 mstore.c 并执行以下命令
1 | gcc -Og -S mstore.c |
-S
选项要求 gcc 生成 mstore.s 后停止,不再继续后面的工作
如果执行
1 | gcc -Og -c mstore.c |
-c
选项会使 gcc 在生成 mstore.o 后停止
如果想查看目标文件里的内容(.o 文件),可以使用 objdump 反汇编工具(disassemblers)
1 | objdump -d mstore.o |
在虚拟机中执行完上面这行命令后的输出是
1 | mstore.o: 文件格式 elf64-x86-64 |
现在再编写一个文件 main.c 来生成真正的可执行文件
1 |
|
执行下面的命令生成可执行文件
1 | gcc -Og -o prog main.c mstore.c |
3.3 数据格式
在 x86 系列发展的过程中,先是把 16-bit 数据称为 word
随着该系列处理器的继续发展,把 32-bit 数据称为 double words ,64-bit 数据称为 quad words
下表列出了与C语言中一些类型对应的 Intel 数据类型

3.4 访问信息
一个 x86-64 CPU 拥有 16 个 64 位通用寄存器(general purpose register)

3.4.1 操作数指定符 operand specifiers
操作数分为三类:直接数、寄存器 和 内存地址
有很多寻址模式 addressing modes ,它们的通式可以表示为 $Imm(r_b,r_i,s)$ ,它表示的地址是 $Imm+R[r_b]+R[r_i*s]$

Practice Problem 3.1
Assume the following values are stored at the indicated memory addresses and registers:
address | value | register | value |
---|---|---|---|
0x100 | 0xFF | %rax | 0x100 |
0x104 | 0xAB | %rcx | 0x1 |
0x108 | 0x13 | %rdx | 0x3 |
0x10c | 0x11 |
Fill in the following table showing the values for the indicated operands:
operand | value |
---|---|
%rax | 0x100 |
0x104 | 0xAB |
$0x108 | 0x108 |
(%rax) | 0xFF |
4(%rax) | 0xAB |
9(%rax,%rdx) | 0x11 |
260(%rcx,%rdx) | 0x13 |
0xFC(,%rcx,4) | 0xFF |
(%rax,%rdx,4) | 0x11 |
3.4.2 移动数据的指令
MOV
类指令

通常,MOV
类指令只会改变目标寄存器的指定位。唯一的例外是:当指令 movl
的目标操作数是一个寄存器时,该指令会把目标寄存器的高32位(高4字节)置零,这与 x86-64 架构有关
1 | movl $0x4050,%eax Immediate--Register, 4 bytes |
有两组专门将小数据拷贝到大寄存器里的指令
movz
类:目标寄存器的高位用 0 填充(zero)

movs
类:目标寄存器的高位用 符号位 填充(sign)

Practice Problem 3.2
For each of the following lines of assembly language, determine the appropriate instruction suffix based on the operands. (For example, mov
can be rewritten as movb
, movw
, movl
, or movq
.)
1 | movl %eax, (%rsp) |
Practice Problem 3.3
Each of the following lines of code generates an error message when we invoke the assembler. Explain what is wrong with each line.
1 | movb $0xF, (%ebx) ebx (32位)不能用作地址寄存器 |
3.4.3 移动数据举例
一个C语言函数:
1 | long exchange(long *xp, long y){ |
它对应的汇编语言版本:
1 | ;long exchange(long *xp, long y) |
Practice Problem 3.4
Assume variables sp
and dp
are declared with types
1 | src_t *sp; |
where src_t
and dest_t
are data types declared with typedef
. We wish to use the appropriate pair of data movement instructions to implement the operation
1 | *dp = (dest_t) *sp; |
Assume that the values of sp
and dp
are stored in registers %rdi
and %rsi
, respectively. For each entry in the table, show the two instructions that implement the specified data movement. The first instruction in the sequence should read from memory, do the appropriate conversion, and set the appropriate portion of register %rax
. The second instruction should then write the appropriate portion of %rax
to memory. In both cases, the portions may be %rax
, %eax
, %ax
, or %al
, and they may differ from one another.
Recall that when performing a cast that involves both a size change and a change of “signedness” in C, the operation should change the size first (Section 2.2.6).
src_t |
dest_t |
Instruction |
---|---|---|
long |
long |
movq (%rdi),%rax |
movq %rax,(%rsi) |
||
char |
int |
movsbl (%rdi),%eax |
movl %eax,(%rsi) |
||
char |
unsigned |
movsbl (%rdi),%eax |
movl %eax,(%rsi) |
||
unsigned char |
long |
movzbq (%rdi),%rax |
movq %rax,(%rsi) |
||
int |
char |
movl (%rdi),%eax |
movb %al,(%rsi) |
||
unsigned |
unsigned char |
movl (%rdi),%eax |
movb %al,(%rsi) |
||
char |
short |
movsbw (%rdi),%ax |
movw %ax,(%rsi) |
Practice Problem 3.5
You are given the following information. A function with prototype
1 | void decode1(long *xp, long *yp, long *zp); |
is compiled into assembly code, yielding the following:
1 | ;void decode1(long *xp, long *yp, long *zp) |
Parameters xp
, yp
, and zp
are stored in registers %rdi
, %rsi
, and %rdx
, respectively.
Write C code for decode1
that will have an effect equivalent to the assembly code shown.
1 | void decode1(long *xp, long *yp, long *zp){ |
3.4.4 数据的进栈和出栈 pushing and popping
push and pop instructions:

%rsp
寄存器存放当前栈顶的地址
3.5 算术和逻辑运算
3.5.1 加载有效地址

Practice Problem 3.6
Suppose register %rbx
holds value p and %rdx
holds value q. Fill in the table below with formulas indicating the value that will be stored in register %rax
for each of the given assembly-code instructions:
Instruction | Result |
---|---|
leaq 9(%rdx), %rax |
(%rdx)+9 |
leaq (%rdx,%rbx), %rax |
(%rdx)+(%rbx) |
leaq (%rdx,%rbx,3), %rax |
(%rdx)+3*(%rbx) |
leaq 2(%rbx,%rbx,7) |
8*(%rbx)+2 |
leaq 0xE(,%rdx,3), %rax |
3*(%rdx)+14 |
leaq 6(%rbx,%rdx,7), %rax |
(%rbx)+7*(%rdx)+6 |
Practice Problem 3.7
Consider the following code, in which we have omitted the expression being computed:
1 | short scale3(short x, short y, short z) { |
Compiling the actual function with gcc yields the following assembly code:
1 | ;short scale3(short x, short y, short z) |
Fill in the missing expression in the C code.
1 | short scale3(short x, short y, short z) { |
3.5.2 一元和二元操作
Practice Problem 3.8
Assume the following values are stored at the indicated memory addresses and registers:
address | value | register | value |
---|---|---|---|
0x100 | 0xFF | %rax | 0x100 |
0x108 | 0xAB | %rcx | 0x1 |
0x110 | 0x13 | %rdx | 0x3 |
0x118 | 0x11 |
Fill in the following table showing the effects of the following instructions, in terms of both the register or memory location that will be updated and the resulting value:
Instruction | Destination | Value |
---|---|---|
addq %rcx,(%rax) |
0x100 | 0xff+0x1=0x100 |
subq %rdx,8(%rax) |
0x108 | 0xab-0x3=0xa8 |
imulq $16,(%rax,%rdx,8) |
0x118 | 0x11*0x10=0x110 |
incq 16(%rax) |
0x110 | 0x14 |
decq %rcx |
%rcx | 0 |
subq %rdx,%rax |
%rax | 0xFD |
3.5.3 移位操作
Practice Problem 3.9
Suppose we want to generate assembly code for the following C function:
1 | long shift_left4_rightn(long x, long n){ |
The code that follows is a portion of the assembly code that performs the actual shifts and leaves the final value in register %rax
. Two key instructions have been omitted. Parameters x and n are stored in registers %rdi
and %rsi
, respectively.
1 | ;long shift_left4_rightn(long x, long n) |
Fill in the missing instructions, following the annotations on the right. The right shift should be performed arithmetically.
Practice Problem 3.10
Consider the following code, in which we have omitted the expression being computed:
1 | short arith3(short x, short y, short z){ |
The portion of the generated assembly code implementing these expressions is as follows:
书本上给的汇编代码有些错误,我按照自己编译出来的汇编代码进行了修改
1 | ;short arith3(short x, short y, short z) |
1 | short arith3(short x, short y, short z){ |
Practice Problem 3.11
It is common to find assembly-code lines of the form
1 | xorq %rcx,%rcx |
in code that was generated from C where no exclusive-or operations were present.
A. Explain the effect of this particular exclusive-or instruction and what useful operation it implements.
将寄存器内容清零
B. What would be the more straightforward way to express this operation in assembly code?
1 | movq $0,%rcx |
C. Compare the number of bytes to encode any two of these three different implementations of the same operation.
事实证明,使用异或置零生成的代码体积更小
3.5.5 特殊算术操作
3.6 控制
3.6.1 标志寄存器 Condition Codes
CF: Carry flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.
ZF: Zero flag. The most recent operation yielded zero.
SF: Sign flag. The most recent operation yielded a negative value.
OF: Overflow flag. The most recent operation caused a two’s-complement overflow—either negative or positive.

3.6.2 获取标志的值 Accessing the Condition Codes

Practice Problem 3.13
The C code
1 | int comp(data_t a, data_t b) { |
shows a general comparison between arguments a and b, where data_t
, the data type of the arguments, is defined (via typedef
) to be one of the integer data types listed in Figure 3.1 and either signed or unsigned. The comparison COMP
is defined via #define
.
Suppose a is in some portion of %rdx
while b is in some portion of %rsi
. For each of the following instruction sequences, determine which data types data_t
and which comparisons COMP
could cause the compiler to generate this code. (There can be multiple correct answers; you should list them all.)
A.
1 | cmpl %esi, %edi |
number | data type | operation |
---|---|---|
1 | int | a<b |
B.
1 | cmpw %si, %di |
number | data type | operation |
---|---|---|
1 | short | a >= b |
C.
1 | cmpb %sil, %dil |
number | data type | operation |
---|---|---|
1 | unsigned char | a <= b |
D.
1 | cmpq %rsi, %rdi |
number | data type | operation |
---|---|---|
1 | long | a != b |
2 | unsigned long | a != b |
3 | some form of pointer | a != b |
Practice Problem 3.14
The C code
1 | int test(data_t a) { |
shows a general comparison between argument a and 0, where we can set the data type of the argument by declaring data_t
with a typedef
, and the nature of the comparison by declaring TEST with a #define
declaration. The following instruction sequences implement the comparison, where a
is held in some portion of register %rdi
. For each sequence, determine which data types data_t
and which comparisons TEST could cause the compiler to generate this code. (There can be multiple correct answers; list all correct ones.)
A.
1 | testq %rdi, %rdi |
number | data type | operation |
---|---|---|
1 | long | a >= 0 |
B.
1 | testw %di, %di |
number | data type | operation |
---|---|---|
1 | short | a == 0 |
2 | unsigned short | a == 0 |
C.
1 | testb %dil, %dil |
number | data type | operation |
---|---|---|
1 | unsigned char | a > 0 |
D.
1 | testl %edi, %edi |
number | data type | operation |
---|---|---|
1 | int | a <= 0 |
3.6.3 Jump 指令
contrived adj. 人为的;做作的;不自然的

3.6.4 Jump 指令的编码
跳转指令的目标地址编码方式有两种:一种是 PC relative ,即给出目标地址相对于即将要执行的指令(当前指令的下一条)的相对位移;另一种是直接给出目标地址的绝对地址 absolute address
Practice Problem 3.15
In the following excerpts from a disassembled binary, some of the information has been replaced by X’s. Answer the following questions about these instructions.
A. What is the target of the je
instruction below? (You do not need to know anything about the callq
instruction here.)
1 | 4003fa: 74 02 je XXXXXX |
0x4003fc+0x02=0x4003fe
B. What is the target of the je
instruction below?
1 | 40042f: 74 f4 je XXXXXX |
0x400431-0xc=0x400425
C. What is the address of the ja
and pop
instructions?
1 | 400543: 77 02 ja 400547 |
D. In the code that follows, the jump target is encoded in PC-relative form as a 4- byte two’s-complement number. The bytes are listed from least significant to most, reflecting the little-endian byte ordering of x86-64.What is the address of the jump target?
1 | 4005e8: e9 73 ff ff ff jmpq XXXXXXX |
0xffff ff73= -141
0x4005ed - 141 = 0x400560
3.6.5 使用条件控制指令实现条件分支
cmp
和 jmp
语句的结合
Practice Problem 3.16
When given the C code
1 | void cond(short a, short *p){ |
gcc generates the following assembly code:
1 | ;void cond(short a, short *p) |
A. Write a goto
version in C that performs the same computation and mimics the control flow of the assembly code, in the style shown in Figure 3.16(b). You might find it helpful to first annotate the assembly code as we have done in our examples.
1 | void cond(short a,short *p){ |
Practice Problem 3.17
An alternate rule for translating if
statements into goto code is as follows:
1 | t = test-expr; |
Practice Problem 3.18
Starting with C code of the form
1 | short test(short x, short y, short z) { |
gcc generates the following assembly code:
1 | ;short test(short x, short y, short z) |
Fill in the missing expressions in the C code.
3.6.6 使用条件移动(Conditional Moves)实现条件分支(Conditional Branches)
使用现代处理器时,Conditional Moves 方法通常会比使用 Conditional Control 更加高效
Practice Problem 3.19
Running on a new processor model, our code required around 45 cycles when the branching pattern was random, and around 25 cycles when the pattern was highly predictable.
A. What is the approximate miss penalty?
50 个机器周期
B. How many cycles would the function require when the branch is mispredicted?
70 个机器周期
条件移动指令:

Practice Problem 3.20
In the following C function, we have left the definition of operation OP
incomplete:
1 |
|
When compiled, gcc generates the following assembly code:
1 | ;short arith(short x) |
这是除法操作的汇编代码,所以题目中缺失的运算符是 /
注意当 x<0 时,由于使用2的补码表示的整型除以2的整数次幂的时候要加一个 bias (具体看看 2.3.7),所以才有第一句 leaq 15(%rdi), %rbx ;temp=x+15
因为读第二章和第三章两章时间间隔有点久,这里被这个 +15 的操作难住了,记忆力不好
Practice Problem 3.21
Starting with C code of the form
1 | short test(short x, short y) { |
gcc generates the following assembly code:
1 | ;short test(short x, short y) |
Fill in the missing expressions in the C code.
1 | short test(short x, short y) { |
3.6.7 循环
do-while
循环
Practice Problem 3.23
For the C code
1 | short dw_loop(short x) { |
gcc generates the following assembly code:
1 | ;short dw_loop(short x) |
A. Which registers are used to hold program values x, y, and n?
rdi and rcx
rcx
rdx
B. How has the compiler eliminated the need for pointer variable p
and the pointer dereferencing implied by the expression (*p)+=5
?
C. Add annotations to the assembly code describing the operation of the program.
while
loops
Practice Problem 3.24
For C code having the general form
1 | short loop_while(short a, short b){ |
gcc, run with command-line option -Og
, produces the following code:
1 | ;short loop_while(short a, short b) |
We can see that the compiler used a jump-to-middle translation, using the jmp
instruction on line 3 to jump to the test starting with label .L2
. Fill in the missing parts of the C code.
1 | short loop_while(short a, short b){ |
Practice Problem 3.25
For C code having the general form
1 | long loop_while2(long a, long b){ |
gcc, run with command-line option -O1
, produces the following code:
1 | ;a in %rdi, b in %rsi |
We can see that the compiler used a guarded-do translation, using the jle
instruction on line 3 to skip over the loop code when the initial test fails. Fill in the missing parts of the C code. Note that the control structure in the assembly code does not exactly match what would be obtained by a direct translation of the C code according to our translation rules. In particular, it has two different ret
instructions (lines 10 and 13). However, you can fill out the missing portions of the C code in a way that it will have equivalent behavior to the assembly code.
1 | long loop_while2(long a, long b){ |
Practice Problem 3.26
A function test_one
has the following overall structure:
1 | short test_one(unsigned short x) { |
The gcc C compiler generates the following assembly code:
1 | ;short test_one(unsigned short x) |
Reverse engineer the operation of this code and then do the following:
A. Determine what loop translation method was used.
jump-to-middle
B. Use the assembly-code version to fill in the missing parts of the C code.
C. Describe in English what this function computes.
经测试,书中给出的代码有错误。如果运行书中的代码,无论如何,返回值都是0。
把原汇编代码改成
1 | ;short test_one(unsigned short x) |
相应的C代码改成
1 | short test_one(unsigned short x) { |
这样的代码就实现了答案中所说的计算 x 中1出现次数奇偶性的功能,出现奇数个1返回0,偶数个1返回1。
测试结果:
1 | x=0x01,val=0 |
for
loops
for
循环在本质上和 while
循环是相同的。自然地,for
循环也有两种翻译成机器语言的方法 jump-to-middle & guarded-do
下面是一个计算阶乘的C语言函数
1 | long fact_for(long n){ |
对应的 while
版本
1 | long fact_for_while(long n){ |
jump-to-middle 版本机器语言对应的C语言(使用 goto)
1 | long fact_for_jm_goto(long n){ |
jump-to-middle 版本汇编语言
1 | fact_for: |
Practice Problem 3.27
Write goto
code for a function called fibonacci
to print fibonacci numbers using a while
loop. Apply the guarded-do transformation.
1 | void fibonacci(int n){ |
Practice Problem 3.28
A function test_two
has the following overall structure:
1 | short test_two(unsigned short x) { |
The gcc C compiler generates the following assembly code:
1 | ;test fun_b(unsigned test x) |
Reverse engineer the operation of this code and then do the following:
A. Use the assembly-code version to fill in the missing parts of the C code.
结合答案来看这个汇编代码,错误很多啊,为啥第三章代码的错误这么多,真的耽误时间
以下是答案给出的C源代码:
1 | long fun_b(unsigned long x) { |
使用以下命令编译
1 | gcc -O1 -S .\p328.c |
得到的汇编代码(简化后):
1 | fun_b: |
这样才能对应上,题目给的汇编代码错误很多
B. Explain why there is neither an initial test before the loop nor an initial jump to the test portion of the loop.
C. Describe in English what this function computes.
bit-wise inverse
3.6.8 switch
语句
Practice Problem 3.30
In the C function that follows, we have omitted the body of the switch
statement. In the C code, the case labels did not span a contiguous range, and some cases had multiple labels.
1 | void switch2(short x, short *dest) { |
In compiling the function, gcc generates the assembly code that follows for the initial part of the procedure, with variable x
in %rdi
:
1 | ;void switch2(short x, short *dest) |
1 | .L4: |
Based on this information, answer the following questions:
A. What were the values of the case labels in the switch
statement?
-2,-1,0,1,3,4,6
B. What cases had multiple labels in the C code?
1和3,-1和6
Practice Problem 3.31
For a C function switcher
with the general structure
1 | void switcher(long a, long b, long c, long *dest){ |
1 | ;void switcher(long a, long b, long c, long *dest) |
Jump table
1 | .L4: |
3.7 过程 procedures
这里的过程是一种抽象概念,在不同的编程语言中,过程的具体形式有:函数、方法、子例程和处理函数等等
要提供对过程的机器级支持,必须处理好以下三件事:控制权的传递、数据(参数)的传递、内存的分配和释放
3.7.1 运行时栈 run-time stack
x86-64 的栈向低地址方向增长,所以减小栈指针实际上是入栈操作
x86-64 上栈的通用结构

3.7.2 控制权的转移
Practice Problem 3.32
The disassembled code for two functions first
and last
is shown below, along with the code for a call of first
by function main
:
1 | ;Disassembly of last(long u, long v) |
1 | ;Disassembly of last(long x) |
Label | PC | Instruction | %rdi | %rsi | %rax | %rsp | *%rsp | Description |
---|---|---|---|---|---|---|---|---|
M1 | 0x400560 | callq | 10 | - | - | 0x7fffffffe820 | - | call first(10) |
F1 | 0x400548 | lea | 10 | - | - | 0x7fffffffe818 | 0x400565 | lea 0x1(%rdi),%rsi |
F2 | 0x40054c | sub | 10 | 11 | - | 0x7fffffffe818 | 0x400565 | sub $0x1,%rdi |
F3 | 0x400550 | callq | 9 | 11 | - | 0x7fffffffe818 | 0x400565 | call last(9,11) |
L1 | 0x400540 | mov | 9 | 11 | - | 0x7fffffffe810 | 0x400555 | mov %rdi,%rax |
L2 | 0x400543 | imul | 9 | 11 | 9 | 0x7fffffffe810 | 0x400555 | imul %rsi,%rax |
L3 | 0x400547 | retq | 9 | 11 | 99 | 0x7fffffffe810 | 0x400555 | retq |
F4 | 0x400555 | repz retq | 9 | 11 | 99 | 0x7fffffffe818 | 0x400565 | repz retq |
M2 | 0x400565 | mov | 9 | 11 | 99 | 0x7fffffffe820 | - | mov %rax,%rdx |
3.7.3 数据(参数)传递
x86-64中,可以通过寄存器最多传递6个整型参数
寄存器的使用是有特殊顺序的,寄存器使用的名字取决于要传递的数据类型的大小

如果函数要传递的参数超过6个,则超过的部分就要用栈来传递
Practice Problem 3.33
A C function procprob
has four arguments u, a, v, and b. Each is either a signed number or a pointer to a signed number, where the numbers have different sizes. The function has the following body:
1 | *u += a; |
It compiles to the following x86-64 code:
1 | procprob: |
Determine a valid ordering and types of the four parameters. There are two correct answers.
1 | procprob(int a,short b,long *u,char *v) |
1 | procprob(int b,short a,long *v,char *u) |
3.7.4 栈上的局部存储
一般的函数不会使用栈存储,但是以下几种情况则需要使用栈作为存储区域:
- 函数参数过多,寄存器不足以存放
- 对一个局部变量使用 ‘&’ 取地址运算符,因为需要获得地址,而寄存器没有地址,所以需要使用栈
- 某些局部变量是结构或者数组,因此必须能够通过数组或结构被引用到,也需要存放在栈中
看一个例子:
1 | long call_proc(){ |
对应的汇编代码
1 | long call_proc() |
可以看到汇编代码的大部分都在为调用proc这个函数做准备
它先是将 stack pointer 减去32来开辟一块32字节大小的栈内存,然后依次存储x1,x2,x3,x4
接着将参数1到参数6保存到规定的寄存器,参数7和参数8则放在栈上(相对栈指针的偏移地址分别为0和8)
这个函数的栈帧如下图所示:

3.7.5 寄存器中的局部存储空间
x86-64中的寄存器分为两种:被调用者保存寄存器和调用者保存寄存器
3.7.6 递归过程
递归函数的一个经典例子:阶乘函数
1 | long rfact(long n){ |
对应的汇编代码
1 | ;long rfact(long n) |
Practice Problem 3.35
For a C function having the general structure
1 | long rfun(unsigned long x) { |
gcc generates the following assembly code:
1 | ;long rfun(unsigned long x) |
A. What value does rfun
store in the callee-saved register %rbx
?
rv x
B. Fill in the missing expressions in the C code shown above.
3.8 数组分配和访问
3.8.1 基本原则
1 | T A[N]; |
起始位置为 $x_A$ ,这个声明会分配长为 $L*N$ 的连续内存,$L$ 是类型 T 的大小
第一个元素被存放在 $x_A$ 地址处,第 $i$ 个元素存放在 $x_A+L*i$ 地址处
Practice Problem 3.36
1 | int P[5]; |
Fill in the following table describing the element size, the total size, and the address of element i
for each of these arrays.
Array | Element size | Total size | Start address | Element i |
---|---|---|---|---|
P | 4 | 4*5=20 | $x_P$ | $x_P+4*i$ |
Q | 2 | 4 | $x_Q$ | $x_Q+2*i$ |
R | 8 | 72 | $x_R$ | $x_R+8*i$ |
S | 8 | 80 | $x_S$ | $x_S+8*i$ |
T | 8 | 16 | $x_T$ | $x_T+8*i$ |
3.8.2 指针运算
C语言允许对指针进行运算,运算结果会根据指针指向的类型进行伸缩。比如指针 $p$ 指向一个类型长度为 $L$ 的类型,那么 $p+i$ 的运算结果就是 $x_p+L*i$
现在假设我们有一个整型数组 E
和整型索引 i
分别存放在寄存器 %rdx 和 %rcx 中

注意图中两个高亮部分的区别,获取地址用的是 leaq
指令,获取地址处的内容用 movl
指令
Practice Problem 3.37
Suppose $x_P$ , the address of short integer array $P$, and long integer index $i$ are stored in registers %rdx
and %rcx
, respectively. For each of the following expressions, give its type, a formula for its value, and an assembly-code implementation. The result should be stored in register %rax
if it is a pointer and register element %ax
if it has data type short.
Expression | Type | Value | Assembly code |
---|---|---|---|
P[1] |
short |
M[$x_P+2*1$] | movw 2(%rdx),%ax |
P+3+i |
long short * |
$x_P+6+2*i$ | leaq 6(%rdx,%rcx,2),%rax |
P[i*6-5] |
short |
M[$x_P+12*i-10$] | movw -10(%rdx,%rcx,12),%ax |
P[2] |
short |
M[$x_P+2*2$] | movw 4(%rdx),%ax |
&P[i+2] |
long short * |
$x_P+2*i+4$ | leaq 4(%rdx,%rcx,2),%rax |
3.8.3 嵌套的数组
就是数组的数组
嵌套数组的引用和一维数组的没什么不同
Practice Problem 3.38
Consider the following source code, where M
and N
are constants declared with #define
:
1 | long P[M][N]; |
1 | ;long sum_element(long i, long j) |
Use your reverse engineering skills to determine the values of M
and N
based on this assembly code.
M=5,N=7
3.8.4 定长数组
对于定长数组的使用,编译器会自动进行很多优化,例如下面这个函数
1 |
|
优化后的C代码
1 | /* Compute i,k of fixed matrix product */ |
对应的汇编代码
1 | ;int fix_prod_ele_opt(fix_matrix A, fix_matrix B, long i, long k) |
Practice Problem 3.40
The following C code sets the diagonal elements of one of our fixed-size arrays to val
:
1 | /* Set all diagonal elements to val */ |
When compiled with optimization level -O1
, gcc generates the following assembly code:
1 | fix_set_diag: |
Create a C code program fix_set_diag_opt
that uses optimizations similar to those in the assembly code, in the same style as the optimized code in 3.8.4. Use expressions involving the parameter N
rather than integer constants, so that your code will work correctly if N
is redefined.
1 |
|
3.9 异质(Heterogeneous)的数据结构
将不同类型对象组合到一起创建的数据类型
C语言中有两种这样的数据结构:struct 和 union
3.9.1 结构 strcut
几个重要的点:
- 结构的所有组成部分都存放在内存中一段连续的区域内
- 指向结构的指针就是结构第一个字节的地址(和数组类似)
- 编译器维护关于每个结构的信息,记录每个字段的字节偏移,从而可以使用结构指针和偏移引用结构元素
Practice Problem 3.41
Consider the following structure declaration:
1 | struct test { |
The following procedure (with some expressions omitted) operates on this structure:
1 | void st_init(struct test *st) { |
A. What are the offsets (in bytes) of the following fields?
p : 0
s.x : 8
s.y : 10
next : 12
B. How many total bytes does the structure require?
8+2*2+8=20
C. The compiler generates the following assembly code for st_init
:
1 | ;void st_init(struct test *st) |
On the basis of this information, fill in the missing expressions in the code for st_init
.
3.9.2 unions
联合提供了一种规避C语言类型系统的方法,允许使用多种类型来引用一个对象
联合还可以用来访问不同数据类型的位模式。例如,假设我们简单地将一个 double
类型的值 d
转换为 unsigned long
类型的 u
1 | unsigned long u = (unsigned long) d; |
u
将会是 d
的整数部分。除了 d=0.0 之外,u
和 d
的位表示会很不一样
再看下面的代码
1 | unsigned long double2bits(double d) { |
以 double
类型存储 d
,以 unsigned long
类型返回它,就实现了不改变位模式的类型转换
Practice Problem 3.43
Suppose you are given the job of checking that a C compiler generates the proper code for structure and union access. You write the following structure declaration:
1 | typedef union { |
You write a series of functions of the form
1 | void get(u_type *up, type *dest) { |
with different access expressions expr
and with destination data type type set according to type associated with expr
. You then examine the code generated when compiling the functions to see if they match your expectations.
Suppose in these functions that up
and dest
are loaded into registers %rdi
and %rsi
, respectively. Fill in the following table with data type type
and sequences of one to three instructions to compute the expression and store the result at dest
.
expr | type | Code |
---|---|---|
up->t1.u | long | movq (%rdi),%rax movq %rax,(%rsi) |
up->t1.v | short | movw 8(%rdi),%ax movw %ax,(%rsi) |
&up->t1.w | char * | leaq 10(%rdi),(%rsi) |
up->t2.a[up->t1.u] | int | movq (%rdi),%rax movl 11(%rdi,%rax,4),%eax mov %eax,(%rsi) |
3.9.3 数据对齐
数据对齐是提高内存操作性能的一种方法。
举个例子,一个处理器每次只能从内存地址为8的倍数的地方一次性读取8个字节,如果我们能保证所有double型的数据都保存在以8的倍数地址为起始地址的地方,那么处理器只需一个内存读取操作就能获得这个double型数据,否则需要两次。
数据对齐的规则:
$K$ 字节大小的数据的起始地址必须是 $K$ 的倍数

Practice Problem 3.44
For each of the following structure declarations, determine the offset of each field, the total size of the structure, and its alignment requirement for x86-64:
A. struct P1 { short i; int c; int *j; short *d; };
最小22字节,偏移分别是0,2,6,14
要求数据对齐:大小24字节,偏移分别是0,4,8,16,结构的起始地址需要是4的倍数
C. struct P3 { long w[2]; int *c[2] };
最小32字节,偏移分别是0,8,16,24
要求数据对齐:大小32字节,起始地址需要是8的倍数
D. struct P4 { char w[16]; char *c[2] };
最小32字节,偏移地址为0,16
对齐要求:起始地址是8的倍数
E.struct P5 { struct P4 a[2]; struct P1 t };
最小86字节
对齐要求:大小88字节,起始地址是8的倍数
3.10 在机器级程序中将控制与数据结合起来
3.10.1 理解指针
一些指针和机器码映射的关键原则
- 每个指针都对应一个类型
每个指针都有一个值
指针可以指向函数
3.10.2 使用GDB调试器

3.10.3 内存越界引用和缓冲区溢出
大名鼎鼎的蠕虫就是利用缓冲区溢出进行攻击的
Practice Problem 3.46
The code below shows a (low-quality) implementation of a function that reads a line from standard input, copies the string to newly allocated storage, and returns a pointer to the result.
1 | /* This is very low-quality code. |
Disassembly up through call to gets
1 | ;char *get_line() |
Consider the following scenario. Procedure get_line
is called with the return address equal to 0x400076 and register %rbx
equal to 0x0123456789ABCDEF. You type in the string
1 | 0123456789012345678901234 |
The program terminates with a segmentation fault. You run gdb and determine that the error occurs during the execution of the ret
instruction of get_line
.
A. Fill in the diagram that follows, indicating as much as you can about the stack just after executing the instruction at line 4 in the disassembly. Label the quantities stored on the stack (e.g., “Return address”) on the right, and their hexadecimal values (if known) within the box. Each box represents 8 bytes. Indicate the position of %rsp
. Recall that the ASCII codes for characters 0–9 are 0x30–0x39.
00 00 00 00 00 40 00 76 | Return address |
01 23 45 67 89 AB CD EF | %rbx |
- | |
- | %rsp |
B. Modify your diagram to show the effect of the call to gets (line 7).
00 00 00 00 00 40 00 34 | Return address |
33 32 31 30 39 38 37 36 | %rbx |
35 34 33 32 31 30 39 38 | |
37 36 35 34 33 32 31 30 | %rsp |
C. To what address does the program attempt to return?
0x40 00 34
D. What register(s) have corrupted value(s) when get_line returns?
%rbx
E. Besides the potential for buffer overflow, what two other things are wrong with the code for get_line
?
1 | result = malloc(strlen(buf)+1); |
应该检查 malloc
返回 NULL
的情况
3.10.4 对抗缓冲区溢出攻击
栈随机化
依然可以使用 nop sled 空操作雪橇攻击
Practice Problem 3.47
Running our stack-checking code 10,000 times on a system running Linux version 2.6.16, we obtained addresses ranging from a minimum of 0xffffb754 to a maximum of 0xffffd754.
A. What is the approximate range of addresses?
$2^{13}$
B. If we attempted a buffer overrun with a 128-byte nop sled, about how many attempts would it take to test all starting addresses?
$2^6=64$
栈破坏检测
利用缓冲区溢出漏洞进行攻击的方法要改变栈上的内容,那么我们可以先在栈上存放一个值(金丝雀值 canary value),程序返回后再检查这个值是不是改变了,如果改变了,说明程序对栈上缓冲区以外的内存进行了更改,就可以马上终止程序。金丝雀值和栈上其他内容的排列方式如下图
Practice Problem 3.48
The functions intlen
, len
, and iptoa
provide a very convoluted way to compute the number of decimal digits required to represent an integer. We will use this as a way to study some aspects of the gcc stack protector facility.
1 | int len(char *s) { |
The following show portions of the code for intlen
, compiled both with and without stack protector:
1 | ;(a) Without protector |
A. For both versions: What are the positions in the stack frame for buf
, v
, and (when present) the canary value?
(a) buf :(%rsp) ,v:24(%rsp)
(b) buf :16(%rsp) ,v: 8(%rsp) , canary value: 40(%rsp)
- 限制可执行代码区域
3.10.5 支持变长栈帧
有一些函数需要使用变长数组,因此编译器不能提前计算出该函数需要的栈帧的大小
看下面这个函数
1 | long vframe(long n, long idx, long *q) { |
在这个函数内部声明了一个长为 n
的 long *
数组,这需要 $8*n$ 个字节长度的栈内存
函数内部对 i
进行了取地址的操作,所以 i
也需要放在栈上
一起看看这个函数对应的汇编代码
1 | ;long vframe(long n, long idx, long *q) |
3.11 浮点代码
为了处理浮点数,intel 和 AMD 都专门增加了一些指令和寄存器,这些寄存器称为媒体寄存器(media registers)

3.11.1 浮点传送和转换操作
浮点数移动指令:

双操作数浮点转换指令:

三操作数浮点转换指令:

C函数
1 | double fcvt(int i, float *fp, double *dp, long *lp){ |
对应的汇编代码:
1 | ;double fcvt(int i, float *fp, double *dp, long *lp) |
Practice Problem 3.50
For the following C code, the expressions val1
–val4
all map to the program values i
, f
, d
, and l
:
1 | double fcvt2(int *ip, float *fp, double *dp, long l){ |
Determine the mapping, based on the following x86-64 code for the function:
1 | ;double fcvt2(int *ip, float *fp, double *dp, long l) |
val1->d
val2->i
val3->l
val4->f
Practice Problem 3.51
The following C function converts an argument of type src_t
to a return value of type dst_t
, where these two types are defined using typedef
:
1 | dest_t cvt(src_t x){ |
For execution on x86-64, assume that argument x
is either in %xmm0
or in the appropriately named portion of register %rdi
(i.e., %rdi or %edi). One or two instructions are to be used to perform the type conversion and to copy the value to the appropriately named portion of register %rax
(integer result) or %xmm0
(floating-point result). Show the instruction(s), including the source and destination registers.
$T_x$ | $T_y$ | Instruction(s) |
---|---|---|
long | double | vcvtsi2sdq %rdi, %xmm0,%xmm0 |
double | int | vcvttsd2si %xmm0,%eax |
double | float | vmovddup %xmm0,%xmm0 vcvtpd2psx %xmm0,%xmm0 |
long | float | vcvtsi2ssq %rdi,%xmm0,%xmm0 |
float | long | vcvttss2siq %xmm0,%rax |
3.11.2 过程(函数)中的浮点代码
x86-64中使用XMM寄存器向函数传递浮点参数,以及从函数返回浮点值

从上图中看到,
- 可以使用 xmm0 ~ xmm7 传递8个浮点参数(超过8个的浮点参数使用栈传递)
- xmm0 返回浮点值
- 所有 XMM 寄存器都是调用者保存
3.11.3 浮点运算操作

这些指令需要一个或两个操作数,第一个操作数可以是一个 XMM 寄存器或者内存位置,第二个操作数和目的操作数都必须是 XMM 寄存器。
1 | double funct(double a, float x, double b, int i){ |
对应的汇编代码
1 | ;double funct(double a, float x, double b, int i) |
Practice Problem 3.54
Function funct2
has the following prototype:
1 | double funct2(double w, int x, float y, long z); |
Gcc generates the following code for the function:
1 | ;double funct2(double w, int x, float y, long z) |
Write a C version of funct2
.
1 | double funct2(double w, int x, float y, long z){ |
3.11.4 定义和使用浮点常数
和整数操作不同的是,AVX浮点操作不能以立即数作为操作数,所以编译器必须为所有的浮点常数分配并初始化内存。
1 | double cel2fahr(double temp){ |
1 | ;double cel2fahr(double temp) |
1.8 的表示:
高4字节 1073532108 (0x3ffccccc)
指数部分
$e=0x3ff = 1023$
$bias=2^{k-1}-1=2^{10}-1=1023$
$E=e-bias=0$
低4字节 3435973837 (0xcccccccd)
将高4字节剩下的部分和低4字节组合起来 0xccccccccccccd,二进制小数为$f=0.8$
所以最后的浮点值为 $(1+0.8)*2^{0}=1.8$
Practice Problem 3.55
Show how the numbers declared at label .LC3
encode the number 32.0.
低4字节 0
高4字节 1077936128 (0x4040 0000)
$e=0x404=1028$
$E=e-1023=5$
小数部分为0
所以最后的浮点值为 $(1+0.0)*2^5=32.0$
3.11.5 在浮点代码中使用位级操作

3.11.6 浮点比较操作
