【C++】How the C++ Compiler Works

Firstly it needs to pre-process our code which means that any pre-processor statements get evaluated and once our code has been pre-processed we move on to more or less tokenizing(记号化) and parsing(解析) and basically sorting out(整理) this English C++ language into a format that the complier can actually understand and reason with.

首先，它需要预处理我们的代码，这意味着任何预处理语句都会被处理，一旦我们的代码被预处理，我们就会继续进行或多或少的记号化和解析，并基本整理成英文的c++语言，使其成为编译器可以真正理解和推理的格式。

This basically results in something called an abstract syntax tree(抽象语法树) being created which is basically a representation(表示) of our code but as an abstract syntax tree.The compiler’s job at the end of the day is to convert all of our code into either constant data or instructions. Once the compiler has created this abstract syntax tree it can begin actually generating code,now this code is going to be the actual machine code that our CPU will execute.We also wind up with various other data such as a place to store all of our constant variables and that’s essentially all the compiler does.

这基本上会导致创建一个抽象语法树，它基本上是我们代码的一个表示，但是是作为一个抽象语法树。最终，编译器的工作是将所有代码转换为常量数据或指令。一旦编译器创建了这个抽象语法树，它就可以开始实际生成代码，现在这些代码将成为我们的CPU将要执行的实际机器码。我们还会得到各种其他数据，比如存储所有常量变量的地方，这就是编译器所做的一切。

What the compiler has done is has generated objects files for each of our C++ files for each of our translation units.Now every CPP file that our project contains that we actually tell the compiler: Hey,compile this CPP file.Every single one of those files will result in an object file.These CPP files are things called translation units essentially.

编译器所做的是为每个翻译单元的每个c++文件生成对象文件。现在我们的项目包含的每个CPP文件我们实际上都会告诉编译器：嘿，编译这个CPP文件。这些文件中的每一个都会产生一个目标文件。这些CPP文件本质上叫做翻译单元。

**You have to realize that C++ doesn’t care about files.**Files are not something that exists in C++.

你必须意识到c++并不关心文件。文件不存在于c++中。

But in Java,your class name has to be tied to your file name and your folder hierarchy has to be tied to your package .(在Java中，类名必须和你的文件名称一样，你的文件夹层次需要和package一样)

这也让我想起了“Linux一切皆文件”

In C++,a file is just a way to feed(提供) the compiler with source code.You’re responsible for telling the compiler what kind of file type this is and how the compiler should treat that.Now of course if you create a file with the extension CPP the compiler is going to treat that as a C++ file.Similarly if you make a file with the extension “.c” or “.h”,the compiler is going to treat the .c file like a C file not a C++ file.And it’s going to treat .h file like a header file.

在c++中，文件只是向编译器提供源代码的一种方式。你负责告诉编译器这是哪种文件类型以及编译器应该如何处理它。当然，如果你创建一个扩展名为CPP的文件，编译器会将其视为c++文件。类似地，如果你创建一个扩展名为“.c”或“.h”的文件，编译器会将.C文件视为C文件，而不是c++文件；它会将.h文件视为头文件。

These are basically just default conventions that are in place you can override any of them.If you don’t tell it how to deal with it,I could go around making .cherno files and telling the compiler: Hey,this file is a C++ file,please compile it like a C++ file.So just remember files have no meaning.

这些基本上都是默认的约定，你可以重写它们中的任何一个。如果你不告诉它如何处理，也可以去制作.cherno文件，然后告诉编译器：嘿，这个文件是一个c++文件，请像编译c++文件一样编译它。所以要记住文件没有意义。

So that being said every C++ file that we feed into the compiler and we tell it this is a C++ file please compile it,it will compile it as a translation unit and a translation unit will result in an object file.

也就是说，我们输入编译器的每个c++文件告诉它这是一个c++文件，请编译它，它会把它编译成一个翻译单元，一个翻译单元会产生一个目标文件。

It’s actually quite common to sometimes include CPP files in other CPP files and create basically one big CPP file with a lot of files in it.If you do something like that and then you only compile the one CPP file you’re going to basically result in one translation unit and thus one object file.

实际上，有时将CPP文件包含在其他CPP文件中并创建一个包含许多文件的大CPP文件是很常见的。如果你那样做了然后只编译一个CPP文件，你就会得到一个翻译单元，自然也就得到一个目标文件。

So that’s why there’s that terminology split(这就是为什么术语上有区别) between what a translation unit is and what a CPP file actually is because a CPP file doesn’t necessarily have to equal a translation unit.However if you just make a project with individual CPP files and you never include them in each other then yes every CPP file will be a translation unit and every CPP file will generate an object file.

因为CPP文件并不一定等于翻译单元。但是，如果您只是使用单个CPP文件创建一个项目，并且您从未将它们包含在彼此中，那么每个CPP文件将是一个翻译单元，每个CPP文件将生成一个目标文件。

How does “hash include”(#include) work?

You basically specify which file you want to include and then the pre-processor will open that file read all of its contents and just paste it into the file where you wrote your include statement.

你只需指定要包含的文件，然后预处理器将打开该文件，读取其所有内容，然后将其粘贴到你写include语句的文件中。

让我们来证明这一点

新建一个 Math.cpp

int Multiply(int a, int b)
{
	int result = a * b;
	return result;
}

编译成功！

新建一个 EndBrace.h

将Math.cpp改为

int Multiply(int a, int b)
{
	int result = a * b;
	return result;
#include "EndBrace.h"

编译成功！

实际上有一种方法我们可以告诉编译器，输出一个文件，其中包含所有的结果，包括所有的预处理器评估情况。这样有助于我们更好的理解预处理，我们开始之前要对VS的设置进行更改

Compiler01

如图所示，将否改为是，此项改动产生的影响如红框所示，所以我们使用完记得改回去哦

然后我们对Math.cpp进行编译，会发现在Debug文件夹下生成了Math.i文件，打开如下：

#line 1 "E:\\SavingProject_C++\\HelloWorld\\HelloWorld\\Math.cpp"
int Multiply(int a, int b)
{
	int result = a * b;
	return result;
#line 1 "E:\\SavingProject_C++\\HelloWorld\\HelloWorld\\EndBrace.h"
}
#line 6 "E:\\SavingProject_C++\\HelloWorld\\HelloWorld\\Math.cpp"

我们接下来会继续对Math.cpp进行更改并编译，查看Math.i的内容

Math.cpp

#define INTEGER int

INTEGER Multiply(int a, int b)
{
	INTEGER result = a * b;
	return result;
}

Math.i

#line 1 "E:\\SavingProject_C++\\HelloWorld\\HelloWorld\\Math.cpp"


int Multiply(int a, int b)
{
	int result = a * b;
	return result;
}

Math.cpp

#define INTEGER Cherno

INTEGER Multiply(int a, int b)
{
	INTEGER result = a * b;
	return result;
}

Math.i

#line 1 "E:\\SavingProject_C++\\HelloWorld\\HelloWorld\\Math.cpp"


Cherno Multiply(int a, int b)
{
	Cherno result = a * b;
	return result;
}

Math.cpp

#if 1
int Multiply(int a, int b)
{
	int result = a * b;
	return result;
}
#endif

Math.i

#line 1 "E:\\SavingProject_C++\\HelloWorld\\HelloWorld\\Math.cpp"

int Multiply(int a, int b)
{
	int result = a * b;
	return result;
}
#line 8 "E:\\SavingProject_C++\\HelloWorld\\HelloWorld\\Math.cpp"

Math.cpp

#if 0
int Multiply(int a, int b)
{
	int result = a * b;
	return result;
}
#endif

Math.i

#line 1 "E:\\SavingProject_C++\\HelloWorld\\HelloWorld\\Math.cpp"






#line 8 "E:\\SavingProject_C++\\HelloWorld\\HelloWorld\\Math.cpp"

Math.cpp

#include<iostream>

int Multiply(int a, int b)
{
	int result = a * b;
	return result;
}

Math.i

你可以看到里面有很多很多内容，拉到底部就是我们自己的函数

到这里可以把设置改回去啦

What’s actually inside our obj file?

以下内容理解即可

如果用文本编辑器打开会发现 It’s binary 二进制格式

It’s actually the machine code that our CPU will run when we call this multiply function

它实际上是我们的CPU在调用乘法函数时会运行的机器码

让我们再进行一些设置让它变得readable
Compiler02

编译后就能在输出目录下(Debug)看到Math.asm文件

Compiler03

有一堆汇编命令，这些是CPU将要执行的实际指令

Compiler04

我们的乘法操作实际发生在这

Compiler05

We load the a variable into our eax register and then we perform an IML instruction which is a multiplication on the b variable and that a variable we’re then storing the result of that in a variable called result and then moving it back into eax to return it.The reason this kind of double move happens is because I actually made a variable called result and then returned it instead of just returning a*b.

我们将a变量加载到eax寄存器中然后我们执行一个IML指令对b变量进行乘法运算然后我们将结果存储在一个名为result的变量中然后将其移回eax以返回它。发生这种来回移动是因为我实际上创建了一个名为result的变量然后返回它而不是返回a*b

That’s why we get this moving eax into result and then moving result into eax which is completely redundant.This is another a great example of why if you set your compiler not to optimize you’re going to wind up with slow code cause it’s doing extra stuff like this for no reason.

这就是为什么我们把eax移到result，然后又把result移到eax是完全多余的。同时也是一个很好的例子来说明，如果你设置编译器为不优化，你会得到缓慢的代码，因为它会无缘无故地做一些额外的事情。

将代码更改并编译

int Multiply(int a, int b)
{
	return a * b;
}

操作就会变少

Compiler06

We’re just doing imal on b and eax is actually going to contain our return value.

我们只是对b进行了imal运算而eax实际上包含了我们的返回值。

All of this may like a lot of code because we’re actually compiling in debug which doesn’t do any optimization and does extra things to make sure that our code is as verbose as possible and as easy to debug as possible.

这些看起来有很多代码，因为我们实际上是在调试中编译的，它没有做任何优化，而是做了额外的事情来确保我们的代码尽可能冗长，尽可能容易调试。

我们再更改一下设置

优化：

Compiler07

代码生成：

Compiler08

然后进行编译

That looks a lot smaller.

Compiler09

We’ve basically just got our variables being loaded into a register and the multiplication.

我们基本上只是把变量加载到寄存器和乘法运算中。

将代码更改并编译

int Multiply()
{
	return 5 * 2;
}

What it’s done is actually really simple.

Compiler10

It’s simply moved 10 into our eax register which is the register that will actually store our return value in.So if we take a look at our code again,it’s basically just simplified our 5*2 to be 10.Because of course there’s no need to do something like 5*2 two constant values at runtime.This is something called constant folding where anything is constant can be worked out at compile time.

它只是把10移到eax寄存器中，eax是实际存储返回值的寄存器。所以如果我们再看一下我们的代码，它基本上只是把5*2简化为10。因为当然没有必要在运行时做像5*2这样的事情。这就是所谓的常数折叠，任何常数都可以在编译时计算出来。

involving another function

const char* Log(const char* message) {
	return message;
}

int Multiply(int a, int b)
{
	Log("Multiply");
	return a * b;
}

可以看到它把我们的message指针移动到eax

Compiler11

Before we do multiplication by using the imal we actually call this Log function.You might be wondering why this Log function is decorated by what seems like random characters and at signs.

在用imal做乘法之前我们先调用了Log函数。您可能想知道为什么这个Log函数用看似随机的字符和@符号来装饰。

This is actually the function signature.This needs to uniquely define your function.

这实际上是函数签名。用来唯一地定义您的函数。

Essentially when we have multiple objs and our functions are defined in multiple objs,it’s going to be the linker’s job to link all of them together.And the way it’s going to do what is looking up this function signature.All you need to know here is that we’re calling this Log function.When you call a function it will generate a call instruction.

本质上，当我们有多个obj并且我们的函数在多个obj中定义时，链接器的工作是把它们链接在一起。它要做的就是查找这个函数签名。你只需要知道我们调用了Log函数。当你调用一个函数时，它会生成一个调用指令。

In this case it might be a little bit stupid because you can see that we’re simply calling Log even not storing the return value basically this could be optimized quite a bit.

在这种情况下，它可能有点愚蠢。因为你可以看到我们只是调用Log，甚至不存储返回值。实际上这可以被优化很多。

将优化设置改为“最大优化(优选速度) (/O2)”进行编译。就会发现call命令消失了，但笔者所用版本(Microsoft Visual Studio 2022社区版)并不会这样。

You should basically now understand the gist of how the compiler works.It will take our source files and output an object file which contains machine code and other constant data that we’ve defined.And we’ve got these object files we can link them into one executable which contains all of the machine code that we actually need to run.

现在您应该基本理解了编译器工作原理的要点。它将获取我们的源文件并输出一个目标文件，其中包含机器代码和我们定义的其他常量数据。我们有了这些目标文件就可以把它们链接到一个可执行文件里面，其中包含了我们实际需要运行的所有机器码。

视频：https://www.youtube.com/watch?v=3tIqpEmWMLI