Compilation in “C” language with gcc

Smith Flores
7 min readJun 21, 2021

The GNU Compiler Collection is a set of compilers created by the GNU project. GCC originally stood for GNU C Compiler (GNU C Compiler), because it only compiled the C language … Later it was extended to compile C ++, Fortran, Ada and others.

Compilation refers to the process of converting a program from the textual source code of a programming language such as C or C ++, into machine code, the sequence of ones and zeros used to control the central processing unit (CPU) of the computer. This machine code is stored in a file known as an executable file, sometimes also called a binary file.

After we finish writing the code the first thing we do is compile it, it normally takes a few seconds for the compiler to compile the code and translate it to the machine language. But during this time, the Code goes through a series of steps to convert to an executable file. These compilation phases are in sequence, so they’re often called a pipeline.

The Components of the C Program Compilation Pipeline are:

  1. Preprocessor
  2. Compilation
  3. Assembly
  4. Linker
  • PREPROCESSING:

When writing a C program, we include libraries, define some macros, and sometimes even make some conditional compilation. All of these are referred to as preprocessor directives.

During the preprocessing step in the C program compilation pipeline, the preprocess directives are substituted with their original values.

Let’s make that clear!

Header Files

The source file consists of a number of header files in the C language, and it is the preprocessor’s task to include the library.

Example:

If the program contains #include, this line will be replaced by the original contents of the header file when the source file is pre-processed.

Macros

Macros are defined by #define syntax in the C programming language. During the preprocessing stage, the macros are replaced by their values.

Conditional Compilation

Often we want our compiled code to be minimal, so we can use conditional compilation. The preprocessor often operates on a conditional compilation and reduces the code by adding only those lines that fulfill the condition.

Some Conditional Compilation preprocessor directives are:

  • #undef
  • #ifdef
  • #ifndef
  • #if
  • #else
  • #elif
  • #endif

Extract Preprocessed Code From GCC

We can also take a look at the file from each part of the compilation pipeline. Let’s ask the C compiler driver to dump the translation unit without going any further.

To extract the translation unit from the source code, we can use the -E option in the GCC.

Example:

C-Code// Header File
#include

#define Max 10

int main()
{
printf("Hello World"); // Print Hello World
printf("%d", Max);
return 0;
}
Dump Translation Unit$ gcc -E cprogram.c


# 1 "cprogram.c"
# 1 ""
# 1 ""
# 1 "cprogram.c"
.......
.......
.......
typedef __time64_t time_t;
# 435 "C:/msys64/mingw64/x86_64-w64-mingw32/include/corecrt.h" 3

typedef struct localeinfo_struct {
pthreadlocinfo locinfo;
pthreadmbcinfo mbcinfo;
} _locale_tstruct,*_locale_t;

typedef struct tagLC_ID {
unsigned short wLanguage;
unsigned short wCountry;
unsigned short wCodePage;
} LC_ID,*LPLC_ID;

.......
.......
.......
# 1582 "C:/msys64/mingw64/x86_64-w64-mingw32/include/stdio.h" 2 3
# 3 "cprogram.c" 2


# 5 "cprogram.c"
int main()
{
printf("Hello World");
printf("%d", 10);
return 0;
}

This example illustrates how the preprocessor functions. The Preprocessor only performs basic functions, such as inclusion, by copying contents from a file or macro expansion by text substitution.

The preprocessed file has an extension of .i, and if you pass this file to the C compiler driver, the preprocessor stage will be bypassed. This happens because of the file with the .i extension is supposed to have already been preprocessed and is sent directly to the compilation stage.

COMPILATION:

As in the previous stage, we dumped the translation unit code using the -E option. Here, we can use the -S option for the GCC to obtain the assembly code. This will create a file with the .s extension, and we will see the contents of the file using the cat command.

Extract Assembly Code from GCC$ gcc -S cprogram.c

$ cat cprogram.s

.file "cprogram.c"
.text
.def printf; .scl 3; .type 32; .endef
.seh_proc printf
printf:
pushq %rbp
.seh_pushreg %rbp
pushq %rbx
.seh_pushreg %rbx
subq $56, %rsp
.seh_stackalloc 56
leaq 128(%rsp), %rbp
.seh_setframe %rbp, 128
.seh_endprologue
movq %rcx, -48(%rbp)
movq %rdx, -40(%rbp)
movq %r8, -32(%rbp)
movq %r9, -24(%rbp)
leaq -40(%rbp), %rax
movq %rax, -96(%rbp)
movq -96(%rbp), %rbx
movl $1, %ecx
movq __imp___acrt_iob_func(%rip), %rax
call *%rax
movq %rbx, %r8
movq -48(%rbp), %rdx
movq %rax, %rcx
call __mingw_vfprintf
movl %eax, -84(%rbp)
movl -84(%rbp), %eax
addq $56, %rsp
popq %rbx
popq %rbp
ret
.seh_endproc
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "Hello World\0"
.LC1:
.ascii "%d\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
leaq .LC0(%rip), %rcx
call printf
movl $10, %edx
leaq .LC1(%rip), %rcx
call printf
movl $0, %eax
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (Rev3, Built by MSYS2 project) 10.2.0"
.def __mingw_vfprintf; .scl 2; .type 32; .endef

The Compilation phase gives us the Assembly code that is unique to the target architecture.

Even if the C language compiler is the same for two different machines with the same C program but different processors and hardware, different assembly codes would be produced.

Generating the assembly code from the C code is one of the most critical stages in the C Program Compilation Pipeline since the assembly code is a low-level language that can be translated to an object file using an assembler.

ASSEMBLY:

The Compilation stage gives us the assembly code that is the input for the next pipeline component, i.e. assembly.

In this stage, the actual instructions on the machine level are generated from the assembly code. Each architecture has its own assembler, which converts its own assembly code into its own machine code.

The assembler generates a relocatable object file from the assembly code.

  • Create object from the assbembly file:

We can use the built-in assembly tool called to translate the assembly file to an object file.

The assembler tool takes the assembly file and produces a relocatable object file.

create object file from assembly file$ as cprogram.s -o cprogram.o

This assembler tool(as) gives us a new file with a .o extension (.obj in Microsoft Windows), which is the relocatable object file.

If you want to translate a C program directly to an object file, you can use the -c option in the GCC compiler driver.

Using -c with the GCC compiler would merge the first three processes in the pipeline compilation, i.e. pre-processing, compilation and assembling.

create object file$ gcc -c cprogram.c

This is really helpful when you want to work with object files and repeating all of the above steps can be quite hectic for a number of files.

The contents inside the object file contain low-level code and are thus not readable to humans. In the latter portion, we will also learn about a tool that will allow us to see the contents of an object file.

Now that we know how to build object files directly from both the assembly file and the C program. It’s time to learn about the Linking Stage in the C program compilation process.

LINKING:

This is one of the most critical steps in the C compilation pipeline where the generated relocatable object files are combined/linked to create another object file that is executable in nature.

Let’s take a look at the situation.

Suppose we have a custom header file htd.h, which contains a printHTD function prototype, and a source file htd.c, which contains the function definition of the header file.

htd.c#include 
#include "htd.h"
void printHTD()
{
printf("Hack The Developer");
}
htd.hvoid printHTD();

The htd.h header file is included in the cprogram.c source file.

cprogram.c#include "htd.h"
int main()
{
printHTD();
return 0;
}

As we know there are two source files, and we’ll have to generate separate object files, which will be linked by the linker later to provide us an executable object file.

$ gcc -c cprogram.c$ gcc -c htd.c

The above command creates two separate relocatable object files. Now let’s link the object files.

We can use the ld tool, which is the default linker in Unix-like systems, to link the relocatable object files.

But the ld tool gives us an undefined reference error.

$ ld htd.o cprogram.o -o cprogram.exeC:\msys64\mingw64\bin\ld.exe: cprogram.o:cprogram.c:(.text+0x32): undefined reference to `__imp___acrt_iob_func'
C:\msys64\mingw64\bin\ld.exe: cprogram.o:cprogram.c:(.text+0x43): undefined reference to `__mingw_vfprintf'
C:\msys64\mingw64\bin\ld.exe: cprogram.o:cprogram.c:(.text+0x78): undefined reference to `__main'

So we are going to use the gcc for the linking process, which has an inbuilt linker that will link the relocatable object files.

$ gcc htd.o cprogram.o -o cprogram.exe$ ./cprogram.exeHack The Developer

The linking was successful as we got the required output.

Summary in one image:

source — https://nerdyelectronics.com/

Tip:

Don’t worry if you don’t understand much about programming, we were all newbies once, don’t get frustrated, keep going, be patient and persevere in your dreams!

Hope You Like It!))

Picture from anime One Piece

--

--