Blogs >> 2021 >> Building Large Scale C/C++ Projects
Building Large Scale C/C++ Projects
1. General building process
The general building process can be divided into two tasks: compilation and linking. On Linux, the translation from source file to object files is handled by compiler. Linking multiple object files to build the executable is called as linking and is done by the linker. The following figure depicts the whole principle.
Building Process
Fig. 1   Compilation of object modules and linking the objects to build the executable of a source code hw.c or hw.cpp. The name hw is a short form of the legendary hello world terminology. (image modified from Computer Systems: A Programmer's Perspective by Randal E. Bryant, David R. O' Hallaron)
Compilation compand of a simple source file:
$ gcc -o code code.c
$ ./code
The binary executable is hello.out. Execute it as $./hw.out. The elements of the compilation pipeline has the following roles:
  1. Preprocessor : modifies the original C program according to directives that begin with the ‘#’ character.
  2. Compile: converts to assembly language
  3. Assembler: converts to a binary
  4. Linker: merges or links with precompiled object files like `print.o`
2. Handling the header files

So what's the difference between header-files and source-files? Basically, header-files are #included and not compiled, whereas source files are compiled and not #included. One should never #include source files.

C/C++ programs are built in a two stage process. First, each source file is compiled on its own. The compiler generates intermediate files for each compiled source file. These intermediate files are often called object files -- but they are not to be confused with objects in your code. Once all the files have been individually compiled, the system then links all the object files together, which generates the final binary (the program).

This means that each source file is compiled separately from other source files.

Inputs: plusvec.h, plusvec.c, minusvec.h, minusvec.c, lalg.cpp.
/* algebra.c */
#include <iostream>
#include "addvec.h"
#include "deductvec.h"
using namespace std;
int main() 
{
    int u[2] = {100, 200};
    int v[2] = {10, 20};
    int res[2] = {0, 0};

    plusvec(u, v, res, 2);
    cout << res[0] << "\n";
    cout << res[1] << "\n";

    minusvec(u, v, res, 2);
    cout << res[0] << "\n";
    cout << res[1] << "\n";
}
/* addvec.h */

void addvec(int* x, int* y, 
            int* res, int d);

/* addvec.c */

#include "addvec.h"

void plusvec(int* x, int* y, 
            int* res, int d) 
{
    for(int i = 0; i < d; i++) 
        res[i] = x[i] + y[i];
}
/* deductvec.h */

void deductvec(int* x, int* y, 
            int* res, int d);

/* deductvec.c */

#include "deductvec.h"

void deductvec(int* x, int* y, 
            int* res, int d) 
{
    for(int i = 0; i < d; i++) 
        res[i] = x[i] - y[i];
}

However, this approach above is not recommended since this is not a scalable approach. A much better and scallable solution is to compile the project code in parts. For example, it is advisble to create the object files, .obj or .o, separately, and finally, to link all the .obj files to create the final executable. The set of commands that follows next demonstrates this fact.

# Not preferred approach
$ gcc algebra.c plusvec.c minusvec.c -o code
$./code
However the preferred approach is domented below:
# Preferred approach
gcc -c plusvec.c
gcc -c minusvec.c
gcc -c algebra.c   
gcc -o code plusvec.o minusvec.o algebra.o 
./code 
Note, each of the three commands like gcc -c filename.c creates the corresponding .obj file filename.o. The last command that performs the linking, essentially links all the .o files in order to create the executable code.
Why this is a better approach? Because say you need to edit/modify the file minusvec.c for some reason. Now, There is no point in compiling all of the sources. Sure, in this case we just have three small .c files but in real life projects the files may be numbered in hundreds while the contents of the files may be huge. Compilation every now and then will take long. Hence compilation in parts will be conveninet for large scale projects.
3. Include-guard: Including the header files in right way
Suppose we have the following sources available as .c and .h header files.
/* getvec.h */

#include <iostream>
int* getvec(int d, int v);

/* plusvec.h */

#include "getvec.h"
int* plusvec(int* x, int* y, int d);

/* minusvec.h */
    
#include "getvec.h"
int* minusvec(int* x, int* y, int d);

/* getvec.c */
#include "getvec.h"
int* getvec(int d, int initval) {
    int *v = (int *) calloc(d, sizeof(int));
    for(int i = 0; i <d; ++i) v[i] = initval;
    return v;
} 
/* plusvec.c */
#include "plusvec.h"
int* plusvec(int* x, int* y, int d) 
{
    int* res = getvec(3, 0);
    for(int i = 0; i < d; i++) 
        res[i] = x[i] + y[i];
    return res;
}
/* minusvec.c */   
#include "minusvec.h"
int* minusvec(int* x, int* y, int d) 
{
    int* res = getvec(3, 0);
    for(int i = 0; i < d; i++) 
        res[i] = x[i] - y[i];
    return res;
}
/* algebra.c */

#include 
#include "plusvec.h"
#include "minusvec.h"

using namespace std;

int main() {
    int* res;

    int u[2] = {100, 200};
    int v[2] = {10, 20};

    res = plusvec(u, v, 2);
    cout << res[0] << "\n";
    cout << res[1] << "\n";
    cout << res[2] << "\n";

    res = minusvec(u, v, 2);
    cout << res[0] << "\n";
    cout << res[1] << "\n";
    cout << res[2] << "\n";

}

Note, the getvec.h is included twice in algebra.c, i.e., once via plusvec.h and another via minusvec.h. Since multiple inclusion often results in hard-to-debug errors, we shall stop multiple inlcusion by eploying a technique that is called the include-guard.

Also note, the three other files, namely, getvec.h, getvec.c, and algebra.c are kept unchanged.

/* plusvec.h */
    
    #ifndef GETVEC_H
    #define GETVEC_H
    #include "getvec.h"
    #endif
    int* plusvec(int* x, int* y, int d);
    
/* plusvec.c */
    
    #include "addvec.h"
    int* plusvec(int* x, int* y, int d) {
        int* res = getvec(3, 0);
        for(int i = 0; i < d; i++) 
            res[i] = x[i] + y[i];
        return res;
    }
    
/* minusvec.h */
    
    #define GETVEC_H
    #include "getvec.h"
    #endif
    int* minusvec(int* x, int* y, int d);
    
/* minusvec.c */
    
    #include "minusvec.h"
    int* minusvec(int* x, int* y, int d) {
        int* res = getvec(3, 0);
        for(int i = 0; i < d; i++) 
            res[i] = x[i] - y[i];
        return res;
    }
    

Note, the getvec.h is included twice in algebra.c, once via plusvec.h and another via minusvec.h. Sicne multiple inclusion often results in hard-to-debug errors, we have stopped multiple inlcusion by writing the include-guard.

#ifndef GETVEC_H
#define GETVEC_H
#include "getvec.h"
#endif

Once we include plusvec.h, the GETVEC_H gets defined and getvec.h gets included. Next time when minusvec.h notices GETVEC_H already defined and skips further inclusion of getvec.h.

4. Automating builds with Makefile

Compilation of big projects involving multiple source-files and their headers is a tedious job. Compiling them repeatedly especially during debugging is an extra headache. Automation tools help. Makefile is such a tool that can take much of the load off your head.

Consider one of the previous examples where we compiled the sources into the executable like this:

$ gcc plusvec.c minusvec.c getvec.c algebra.c -o compute 
Let's see how we can use Makefile to build the executable compute. A simple makefile consists of “rules” with the following shape:
target: prerequisites
        recipe
A target is usually (i) the name of a file that is generated by a program (executable or object files), (ii) name of an action to carry out, such as ‘clean’. A prerequisite is a file (or several files) that is (are) used as input(s) to create the target. A recipe is an action (command or multiple commands) that make carries out. Commands may be on the same line or on different lines each on its own line. Remember to put a tab character at the beginning of every recipe line! The following simple makefile illustrates one of our build processes mentioned above.

all: compute
compute: plusvec.c minusvec.c getvec.c algebra.c
         gcc plusvec.c minusvec.c getvec.c algebra.c -o compute
clean:
         rm compute
Our Makefile has a target "all" with a prerequisite but no rule/recipe. Makefile has target 'compute' with prerequisites and rule. Makefile has target 'clean' with rule but no prerequisite. But how does the Makefile work? Read the passage below for details.
4.i. How make Processes a Makefile?
When you execute the command: $ make, make reads the makefile in the current directory and proceeds by processing the first target (default target). In the example, the first target is 'all' which has prerequisite 'compute' as the next target. So the next target 'compute' has prerequisites that are just the source files and hence are available in the current directory. Then 'compute' starts with the rule that is essentially creating the executable 'compute'.
When you type make clean the 'clean' target is executed. How Makefile can be written to include the relinking?
all: compute

compute: algebra.o plusvec.o minusvec.o getvec.o
    gcc algebra.o plusvec.o minusvec.o getvec.o -o compute

algebra.o: algebra.c
    gcc -c algebra.cpp

plusvec.o: plusvec.c
    gcc -c plusvec.c

minusvec.o: minusvec.c
    g++ -c minusvec.c
    
getvec.o: getvec.c
    g++ -c getvec.c

clean:
    rm *o compute
This kind of Makefile has an important usage. There are multiple targets that can be useful at times. This is because if any source is modifies in the project, you don’t have to recompile everything, only what you modified.
4.ii. Variables Make Makefiles Simpler
One can also use variables when writing Makefiles. It comes in handy in situations where you want to change the compiler, or the compiler options.
# variable CC will be the compiler to use.
CC=gcc
    
#  CFLAGS will be the options to pass to the compiler.
CFLAGS=-c -Wall

all: compute
    
compute: algebra.o plusvec.o minusvec.o getvec.o
    $(CC) algebra.o plusvec.o minusvec.o getvec.o -o compute

algebra.o: algebra.c
    $(CC) $(CFLAGS) algebra.cpp

plusvec.o: plusvec.c
    $(CC) $(CFLAGS) plusvec.c

minusvec.o: minusvec.c
    $(CC) $(CFLAGS) minusvec.c
    
getvec.o: getvec.c
    $(CC) $(CFLAGS) getvec.c

clean:
    rm *o compute
5. Conclusion

Lorem ipsum dolor sit amet consectetur adipisicing elit. Ratione dolor explicabo repellendus, natus impedit, eaque itaque reprehenderit alias iure autem, officia aliquid cumque eligendi. Quis doloribus voluptates animi impedit accusantium.

Consider one of the previous examples where we compiled the sources into the executable like this: