Structure Padding and Binary Files

5 min readJan 3, 2021

I spent quite a time to read deeply on the topic of how to use pragma and the reason why to use a pragma when we write binary files. It seems we dig deep enough to a point that reaching C-level preprocessor. I will talk about what I discovered including some references.

Structure Padding

Consider the following code and result.

int has a size of 4 
char has a size of 1 
double has a size of 8  Assumed struct 'Student' size = 62 
Actual struct 'Student' size = 64

Once we calculated the size of each data type, we then use the value received and try to calculate the size of the struct student that we declared. However, there are 2 bytes more than we expected. Why does this happen? We then have to look into structure padding.

Padding Concept

https://www.javatpoint.com/structure-padding-in-c#:~:text=Structure%20padding%20is%20a%20concept,align%20the%20data%20in%20memory

Structure padding is a concept in C. It adds one or more empty bytes between the memory addresses to align the data in memory. Firstly we have to make sure how does a processor work.

It reads 1 word at a time.
The processor does not read 1 byte at a time.
The 32-bit processor → reads 4 bytes at a time → 1 word = 4 bytes →
Hence, accessing 4 bytes at a time
The 64-bit processor → reads 8 bytes at a time → 1 word = 8 bytes
Hence, accessing 8 bytes at a time

Let us consider the following simpler structure.

struct A {   
    char x;   
    char y;   
    int z; 
};

The order of variables in the class matters. Variable x and y have no problem when we write to memory and within 4 bytes, it exceeds 1 CPU cycle which might lead to the following situation in the photo.

Reference from: https://www.javatpoint.com/structure-padding-in-c#:~:text=Structure padding is a concept,align the data in memory

When we access the variable ‘c’, it requires two CPU cycles. This is an unnecessary wastage of CPU cycles as we can utilizing it within 1 cycle. This is where structure padding is done automatically by the compiler.

An empty row is created by structure padding and when we try to access variable C, it can then accessed in a single CPU cycle. Although the size of memory is greater than the actual memory use at the same time, the variable can be accessed within one CPU cycle. If the variable will access many times in the program inside of a class or struct, it is better to use the default structure padding. Hence, the order of initializing a variable inside a structure or class really matters.

For more, please check out the following website, it helped me to understand more deeply in terms of structure padding.

Pragma

There are ways to avoid structure padding in C. One way is to use __attribute__((packed)) and the other way is to use #pragma. We can turn on and off of using a pragma before a declaration of a class or structure and after the declaration of the structure respectively. The following code and result show a way to let my above program not use structure padding. Now, both the human calculated answer has the same size of structure Student.

age has a size of 4 
1 block of name has a size of 1 
weight has a size of 8  Assumed struct 'student' size = 62 
Actual struct 'Student' size = 62

Now the actual object has a size as what we expected.

Reading and Writing Binary Files

First and foremost, binary files do not have any newline characters. We have to use ios::binary to prevent C++ from messing around. Let us used what we learned in pragma and modify the example class to see the use of pragma can make a difference. Consider the following code.

Now, both using pragma and not using pragma still allow us to received the same result according to what we discussed above of structure padding. The only difference is the size of binary files. When I compile and run, the output file size is 120 bytes without the pragma, and 113 bytes with the pragma. Now the question comes, why do we want to use pragma?

For this question, I am currently still reading a lot of paragraphs online but meanwhile, I have a friend who is working as a C++ Engineer and he answered me that when we deal with a certain low-level network protocol after we cast the object to character string and send it to destination. In some requirement at the destination side, there might be a certain limitation on the API design that require senders to send binary files with a certain length. This is the reason why we want to pack the object to a smaller size in order to optimize the space for the receiver.

Conclusion

I have illustrated how the structure padding is automatically used by default C pre-processor, as well as the reason why is there such a setting. Meanwhile, I have also mentioned how to use pragma to avoid padding in a struct, as well as the general possible reason why to use a pragma when we write binary files, ensuring blocks of memory sending to the receiver is in the standard.

Feeling right now, there are more to learn in C++ (or C) when I dig such deep. It is a long day when I try to write all these paragraphs to make sure I really understand and memorize what I have learned. When I firstly know there exists a structure padding concept, I really think C++ is the most beautiful language that allows the user as a programmer to decide when to use padding or not. I guess other languages such as Java, C# or python are pre-defined and somehow user cannot access the memory block directly. All in all, C++ is really a beautiful language that is so “free”. I really like this language :’)