A simple guide and example code for creating portable binary files in C, addressing architecture-specific challenges like endianness and data alignment.
Get full source code here.
In systems programming, it is often necessary to store data in binary files for efficient storage and retrieval. However, binary data can be architecture-specific, which poses challenges when transferring files between different systems. oOne simple example of this happens when writing a file using a program compiled using x64 compiler and reading that data using a program compiled using x86 compiler.
This short article explores the methods for creating portable binary files in C, ensuring data consistency across diverse architectures.
Binary files store data in a raw, unencoded format that is directly mapped from memory. This direct mapping leads to potential issues:
- Endianness: Different systems use different byte orders for multi-byte data. Big-endian systems store the most significant byte first, while little-endian systems store the least significant byte first.
- Data Alignment: Compilers may introduce padding between structure members to align data in memory, which varies across systems.
- Data Type Sizes: The size of basic data types like
intandlongcan differ between architectures.
To achieve portability, we must standardize the binary format, explicitly controlling byte order and structure padding. This involves manually serializing and deserializing each data member, converting them to a fixed byte order, and avoiding direct memory writes.
Below is a simple example demonstrating how to serialize and deserialize a simple structure in a portable manner:
#include <stdio.h>
#include <stdint.h>
// Function to write a 32-bit integer in a portable way
void write_int32(FILE *file, int32_t value) {
uint8_t bytes[4];
bytes[0] = (value >> 24) & 0xFF;
bytes[1] = (value >> 16) & 0xFF;
bytes[2] = (value >> 8) & 0xFF;
bytes[3] = value & 0xFF;
fwrite(bytes, sizeof(bytes), 1, file);
}
// Function to write a float in a portable way
void write_float(FILE *file, float value) {
uint8_t bytes[4];
uint32_t as_int = *((uint32_t*)&value);
bytes[0] = (as_int >> 24) & 0xFF;
bytes[1] = (as_int >> 16) & 0xFF;
bytes[2] = (as_int >> 8) & 0xFF;
bytes[3] = as_int & 0xFF;
fwrite(bytes, sizeof(bytes), 1, file);
}
// Function to read a 32-bit integer in a portable way
int32_t read_int32(FILE *file) {
uint8_t bytes[4];
fread(bytes, sizeof(bytes), 1, file);
return (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2] << 8) | bytes[3];
}
// Function to read a float in a portable way
float read_float(FILE *file) {
uint8_t bytes[4];
fread(bytes, sizeof(bytes), 1, file);
uint32_t as_int = (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2] << 8) | bytes[3];
return *((float*)&as_int);
}
typedef struct {
int32_t id;
float value;
} MyStruct;
void write_struct(FILE *file, const MyStruct *s) {
write_int32(file, s->id);
write_float(file, s->value);
}
void read_struct(FILE *file, MyStruct *s) {
s->id = read_int32(file);
s->value = read_float(file);
}
int main() {
MyStruct s = {123, 456.789};
FILE *file = fopen("data.bin", "wb");
if (file != NULL) {
write_struct(file, &s);
fclose(file);
}
MyStruct read_s;
file = fopen("data.bin", "rb");
if (file != NULL) {
read_struct(file, &read_s);
fclose(file);
}
printf("Read id: %d, value: %f\n", read_s.id, read_s.value);
return 0;
}By manually handling the serialization of each data member and ensuring consistent byte order, we can create binary files that are portable across different architectures. This approach mitigates issues related to endianness, data alignment, and data type size discrepancies. This example serves as a foundation for developing more complex portable binary file handling in C, paving the way for robust and architecture-agnostic data storage solutions.
Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
Please refer to LICENSE file.