Portable Binary File Serialization in C: Ensuring Cross-Platform Data Integrity

A simple guide and example code for creating portable binary files in C, addressing architecture-specific challenges like endianness and data alignment.

Get full source code here.

Introduction

In systems programming, it is often necessary to store data in binary files for efficient storage and retrieval. However, binary data can be architecture-specific, which poses challenges when transferring files between different systems. oOne simple example of this happens when writing a file using a program compiled using x64 compiler and reading that data using a program compiled using x86 compiler.

This short article explores the methods for creating portable binary files in C, ensuring data consistency across diverse architectures.

The Challenge of Portability

Binary files store data in a raw, unencoded format that is directly mapped from memory. This direct mapping leads to potential issues:

Endianness: Different systems use different byte orders for multi-byte data. Big-endian systems store the most significant byte first, while little-endian systems store the least significant byte first.
Data Alignment: Compilers may introduce padding between structure members to align data in memory, which varies across systems.
Data Type Sizes: The size of basic data types like int and long can differ between architectures.

Ensuring Portability

To achieve portability, we must standardize the binary format, explicitly controlling byte order and structure padding. This involves manually serializing and deserializing each data member, converting them to a fixed byte order, and avoiding direct memory writes.

Implementation

Below is a simple example demonstrating how to serialize and deserialize a simple structure in a portable manner:

#include <stdio.h>
#include <stdint.h>

// Function to write a 32-bit integer in a portable way
void write_int32(FILE *file, int32_t value) {
    uint8_t bytes[4];
    bytes[0] = (value >> 24) & 0xFF;
    bytes[1] = (value >> 16) & 0xFF;
    bytes[2] = (value >> 8) & 0xFF;
    bytes[3] = value & 0xFF;
    fwrite(bytes, sizeof(bytes), 1, file);
}

// Function to write a float in a portable way
void write_float(FILE *file, float value) {
    uint8_t bytes[4];
    uint32_t as_int = *((uint32_t*)&value);
    bytes[0] = (as_int >> 24) & 0xFF;
    bytes[1] = (as_int >> 16) & 0xFF;
    bytes[2] = (as_int >> 8) & 0xFF;
    bytes[3] = as_int & 0xFF;
    fwrite(bytes, sizeof(bytes), 1, file);
}

// Function to read a 32-bit integer in a portable way
int32_t read_int32(FILE *file) {
    uint8_t bytes[4];
    fread(bytes, sizeof(bytes), 1, file);
    return (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2] << 8) | bytes[3];
}

// Function to read a float in a portable way
float read_float(FILE *file) {
    uint8_t bytes[4];
    fread(bytes, sizeof(bytes), 1, file);
    uint32_t as_int = (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2] << 8) | bytes[3];
    return *((float*)&as_int);
}

typedef struct {
    int32_t id;
    float value;
} MyStruct;

void write_struct(FILE *file, const MyStruct *s) {
    write_int32(file, s->id);
    write_float(file, s->value);
}

void read_struct(FILE *file, MyStruct *s) {
    s->id = read_int32(file);
    s->value = read_float(file);
}

int main() {
    MyStruct s = {123, 456.789};
    
    FILE *file = fopen("data.bin", "wb");
    if (file != NULL) {
        write_struct(file, &s);
        fclose(file);
    }
    
    MyStruct read_s;
    file = fopen("data.bin", "rb");
    if (file != NULL) {
        read_struct(file, &read_s);
        fclose(file);
    }
    
    printf("Read id: %d, value: %f\n", read_s.id, read_s.value);
    
    return 0;
}

Conclusion

By manually handling the serialization of each data member and ensuring consistent byte order, we can create binary files that are portable across different architectures. This approach mitigates issues related to endianness, data alignment, and data type size discrepancies. This example serves as a foundation for developing more complex portable binary file handling in C, paving the way for robust and architecture-agnostic data storage solutions.

License

Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

Please refer to LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
portable_binary.c		portable_binary.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Portable Binary File Serialization in C: Ensuring Cross-Platform Data Integrity

Introduction

The Challenge of Portability

Ensuring Portability

Implementation

Conclusion

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

dezashibi-c/a-writing_portable_binary

Folders and files

Latest commit

History

Repository files navigation

Portable Binary File Serialization in C: Ensuring Cross-Platform Data Integrity

Introduction

The Challenge of Portability

Ensuring Portability

Implementation

Conclusion

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages