logoCndocs
Socket Fundamentals

Network Byte Order and Data Serialization

When exchanging data between different systems over a network, it's essential to ensure that both sides interpret the data correctly. This section covers network byte order, endianness, and various techniques for serializing data in socket programming.

Endianness and Byte Order

Endianness refers to the order in which bytes are stored in memory for multi-byte data types (like integers and floats). There are two main types of endianness:

  1. Big-Endian: The most significant byte is stored at the lowest memory address
  2. Little-Endian: The least significant byte is stored at the lowest memory address
Big-Endian vs Little-Endian

Different computer architectures use different byte orders:

  • x86 and x86-64 (Intel, AMD) use little-endian
  • ARM can use either, but often uses little-endian
  • SPARC, PowerPC, and MIPS traditionally use big-endian

Network Byte Order

To ensure consistent interpretation of data across different systems, network protocols define a standard byte order called "network byte order," which is big-endian.

When sending multi-byte values over a network, you should convert them from the host's byte order to network byte order. When receiving, you should convert from network byte order to the host's byte order.

Byte Order Conversion Functions

The Socket API provides functions to convert between host and network byte order:

#include <arpa/inet.h>
 
// Host to Network
uint16_t htons(uint16_t hostshort);   // Host to Network Short (16-bit)
uint32_t htonl(uint32_t hostlong);    // Host to Network Long (32-bit)
 
// Network to Host
uint16_t ntohs(uint16_t netshort);    // Network to Host Short (16-bit)
uint32_t ntohl(uint32_t netlong);     // Network to Host Long (32-bit)

Example: Using Byte Order Conversion Functions

#include <stdio.h>
#include <arpa/inet.h>
 
int main() {
    // Original values in host byte order
    uint16_t host_port = 8080;
    uint32_t host_addr = 0x0A000001;  // 10.0.0.1
    
    // Convert to network byte order
    uint16_t net_port = htons(host_port);
    uint32_t net_addr = htonl(host_addr);
    
    printf("Host port: %u, Network port: %u\n", host_port, net_port);
    printf("Host addr: 0x%X, Network addr: 0x%X\n", host_addr, net_addr);
    
    // Convert back to host byte order
    uint16_t new_host_port = ntohs(net_port);
    uint32_t new_host_addr = ntohl(net_addr);
    
    printf("Converted back - Port: %u, Addr: 0x%X\n", new_host_port, new_host_addr);
    
    return 0;
}

On a little-endian system, this might output:

Host port: 8080, Network port: 36895
Host addr: 0xA000001, Network addr: 0x100000A
Converted back - Port: 8080, Addr: 0xA000001

On a big-endian system, the values would remain the same since network byte order is big-endian.

IP Address Conversion Functions

For IP addresses, the Socket API provides functions to convert between string and binary representations:

#include <arpa/inet.h>
 
// IPv4 conversions
int inet_pton(int af, const char *src, void *dst);
const char *inet_ntop(int af, const void *src, char *dst, socklen_t size);
 
// Older functions (deprecated)
in_addr_t inet_addr(const char *cp);
char *inet_ntoa(struct in_addr in);

Example: Using IP Address Conversion Functions

#include <stdio.h>
#include <arpa/inet.h>
#include <string.h>
 
int main() {
    // Convert string IP to binary
    struct in_addr addr;
    if (inet_pton(AF_INET, "192.168.1.1", &addr) <= 0) {
        perror("inet_pton failed");
        return 1;
    }
    
    // The binary address is now in network byte order
    printf("Binary address: 0x%X\n", addr.s_addr);
    
    // Convert binary IP back to string
    char ip_str[INET_ADDRSTRLEN];
    if (inet_ntop(AF_INET, &addr, ip_str, INET_ADDRSTRLEN) == NULL) {
        perror("inet_ntop failed");
        return 1;
    }
    
    printf("String address: %s\n", ip_str);
    
    return 0;
}

Output:

Binary address: 0x0101A8C0
String address: 192.168.1.1

Data Serialization

Data serialization is the process of converting structured data into a format that can be transmitted over a network and reconstructed on the receiving end. There are several approaches to data serialization in socket programming:

1. Manual Serialization

The simplest approach is to manually pack and unpack data using byte order conversion functions:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
 
// Structure to be serialized
struct person {
    uint32_t id;
    uint16_t age;
    char name[32];
};
 
// Serialize a person structure into a buffer
size_t serialize_person(const struct person *p, void *buffer) {
    char *ptr = (char *)buffer;
    
    // Write id in network byte order
    uint32_t net_id = htonl(p->id);
    memcpy(ptr, &net_id, sizeof(net_id));
    ptr += sizeof(net_id);
    
    // Write age in network byte order
    uint16_t net_age = htons(p->age);
    memcpy(ptr, &net_age, sizeof(net_age));
    ptr += sizeof(net_age);
    
    // Write name (no byte order conversion needed for chars)
    memcpy(ptr, p->name, sizeof(p->name));
    ptr += sizeof(p->name);
    
    // Return the total size
    return ptr - (char *)buffer;
}
 
// Deserialize a buffer into a person structure
void deserialize_person(const void *buffer, struct person *p) {
    const char *ptr = (const char *)buffer;
    
    // Read id in network byte order
    uint32_t net_id;
    memcpy(&net_id, ptr, sizeof(net_id));
    p->id = ntohl(net_id);
    ptr += sizeof(net_id);
    
    // Read age in network byte order
    uint16_t net_age;
    memcpy(&net_age, ptr, sizeof(net_age));
    p->age = ntohs(net_age);
    ptr += sizeof(net_age);
    
    // Read name
    memcpy(p->name, ptr, sizeof(p->name));
}
 
int main() {
    // Create a person
    struct person alice = {
        .id = 12345,
        .age = 30,
        .name = "Alice"
    };
    
    // Serialize
    char buffer[sizeof(struct person)];
    size_t size = serialize_person(&alice, buffer);
    
    printf("Serialized %zu bytes\n", size);
    
    // Deserialize
    struct person received;
    deserialize_person(buffer, &received);
    
    printf("Received: id=%u, age=%u, name=%s\n",
           received.id, received.age, received.name);
    
    return 0;
}

2. Using Packed Structures

Another approach is to use packed structures with fixed-size fields:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
 
// Packed structure to ensure no padding
struct __attribute__((packed)) message {
    uint32_t type;
    uint32_t length;
    char data[128];
};
 
// Convert message to network byte order
void message_to_network(struct message *msg) {
    msg->type = htonl(msg->type);
    msg->length = htonl(msg->length);
    // No conversion needed for data (chars)
}
 
// Convert message from network to host byte order
void message_from_network(struct message *msg) {
    msg->type = ntohl(msg->type);
    msg->length = ntohl(msg->length);
    // No conversion needed for data (chars)
}
 
int main() {
    // Create a message
    struct message msg = {
        .type = 1,
        .length = 13,
        .data = "Hello, World!"
    };
    
    // Convert to network byte order
    message_to_network(&msg);
    
    // At this point, msg could be sent over the network
    
    // Convert back to host byte order
    message_from_network(&msg);
    
    printf("Message: type=%u, length=%u, data=%s\n",
           msg.type, msg.length, msg.data);
    
    return 0;
}

Note: The __attribute__((packed)) directive is compiler-specific (GCC and Clang support it) and ensures that the structure has no padding between fields.

3. Using Protocol Buffers

For more complex data structures, consider using a library like Protocol Buffers:

// Example .proto file
syntax = "proto3";
 
message Person {
    uint32 id = 1;
    uint32 age = 2;
    string name = 3;
}

Using Protocol Buffers requires additional setup and libraries, but provides benefits like versioning, backward compatibility, and cross-language support.

Variable-Length Data

Handling variable-length data requires special attention. Here's an approach using length prefixing:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
 
// Serialize a string with length prefix
size_t serialize_string(const char *str, void *buffer) {
    char *ptr = (char *)buffer;
    
    // Get string length
    uint32_t length = strlen(str);
    
    // Write length in network byte order
    uint32_t net_length = htonl(length);
    memcpy(ptr, &net_length, sizeof(net_length));
    ptr += sizeof(net_length);
    
    // Write string data
    memcpy(ptr, str, length);
    ptr += length;
    
    // Return total size
    return ptr - (char *)buffer;
}
 
// Deserialize a string with length prefix
char *deserialize_string(const void *buffer, size_t *bytes_read) {
    const char *ptr = (const char *)buffer;
    
    // Read length
    uint32_t net_length;
    memcpy(&net_length, ptr, sizeof(net_length));
    uint32_t length = ntohl(net_length);
    ptr += sizeof(net_length);
    
    // Allocate memory for the string
    char *str = malloc(length + 1);
    if (str == NULL) {
        return NULL;
    }
    
    // Copy string data
    memcpy(str, ptr, length);
    str[length] = '\0';  // Null-terminate
    
    // Update bytes read
    if (bytes_read != NULL) {
        *bytes_read = sizeof(net_length) + length;
    }
    
    return str;
}
 
int main() {
    const char *message = "Hello, this is a variable-length string!";
    
    // Allocate buffer (size = 4 bytes for length + string length)
    size_t buffer_size = sizeof(uint32_t) + strlen(message);
    void *buffer = malloc(buffer_size);
    
    // Serialize
    size_t bytes_written = serialize_string(message, buffer);
    printf("Serialized %zu bytes\n", bytes_written);
    
    // Deserialize
    size_t bytes_read;
    char *received = deserialize_string(buffer, &bytes_read);
    
    printf("Deserialized %zu bytes: %s\n", bytes_read, received);
    
    // Clean up
    free(buffer);
    free(received);
    
    return 0;
}

Handling Floating-Point Numbers

Floating-point numbers don't have a standard network representation. One approach is to convert them to a fixed-point representation or serialize them as strings:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>
 
// Serialize a float as a fixed-point number
size_t serialize_float(float value, void *buffer) {
    // Convert to fixed-point with 3 decimal places
    int32_t fixed_point = (int32_t)(value * 1000);
    
    // Convert to network byte order
    int32_t net_value = htonl(fixed_point);
    
    // Write to buffer
    memcpy(buffer, &net_value, sizeof(net_value));
    
    return sizeof(net_value);
}
 
// Deserialize a fixed-point number to float
float deserialize_float(const void *buffer) {
    // Read from buffer
    int32_t net_value;
    memcpy(&net_value, buffer, sizeof(net_value));
    
    // Convert from network byte order
    int32_t fixed_point = ntohl(net_value);
    
    // Convert to float
    return (float)fixed_point / 1000.0f;
}
 
int main() {
    float original = 123.456f;
    char buffer[sizeof(int32_t)];
    
    // Serialize
    serialize_float(original, buffer);
    
    // Deserialize
    float result = deserialize_float(buffer);
    
    printf("Original: %f, Result: %f\n", original, result);
    
    return 0;
}

Text-Based Serialization

For human-readable formats, consider using text-based serialization like JSON:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
// Simple JSON serialization for a person
char *person_to_json(const char *name, int age, int id) {
    // Allocate buffer (estimate size)
    size_t buffer_size = strlen(name) + 50;
    char *json = malloc(buffer_size);
    
    if (json == NULL) {
        return NULL;
    }
    
    // Format as JSON
    snprintf(json, buffer_size,
             "{\"name\":\"%s\",\"age\":%d,\"id\":%d}",
             name, age, id);
    
    return json;
}
 
// Note: Proper JSON parsing would require a library
 
int main() {
    char *json = person_to_json("Alice", 30, 12345);
    
    if (json != NULL) {
        printf("JSON: %s\n", json);
        free(json);
    }
    
    return 0;
}

For real applications, use a proper JSON library like cJSON, json-c, or jansson.

Best Practices for Data Serialization

  1. Always Use Network Byte Order: Convert multi-byte values to network byte order before sending
  2. Define Clear Message Formats: Document the exact format of your messages
  3. Include Version Information: Add a version field to support protocol evolution
  4. Handle Variable-Length Data: Use length prefixes or delimiters for variable-length fields
  5. Validate Input: Check that received data conforms to your protocol
  6. Consider Using Established Formats: JSON, Protocol Buffers, or MessagePack for complex data
  7. Test with Different Endianness: Ensure your code works on both big-endian and little-endian systems
  8. Implement Error Checking: Add checksums or CRCs for data integrity
  9. Document Byte Order Assumptions: Make it clear which fields require byte order conversion
  10. Keep It Simple: Use the simplest serialization that meets your needs

Conclusion

Proper handling of byte order and data serialization is essential for reliable network communication. By understanding endianness, using network byte order, and implementing appropriate serialization techniques, you can ensure that data is correctly interpreted by all systems in your network.

In the next section, we'll explore multicast and broadcast communication, which allow sending data to multiple recipients simultaneously.

Test Your Knowledge

Take a quiz to reinforce what you've learned

Exam Preparation

Access short and long answer questions for written exams

Share this page