Socket Error Handling

Proper error handling is essential for building robust and reliable socket applications. Network programming is particularly prone to errors due to the unpredictable nature of networks, system resource limitations, and security restrictions. This section covers common socket errors, how to detect and interpret them, and strategies for error recovery.

Error Reporting in Socket Programming

In C socket programming, errors are typically reported in two ways:

Return Values: Socket functions return -1 (or sometimes other negative values) to indicate an error
errno Variable: The global errno variable is set to an error code that provides more specific information about what went wrong

Accessing errno

#include <errno.h>
#include <string.h>
 
// After a socket function call
if (result < 0) {
    int error_code = errno;
    printf("Error: %s (errno=%d)\n", strerror(error_code), error_code);
}

Error Reporting Functions

Several functions are available to convert error codes into human-readable messages:

char *strerror(int errnum);                 // Returns a string describing the error code
void perror(const char *s);                 // Prints a message followed by the error description

Example usage:

int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {
    perror("socket creation failed");
    exit(EXIT_FAILURE);
}

Common Socket Error Codes

Here are some of the most common error codes you'll encounter in socket programming:

Connection Establishment Errors

Error Code	Description	Possible Causes	Recovery Strategy
ECONNREFUSED	Connection refused	Server not running, wrong port	Retry with backoff, check server status
ETIMEDOUT	Connection timed out	Network issues, firewall	Retry with longer timeout, check network
ENETUNREACH	Network unreachable	Routing problem, wrong address	Check network configuration
EHOSTUNREACH	Host unreachable	Host down, wrong address	Verify host address, try alternative
EADDRNOTAVAIL	Address not available	Invalid local address	Use a valid local address
EADDRINUSE	Address already in use	Port already bound	Use SO_REUSEADDR, choose different port

I/O Operation Errors

Error Code	Description	Possible Causes	Recovery Strategy
EWOULDBLOCK / EAGAIN	Operation would block	Non-blocking socket, no data available	Retry later, use select/poll/epoll
EINTR	Interrupted system call	Signal received	Retry the operation
EPIPE	Broken pipe	Connection closed by peer	Reconnect if needed
ECONNRESET	Connection reset by peer	Abrupt connection termination	Reconnect if needed
EMSGSIZE	Message too long	Datagram too large for buffer	Use smaller messages or TCP
ENOMEM	Out of memory	System resource exhaustion	Free resources, limit connections

Protocol and Parameter Errors

Error Code	Description	Possible Causes	Recovery Strategy
EPROTONOSUPPORT	Protocol not supported	Invalid protocol specified	Use a supported protocol
EAFNOSUPPORT	Address family not supported	Invalid address family	Use a supported address family
EINVAL	Invalid argument	Invalid parameter	Check function parameters
EACCES	Permission denied	Insufficient privileges	Run with appropriate permissions
ENOTSOCK	Not a socket	File descriptor is not a socket	Verify file descriptor
EOPNOTSUPP	Operation not supported	Unsupported socket operation	Use alternative approach

Handling Specific Error Scenarios

Non-blocking Socket Errors

When using non-blocking sockets, you need to handle EWOULDBLOCK and EAGAIN (which are typically the same value) differently from other errors:

ssize_t bytes_sent = send(sockfd, buffer, length, 0);
if (bytes_sent < 0) {
    if (errno == EAGAIN || errno == EWOULDBLOCK) {
        // Not an error, just try again later
        printf("Would block, try again later\n");
        return 0;
    } else if (errno == EINTR) {
        // Interrupted by signal, retry immediately
        printf("Interrupted, retrying\n");
        return send_data(sockfd, buffer, length);
    } else {
        // Real error
        perror("send failed");
        return -1;
    }
}

Connection Errors

When establishing a connection, you might want to retry with an exponential backoff:

int connect_with_retry(int sockfd, const struct sockaddr *addr, socklen_t addrlen,
                       int max_retries, int initial_delay_ms) {
    int retries = 0;
    int delay_ms = initial_delay_ms;
    
    while (retries < max_retries) {
        if (connect(sockfd, addr, addrlen) == 0) {
            return 0;  // Success
        }
        
        if (errno == ECONNREFUSED || errno == ETIMEDOUT || errno == ENETUNREACH) {
            printf("Connection attempt %d failed: %s. Retrying in %d ms...\n",
                   retries + 1, strerror(errno), delay_ms);
            
            // Sleep with millisecond precision
            struct timespec ts;
            ts.tv_sec = delay_ms / 1000;
            ts.tv_nsec = (delay_ms % 1000) * 1000000;
            nanosleep(&ts, NULL);
            
            // Exponential backoff
            delay_ms *= 2;
            retries++;
        } else {
            // Other errors are not retryable
            return -1;
        }
    }
    
    // Max retries exceeded
    errno = ETIMEDOUT;
    return -1;
}

Handling Partial Writes

TCP sockets may perform partial writes, especially with non-blocking sockets. You need to track how much data has been sent and retry for the remaining data:

ssize_t send_all(int sockfd, const void *buf, size_t len, int flags) {
    const char *ptr = (const char *)buf;
    size_t remaining = len;
    ssize_t sent;
    
    while (remaining > 0) {
        sent = send(sockfd, ptr, remaining, flags);
        
        if (sent < 0) {
            if (errno == EINTR) {
                // Interrupted by signal, retry immediately
                continue;
            } else if (errno == EAGAIN || errno == EWOULDBLOCK) {
                // Socket would block, return partial success
                return len - remaining;
            } else {
                // Real error
                return -1;
            }
        } else if (sent == 0) {
            // Unexpected, but treat as an error
            errno = EPIPE;
            return -1;
        }
        
        ptr += sent;
        remaining -= sent;
    }
    
    return len;
}

Handling Connection Resets

When a connection is reset by the peer, you might want to reconnect:

ssize_t send_with_reconnect(int *sockfd, const struct sockaddr *addr, socklen_t addrlen,
                           const void *buf, size_t len) {
    ssize_t result = send(*sockfd, buf, len, 0);
    
    if (result < 0 && (errno == ECONNRESET || errno == EPIPE)) {
        printf("Connection reset, attempting to reconnect...\n");
        
        // Close the old socket
        close(*sockfd);
        
        // Create a new socket
        *sockfd = socket(addr->sa_family, SOCK_STREAM, 0);
        if (*sockfd < 0) {
            perror("socket creation failed");
            return -1;
        }
        
        // Reconnect
        if (connect(*sockfd, addr, addrlen) < 0) {
            perror("reconnect failed");
            return -1;
        }
        
        // Retry the send
        result = send(*sockfd, buf, len, 0);
    }
    
    return result;
}

Error Logging and Reporting

Proper error logging is crucial for diagnosing issues in production environments. Here's a simple error logging function:

#include <stdarg.h>
#include <time.h>
 
void log_error(const char *format, ...) {
    va_list args;
    time_t now;
    char time_str[64];
    FILE *log_file;
    
    // Get current time
    time(&now);
    strftime(time_str, sizeof(time_str), "%Y-%m-%d %H:%M:%S", localtime(&now));
    
    // Open log file
    log_file = fopen("socket_errors.log", "a");
    if (log_file == NULL) {
        perror("Failed to open log file");
        return;
    }
    
    // Write timestamp
    fprintf(log_file, "[%s] ", time_str);
    
    // Write error message
    va_start(args, format);
    vfprintf(log_file, format, args);
    va_end(args);
    
    // Add errno information if available
    if (errno != 0) {
        fprintf(log_file, ": %s (errno=%d)", strerror(errno), errno);
    }
    
    fprintf(log_file, "\n");
    fclose(log_file);
}
 
// Usage example
if (connect(sockfd, (struct sockaddr *)&server_addr, sizeof(server_addr)) < 0) {
    log_error("Failed to connect to %s:%d", inet_ntoa(server_addr.sin_addr),
              ntohs(server_addr.sin_port));
    // Handle error...
}

Error Recovery Strategies

Reconnection with Exponential Backoff

When a connection fails, it's often a good idea to retry with an exponential backoff to avoid overwhelming the server:

int connect_with_exponential_backoff(int sockfd, const struct sockaddr *addr,
                                    socklen_t addrlen, int max_retries) {
    int retries = 0;
    int delay_ms = 100;  // Start with 100ms delay
    
    while (retries < max_retries) {
        if (connect(sockfd, addr, addrlen) == 0) {
            return 0;  // Success
        }
        
        if (errno == ECONNREFUSED || errno == ETIMEDOUT) {
            printf("Connection attempt %d failed: %s. Retrying in %d ms...\n",
                   retries + 1, strerror(errno), delay_ms);
            
            // Sleep with millisecond precision
            struct timespec ts;
            ts.tv_sec = delay_ms / 1000;
            ts.tv_nsec = (delay_ms % 1000) * 1000000;
            nanosleep(&ts, NULL);
            
            // Exponential backoff with jitter
            delay_ms = delay_ms * 2 + (rand() % 100);
            retries++;
        } else {
            // Other errors are not retryable
            return -1;
        }
    }
    
    // Max retries exceeded
    errno = ETIMEDOUT;
    return -1;
}

Circuit Breaker Pattern

The circuit breaker pattern prevents repeated attempts to an unavailable service, allowing it time to recover:

struct circuit_breaker {
    int state;              // 0: closed, 1: open, 2: half-open
    time_t last_failure;    // Time of last failure
    int failure_count;      // Number of consecutive failures
    int failure_threshold;  // Threshold to open the circuit
    int reset_timeout;      // Timeout in seconds to try half-open
};
 
enum { CLOSED, OPEN, HALF_OPEN };
 
void init_circuit_breaker(struct circuit_breaker *cb, int threshold, int timeout) {
    cb->state = CLOSED;
    cb->last_failure = 0;
    cb->failure_count = 0;
    cb->failure_threshold = threshold;
    cb->reset_timeout = timeout;
}
 
int circuit_breaker_allow_request(struct circuit_breaker *cb) {
    time_t now = time(NULL);
    
    switch (cb->state) {
        case CLOSED:
            return 1;  // Allow request
            
        case OPEN:
            // Check if timeout has elapsed
            if (now - cb->last_failure >= cb->reset_timeout) {
                cb->state = HALF_OPEN;
                return 1;  // Allow one test request
            }
            return 0;  // Reject request
            
        case HALF_OPEN:
            return 0;  // Allow only one request in half-open state
            
        default:
            return 0;
    }
}
 
void circuit_breaker_on_success(struct circuit_breaker *cb) {
    if (cb->state == HALF_OPEN) {
        cb->state = CLOSED;
        cb->failure_count = 0;
    } else if (cb->state == CLOSED) {
        cb->failure_count = 0;
    }
}
 
void circuit_breaker_on_failure(struct circuit_breaker *cb) {
    time_t now = time(NULL);
    cb->last_failure = now;
    
    if (cb->state == HALF_OPEN) {
        cb->state = OPEN;
    } else if (cb->state == CLOSED) {
        cb->failure_count++;
        if (cb->failure_count >= cb->failure_threshold) {
            cb->state = OPEN;
        }
    }
}
 
// Usage example
int send_with_circuit_breaker(struct circuit_breaker *cb, int sockfd,
                             const void *buf, size_t len, int flags) {
    if (!circuit_breaker_allow_request(cb)) {
        errno = ECONNREFUSED;  // Set an appropriate errno
        return -1;
    }
    
    ssize_t result = send(sockfd, buf, len, flags);
    
    if (result < 0) {
        circuit_breaker_on_failure(cb);
    } else {
        circuit_breaker_on_success(cb);
    }
    
    return result;
}

Graceful Degradation

When certain operations fail, you can fall back to alternative methods or reduced functionality:

ssize_t send_with_fallback(int primary_sockfd, int backup_sockfd,
                          const void *buf, size_t len, int flags) {
    // Try primary connection
    ssize_t result = send(primary_sockfd, buf, len, flags);
    
    if (result < 0) {
        printf("Primary connection failed: %s. Trying backup...\n", strerror(errno));
        
        // Fall back to backup connection
        result = send(backup_sockfd, buf, len, flags);
        
        if (result < 0) {
            printf("Backup connection also failed: %s\n", strerror(errno));
        }
    }
    
    return result;
}

Defensive Programming Techniques

Input Validation

Always validate input parameters before passing them to socket functions:

int create_and_bind(const char *ip, int port) {
    // Validate parameters
    if (ip == NULL || port <= 0 || port > 65535) {
        errno = EINVAL;
        return -1;
    }
    
    // Create socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) {
        return -1;
    }
    
    // Initialize address structure
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_port = htons(port);
    
    // Convert IP address
    if (inet_pton(AF_INET, ip, &addr.sin_addr) <= 0) {
        close(sockfd);
        errno = EINVAL;
        return -1;
    }
    
    // Bind socket
    if (bind(sockfd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        int bind_errno = errno;
        close(sockfd);
        errno = bind_errno;
        return -1;
    }
    
    return sockfd;
}

Resource Cleanup

Always clean up resources, even in error cases:

int connect_to_server(const char *ip, int port) {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) {
        return -1;
    }
    
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_port = htons(port);
    
    if (inet_pton(AF_INET, ip, &addr.sin_addr) <= 0) {
        close(sockfd);
        errno = EINVAL;
        return -1;
    }
    
    if (connect(sockfd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        int connect_errno = errno;
        close(sockfd);
        errno = connect_errno;
        return -1;
    }
    
    return sockfd;
}

Timeouts

Set timeouts to prevent operations from blocking indefinitely:

int set_socket_timeout(int sockfd, int recv_timeout_sec, int send_timeout_sec) {
    struct timeval recv_tv, send_tv;
    
    if (recv_timeout_sec > 0) {
        recv_tv.tv_sec = recv_timeout_sec;
        recv_tv.tv_usec = 0;
        if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &recv_tv, sizeof(recv_tv)) < 0) {
            return -1;
        }
    }
    
    if (send_timeout_sec > 0) {
        send_tv.tv_sec = send_timeout_sec;
        send_tv.tv_usec = 0;
        if (setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO, &send_tv, sizeof(send_tv)) < 0) {
            return -1;
        }
    }
    
    return 0;
}

Best Practices for Socket Error Handling

Check Return Values: Always check the return values of socket functions
Use perror() or strerror(): Provide meaningful error messages
Implement Proper Logging: Log errors with context information
Clean Up Resources: Close sockets and free memory, even in error cases
Handle Partial Operations: Account for partial sends and receives
Set Timeouts: Prevent operations from blocking indefinitely
Implement Retry Logic: Retry operations that might succeed on a subsequent attempt
Use Circuit Breakers: Prevent repeated attempts to unavailable services
Validate Input: Check parameters before passing them to socket functions
Provide Fallbacks: Implement alternative methods when primary ones fail

Conclusion

Proper error handling is essential for building robust and reliable socket applications. By understanding common error codes, implementing appropriate recovery strategies, and following best practices, you can create applications that gracefully handle network issues, resource limitations, and other potential problems.

In the next section, we'll explore network byte order and data serialization, which are crucial for ensuring that data is correctly interpreted when transmitted between different systems.

Test Your Knowledge

Take a quiz to reinforce what you've learned

Exam Preparation

Access short and long answer questions for written exams

Share this page

WhatsApp Twitter/X

Test Your Knowledge

Exam Preparation

On this page