logoCndocs

Socket Error Handling

Proper error handling is essential for building robust and reliable socket applications. Network programming is particularly prone to errors due to the unpredictable nature of networks, system resource limitations, and security restrictions. This section covers common socket errors, how to detect and interpret them, and strategies for error recovery.

Error Reporting in Socket Programming

In C socket programming, errors are typically reported in two ways:

  1. Return Values: Socket functions return -1 (or sometimes other negative values) to indicate an error
  2. errno Variable: The global errno variable is set to an error code that provides more specific information about what went wrong

Accessing errno

#include <errno.h>
#include <string.h>
 
// After a socket function call
if (result < 0) {
    int error_code = errno;
    printf("Error: %s (errno=%d)\n", strerror(error_code), error_code);
}

Error Reporting Functions

Several functions are available to convert error codes into human-readable messages:

char *strerror(int errnum);                 // Returns a string describing the error code
void perror(const char *s);                 // Prints a message followed by the error description

Example usage:

int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {
    perror("socket creation failed");
    exit(EXIT_FAILURE);
}

Common Socket Error Codes

Here are some of the most common error codes you'll encounter in socket programming:

Connection Establishment Errors

Error CodeDescriptionPossible CausesRecovery Strategy
ECONNREFUSEDConnection refusedServer not running, wrong portRetry with backoff, check server status
ETIMEDOUTConnection timed outNetwork issues, firewallRetry with longer timeout, check network
ENETUNREACHNetwork unreachableRouting problem, wrong addressCheck network configuration
EHOSTUNREACHHost unreachableHost down, wrong addressVerify host address, try alternative
EADDRNOTAVAILAddress not availableInvalid local addressUse a valid local address
EADDRINUSEAddress already in usePort already boundUse SO_REUSEADDR, choose different port

I/O Operation Errors

Error CodeDescriptionPossible CausesRecovery Strategy
EWOULDBLOCK / EAGAINOperation would blockNon-blocking socket, no data availableRetry later, use select/poll/epoll
EINTRInterrupted system callSignal receivedRetry the operation
EPIPEBroken pipeConnection closed by peerReconnect if needed
ECONNRESETConnection reset by peerAbrupt connection terminationReconnect if needed
EMSGSIZEMessage too longDatagram too large for bufferUse smaller messages or TCP
ENOMEMOut of memorySystem resource exhaustionFree resources, limit connections

Protocol and Parameter Errors

Error CodeDescriptionPossible CausesRecovery Strategy
EPROTONOSUPPORTProtocol not supportedInvalid protocol specifiedUse a supported protocol
EAFNOSUPPORTAddress family not supportedInvalid address familyUse a supported address family
EINVALInvalid argumentInvalid parameterCheck function parameters
EACCESPermission deniedInsufficient privilegesRun with appropriate permissions
ENOTSOCKNot a socketFile descriptor is not a socketVerify file descriptor
EOPNOTSUPPOperation not supportedUnsupported socket operationUse alternative approach

Handling Specific Error Scenarios

Non-blocking Socket Errors

When using non-blocking sockets, you need to handle EWOULDBLOCK and EAGAIN (which are typically the same value) differently from other errors:

ssize_t bytes_sent = send(sockfd, buffer, length, 0);
if (bytes_sent < 0) {
    if (errno == EAGAIN || errno == EWOULDBLOCK) {
        // Not an error, just try again later
        printf("Would block, try again later\n");
        return 0;
    } else if (errno == EINTR) {
        // Interrupted by signal, retry immediately
        printf("Interrupted, retrying\n");
        return send_data(sockfd, buffer, length);
    } else {
        // Real error
        perror("send failed");
        return -1;
    }
}

Connection Errors

When establishing a connection, you might want to retry with an exponential backoff:

int connect_with_retry(int sockfd, const struct sockaddr *addr, socklen_t addrlen,
                       int max_retries, int initial_delay_ms) {
    int retries = 0;
    int delay_ms = initial_delay_ms;
    
    while (retries < max_retries) {
        if (connect(sockfd, addr, addrlen) == 0) {
            return 0;  // Success
        }
        
        if (errno == ECONNREFUSED || errno == ETIMEDOUT || errno == ENETUNREACH) {
            printf("Connection attempt %d failed: %s. Retrying in %d ms...\n",
                   retries + 1, strerror(errno), delay_ms);
            
            // Sleep with millisecond precision
            struct timespec ts;
            ts.tv_sec = delay_ms / 1000;
            ts.tv_nsec = (delay_ms % 1000) * 1000000;
            nanosleep(&ts, NULL);
            
            // Exponential backoff
            delay_ms *= 2;
            retries++;
        } else {
            // Other errors are not retryable
            return -1;
        }
    }
    
    // Max retries exceeded
    errno = ETIMEDOUT;
    return -1;
}

Handling Partial Writes

TCP sockets may perform partial writes, especially with non-blocking sockets. You need to track how much data has been sent and retry for the remaining data:

ssize_t send_all(int sockfd, const void *buf, size_t len, int flags) {
    const char *ptr = (const char *)buf;
    size_t remaining = len;
    ssize_t sent;
    
    while (remaining > 0) {
        sent = send(sockfd, ptr, remaining, flags);
        
        if (sent < 0) {
            if (errno == EINTR) {
                // Interrupted by signal, retry immediately
                continue;
            } else if (errno == EAGAIN || errno == EWOULDBLOCK) {
                // Socket would block, return partial success
                return len - remaining;
            } else {
                // Real error
                return -1;
            }
        } else if (sent == 0) {
            // Unexpected, but treat as an error
            errno = EPIPE;
            return -1;
        }
        
        ptr += sent;
        remaining -= sent;
    }
    
    return len;
}

Handling Connection Resets

When a connection is reset by the peer, you might want to reconnect:

ssize_t send_with_reconnect(int *sockfd, const struct sockaddr *addr, socklen_t addrlen,
                           const void *buf, size_t len) {
    ssize_t result = send(*sockfd, buf, len, 0);
    
    if (result < 0 && (errno == ECONNRESET || errno == EPIPE)) {
        printf("Connection reset, attempting to reconnect...\n");
        
        // Close the old socket
        close(*sockfd);
        
        // Create a new socket
        *sockfd = socket(addr->sa_family, SOCK_STREAM, 0);
        if (*sockfd < 0) {
            perror("socket creation failed");
            return -1;
        }
        
        // Reconnect
        if (connect(*sockfd, addr, addrlen) < 0) {
            perror("reconnect failed");
            return -1;
        }
        
        // Retry the send
        result = send(*sockfd, buf, len, 0);
    }
    
    return result;
}

Error Logging and Reporting

Proper error logging is crucial for diagnosing issues in production environments. Here's a simple error logging function:

#include <stdarg.h>
#include <time.h>
 
void log_error(const char *format, ...) {
    va_list args;
    time_t now;
    char time_str[64];
    FILE *log_file;
    
    // Get current time
    time(&now);
    strftime(time_str, sizeof(time_str), "%Y-%m-%d %H:%M:%S", localtime(&now));
    
    // Open log file
    log_file = fopen("socket_errors.log", "a");
    if (log_file == NULL) {
        perror("Failed to open log file");
        return;
    }
    
    // Write timestamp
    fprintf(log_file, "[%s] ", time_str);
    
    // Write error message
    va_start(args, format);
    vfprintf(log_file, format, args);
    va_end(args);
    
    // Add errno information if available
    if (errno != 0) {
        fprintf(log_file, ": %s (errno=%d)", strerror(errno), errno);
    }
    
    fprintf(log_file, "\n");
    fclose(log_file);
}
 
// Usage example
if (connect(sockfd, (struct sockaddr *)&server_addr, sizeof(server_addr)) < 0) {
    log_error("Failed to connect to %s:%d", inet_ntoa(server_addr.sin_addr),
              ntohs(server_addr.sin_port));
    // Handle error...
}

Error Recovery Strategies

Reconnection with Exponential Backoff

When a connection fails, it's often a good idea to retry with an exponential backoff to avoid overwhelming the server:

int connect_with_exponential_backoff(int sockfd, const struct sockaddr *addr,
                                    socklen_t addrlen, int max_retries) {
    int retries = 0;
    int delay_ms = 100;  // Start with 100ms delay
    
    while (retries < max_retries) {
        if (connect(sockfd, addr, addrlen) == 0) {
            return 0;  // Success
        }
        
        if (errno == ECONNREFUSED || errno == ETIMEDOUT) {
            printf("Connection attempt %d failed: %s. Retrying in %d ms...\n",
                   retries + 1, strerror(errno), delay_ms);
            
            // Sleep with millisecond precision
            struct timespec ts;
            ts.tv_sec = delay_ms / 1000;
            ts.tv_nsec = (delay_ms % 1000) * 1000000;
            nanosleep(&ts, NULL);
            
            // Exponential backoff with jitter
            delay_ms = delay_ms * 2 + (rand() % 100);
            retries++;
        } else {
            // Other errors are not retryable
            return -1;
        }
    }
    
    // Max retries exceeded
    errno = ETIMEDOUT;
    return -1;
}

Circuit Breaker Pattern

The circuit breaker pattern prevents repeated attempts to an unavailable service, allowing it time to recover:

struct circuit_breaker {
    int state;              // 0: closed, 1: open, 2: half-open
    time_t last_failure;    // Time of last failure
    int failure_count;      // Number of consecutive failures
    int failure_threshold;  // Threshold to open the circuit
    int reset_timeout;      // Timeout in seconds to try half-open
};
 
enum { CLOSED, OPEN, HALF_OPEN };
 
void init_circuit_breaker(struct circuit_breaker *cb, int threshold, int timeout) {
    cb->state = CLOSED;
    cb->last_failure = 0;
    cb->failure_count = 0;
    cb->failure_threshold = threshold;
    cb->reset_timeout = timeout;
}
 
int circuit_breaker_allow_request(struct circuit_breaker *cb) {
    time_t now = time(NULL);
    
    switch (cb->state) {
        case CLOSED:
            return 1;  // Allow request
            
        case OPEN:
            // Check if timeout has elapsed
            if (now - cb->last_failure >= cb->reset_timeout) {
                cb->state = HALF_OPEN;
                return 1;  // Allow one test request
            }
            return 0;  // Reject request
            
        case HALF_OPEN:
            return 0;  // Allow only one request in half-open state
            
        default:
            return 0;
    }
}
 
void circuit_breaker_on_success(struct circuit_breaker *cb) {
    if (cb->state == HALF_OPEN) {
        cb->state = CLOSED;
        cb->failure_count = 0;
    } else if (cb->state == CLOSED) {
        cb->failure_count = 0;
    }
}
 
void circuit_breaker_on_failure(struct circuit_breaker *cb) {
    time_t now = time(NULL);
    cb->last_failure = now;
    
    if (cb->state == HALF_OPEN) {
        cb->state = OPEN;
    } else if (cb->state == CLOSED) {
        cb->failure_count++;
        if (cb->failure_count >= cb->failure_threshold) {
            cb->state = OPEN;
        }
    }
}
 
// Usage example
int send_with_circuit_breaker(struct circuit_breaker *cb, int sockfd,
                             const void *buf, size_t len, int flags) {
    if (!circuit_breaker_allow_request(cb)) {
        errno = ECONNREFUSED;  // Set an appropriate errno
        return -1;
    }
    
    ssize_t result = send(sockfd, buf, len, flags);
    
    if (result < 0) {
        circuit_breaker_on_failure(cb);
    } else {
        circuit_breaker_on_success(cb);
    }
    
    return result;
}

Graceful Degradation

When certain operations fail, you can fall back to alternative methods or reduced functionality:

ssize_t send_with_fallback(int primary_sockfd, int backup_sockfd,
                          const void *buf, size_t len, int flags) {
    // Try primary connection
    ssize_t result = send(primary_sockfd, buf, len, flags);
    
    if (result < 0) {
        printf("Primary connection failed: %s. Trying backup...\n", strerror(errno));
        
        // Fall back to backup connection
        result = send(backup_sockfd, buf, len, flags);
        
        if (result < 0) {
            printf("Backup connection also failed: %s\n", strerror(errno));
        }
    }
    
    return result;
}

Defensive Programming Techniques

Input Validation

Always validate input parameters before passing them to socket functions:

int create_and_bind(const char *ip, int port) {
    // Validate parameters
    if (ip == NULL || port <= 0 || port > 65535) {
        errno = EINVAL;
        return -1;
    }
    
    // Create socket
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) {
        return -1;
    }
    
    // Initialize address structure
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_port = htons(port);
    
    // Convert IP address
    if (inet_pton(AF_INET, ip, &addr.sin_addr) <= 0) {
        close(sockfd);
        errno = EINVAL;
        return -1;
    }
    
    // Bind socket
    if (bind(sockfd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        int bind_errno = errno;
        close(sockfd);
        errno = bind_errno;
        return -1;
    }
    
    return sockfd;
}

Resource Cleanup

Always clean up resources, even in error cases:

int connect_to_server(const char *ip, int port) {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) {
        return -1;
    }
    
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_port = htons(port);
    
    if (inet_pton(AF_INET, ip, &addr.sin_addr) <= 0) {
        close(sockfd);
        errno = EINVAL;
        return -1;
    }
    
    if (connect(sockfd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
        int connect_errno = errno;
        close(sockfd);
        errno = connect_errno;
        return -1;
    }
    
    return sockfd;
}

Timeouts

Set timeouts to prevent operations from blocking indefinitely:

int set_socket_timeout(int sockfd, int recv_timeout_sec, int send_timeout_sec) {
    struct timeval recv_tv, send_tv;
    
    if (recv_timeout_sec > 0) {
        recv_tv.tv_sec = recv_timeout_sec;
        recv_tv.tv_usec = 0;
        if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &recv_tv, sizeof(recv_tv)) < 0) {
            return -1;
        }
    }
    
    if (send_timeout_sec > 0) {
        send_tv.tv_sec = send_timeout_sec;
        send_tv.tv_usec = 0;
        if (setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO, &send_tv, sizeof(send_tv)) < 0) {
            return -1;
        }
    }
    
    return 0;
}

Best Practices for Socket Error Handling

  1. Check Return Values: Always check the return values of socket functions
  2. Use perror() or strerror(): Provide meaningful error messages
  3. Implement Proper Logging: Log errors with context information
  4. Clean Up Resources: Close sockets and free memory, even in error cases
  5. Handle Partial Operations: Account for partial sends and receives
  6. Set Timeouts: Prevent operations from blocking indefinitely
  7. Implement Retry Logic: Retry operations that might succeed on a subsequent attempt
  8. Use Circuit Breakers: Prevent repeated attempts to unavailable services
  9. Validate Input: Check parameters before passing them to socket functions
  10. Provide Fallbacks: Implement alternative methods when primary ones fail

Conclusion

Proper error handling is essential for building robust and reliable socket applications. By understanding common error codes, implementing appropriate recovery strategies, and following best practices, you can create applications that gracefully handle network issues, resource limitations, and other potential problems.

In the next section, we'll explore network byte order and data serialization, which are crucial for ensuring that data is correctly interpreted when transmitted between different systems.

Test Your Knowledge

Take a quiz to reinforce what you've learned

Exam Preparation

Access short and long answer questions for written exams

Share this page