TCP Reliability and Congestion Control
Pillars of Reliable Transmission
TCP transforms the "best-effort" delivery of the IP layer into a reliable, ordered bitstream through four key mechanisms:
- Sequence and Acknowledgment: Tracks every byte for order and confirmation.
- Retransmission: Automatically re-sends data if no ACK arrives.
- Flow Control: Prevents the sender from overwhelming the receiver.
- Congestion Control: Prevents the sender from overwhelming the network.
Sequence Numbers and Cumulative ACK
Every byte in a TCP stream is numbered. When a receiver sends an ACK=N, it tells the sender: "I have received everything up to N-1. I am now waiting for byte N." This is known as Cumulative Acknowledgment.
Adaptive Retransmission (RTO)
TCP starts a timer for every segment. If no ACK arrives before the timer expires, the segment is retransmitted. The RTO (Retransmission Timeout) is not static; it is dynamically calculated based on continuous sampling of the RTT (Round Trip Time) and its variance.
The Sliding Window
The sliding window allows a sender to have multiple packets "in flight" (sent but not yet acknowledged) to maximize bandwidth utilization.
Sender's Window:
├────────────┤├──────────────────┤├────────────┤├────────────┤
│ Confirmed ││ Sent, Unacked ││ Can Send ││ Cannot Send│
└────────────┘└──────────────────┘└────────────┘└────────────┘
└───── Window Size ──────┘
Flow Control: The rwnd
The Receiver Window (rwnd) is the amount of space left in the receiver's buffer. The receiver communicates this value back to the sender in the TCP header's Window field. If rwnd drops to 0, the sender must stop immediately to prevent a buffer overflow.
Congestion Control: The cwnd
While flow control protects the end-node, congestion control protects the network. The Congestion Window (cwnd) limits the amount of data in flight based on perceived network noise (packet loss).
Actual Transmission Window = min(rwnd, cwnd)
The Four Phases of Congestion Control
- Slow Start: Start with
cwnd = 1and double it every RTT (Exponential Growth). Despite the name, this is incredibly fast. - Congestion Avoidance: Once a threshold (
ssthresh) is hit, switch to Linear Growth (+1 per RTT) to probe the bandwidth limit safely. - Fast Retransmit: If the sender receives 3 duplicate ACKs for the same segment, it assumes that segment is lost (and others after it were received) and retransmits immediately without waiting for a timeout.
- Fast Recovery: After a Fast Retransmit,
ssthreshis cut in half, but the window isn't reset to 1; it starts growing linearly from the newssthresh.
Deep Technical Insights
Fast Retransmit vs. Timeout
A Timeout is a sign of "Heavy Congestion" (nothing is getting through), and TCP reacts violently by resetting cwnd back to 1. 3 Duplicate ACKs is a sign of "Light Congestion" or "Reordering" (the network is still delivering subsequent packets), so TCP only halves the window. This distinction is what keeps the internet stable yet responsive.
The Silly Window Syndrome
If a receiver is very slow and its buffer only frees up 1 byte at a time, a naive TCP would send 1-byte packets (which have a 40-byte header overhead). This is "Silly Window Syndrome." Modern TCP uses Nagle’s Algorithm (on the sender side) to buffer small outgoing chunks and Clark’s Solution (on the receiver side) to only announce window updates once a significant amount of space is free.
Bufferbloat
Excessively large buffers in routers can cause high latency (long queues) without dropping packets. Because TCP relies on packet drops as a signal for congestion, it might keep increasing speed into a "bloated" buffer, resulting in "lag" despite high bandwidth. Advanced algorithms like BBR (Bottleneck Bandwidth and RTT) used by Google focus on estimating real-time bandwidth rather than just reacting to loss.