正在切换页面...

Data Races, Atomics, and Memory Order: Why C/C++ Concurrency Can't Rely on Luck

hardCC++ConcurrencyAtomicsMemory ModelUpdated

The core boundary of C/C++ concurrency is the data race. When two threads access the same object concurrently, at least one of them writes to it, and there is no synchronization between them, a data race occurs. A data race in C/C++ is undefined behavior. This doesn't just mean "occasionally reading an old value"; it means the compiler and the CPU are no longer obligated to maintain the execution order you imagined.

Concurrency is Not a Thread API Problem

A thread is merely an execution vehicle. The real problem is how multiple execution streams observe shared objects. If a shared object is not protected by synchronization, the source code order cannot represent the actual execution order.

int ready = 0;
int data = 0;

void producer() {
  data = 42;
  ready = 1;
}

void consumer() {
  while (ready == 0) {}
  use(data);
}

This code looks clear. However, ready and data are ordinary objects. There is no synchronization between the two threads. The compiler can cache ready. The CPU can reorder memory visibility. The result is a data race and undefined behavior.

Happens-Before is the Causal Chain of Concurrency

The C/C++ memory model uses happens-before to describe visibility and ordering relationships. Only if a write happens-before a read can the read reliably observe that write. Mechanisms like locks, atomic release/acquire, and thread joins can establish this relationship.

Thread A:
  write data
  release store ready

Thread B:
  acquire load ready
  read data

release/acquire establishes synchronization
data write becomes visible to Thread B

Without happens-before, you cannot reason about shared state based on chronological timing.

Mutex is the Most Direct Synchronization Boundary

A mutex protects a critical section. Writes that occur between locking and unlocking are visible to any thread that subsequently acquires the same lock.

std::mutex mutex;
int counter = 0;

void inc() {
  std::lock_guard<std::mutex> lock(mutex);
  ++counter;
}

A lock doesn't just prevent simultaneous writes. It also establishes memory synchronization. lock_guard uses RAII to ensure the lock is released on exception paths. This relates directly to resource release rules.

Atomic Objects Eliminate Data Races

Accessing an std::atomic<T> is, by definition, atomic. Multiple threads reading and writing the same atomic object will not produce a data race.

std::atomic<int> ready{0};
int data = 0;

void producer() {
  data = 42;
  ready.store(1, std::memory_order_release);
}

void consumer() {
  while (ready.load(std::memory_order_acquire) == 0) {}
  use(data);
}

release guarantees that prior writes will not be reordered past the publishing point. acquire guarantees that after observing the publication, subsequent reads will observe all writes made before the publication. This makes the ordinary data write visible.

`memory_order_relaxed` Only Guarantees Atomicity

Relaxed atomics do not establish cross-variable synchronization. They are suitable for counters, statistics, and scenarios where data is not being published.

std::atomic<uint64_t> requests{0};

void record() {
  requests.fetch_add(1, std::memory_order_relaxed);
}

This safely counts requests. But you cannot use a relaxed flag to publish another ordinary object. Otherwise, the reader might see the flag but is not guaranteed to see the data.

`seq_cst` is Simple but Not Free

The default atomic memory order is sequentially consistent (seq_cst). It provides the strongest intuition: all threads observe a single, globally consistent total order of atomic operations. This is easy to reason about, but it may restrict compiler optimizations and hardware execution.

flag.store(true);      // Defaults to seq_cst
flag.load();           // Defaults to seq_cst

When first learning concurrency, using seq_cst is safe. Before lowering the memory order on performance-sensitive paths, you must have tests, profiling, and auditing. Memory order optimization is not something to be tweaked by "gut feeling".

`volatile` is Not a Thread Synchronization Tool

In C/C++, volatile is primarily used for special memory accesses, such as memory-mapped I/O or signal handling scenarios. It does not establish inter-thread synchronization. It cannot replace atomic or mutex.

volatile int ready = 0; // Unsuitable as a thread synchronization flag

volatile can affect how the compiler handles a single access. It does not guarantee CPU cache coherency semantics. And it does not eliminate data races.

Double-Checked Locking Requires Coordination Between Atomics and Lifetime

A common mistake in lazy-loaded singletons is checking the pointer without establishing a publication order.

std::atomic<Service*> instance{nullptr};
std::mutex mutex;

Service* get() {
  Service* p = instance.load(std::memory_order_acquire);
  if (p == nullptr) {
    std::lock_guard<std::mutex> lock(mutex);
    p = instance.load(std::memory_order_relaxed);
    if (p == nullptr) {
      p = new Service();
      instance.store(p, std::memory_order_release);
    }
  }
  return p;
}

Before publishing the pointer, you must ensure the object is fully constructed. After reading the pointer, you must observe the construction writes via acquire. However, this code still needs to address destruction order and leak policies. Often, a local static object (Meyers Singleton) is much simpler.

Condition Variables Require a Predicate

Condition variables are subject to spurious wakeups. The wait call must be placed inside a predicate loop.

std::mutex mutex;
std::condition_variable cv;
bool ready = false;

void wait_ready() {
  std::unique_lock<std::mutex> lock(mutex);
  cv.wait(lock, [] { return ready; });
}

The predicate is the state. The notification is merely a hint. Treating the notification as the state itself will lead to lost signals.

Lifetime is Half of Concurrency Safety

Synchronization only resolves access order. Whether the object is still alive is a separate issue.

std::thread t([ptr] {
  ptr->run();
});
delete ptr;
t.join();

This code might delete the object while the thread is still using it. The correct order should be: request a stop, then join, then release resources. Thread lifetimes must be tied to object lifetimes.

When multiple threads update different variables that happen to fall onto the same cache line, they interfere with each other. This is not a data race, but it causes severe slowdowns.

cache line
├── counter_a written by Thread A
└── counter_b written by Thread B

The two variables are independent, yet they share a cache line. The CPU cache coherency protocol will repeatedly bounce ownership back and forth. For high-frequency counters, consider padding or per-thread aggregation.

Lock-Free Does Not Automatically Mean Faster

Lock-free structures reduce blocking but introduce complexities around memory order, the ABA problem, memory reclamation, and starvation. For many lock-free queues, the hardest part is not the CAS operation, but knowing when it is safe to free a node.

Common risks:

The ABA problem.
Delayed memory reclamation.
Busy-waiting consuming CPU cycles.
Ordering errors under weak memory models.
Lack of timeouts and fallback strategies.

Without mature requirements and thorough verification, do not rewrite locks just to sound "advanced."

TSan is the Observability Tool for Data Races

ThreadSanitizer (TSan) can discover many data races.

c++ -std=c++23 -g -O1 \
  -fsanitize=thread \
  concurrent_test.cpp

TSan adds runtime overhead. It is suitable for testing and CI. It cannot cover unexecuted paths. For low-level synchronization primitives and custom atomic algorithms, code auditing is still required.

Concurrency Design Must Support Graceful Shutdown

Threads in production systems cannot only know how to start. They must also be cancellable, timeout-aware, joinable, and degradable.

start workers
  -> process queue
  -> request stop
  -> wake blocked workers
  -> drain or discard tasks
  -> join threads
  -> release resources

Concurrent code without a shutdown protocol will eventually expose resource release issues during deployments, rollbacks, or process exits.

Engineering Checklist

Shared mutable state must be protected by a mutex or atomic.
Do not use volatile as a thread synchronization tool.
Use release/acquire when using an atomic flag to publish ordinary data.
Profiling and auditing are mandatory before lowering memory order.
Condition variable waits must use predicates.
Before a thread exits, stop it, wake it up, join it, and then release objects.
Check high-frequency counters for false sharing.
Lock-free algorithms must have a designed memory reclamation strategy.
Incorporate TSan into the testing matrix.
Concurrent modules must have timeouts, fallbacks, and logging.

Summary

The reliability of C/C++ concurrency is built upon the data race boundary. Unsynchronized ordinary reads and writes are not "occasional inconsistencies"—they are undefined behavior. Locks provide clear synchronization. Atomics provide fine-grained synchronization. Memory orders provide visibility guarantees. Only by combining these mechanisms with lifetime management, graceful shutdown protocols, and observability tools can you build truly operable concurrent engineering systems.

Data Races, Atomics, and Memory Order: Why C/C++ Concurrency Can't Rely on Luck

hardCC++ConcurrencyAtomicsMemory ModelUpdated

Concurrency is Not a Thread API Problem

int ready = 0;
int data = 0;

void producer() {
  data = 42;
  ready = 1;
}

void consumer() {
  while (ready == 0) {}
  use(data);
}

Happens-Before is the Causal Chain of Concurrency

Thread A:
  write data
  release store ready

Thread B:
  acquire load ready
  read data

release/acquire establishes synchronization
data write becomes visible to Thread B

Without happens-before, you cannot reason about shared state based on chronological timing.

Mutex is the Most Direct Synchronization Boundary

A mutex protects a critical section. Writes that occur between locking and unlocking are visible to any thread that subsequently acquires the same lock.

std::mutex mutex;
int counter = 0;

void inc() {
  std::lock_guard<std::mutex> lock(mutex);
  ++counter;
}

Atomic Objects Eliminate Data Races

Accessing an std::atomic<T> is, by definition, atomic. Multiple threads reading and writing the same atomic object will not produce a data race.

std::atomic<int> ready{0};
int data = 0;

void producer() {
  data = 42;
  ready.store(1, std::memory_order_release);
}

void consumer() {
  while (ready.load(std::memory_order_acquire) == 0) {}
  use(data);
}

`memory_order_relaxed` Only Guarantees Atomicity

Relaxed atomics do not establish cross-variable synchronization. They are suitable for counters, statistics, and scenarios where data is not being published.

std::atomic<uint64_t> requests{0};

void record() {
  requests.fetch_add(1, std::memory_order_relaxed);
}

This safely counts requests. But you cannot use a relaxed flag to publish another ordinary object. Otherwise, the reader might see the flag but is not guaranteed to see the data.

`seq_cst` is Simple but Not Free

flag.store(true);      // Defaults to seq_cst
flag.load();           // Defaults to seq_cst

`volatile` is Not a Thread Synchronization Tool

volatile int ready = 0; // Unsuitable as a thread synchronization flag

volatile can affect how the compiler handles a single access. It does not guarantee CPU cache coherency semantics. And it does not eliminate data races.

Double-Checked Locking Requires Coordination Between Atomics and Lifetime

A common mistake in lazy-loaded singletons is checking the pointer without establishing a publication order.

std::atomic<Service*> instance{nullptr};
std::mutex mutex;

Service* get() {
  Service* p = instance.load(std::memory_order_acquire);
  if (p == nullptr) {
    std::lock_guard<std::mutex> lock(mutex);
    p = instance.load(std::memory_order_relaxed);
    if (p == nullptr) {
      p = new Service();
      instance.store(p, std::memory_order_release);
    }
  }
  return p;
}

Condition Variables Require a Predicate

Condition variables are subject to spurious wakeups. The wait call must be placed inside a predicate loop.

std::mutex mutex;
std::condition_variable cv;
bool ready = false;

void wait_ready() {
  std::unique_lock<std::mutex> lock(mutex);
  cv.wait(lock, [] { return ready; });
}

The predicate is the state. The notification is merely a hint. Treating the notification as the state itself will lead to lost signals.

Lifetime is Half of Concurrency Safety

Synchronization only resolves access order. Whether the object is still alive is a separate issue.

std::thread t([ptr] {
  ptr->run();
});
delete ptr;
t.join();

When multiple threads update different variables that happen to fall onto the same cache line, they interfere with each other. This is not a data race, but it causes severe slowdowns.

cache line
├── counter_a written by Thread A
└── counter_b written by Thread B

Lock-Free Does Not Automatically Mean Faster

Common risks:

The ABA problem.
Delayed memory reclamation.
Busy-waiting consuming CPU cycles.
Ordering errors under weak memory models.
Lack of timeouts and fallback strategies.

Without mature requirements and thorough verification, do not rewrite locks just to sound "advanced."

TSan is the Observability Tool for Data Races

ThreadSanitizer (TSan) can discover many data races.

c++ -std=c++23 -g -O1 \
  -fsanitize=thread \
  concurrent_test.cpp

Concurrency Design Must Support Graceful Shutdown

Threads in production systems cannot only know how to start. They must also be cancellable, timeout-aware, joinable, and degradable.

start workers
  -> process queue
  -> request stop
  -> wake blocked workers
  -> drain or discard tasks
  -> join threads
  -> release resources

Concurrent code without a shutdown protocol will eventually expose resource release issues during deployments, rollbacks, or process exits.

Engineering Checklist

Shared mutable state must be protected by a mutex or atomic.
Do not use volatile as a thread synchronization tool.
Use release/acquire when using an atomic flag to publish ordinary data.
Profiling and auditing are mandatory before lowering memory order.
Condition variable waits must use predicates.
Before a thread exits, stop it, wake it up, join it, and then release objects.
Check high-frequency counters for false sharing.
Lock-free algorithms must have a designed memory reclamation strategy.
Incorporate TSan into the testing matrix.
Concurrent modules must have timeouts, fallbacks, and logging.

Concurrency is Not a Thread API Problem

Happens-Before is the Causal Chain of Concurrency

Mutex is the Most Direct Synchronization Boundary

Atomic Objects Eliminate Data Races

memory_order_relaxed Only Guarantees Atomicity

seq_cst is Simple but Not Free

volatile is Not a Thread Synchronization Tool

Double-Checked Locking Requires Coordination Between Atomics and Lifetime

Condition Variables Require a Predicate

Lifetime is Half of Concurrency Safety

False Sharing is a Cache-Level Performance Issue

Lock-Free Does Not Automatically Mean Faster

TSan is the Observability Tool for Data Races

Concurrency Design Must Support Graceful Shutdown

Engineering Checklist

Summary

Concurrency is Not a Thread API Problem

Happens-Before is the Causal Chain of Concurrency

Mutex is the Most Direct Synchronization Boundary

Atomic Objects Eliminate Data Races

memory_order_relaxed Only Guarantees Atomicity

seq_cst is Simple but Not Free

volatile is Not a Thread Synchronization Tool

Double-Checked Locking Requires Coordination Between Atomics and Lifetime

Condition Variables Require a Predicate

Lifetime is Half of Concurrency Safety

False Sharing is a Cache-Level Performance Issue

Lock-Free Does Not Automatically Mean Faster

TSan is the Observability Tool for Data Races

Concurrency Design Must Support Graceful Shutdown

Engineering Checklist

Summary

`memory_order_relaxed` Only Guarantees Atomicity

`seq_cst` is Simple but Not Free

`volatile` is Not a Thread Synchronization Tool

`memory_order_relaxed` Only Guarantees Atomicity

`seq_cst` is Simple but Not Free

`volatile` is Not a Thread Synchronization Tool