JMM: Synchronized and Volatile Deep Dive
The Java Memory Model (JMM) is the cornerstone of concurrent programming. To master the "Two Titans"—synchronized and volatile—one must look beyond the syntax and into the hardware-level pipelines, CPU buffers, and the intricate contract between the JVM and the hardware.
1. The JMM Abstraction: "The Courier System"
While JVM Runtime Data Areas (Stack, Heap, Metaspace) define where data resides, the JMM defines how threads safely interact with that data.
1.1 Main Memory vs. Working Memory
The JMM abstracts the hardware (Registers, L1/L2/L3 caches) into two logical tiers:
- Main Memory: Shared storage (Heap and Static fields).
- Working Memory: Thread-private storage (CPU registers/cache).
1.2 The 8 Atomic Operations
To ensure data flows correctly between these tiers, the JMM enforces 8 atomic primitives:
- read/load: Fetching from Main to Working memory.
- use/assign: Interacting with the CPU's execution engine.
- store/write: Flushing from Working back to Main memory.
- lock/unlock: The ultimate authority for atomic exclusivity, which also flushes buffers upon release.
2. Hardware Reality: The Root of Corruption
Why is concurrency difficult? Because hardware engineers prioritized speed over simplicity.
2.1 The Speed Gap & Store Buffers
CPUs are rockets; RAM is a bicycle. To bridge this, CPUs use Store Buffers (outbox) and Invalidate Queues (drafts). A CPU writes to its Store Buffer and continues executing without waiting for the RAM to acknowledge. This causes Visibility Lag: Core A updated a value, but Core B still sees the old one from its own cache.
2.2 Reordering: as-if-serial
The JVM and CPU will reorder instructions to maximize pipeline throughput, as long as the result in a single-threaded context remains the same. In multi-threaded environments, this reordering is a catastrophe (e.g., partial object exposure).
3. The Defense: Memory Barriers
To fight hardware lag, the JMM utilizes Memory Barriers (Fences):
- LoadLoad / StoreStore: Ensures previous ops complete before subsequent ones.
- StoreLoad: The "Heavy Hammer." It forces a full flush of all store buffers and waits for all invalidation signals to be processed. This is usually implemented via the
lockprefix in x86 assembly.
4. Volatile: The Lightweight Shield
4.1 Bytecode vs. Hardware
At the bytecode level, volatile fields are marked with ACC_VOLATILE. When the JIT compiler sees this, it injects memory barriers around the read/write instructions.
4.2 The Magic Grid (Visibility & Ordering)
volatile provides two guarantees:
- Visibility: A write is immediately flushed to main memory; a read always bypasses local cache and goes to main memory.
- Ordering: It prevents reordering of instructions across the volatile "fence."
4.3 The DCL Singleton Pitfall
Without volatile, the instance = new Singleton() line can be reordered:
Allocate Memory2.Assign Reference3.Initialize Object. A second thread might see a non-null reference (Step 2) and try to use an uninitialized object (Step 3hasn't happened yet), leading to a NullPointerException. Addingvolatileforces Step 3 to happen before Step 2.
5. Synchronized: The Heavy Artillery
While volatile handles visibility and ordering, synchronized is the only mechanism that guarantees Atomicity (serialized execution).
5.1 The Mark Word: The Battle for the Crown
Every Java object has a 64-bit Mark Word in its header. This tag tracks the lock's state:
| State | Bit Tag | Description |
|---|---|---|
| Biased | 01 |
Optimistically assigned to the first thread (Solo runner). |
| Lightweight | 00 |
Threads "Spin" (Adaptive Spinning) and update a stack-based Lock Record. |
| Heavyweight | 10 |
The "Inflation." Threads enter a kernel-level sleep managed by ObjectMonitor. |
5.2 The Lock Inflation Lifecycle
- Biased Lock: The JVM records the
Thread IDin the Mark Word. Subsequent entries cost nearly zero. - Lightweight Lock: When a second thread arrives, the biased lock is revoked. Both threads attempt to swap the Mark Word with their own stack's Lock Record via CAS.
- Spinning: Failure doesn't cause sleep immediately. The thread "spins" in a
while(true)loop, hoping the holder finishes soon. - Heavyweight Lock: If spinning fails (the holder is doing heavy work), the lock inflates. A C++
ObjectMonitoris created, and waiting threads are put into a kernel-mode sleep (Context Switch cost: ~10,000ns).
5.3 The Brutal Unfairness
Synchronized is Unfair. When a lock is released, a newly arrived thread might "barge in" and steal the lock via CAS before a sleeping thread can even wake up. This is a deliberate design to maximize System Throughput by avoiding unnecessary context switches.
Summary Decision Matrix
| Feature | volatile |
synchronized |
|---|---|---|
| Scope | Variables | Blocks and Methods |
| Visibility | Yes | Yes |
| Ordering | Yes | Yes |
| Atomicity | No | Yes |
| Mechanism | Memory Barriers | Monitor / Lock Inflation |
| Overhead | Minimal | Medium to High (at inflation) |
Golden Rule: Use volatile for status flags and read-heavy indicators. Use synchronized for complex state transitions where multiple steps must appear as a single atomic unit.