Thread Underlying Principles & Source Code Analysis
In the previous article, we covered the basic usage and lifecycle of threads. However, in industrial-grade concurrent programming, merely staying at the API level is far from enough. Java's thread mechanism does not exist in isolation; it is a JVM-level encapsulation of the operating system's thread mechanism.
This article dives deep from the hardware and operating system perspective straight down to the JVM source code. We will thoroughly understand what exactly happens when you "create a thread", why thread context switching is so expensive, and how the Virtual Threads introduced in JDK 21 fundamentally break this bottleneck.
Threads from an OS Perspective
At the operating system level, a thread is the smallest unit of CPU scheduling.
The Essential Difference Between Processes and Threads
In the Linux operating system, there is actually no strict concept of a "thread." The Linux kernel only recognizes Tasks, corresponding to the task_struct kernel data structure.
- Process: Has independent memory address space (page tables), file descriptor tables, and other resources.
- Thread: Essentially, a Lightweight Process (LWP) that shares the same address space and resources with other threads.
We can use an analogy: A process is an independent office building with its own plumbing and electrical systems (memory and resources). A thread is a single cubicle within the building. Cubicles share the building's water dispenser and restrooms, but each cubicle has its own computer and documents (independent Program Counter PC, stack space, and register state).
Why is Context Switching Expensive?
When the CPU switches from Thread A to Thread B, a Context Switch must occur. This is an extremely resource-intensive process, primarily consisting of:
- Saving State: Saving Thread A's CPU register state, Program Counter (PC), etc., into memory (usually its kernel stack).
- Restoring State: Loading Thread B's context from memory into the CPU registers.
- Cache Invalidation: More fatally, because a thread switch often accompanies the execution of different tasks, the CPU's internal L1/L2 caches might experience a severe drop in hit rate, leading to a catastrophic decline in performance. If a process-level switch occurs, the TLB (Translation Lookaside Buffer) is also flushed, turning subsequent memory access into extremely slow physical memory addressing.
This is why we often say "don't create too many threads." Too many threads not only consume memory (each Java thread defaults to 1MB of stack space) but also cause the CPU to waste the vast majority of its time on context switching rather than executing real business logic.
The Java Thread Mapping Model
How does java.lang.Thread map to operating system threads?
In modern HotSpot JVM implementations, Java threads use a 1:1 kernel-level thread model. This means that every Java thread started maps strictly to a native kernel thread in the operating system. Java-level thread scheduling, blocking, and waking rely entirely on the OS scheduler (such as the CFS scheduler in Linux).
Why Choose the 1:1 Model?
Early JVMs (like Green Threads on Solaris) attempted a many-to-one (M:1) user-level thread model. Switching user-space threads is extremely lightweight and requires no kernel involvement. However, the M:1 model has a fatal flaw: once a user thread issues a blocking I/O system call (like reading a file), the entire kernel thread is blocked, which in turn halts all user threads mapped to that kernel thread.
To simplify the implementation and fully utilize the parallel capabilities of multi-core CPUs, HotSpot ultimately moved to the 1:1 model. This delegates all the complex scheduling work to the operating system, but it also dictates that Java threads are inherently "heavy."
The True Face of Thread.start() via Source Code
When we call new Thread().start(), what exactly happens under the hood? Let's follow the OpenJDK source code to find out.
1. The Java Layer: start0()
The core logic of the start() method in Thread.java is very short:
public synchronized void start() {
if (threadStatus != 0)
throw new IllegalThreadStateException();
group.add(this);
boolean started = false;
try {
start0(); // <--- This native method is the core
started = true;
} finally {
if (!started) group.threadStartFailed(this);
}
}
private native void start0();
The mystery lies within the native method start0().
2. The JNI Mapping Layer: JVM_StartThread
In the JVM source code (Thread.c), start0 is mapped to the JVM_StartThread function:
// hotspot/src/share/vm/prims/jvm.cpp
JVM_ENTRY(void, JVM_StartThread(JNIEnv* env, jobject jthread))
// 1. Get the Java thread object
JavaThread *native_thread = NULL;
// 2. Core: Create the JavaThread object at the C++ level
native_thread = new JavaThread(&thread_entry, sz);
// 3. Start the OS thread
Thread::start(native_thread);
JVM_END
3. Creating the OS Thread: os::create_thread
Continuing down into new JavaThread, it calls the OS-specific implementation os::create_thread (taking Linux as an example, located in os_linux.cpp):
// hotspot/src/os/linux/vm/os_linux.cpp
bool os::create_thread(Thread* thread, ThreadType thr_type, size_t stack_size) {
// Prepare native OS thread attributes (e.g., allocating stack space)
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, stack_size);
pthread_t tid;
// Core system call: Invoke glibc's pthread_create to create a kernel thread
int ret = pthread_create(&tid, &attr, (void* (*)(void*)) thread_native_entry, thread);
return true;
}
See that? On Linux, the foundation of a Java thread is the hardcore pthread_create. This system call requests the kernel to create a completely new kernel execution flow, allocates a kernel stack, and adds it to the OS scheduling queue.
4. Thread Startup: thread_native_entry
When the kernel thread is created and scheduled, it executes the callback function thread_native_entry:
static void *thread_native_entry(Thread *thread) {
// ... various initialization operations
// Callback to the run method of JavaThread
thread->run();
return NULL;
}
At this point, the run() method of java.lang.Thread finally begins execution within the brand new OS kernel thread. This completes the full loop of Java thread startup.
The Salvation of Virtual Threads
Understanding the underlying 1:1 model gives us a profound appreciation for the pain points of traditional Java concurrency: threads are just too expensive. In a microservices architecture, when faced with massive concurrent network requests (allocating one thread per request), systems often crash due to thread exhaustion or OutOfMemory (OOM) errors before the CPU is even fully utilized.
To solve this problem, JDK 21 officially introduced Virtual Threads.
Principle Analysis: The Return of the M:N Model
Virtual Threads abandon the 1:1 model and re-embrace the M:N scheduling model. However, this isn't the JVM reverting to the Solaris era; instead, it implements an exceptionally elegant user-space scheduler at the JVM level.
- Carrier Thread: The underlying platform thread (OS thread) acting as the "physical worker." The quantity is usually equal to the number of CPU cores.
- Virtual Thread: A lightweight object managed entirely by the JVM, acting as a "virtual task" that wraps the business logic. The quantity can be in the millions.
The core principle of Virtual Threads can be summarized as "Continuation" (Suspend and Resume):
- When a virtual thread performs pure memory computation, it is Mounted onto a carrier thread to run.
- When a virtual thread initiates blocking I/O (like waiting for a database response), the JVM intercepts this system call and Unmounts the virtual thread from the carrier thread.
- At this moment, the carrier thread is not blocked; it immediately proceeds to execute the next virtual thread in the queue.
- Once the network I/O is ready, underlying network events (like
epoll) notify the JVM, and the JVM re-mounts the suspended virtual thread onto an available carrier thread to continue execution.
Disruptive Changes Brought By Virtual Threads
Virtual threads do not make a single computation faster; their significance lies in dramatically increasing system throughput. Developers can continue using the synchronous, blocking programming model (which is the easiest to write and understand) while enjoying the performance dividends similar to Node.js's asynchronous, non-blocking Event Loop.
In the underlying implementation, the stack space for a virtual thread is no longer allocated by the OS in physical memory but is stored as a Chunk within the Java heap memory. When suspension occurs, the JVM merely copies its stack frame data to the heap for safekeeping. This cost is extremely low (nanoseconds), making it a massive "dimensionality reduction attack" compared to the microsecond-level OS thread switches that require trapping into the kernel.
Armed with these underlying principles, when we subsequently explore Thread Pool design, AQS source code, and concurrent utilities, we can truly achieve the state of "knowing not just the what, but the why."