JVM Tuning and Troubleshooting: Production Guide
hardJVMTuningOOMGCLogjstackjmapMAT
JVM tuning is a data-driven discipline, not a guessing game. This article establishes the toolchain and methodology required to handle production crises—from OOM errors and frequent GCs to CPU spikes—without panic.
1. Common Production Anomalies
| Symptom | Potential Root Cause |
|---|---|
| CPU 100% | Infinite loops, frequent Full GCs, thread deadlocks. |
| OOM: Java heap space | Memory leaks or insufficient heap allocation. |
| OOM: Metaspace | Excess class loading (Dynamic proxies, hot-swapping). |
| OOM: Direct buffer memory | Unreleased NIO Direct Memory. |
| Application Jitter | Excessive STW (Stop-The-World) duration, lock contention. |
| API Timeouts | GC pauses, thread pool saturation, blocking I/O. |
2. The Diagnostic Toolchain
2.1 Essential CLI Tools
jps: Lists Java processes. (jps -lvm)jstat: Monitors GC statistics. (jstat -gcutil <pid> 1000 10)jinfo: Views or modifies JVM flags at runtime. (jinfo -flags <pid>)jmap: Generates Heap Dumps for memory analysis. (jmap -dump:format=b,file=heap.hprof <pid>)jstack: Dumps thread stacks to find deadlocks or CPU-hungry code. (jstack <pid>)jcmd: A multi-purpose Swiss Army knife for JVM diagnostics.
3. High-CPU Troubleshooting Workflow
If a Java process is consuming 100% CPU, follow these steps:
- Find the process:
top(Find the<pid>). - Find the thread:
top -H -p <pid>(Identify the specific<tid>consuming CPU). - Convert ID:
printf '%x\n' <tid>(Convert tid to hexadecimal). - Pinpoint Code: Search the hexadecimal ID in the
jstackoutput:jstack <pid> | grep -A 30 '<hex_tid>'
This allows you to link a high-CPU thread directly to a specific line of Java code.
4. Memory Leak Analysis
- Monitor Growth:
jstat -gcutil <pid> 1000. If the O (Old Gen) column grows continuously despite GCs, a leak is likely. - Capture the Dump:
jmap -dump:live,format=b,file=leak.hprof <pid>. - Analyze: Use MAT (Eclipse Memory Analyzer) or VisualVM. MAT’s "Leak Suspects" report can automatically identify the objects and
GC Rootsresponsible for the leak.
5. Standard OOM Scenarios and Solutions
5.1 Java Heap Space
- Cause: Excess object creation or collection leaks (e.g., static
Mapcaches). - Solution: Analyze Heap Dump with MAT. Check for unclosed resources or static collections that grow without bounds.
5.2 Metaspace
- Cause: Too many dynamically generated classes (CGLIB, Groovy, Reflection).
- Solution: Set
-XX:MaxMetaspaceSizeto prevent system memory exhaustion. Inspect third-party libraries for proxy caching behavior.
5.3 Unable to create new native thread
- Cause: Thread count exceeds OS limits or thread pools are misconfigured.
- Solution: Check
jstackfor thread states. Review thread poolcoreSizeandmaxSize. Check Linuxulimit -ulimits.
6. Recommended Production Parameters
Memory Limits
-Xms4g -Xmx4g # Keep initial and max heap identical to prevent resizing jitter
-Xss512k # Thread stack size (Decrease to support more threads)
GC Selection
-XX:+UseG1GC # Recommended for JDK 8/11+
-XX:+UseZGC # Recommended for ultra-low latency on JDK 17+
Diagnostics
-XX:+HeapDumpOnOutOfMemoryError # Auto-dump on OOM
-XX:HeapDumpPath=/var/logs/java_heap.hprof # Path for dump
-Xlog:gc*:file=gc.log:time,level,tags # Modern GC logging (JDK 9+)
7. Strategic Methodology
- Code First: Most "JVM problems" are actually application code problems. Optimize your logic and eliminate leaks before touching VM flags.
- Stability First: Keep
-Xms = -Xmxto ensure a steady environment. - Measure, Don't Guess: Never apply a tuning parameter without observing its effect on a GC log analyzer like GCEasy.io.
- Last Resort: JVM tuning is the last step of performance optimization, not the first.