A backend engineer's journey of learning and growth.
by kan01234
Garbage Collection (GC) isn’t just about freeing memory — it’s a fundamental part of how the JVM manages object lifecycle and reachability. To truly understand memory performance in Java applications, especially when handling large datasets (like our CSV export pipeline), we need to go beyond “heap is full = GC runs”.
Let’s break down how GC really works — from object references to pointer marking, and color-based marking phases used internally by collectors like G1GC and Shenandoah.
GC begins with a set of known GC roots — objects that are always reachable:
From these roots, GC walks the object reference graph, recursively following pointers from root objects to others.
If an object isn’t reachable from any root, it is garbage and can be collected.
Most modern GCs (including G1GC) use a tri-color marking algorithm to track which objects are live.
This avoids cycles, and ensures even deeply nested object graphs are scanned safely.
Java has multiple reference levels, and GC behavior depends on them:
Reference Type | Collected When Memory Needed? | Use Case |
---|---|---|
Strong Reference | ❌ Never | Default (new Object() ) |
Soft Reference | ✅ When memory is low | Caches |
Weak Reference | ✅ On next GC | Maps, metadata |
Phantom Reference | ✅ After finalization | Cleanup hooks |
If your batch job holds strong references to large lists or maps (e.g., List<Entity>
), those will not be collected even under pressure.
GCs that run concurrently with your app (like G1GC, Shenandoah, ZGC) must handle pointer updates during GC.
They use read and write barriers:
These barriers are critical for concurrent and low-pause collectors to work correctly while your app continues to allocate and modify memory.
G1GC divides the heap into equal-sized regions (1MB–32MB each) instead of strict Young/Old generations.
Each region is tagged as Eden, Survivor, or Old — but G1 is flexible and can reassign region roles.
G1GC chooses which regions to collect based on garbage density (garbage/region size) → hence the name Garbage-First.
Tri-color marking is central to modern concurrent GCs — it’s how they track object reachability without stopping your application. But not all collectors use it.
GC | Uses Tri-Color? | Notes |
---|---|---|
G1GC | ✅ Yes | During concurrent marking, uses tri-color to identify reachable objects. |
Shenandoah | ✅ Yes | Employs concurrent tri-color marking with barriers to minimize pause time. |
ZGC | ✅ Yes | Uses a color-in-pointer scheme; conceptually tri-color with advanced pointer tagging. |
CMS (deprecated) | ✅ Yes | Legacy concurrent GC using tri-color for live object discovery. |
Serial GC | ❌ No | Performs full STW (stop-the-world) marking; doesn’t need tri-color. |
Parallel GC | ❌ No | Optimized for throughput with STW collection; skips incremental marking logic. |
Tri-color is especially useful when the GC runs concurrently with application threads. The algorithm prevents issues like:
By coloring objects white (unreachable), gray (reachable but not fully scanned), and black (fully scanned), the GC guarantees memory safety during concurrent traversals.
If you load 10M records from DB:
Knowing the GC is not just about tuning flags — it’s about knowing:
You can’t fix memory problems just by increasing heap size. The solution often lies in rethinking object lifecycles, reducing reference retention, and allowing the GC to do its job effectively.
tags: memory-optimization - java - performance