kan01234 - Software Engineer Notes

Logo

A backend engineer's journey of learning and growth.

View the Project on GitHub kan01234/post

10 May 2025

Inside the JVM: How Java Garbage Collectors Track, Mark, and Sweep Objects

by kan01234

🧬 Inside the JVM: How Java Garbage Collectors Track, Mark, and Sweep Objects

Garbage Collection (GC) isn’t just about freeing memory — it’s a fundamental part of how the JVM manages object lifecycle and reachability. To truly understand memory performance in Java applications, especially when handling large datasets (like our CSV export pipeline), we need to go beyond “heap is full = GC runs”.

Let’s break down how GC really works — from object references to pointer marking, and color-based marking phases used internally by collectors like G1GC and Shenandoah.


🧭 GC Roots & Object Graph Traversal

GC begins with a set of known GC roots — objects that are always reachable:

From these roots, GC walks the object reference graph, recursively following pointers from root objects to others.

If an object isn’t reachable from any root, it is garbage and can be collected.


🎨 Object Coloring: White, Gray, Black

Most modern GCs (including G1GC) use a tri-color marking algorithm to track which objects are live.

✅ Marking Phase (Tri-color Marking)

  1. All objects start as white.
  2. GC starts with GC roots → marks them gray (reachable but not yet fully scanned).
  3. For each gray object:
    • Mark its references (children) gray
    • Mark the object itself black (fully scanned)
  4. At the end, only white objects remain unreachable → eligible for collection.

This avoids cycles, and ensures even deeply nested object graphs are scanned safely.


📌 Reference Types and Reachability

Java has multiple reference levels, and GC behavior depends on them:

Reference Type Collected When Memory Needed? Use Case
Strong Reference ❌ Never Default (new Object())
Soft Reference ✅ When memory is low Caches
Weak Reference ✅ On next GC Maps, metadata
Phantom Reference ✅ After finalization Cleanup hooks

If your batch job holds strong references to large lists or maps (e.g., List<Entity>), those will not be collected even under pressure.


🪄 Barriers & Pointer Traversal

GCs that run concurrently with your app (like G1GC, Shenandoah, ZGC) must handle pointer updates during GC.

They use read and write barriers:

These barriers are critical for concurrent and low-pause collectors to work correctly while your app continues to allocate and modify memory.


⚙️ G1GC Behavior: Region-Based Heap and Incremental GC

G1GC divides the heap into equal-sized regions (1MB–32MB each) instead of strict Young/Old generations.

Key Phases:

Each region is tagged as Eden, Survivor, or Old — but G1 is flexible and can reassign region roles.

Key Feature:

G1GC chooses which regions to collect based on garbage density (garbage/region size) → hence the name Garbage-First.


🌈 Which GCs Use Tri-Color Marking?

Tri-color marking is central to modern concurrent GCs — it’s how they track object reachability without stopping your application. But not all collectors use it.

✅ GCs that use tri-color marking:

GC Uses Tri-Color? Notes
G1GC ✅ Yes During concurrent marking, uses tri-color to identify reachable objects.
Shenandoah ✅ Yes Employs concurrent tri-color marking with barriers to minimize pause time.
ZGC ✅ Yes Uses a color-in-pointer scheme; conceptually tri-color with advanced pointer tagging.
CMS (deprecated) ✅ Yes Legacy concurrent GC using tri-color for live object discovery.
Serial GC ❌ No Performs full STW (stop-the-world) marking; doesn’t need tri-color.
Parallel GC ❌ No Optimized for throughput with STW collection; skips incremental marking logic.

Tri-color is especially useful when the GC runs concurrently with application threads. The algorithm prevents issues like:

By coloring objects white (unreachable), gray (reachable but not fully scanned), and black (fully scanned), the GC guarantees memory safety during concurrent traversals.


📉 Example: Why You Might Have Long GCs in a Batch Job

If you load 10M records from DB:


🧠 Final Thought: GC Is a Tool, Not Magic

Knowing the GC is not just about tuning flags — it’s about knowing:

You can’t fix memory problems just by increasing heap size. The solution often lies in rethinking object lifecycles, reducing reference retention, and allowing the GC to do its job effectively.

tags: memory-optimization - java - performance