The definitive merged guide — JVM memory internals, API semantics, concurrency primitives, Java 8–21 features, Spring/Kafka/Hibernate patterns, and 90+ senior-level Q&As. All duplicates removed. New advanced questions added throughout.
Memory layout · Internals · API semantics · Type system · Concurrent collections
ArrayList uses a single contiguous Object[] elementData block on the heap. Every element slot is a 4-byte (compressed OOP) or 8-byte reference. On resize, a new array of size × 1.5 is allocated and System.arraycopy runs — O(n) amortized but O(1) per append. The contiguous layout is CPU cache-friendly: sequential access loads a full 64-byte cache line at a time, so prefetching works perfectly.
LinkedList allocates each Node object separately on the heap. Node layout: 16B object header + 8B item ref + 8B next + 8B prev = ~40B per element. Nodes are scattered across the heap, causing cache misses on every pointer hop. Iteration over 1M elements: ArrayList ~5–10× faster due to cache locality.
// ArrayList: single contiguous block transient Object[] elementData; // heap: [ref0, ref1, ref2, ...] // grow: Arrays.copyOf(data, newCapacity = size * 3/2 + 1) // LinkedList: scattered nodes private static class Node<E> { E item; // 8 bytes ref Node<E> next; // 8 bytes ref — points ANYWHERE in heap Node<E> prev; // 8 bytes ref // + 16 bytes object header = ~40 bytes per element } // Benchmark: iterating 10M ints // ArrayList: ~8ms (prefetcher loads 8 refs per cache line) // LinkedList: ~80ms (cache miss on every node → main memory)
Step 1 – Hash spreading: h ^ (h >>> 16) XORs the high 16 bits into the low 16 bits, spreading entropy across all bits. Without this, objects whose hashCodes differ only in high bits would all land in bucket 0.
Step 2 – Index: (n-1) & hash — bitwise AND, not modulo. Works because capacity is always a power of 2. Cost: O(1).
Step 3 – Collision: Linked list in the bucket. equals() scans for key match. Complexity degrades to O(n) with pathological hash functions.
Step 4 – Treeify: When bucket list length ≥ 8 AND total table size ≥ 64 → convert to red-black TreeNode (O(log n)). Untreed when shrinks back to ≤ 6. Table < 64 → resize instead.
Step 5 – Resize: Triggered when size > capacity × 0.75. New table = 2×. In Java 8, each bucket splits into high/low bit sub-lists without rehashing full hash — very efficient.
equals/hashCode contract: Equal objects MUST have equal hash codes. Violating this means put() and get() use different buckets → silent data loss.
// Hash spreading — prevents high-bit clustering static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } // putVal: plain writes — NOT thread-safe! if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); // plain store else { for (int binCount = 0;; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // 8 treeifyBin(tab, hash); // convert to TreeNode break; } } } if (++size > threshold) resize(); // load factor 0.75 // Correct equals + hashCode (Java 16+ records do this automatically) public boolean equals(Object o) { if (!(o instanceof User u)) return false; return age == u.age && Objects.equals(name, u.name); } public int hashCode() { return Objects.hash(name, age); }
Java 8+ dropped segments (Java 7's 16-segment approach) in favour of per-bucket locking:
Node<K,V>[] table is volatile — every read sees latest pointer after a write.casTabAt(tab, i, null, newNode) uses Unsafe.compareAndSwapObject — atomic, no lock needed.size() sums all cells — not an atomic snapshot; may lag briefly.final V putVal(K key, V value, boolean onlyIfAbsent) { for (Node<K,V>[] tab = table;;) { // volatile read each iteration Node<K,V> f; int n, i, fh; if (tab == null) tab = initTable(); else if ((f = tabAt(tab, i=(n-1)&hash)) == null) { // EMPTY: lock-free CAS — no contention path if (casTabAt(tab, i, null, new Node<>(hash,key,value))) break; } else { // COLLISION: lock on first node only synchronized (f) { // traverse list or tree, update or insert } } } addCount(1L, binCount); // LongAdder-style striped counter } // CounterCell — padded to avoid false sharing between CPUs @jdk.internal.vm.annotation.Contended static final class CounterCell { volatile long value; // each on its own cache line }
mappingCount() instead of size() on large maps — it returns long and avoids overflow. size() is not atomic — it's the sum of CounterCell values at one moment in time.TreeMap: Red-black BST — each Entry node holds key, value, left, right, parent, color (boolean). No array — just root reference. O(log n) for all ops. NavigableMap methods: floorKey(k) (largest ≤ k), ceilingKey(k) (smallest ≥ k), headMap(k), tailMap(k), subMap(lo, hi), pollFirstEntry(). Each node ≈ 48 bytes.
LinkedHashMap LRU: Extends HashMap, adds doubly-linked list through all entries. Constructor param accessOrder=true triggers afterNodeAccess() on every get/put — moves accessed entry to tail. Override removeEldestEntry() to auto-evict head.
PriorityQueue: Binary min-heap stored in array. Parent of index i is at (i-1)/2. Children at 2i+1, 2i+2. peek()/element() return root O(1) — peek returns null, element throws. poll()/remove() remove root, siftDown O(log n). remove(Object) is O(n) linear scan.
// TreeMap — range queries (e.g. Snowflake partition metadata) TreeMap<Long, Partition> partitions = new TreeMap<>(); NavigableMap<Long, Partition> range = partitions.subMap(startTs, true, endTs, true); // O(log n) view // LinkedHashMap LRU Cache class LRUCache<K,V> extends LinkedHashMap<K,V> { private final int cap; LRUCache(int cap) { super(cap, 0.75f, true); this.cap=cap; } protected boolean removeEldestEntry(Map.Entry<K,V> e) { return size() > cap; // evict LRU when over capacity } } // PriorityQueue — rate limiter sorted by expiry window PriorityQueue<Request> pq = new PriorityQueue<>(Comparator.comparingLong(Request::getExpiry)); pq.peek(); // earliest expiry, null if empty pq.element(); // earliest expiry, throws NoSuchElementException pq.poll(); // removes+returns earliest, null if empty pq.remove(); // removes+returns earliest, throws if empty
Each key is wrapped in WeakReference<K> registered with a shared ReferenceQueue. When GC collects a key (no strong reference exists), the WeakReference is enqueued on the queue. On subsequent map operations (get, put, size), expungeStaleEntries() polls the queue and removes matching bucket entries.
Why keys weak, values strong: Values should live as long as keys. Making values weak too would evict entries even when the key is still live — defeating the purpose.
Production use: Spring's ReflectionUtils method caches, ClassLoader metadata caches — entries should not prevent GC of class objects.
// Simplified stale entry cleanup (runs on map operations) private void expungeStaleEntries() { for (Object x; (x = queue.poll()) != null; ) { synchronized (queue) { Entry<K,V> e = (Entry<K,V>) x; int i = indexFor(e.hash, table.length); // unlink entry from bucket chain } } } // ⚠ Not thread-safe! Wrap with synchronizedMap if needed Map<Key, Metadata> cache = Collections.synchronizedMap(new WeakHashMap<>()); // Alternative: Use Guava Cache or Caffeine with weakKeys() Cache<Key, Value> c = Caffeine.newBuilder() .weakKeys().softValues().build();
Every mutative operation (add, set, remove) acquires a lock, copies the full underlying array, applies the change, then performs a volatile write of the array reference — establishing happens-before for subsequent readers.
Iterators capture the array reference at creation time. They always iterate the snapshot array, ignoring subsequent changes. No modCount check → never throws CME. But may return stale data.
Use when: reads vastly outnumber writes (listener lists, Spring event multicaster, read-heavy configuration maps). Avoid when: high write frequency — every add() is O(n) array copy.
public boolean add(E e) { synchronized (lock) { Object[] es = getArray(); // volatile read int len = es.length; Object[] newEs = Arrays.copyOf(es, len + 1); // O(n) copy newEs[len] = e; setArray(newEs); // volatile write → HB guarantee return true; } } // Spring ApplicationEventMulticaster uses it internally: // private final Set<ApplicationListener<?>> applicationListeners // = new CopyOnWriteArraySet<>(); (backed by COWAL)
| Property | ArrayBlockingQueue | LinkedBlockingQueue |
|---|---|---|
| Backing store | Fixed Object[] preallocated | Linked Node objects (dynamic) |
| Capacity | Bounded (required at construction) | Optional (Integer.MAX_VALUE default) |
| Locking | Single ReentrantLock (put + take share) | Two locks: putLock + takeLock |
| Throughput | Lower — single lock contention | Higher — producers/consumers don't block each other |
| Memory | Predictable, preallocated | GC pressure from node allocation |
| Fairness option | Yes (fair=true constructor) | No |
DelayQueue: Unbounded BlockingQueue backed by PriorityQueue. Elements must implement Delayed.getDelay(). take() blocks until head element's delay ≤ 0. Used for: session expiry, scheduled retries, cache TTL eviction.
// Producer-consumer with LinkedBlockingQueue BlockingQueue<Task> queue = new LinkedBlockingQueue<>(1000); queue.put(task); // blocks if full Task t = queue.take(); // blocks until available queue.offer(task, 1, TimeUnit.SECONDS); // timeout version // DelayQueue for session expiry class SessionEvent implements Delayed { private final long expiryNanos; public long getDelay(TimeUnit u) { return u.convert(expiryNanos - System.nanoTime(), TimeUnit.NANOSECONDS); } public int compareTo(Delayed o) { return Long.compare(expiryNanos, ((SessionEvent)o).expiryNanos); } } DelayQueue<SessionEvent> expiry = new DelayQueue<>(); SessionEvent expired = expiry.take(); // blocks until delay expires
Every AbstractList has int modCount incremented on every structural modification (add, remove, clear, sort). When an Iterator is created it snapshots expectedModCount = modCount. Each next() call invokes checkForComodification().
Not guaranteed: Another thread modifying without synchronization means the JMM doesn't guarantee visibility of the updated modCount. The iterator may or may not see the change — CME is best-effort, not a thread-safety mechanism.
// ✅ Safe: iterator.remove() (syncs expectedModCount) Iterator<String> it = list.iterator(); while (it.hasNext()) if (shouldRemove(it.next())) it.remove(); // ✅ Safe: removeIf (Java 8, uses modCount internally) list.removeIf(s -> s.startsWith("X")); // ✅ Safe: CopyOnWriteArrayList (snapshot iterator) new CopyOnWriteArrayList<>(list).iterator(); // ✅ Safe: collect to new list then remove List<String> toRemove = list.stream() .filter(this::shouldRemove).toList(); list.removeAll(toRemove); // ❌ Broken: modifying list during forEach list.forEach(s -> if (bad) list.remove(s)); // CME!
A skip list is a probabilistic multilevel linked list. The base layer is a sorted linked list; each higher layer is a "fast lane" — each node promoted with probability 0.25. Average height: O(log n).
Thread safety: node next pointers are volatile. Insert uses CAS to update links. Reads are fully lock-free — they traverse volatile next references and always see a consistent state (may briefly see stale due to in-progress insertions).
vs synchronizedSortedMap(TreeMap): synchronizedSortedMap uses a single mutex — one thread blocks all others. ConcurrentSkipListMap allows concurrent reads and fine-grained write concurrency. Use ConcurrentSkipListMap when you need a concurrent NavigableMap.
// Concurrent event timeline (used in Kafka consumer internals) ConcurrentSkipListMap<Long, Event> timeline = new ConcurrentSkipListMap<>(); timeline.put(System.currentTimeMillis(), event); // Thread-safe range query — no external lock needed ConcurrentNavigableMap<Long,Event> lastMinute = timeline.tailMap(System.currentTimeMillis() - 60_000); // ConcurrentSkipListSet for sorted deduplication ConcurrentSkipListSet<String> tags = new ConcurrentSkipListSet<>(); tags.add("java"); tags.add("kafka"); tags.higher("j"); // "java" — lock-free O(log n)
Type erasure: Generic parameters are erased at compile time. List<String> and List<Integer> are the same class at runtime. Bounds become their erasure (T extends Comparable → Comparable). Compiler inserts casts at use sites.
Bridge methods: When a generic subclass overrides a method, the compiler generates a synthetic bridge method with the erased signature that delegates to the concrete method — maintains polymorphism.
PECS — Producer Extends, Consumer Super: ? extends T = you can READ (produces T), cannot WRITE. ? super T = you can WRITE T values, cannot read specific type (only Object).
Heap pollution: Variable of parameterized type holds object of wrong type — ClassCastException fires at unexpected location, not at assignment.
// Bridge method — compiler generated class Base<T> { void set(T t) {} } class Sub extends Base<String> { void set(String s) { /* override */ } // compiler adds: void set(Object o){set((String)o);} — BRIDGE } // PECS: Collections.copy uses both wildcards public static <T> void copy( List<? super T> dest, // consumer: write T into it List<? extends T> src) // producer: read T from it { ... } // Why can't add to ? extends Number? List<? extends Number> nums = new ArrayList<Integer>(); nums.add(1.5); // COMPILE ERROR — could be List<Integer>, adding Double is wrong // Heap pollution List[] raw = new List[1]; List<String>[] polluted = raw; // unchecked raw[0] = List.of(42); // Integer list in String[] slot String s = polluted[0].get(0); // CCE here, not at assignment!
EnumMap: Keys must be enum values. Internally a plain Object[] indexed by key.ordinal() — zero hashing, zero collision, zero resizing. O(1) by array index. Faster and more memory-efficient than HashMap for enum keys.
IdentityHashMap: Uses == (reference equality) and System.identityHashCode() instead of equals()/`hashCode()`. Open addressing with linear probing (not separate chaining). Two distinct objects with same equals() are treated as different keys.
// EnumMap — state machine or permission flags enum State { PENDING, PROCESSING, DONE, FAILED } EnumMap<State, Color> stateColors = new EnumMap<>(State.class); stateColors.put(State.DONE, Color.GREEN); // Internal: Object[4] indexed by State.ordinal() — fastest possible Map // IdentityHashMap — serialization cycle detection // (e.g. Java ObjectOutputStream uses IdentityHashMap internally) IdentityHashMap<Object, Integer> visited = new IdentityHashMap<>(); void serialize(Object obj) { if (visited.containsKey(obj)) return; // cycle detected by identity visited.put(obj, nextId++); // serialize obj... } // Spring AOP uses IdentityHashMap for proxy caching: // two proxied objects wrapping same target should be distinct
ArrayDeque uses a resizable circular array with head and tail int pointers. Capacity is always a power of 2 — enables bitwise modulo. addFirst: elements[head = (head - 1) & (length - 1)] = e. addLast: elements[tail] = e; tail = (tail + 1) & (length - 1). No per-element allocation. Cache-friendly. Null elements prohibited (used as sentinel).
vs LinkedList as Stack/Queue: No per-element Node allocation = zero GC pressure per operation. Better cache locality. Java's own Deque javadoc states: "This class is likely to be faster than Stack when used as a stack, and faster than LinkedList when used as a queue."
// Circular buffer — head/tail wrap around // addFirst: head moves left (wrapping), addLast: tail moves right // [_, _, 3, 2, 1, _, _] head=2, tail=5 // addFirst(4): head=(2-1)&7=1 → [_, 4, 3, 2, 1, _, _] // Use as Stack (LIFO) Deque<Integer> stack = new ArrayDeque<>(); stack.push(1); stack.push(2); stack.pop(); // 2 (LIFO) stack.peek(); // 1 (no remove) // Use as Queue (FIFO) Deque<Integer> queue = new ArrayDeque<>(); queue.offer(1); queue.offer(2); queue.poll(); // 1 (FIFO) // BFS/DFS algorithm template Deque<Node> frontier = new ArrayDeque<>(); frontier.offerLast(root); // BFS: poll from front // frontier.offerFirst(root); // DFS: poll from front too
JMM · volatile · synchronized · AQS · Locks · Atomics · CompletableFuture · Thread pool
volatile guarantees:
Happens-before (JMM §17.4.5) — complete list:
volatile not atomic: counter++ is READ→INCREMENT→WRITE. Two threads reading the same value before either writes will both compute +1 → lost update. Use AtomicInteger.incrementAndGet().
// Volatile flag with non-volatile data — safe publication pattern class Publisher { int data = 0; // plain field volatile boolean ready = false; // Thread A void publish() { data = 42; // plain write, but... ready = true; // volatile write = StoreStore barrier above // data=42 happens-before ready=true happens-before Thread B read } // Thread B void consume() { if (ready) { // volatile read assert data == 42; // guaranteed: HB chain intact } } } // NOT thread-safe — not atomic volatile int counter = 0; counter++; // READ(0) ... WRITE(1) — race condition window! // Fix: AtomicInteger.incrementAndGet() — single CAS instruction
Every Java object has a mark word (64-bit header field). Lock state encoded in mark word bits:
park() and block. Wake on notify.Java 15+: Biased locking deprecated (JEP 374), disabled by default in Java 21. Lock path: unlocked → lightweight → heavyweight. Simpler JVM, no STW revocation pauses.
Object vs Class lock: Instance method synchronized locks on this. Static method synchronized locks on the Class object. These are completely independent monitors — two threads can hold one each simultaneously.
class Foo { synchronized void a() {} // lock = this (Foo instance) static synchronized void b() {} // lock = Foo.class void c() { synchronized(this) {} // same as a() synchronized(Foo.class) {} // same as b() synchronized(lockObj) {} // private lock object — best practice } } // Thread states under synchronized: // Waiting for monitor entry → BLOCKED // In Object.wait() → WAITING // In Object.wait(timeout) → TIMED_WAITING
AQS is the backbone of all java.util.concurrent locks. It maintains:
volatile int state — represents lock state (0=free, 1=held, N=held N times for reentrant)Node objects. Each Node has: waitStatus (CANCELLED, SIGNAL, CONDITION, PROPAGATE), thread, prev, next.Unfair acquisition: CAS state 0→1 immediately, without checking the queue. Barging — new thread may jump ahead of waiting threads. Higher throughput, possible starvation.
Fair acquisition: Checks hasQueuedPredecessors() first. If any thread is waiting, enqueue instead of barging. FIFO ordering. Lower throughput but starvation-free.
Condition: lock.newCondition() creates a ConditionObject with its own linked list of waiting threads. await() releases the lock and moves thread to condition queue. signal() moves thread back to AQS CLH queue.
protected final boolean tryAcquire(int acquires) { final Thread current = Thread.currentThread(); int c = getState(); // volatile read if (c == 0) { // UNFAIR: no queue check — barge in if (compareAndSetState(0, acquires)) { setExclusiveOwnerThread(current); return true; } } else if (current == getExclusiveOwnerThread()) { // REENTRANT: same thread holds lock setState(c + acquires); // plain write — already own the lock return true; } return false; // enqueue and park } // Condition example — bounded buffer ReentrantLock lock = new ReentrantLock(); Condition notFull = lock.newCondition(); Condition notEmpty = lock.newCondition(); // producer lock.lock(); try { while (isFull()) notFull.await(); // releases lock, waits buf.add(item); notEmpty.signal(); // wake one consumer } finally { lock.unlock(); }
3 modes: (1) Write lock — exclusive, blocks all. (2) Read lock — shared, multiple concurrent readers. (3) Optimistic read — tryOptimisticRead() returns a non-zero stamp without acquiring any lock.
Optimistic read pattern: Get stamp → read data into locals → validate(stamp). If valid (no write occurred), proceed. If invalid, fall back to full read lock. The validate() is a cheap volatile read of the internal state version counter.
Not reentrant: Attempting to acquire the same lock twice from the same thread will deadlock. Design around this — extract methods that don't re-acquire. If reentrance needed, use ReentrantReadWriteLock.
When to use over ReadWriteLock: Read-mostly data (99% reads). Optimistic read has ~zero overhead for the common case vs acquiring a read lock.
class Point { double x, y; final StampedLock sl = new StampedLock(); double distanceFromOrigin() { // called very frequently long stamp = sl.tryOptimisticRead(); // no lock — just a version double cx = x, cy = y; // read into locals if (!sl.validate(stamp)) { // write happened? re-read stamp = sl.readLock(); // now hold shared lock try { cx = x; cy = y; } finally { sl.unlockRead(stamp); } } return Math.hypot(cx, cy); } void move(double dx, double dy) { // rare long stamp = sl.writeLock(); try { x += dx; y += dy; } finally { sl.unlockWrite(stamp); } } }
Each Thread object holds a ThreadLocalMap. Map entries use WeakReference<ThreadLocal> as the key but a strong reference as the value. If the ThreadLocal variable goes out of scope (no strong reference), the key becomes null at GC. But the value remains strongly reachable through Thread → ThreadLocalMap → Entry.value. In thread pools (threads reused), this accumulates.
Fix: Always call tl.remove() in a finally block.
MDC (Mapped Diagnostic Context): Logback/SLF4J uses ThreadLocal<Map<String,String>> to store per-request context (correlation ID, user ID). Every log call reads from this map to append structured fields.
InheritableThreadLocal: Child threads inherit parent's value at creation time (copy). Limitation: thread pool threads are created once — new tasks don't inherit the submitting thread's values. Use TransmittableThreadLocal from Alibaba or use MDC.getCopyOfContextMap() + explicit propagation.
@Component public class CorrelationFilter implements Filter { public void doFilter(ServletRequest req, ...) { String id = Optional .ofNullable(((HttpServletRequest)req).getHeader("X-Correlation-ID")) .orElse(UUID.randomUUID().toString()); try { MDC.put("correlationId", id); chain.doFilter(req, res); } finally { MDC.clear(); // MUST clear — thread returns to pool! } } } // Propagating MDC to async tasks (thread pool) Map<String,String> ctx = MDC.getCopyOfContextMap(); executor.submit(() -> { try { if (ctx != null) MDC.setContextMap(ctx); // restore in new thread doWork(); } finally { MDC.clear(); } });
4 Coffman conditions (ALL must hold simultaneously):
// Detection: ThreadMXBean ThreadMXBean tmx = ManagementFactory.getThreadMXBean(); long[] ids = tmx.findDeadlockedThreads(); // monitors + synchronizers if (ids != null) { ThreadInfo[] infos = tmx.getThreadInfo(ids, true, true); for (ThreadInfo ti : infos) log.error("DEADLOCK: {} blocked on {}", ti.getThreadName(), ti.getLockName()); } // Prevention 1: consistent lock ordering (break circular wait) void transfer(Account a, Account b) { Account first = a.getId() < b.getId() ? a : b; // always lower id first Account second = a.getId() < b.getId() ? b : a; synchronized(first) { synchronized(second) { /* transfer */ } } } // Prevention 2: tryLock with timeout (break hold-and-wait) if (lockA.tryLock(100, TimeUnit.MS)) { try { if (lockB.tryLock(100, TimeUnit.MS)) { try { /* critical section */ } finally { lockB.unlock(); } } } finally { lockA.unlock(); } }
CAS (Compare-And-Swap): Single x86 instruction CMPXCHG — atomically: if memory[addr] == expected, write newVal, return true. Else return false. No lock held. Java wraps via Unsafe.compareAndSwapInt (Java 8) or VarHandle.compareAndSet (Java 9+).
ABA problem: Thread reads value A. Another thread changes A→B→A. Thread's CAS sees A and succeeds — but state was transiently changed. For counters: harmless. For pointer-based lock-free structures (stacks, queues): dangerous corruption. Fix: AtomicStampedReference<V> — pairs value with monotonic stamp; CAS must match both.
LongAdder vs AtomicLong: AtomicLong has one volatile long — all threads CAS-retry on the same memory location. Under high contention: many retries, CPU waste. LongAdder uses Striped64 — a Cell[] array where each thread maps to its own cell (padded with @Contended to avoid false sharing). sum() adds all cells. 5–20× faster under contention. Trade-off: sum() is not atomic — use AtomicLong when you need read-modify-write atomicity.
// AtomicInteger CAS loop (incrementAndGet internals) public final int incrementAndGet() { for (;;) { int cur = get(); // volatile read int next = cur + 1; if (compareAndSet(cur, next)) // CAS: atomic read+write return next; // retry if CAS failed (another thread incremented) } } // ABA fix: AtomicStampedReference AtomicStampedReference<Node> head = new AtomicStampedReference<>(node, 0); int[] stampHolder = new int[1]; Node curr = head.get(stampHolder); int stamp = stampHolder[0]; head.compareAndSet(curr, newNode, stamp, stamp + 1); // Now A→B→A is detected: stamp incremented each change // LongAdder for high-throughput counters (metrics, rate limiters) LongAdder hitCount = new LongAdder(); hitCount.increment(); // writes to thread-local Cell — no contention long total = hitCount.sum(); // aggregate — NOT atomic snapshot
Task submission decision tree:
4 Rejection policies: AbortPolicy (default, throws RejectedExecutionException), CallerRunsPolicy (calling thread executes — backpressure), DiscardPolicy (silently drop), DiscardOldestPolicy (drop queue head, retry submit).
Spring @Async: Default SimpleAsyncTaskExecutor creates a new thread per task — terrible for production. Configure a proper ThreadPoolTaskExecutor.
// Raw ThreadPoolExecutor with bounded queue ThreadPoolExecutor executor = new ThreadPoolExecutor( 10, // corePoolSize 50, // maximumPoolSize 60L, TimeUnit.SECONDS, // keepAlive (non-core threads) new ArrayBlockingQueue<>(200), // bounded queue (backpressure) new ThreadPoolExecutor.CallerRunsPolicy() // backpressure via caller ); // Spring @Async production configuration @Configuration @EnableAsync public class AsyncConfig { @Bean public Executor asyncExecutor() { ThreadPoolTaskExecutor ex = new ThreadPoolTaskExecutor(); ex.setCorePoolSize(10); ex.setMaxPoolSize(50); ex.setQueueCapacity(200); ex.setThreadNamePrefix("async-worker-"); ex.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy()); ex.setWaitForTasksToCompleteOnShutdown(true); ex.initialize(); return ex; } }
thenApply(fn): synchronous transform T→U. Runs in completing thread (or caller if already done). If fn itself returns a CF, you get CF<CF<U>> — nested, wrong.
thenCompose(fn): flatMap equivalent. fn returns CompletionStage<U>, result is CF<U> — flattened. Use when the next step is itself async.
thenApplyAsync: same as thenApply but forces execution on ForkJoinPool.commonPool() (or provided executor) — frees the completing thread.
Memory visibility: CF has a volatile Object result. CAS sets it on completion — establishes happens-before. Any thread that reads result via subsequent stage is guaranteed to see all writes made before complete() was called.
// thenApply vs thenCompose CF<String> step1 = CF.supplyAsync(() -> "hello"); // thenApply — sync transform (no new async step) CF<Integer> r1 = step1.thenApply(String::length); // CF<Integer> ✓ // thenCompose — flatten when next step is async CF<User> r2 = step1.thenCompose(name -> userService.lookupAsync(name)); // CF<User> ✓, not CF<CF<User>> // allOf — 3 parallel calls, aggregate CF<User> cfU = getUserAsync(id); CF<Orders> cfO = getOrdersAsync(id); CF<Account> cfAcc = getAccountAsync(id); CF.allOf(cfU, cfO, cfAcc) .orTimeout(2, TimeUnit.SECONDS) // Java 9+ .thenApply(v -> new Dashboard(cfU.join(), cfO.join(), cfAcc.join())) // safe, done .exceptionally(ex -> Dashboard.empty()) // on timeout/error .handle((res, ex) -> { // always runs (try-finally) metrics.record(res, ex); return res != null ? res : Dashboard.empty(); }); // completeOnTimeout — return default instead of throwing cfU.completeOnTimeout(User.anonymous(), 500, TimeUnit.MS);
Each worker thread owns a double-ended deque. fork() pushes task to deque tail (LIFO). Worker pops from its own tail (LIFO — better cache locality for recursive tasks). Idle workers steal from other workers' heads (FIFO — reduces contention with owner).
Why compute-right-inline + fork-left pattern: By computing right subtask directly (not forking), you avoid creating an extra queued task for work you're about to do anyway. You only fork left — which another idle thread can steal. This halves task creation overhead.
Custom FJP for parallel streams: Stream.parallel() uses ForkJoinPool.commonPool() (shared JVM-wide). If your parallel stream does blocking IO, it starves common pool for other tasks. Wrap in custom FJP to isolate.
class SumTask extends RecursiveTask<Long> { static final int THRESHOLD = 1_000; int[] arr; int lo, hi; @Override protected Long compute() { if (hi - lo <= THRESHOLD) { long sum = 0; for (int i = lo; i < hi; i++) sum += arr[i]; return sum; } int mid = (lo + hi) / 2; SumTask left = new SumTask(arr, lo, mid); SumTask right = new SumTask(arr, mid, hi); left.fork(); // push left to deque tail return right.compute() + left.join(); // compute right inline } } // Custom FJP to isolate parallel stream from commonPool ForkJoinPool custom = new ForkJoinPool(8); long result = custom.submit(() -> bigList.parallelStream().mapToLong(Item::getValue).sum() ).get(); custom.shutdown();
AQS state is split: high 16 bits = shared (read) count, low 16 bits = exclusive (write) count. Multiple readers can hold simultaneously; writer needs all read counts to reach 0.
Writer starvation (unfair mode): While readers continuously arrive and acquire the lock, a waiting writer never gets in. Unfair mode allows new read acquisitions to proceed as long as no writer currently holds.
Fair mode: A FIFO queue is maintained. Once a writer is waiting, subsequent read requests are queued behind it. Eliminates starvation but reduces throughput.
Lock downgrade: Holding write lock, acquire read lock, then release write lock — you atomically move from exclusive to shared without a gap where another writer could sneak in and modify data you already computed.
class CachedData { volatile boolean cacheValid = false; final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock(); Object[] data; void processCachedData() { rwl.readLock().lock(); try { if (!cacheValid) { rwl.readLock().unlock(); // drop read before acquiring write rwl.writeLock().lock(); try { if (!cacheValid) { // double-check after write lock data = loadData(); cacheValid = true; } rwl.readLock().lock(); // DOWNGRADE: acquire read before releasing write } finally { rwl.writeLock().unlock(); } // release write // now holding read lock only — safe to proceed } use(data); // process under read lock } finally { rwl.readLock().unlock(); } } }
| Feature | CountDownLatch | CyclicBarrier | Phaser |
|---|---|---|---|
| Reusable | No — one-shot | Yes — auto-resets | Yes — multiple phases |
| Party count | Fixed | Fixed | Dynamic register/deregister |
| Barrier action | None | Optional Runnable | Override onAdvance() |
| HB guarantee | countDown() HB await() return | await() HB barrier action HB subsequent awaiters | arrive() HB phase advance HB next phase start |
Semaphore: AQS state = permit count. acquire() decrements (CAS). release() increments (CAS + unpark waiters). Use for connection pools, rate limiters, bounded resources.
// CountDownLatch — await all services to start CountDownLatch ready = new CountDownLatch(3); // each service calls ready.countDown() after init ready.await(); // main waits — all writes before countDown() are visible // CyclicBarrier — parallel batch processing CyclicBarrier barrier = new CyclicBarrier(4, () -> log.info("Batch complete, merging results")); // each worker calls barrier.await() when done with its shard // barrier.reset() or it auto-resets for next batch round // Semaphore — connection pool rate limiter Semaphore semaphore = new Semaphore(10, true); // 10 permits, fair semaphore.acquire(); // blocks if 10 already acquired try { useResource(); } finally { semaphore.release(); } // Phaser — dynamic multi-phase simulation Phaser ph = new Phaser(1); // 1 initial party (orchestrator) ph.register(); // add party dynamically ph.arriveAndAwaitAdvance(); // participate and wait for phase N ph.arriveAndDeregister(); // leave phaser
VarHandle is the safe, typed replacement for sun.misc.Unsafe. Created via MethodHandles.lookup().findVarHandle(...). Provides fine-grained memory ordering:
Acquire/Release is cheaper than Volatile — only one-sided fence. Sufficient for lock implementations (lock acquire = getAcquire, lock release = setRelease).
JDK uses VarHandle in: ConcurrentHashMap (tabAt, casTabAt), AbstractQueuedSynchronizer, CompletableFuture, ForkJoinPool, StampedLock.
class MyClass { volatile int state = 0; static final VarHandle STATE; static { STATE = MethodHandles.lookup() .findVarHandle(MyClass.class, "state", int.class); } void demonstrate() { STATE.set(this, 1); // plain write int v = (int) STATE.getAcquire(this); // acquire read STATE.setRelease(this, 2); // release write STATE.compareAndSet(this, 0, 1); // CAS volatile STATE.getAndAdd(this, 1); // atomic increment STATE.compareAndExchangeAcquire(this,0,1); // CAS + acquire fence } }
CPUs cache data in 64-byte cache lines. If thread A updates variable X and thread B updates variable Y, but X and Y occupy the same cache line, every update by A invalidates B's copy of the line (and vice versa) — even though they're logically independent. This "false sharing" can reduce throughput by 10–40× and causes cache coherency traffic.
@jdk.internal.vm.annotation.Contended pads the annotated field with ~128 bytes, ensuring it occupies its own cache line. For application code, use @sun.misc.Contended with -XX:-RestrictContended.
Diagnosis: Linux perf stat -e cache-misses,L1-dcache-load-misses. Java Flight Recorder CPU sample stacks. High LOCK XCHG instructions under perf. Async-profiler with --event cache-misses.
// ❌ False sharing: a and b may share same 64-byte cache line class Counters { volatile long a = 0; // Thread 1 writes volatile long b = 0; // Thread 2 writes — same cache line! } // ✅ Fix with @Contended (adds ~128 bytes padding around each) class Counters { @jdk.internal.vm.annotation.Contended volatile long a = 0; // own 128-byte padded cache line @jdk.internal.vm.annotation.Contended volatile long b = 0; // own 128-byte padded cache line } // JDK ConcurrentHashMap uses this for its striped counter @Contended static final class CounterCell { volatile long value; // each CounterCell on its own cache line // Without @Contended: 16 threads → 16 writes → all invalidate same line // With @Contended: each thread hits its own cache line → linear throughput } // Manual padding (pre-Java 9, ugly but works) long p1,p2,p3,p4,p5,p6,p7; // 56 bytes prefix volatile long value; // 8 bytes on its own line long q1,q2,q3,q4,q5,q6,q7; // 56 bytes suffix
LockSupport provides per-thread permit-based blocking (max 1 permit per thread). park() blocks if permit unavailable; consumes permit if available. unpark(t) grants permit to thread t; if t is parked, it wakes immediately.
Why unpark-before-park is safe: The permit is stored. When park() is called later, it sees the permit and returns immediately. This is unlike Object.notify() which is lost if no thread is waiting — LockSupport has no such lost-wakeup problem.
Memory visibility: unpark(t) happens-before park() returns in thread t. Everything written before unpark is visible to the parked thread after it wakes.
// park/unpark — AQS uses this for all lock implementations Thread waiter = new Thread(() -> { System.out.println("parking..."); LockSupport.park(); // blocks — may have spurious wakeups System.out.println("unparked"); }); waiter.start(); Thread.sleep(100); // data write before unpark — waiter will see it LockSupport.unpark(waiter); // wake waiter — HB established // unpark BEFORE park — still works LockSupport.unpark(waiter); // permit granted now LockSupport.park(waiter); // returns immediately — permit consumed // Spurious wakeup guard (always check condition in loop) while (!condition) { LockSupport.park(this); // 'this' = blocker object for jstack if (Thread.interrupted()) throw new InterruptedException(); }
Without volatile: new Singleton() is 3 steps: allocate memory, call constructor, write reference. JIT/CPU can reorder to: allocate, write reference, call constructor. Thread B sees non-null reference before constructor finishes → reads partially constructed object.
volatile on instance prevents that reorder via StoreStore barrier. Constructor completion happens-before volatile write, which happens-before every subsequent read.
// Pattern 1: DCL with volatile (acceptable) public class Singleton { private static volatile Singleton instance; // volatile REQUIRED public static Singleton get() { if (instance == null) { // fast path (no lock) synchronized (Singleton.class) { if (instance == null) // re-check under lock instance = new Singleton(); } } return instance; } } // Pattern 2: Initialization-on-Demand Holder (preferred) // Class loading is thread-safe by JVM spec (§12.4.2) public class Singleton2 { private static class Holder { static final Singleton2 INSTANCE = new Singleton2(); // lazy+safe } public static Singleton2 get() { return Holder.INSTANCE; } } // Pattern 3: Enum singleton (simplest, serialization-safe) public enum Config { INSTANCE; public void doWork() { ... } // guaranteed single instance } Config.INSTANCE.doWork();
Exchanger<V>: Rendezvous point for exactly two threads — each calls exchange(v) and receives the other's value. Thread A blocks until thread B arrives (or timeout). Internally: CAS on a slot node with volatile item and waiter thread. Supports arenas for high-concurrency pair matching.
ConcurrentLinkedQueue: Michael-Scott algorithm. head and tail are volatile. Enqueue: CAS tail.next from null to newNode; update tail lazily. Dequeue: CAS head to head.next. Lock-free but not wait-free (CAS can retry). size() is O(n) — avoid.
Flow API (Java 9): Reactive Streams standard — Publisher, Subscriber, Subscription, Processor. Backpressure: Subscriber calls subscription.request(n) to pull n items. Publisher must not emit more than requested. Prevents fast producer from overwhelming slow consumer. Used by Spring WebFlux (Project Reactor) and Kafka Streams.
// Exchanger: zero-copy double buffer pattern Exchanger<List<Event>> ex = new Exchanger<>(); // Producer new Thread(() -> { List<Event> buf = new ArrayList<>(); while(true) { buf.add(nextEvent()); if (buf.size() == 100) { buf = ex.exchange(buf); // swap full for empty list buf.clear(); } } }); // Consumer new Thread(() -> { List<Event> buf = new ArrayList<>(); while(true) { buf = ex.exchange(buf); // get full, return empty buf.forEach(this::process); buf.clear(); } });
AtomicReference<V> is a separate heap object (~24+ bytes per instance). In a data structure with millions of nodes, using one AtomicReference per node can add gigabytes of overhead.
AtomicReferenceFieldUpdater performs CAS directly on a volatile field using a single shared updater instance (reflection/VarHandle). Zero additional allocation per node — only the volatile field cost.
JDK usage: ConcurrentHashMap.Node (next field), FutureTask (waiters), AbstractQueuedSynchronizer (head, tail, state), SynchronousQueue.
// ❌ AtomicReference: 1 extra heap object per node class Node<T> { T value; AtomicReference<Node<T>> next = new AtomicReference<>(); // 24+ bytes per node just for the AtomicReference wrapper } // ✅ ARFU: one shared updater, zero per-node overhead class Node<T> { T value; volatile Node<T> next; // just a volatile reference static final AtomicReferenceFieldUpdater<Node, Node> NEXT = AtomicReferenceFieldUpdater.newUpdater(Node.class, Node.class, "next"); boolean casNext(Node<T> expect, Node<T> update) { return NEXT.compareAndSet(this, expect, update); } } // VarHandle alternative (Java 9+, preferred) static final VarHandle NEXT_VH = MethodHandles.lookup() .findVarHandle(Node.class, "next", Node.class);
Virtual threads are lightweight JVM-managed threads. Thousands of virtual threads are multiplexed onto a small pool of OS platform threads ("carrier threads"). When a virtual thread blocks (IO, sleep, lock), the JVM unmounts it from its carrier thread — the carrier is freed to run another virtual thread.
Key differences from platform threads: Stack is heap-allocated (grows/shrinks, starts tiny ~1KB). Creating millions is cheap. Ideal for blocking IO workloads — no need for reactive/callback style.
Carrier thread pinning: When a virtual thread is blocked inside a synchronized block, it cannot unmount — the carrier thread is pinned. This can exhaust the carrier pool. Fix: replace synchronized with ReentrantLock inside VT-heavy code.
Spring Boot 3.2+: spring.threads.virtual.enabled=true switches Tomcat/Jetty/Undertow to use virtual thread per request — replaces thread-per-request pool model. Each HTTP request gets a fresh virtual thread; blocking DB/IO calls are cheap.
// Create virtual threads Thread vt = Thread.ofVirtual().name("vt-1").start(() -> { // Blocking IO here is fine — carrier freed while blocked var result = dbService.queryBlocking(id); // unmounts, doesn't block carrier }); // ExecutorService with virtual threads — no pooling needed! try (var exec = Executors.newVirtualThreadPerTaskExecutor()) { for (var req : requests) exec.submit(() -> handleRequest(req)); // 1M concurrent = fine } // ⚠ PINNING: synchronized + blocking = carrier pinned! synchronized(this) { Thread.sleep(1000); // carrier thread blocked — pinning! } // ✅ Fix: use ReentrantLock lock.lock(); try { Thread.sleep(1000); } // carrier freed — no pinning finally { lock.unlock(); } // Spring Boot 3.2+ application.yml // spring.threads.virtual.enabled: true
Structured Concurrency: Groups related virtual threads into a "scope" with clear owner lifetime semantics. When the scope closes, all forked threads are joined. If one subtask fails, the scope cancels others. Solves the problem of leaked threads and scattered error handling in CompletableFuture chains.
vs CompletableFuture: CF chains are not tied to a single owner thread — completion can happen on any thread, making cancellation/shutdown complex. StructuredTaskScope enforces parent-child lifetime hierarchy.
ScopedValues (JEP 446): Immutable per-scope binding — one value per scope, shared (read-only) across all child virtual threads. Unlike ThreadLocal: not mutable, not inherited by all pools, no remove() needed, no memory leak risk. Perfect for request context in virtual-thread servers.
// StructuredTaskScope: ShutdownOnFailure pattern try (var scope = new StructuredTaskScope.ShutdownOnFailure()) { Subtask<User> userTask = scope.fork(() -> fetchUser(id)); Subtask<Orders> orderTask = scope.fork(() -> fetchOrders(id)); scope.join(); // wait for all tasks scope.throwIfFailed(); // propagate first failure, cancels others return new Dashboard(userTask.get(), orderTask.get()); } // scope closes: all forked threads guaranteed complete or cancelled // ScopedValue: immutable per-request context static final ScopedValue<RequestContext> CTX = ScopedValue.newInstance(); // Set at request entry point ScopedValue.where(CTX, new RequestContext(correlationId)) .run(() -> handleRequest()); // Read anywhere in the call tree (including child virtual threads) void deepMethod() { String id = CTX.get().correlationId(); // no ThreadLocal leak possible }
Streams · Collectors · Optional · Records · Sealed classes · Pattern matching
Lazy evaluation: Intermediate operations (filter, map, flatMap, sorted, peek) build a pipeline but execute nothing. The pipeline fires only when a terminal operation (collect, forEach, findFirst, count, reduce) is invoked. This means the pipeline iterates the source once — all operations fused per element.
Short-circuit: findFirst(), findAny(), anyMatch(), allMatch(), noneMatch(), limit(n) — can stop processing before all elements are consumed.
Spliterator: The strategy for traversal and splitting. trySplit() splits data for parallel processing. Characteristics: ORDERED, DISTINCT, SORTED, SIZED, NONNULL, IMMUTABLE, CONCURRENT, SUBSIZED. Parallel streams automatically split via Spliterator and submit to ForkJoinPool.commonPool().
CME in streams: Structurally modifying the source collection during stream pipeline execution causes CME (same modCount mechanism as iterator).
// Only 4 filter calls — stops at first match ≥ 10 List.of(1,3,5,10,20).stream() .filter(n -> { System.out.print(n + " "); return n >= 10; }) .findFirst(); // prints: 1 3 5 10 (not 20!) // Intermediate ops are STATELESS vs STATEFUL: // Stateless: filter, map, flatMap, peek — no ordering dependency // Stateful: sorted, distinct, limit — need to see data before outputting // Custom Spliterator for custom data source class RangeSpliterator implements Spliterator.OfInt { int current, end; public boolean tryAdvance(IntConsumer action) { if (current < end) { action.accept(current++); return true; } return false; } public Spliterator.OfInt trySplit() { int mid = (current + end) / 2; if (mid == current) return null; // too small to split int lo = current; current = mid; return new RangeSpliterator(lo, mid); // left half to new thread } }
Custom Collector requires implementing Collector<T, A, R>: supplier (create mutable accumulator), accumulator (fold one element), combiner (merge parallel accumulators), finisher (transform A to R), characteristics (IDENTITY_FINISH, CONCURRENT, UNORDERED).
Avoid parallel streams when: source is small (<10K elements), operations have side effects or shared mutable state, source is ordered with unordered-sensitive operations, IO-bound operations (use virtual threads instead), tasks are unequal in size (work stealing won't help).
// groupingBy with downstream: dept → avg salary Map<String, Double> avgSalary = employees.stream() .collect(Collectors.groupingBy( Employee::getDept, Collectors.averagingDouble(Employee::getSalary))); // toMap with merge (avoid key collision exception) Map<String, Long> wordCount = words.stream() .collect(Collectors.toMap(w -> w, w -> 1L, Long::sum)); // teeing (Java 12): two collectors in one pass record Stats(double avg, Optional<Employee> highest) {} Stats stats = employees.stream().collect(Collectors.teeing( Collectors.averagingDouble(Employee::getSalary), Collectors.maxBy(Comparator.comparingDouble(Employee::getSalary)), Stats::new)); // Custom Collector: immutable set with running total Collector<Integer, ?, Map<String,Object>> custom = Collector.of( () -> new int[]{0, 0}, // [sum, count] (arr, n) -> { arr[0]+=n; arr[1]++; }, // accumulate (a, b) -> new int[]{a[0]+b[0], a[1]+b[1]}, // combine (parallel) arr -> Map.of("sum", arr[0], "avg", arr[1]>0 ? arr[0]/arr[1] : 0) // finisher );
// String methods " hello ".strip(); // "hello" — Unicode-aware (vs trim) " ".isBlank(); // true — whitespace check "a\nb\nc".lines().toList(); // ["a","b","c"] "ab".repeat(3); // "ababab" "".stripLeading(); // strip only from left/right // var in lambda parameters (Java 11) (var s, var t) -> s + t // allows @NotNull var s — for annotations // Files.readString / writeString String content = Files.readString(Path.of("config.json")); Files.writeString(Path.of("out.txt"), "hello", StandardOpenOption.CREATE); // New HttpClient (non-blocking) HttpClient client = HttpClient.newBuilder() .connectTimeout(Duration.ofSeconds(5)) .build(); HttpRequest req = HttpRequest.newBuilder() .uri(URI.create("https://api.example.com/data")) .header("Authorization", "Bearer " + token) .GET().build(); // Async HTTP call CompletableFuture<String> body = client .sendAsync(req, HttpResponse.BodyHandlers.ofString()) .thenApply(HttpResponse::body); // Collection.toArray with type String[] arr = list.toArray(String[]::new); // no reflection needed
Records (Java 16): Immutable data carriers. Compiler auto-generates: constructor, accessors (no get prefix), equals, hashCode, toString. Cannot extend other classes (implicitly extends Record). Can implement interfaces.
Sealed classes (Java 17): Restrict which classes can extend/implement. Paired with pattern matching switch — compiler can verify exhaustiveness.
// Record — immutable data carrier record Point(double x, double y) { // compact canonical constructor for validation Point { if (x < 0) throw new IllegalArgumentException(); } double distance() { return Math.hypot(x, y); } // extra methods OK } Point p = new Point(3, 4); p.x(); p.y(); p.distance(); // accessors — no getX() // Sealed class + exhaustive pattern matching switch sealed interface Shape permits Circle, Rectangle, Triangle {} record Circle(double r) implements Shape {} record Rectangle(double w, double h) implements Shape {} record Triangle(double base, double h) implements Shape {} double area(Shape s) { return switch (s) { // Java 21: exhaustive — no default needed case Circle(var r) -> Math.PI * r * r; case Rectangle(var w, var h) -> w * h; case Triangle(var b, var h) -> 0.5 * b * h; }; } // Text blocks (Java 15) String json = """ { "name": "Bheemesh", "role": "Senior Java Developer" } """; // indentation stripped automatically
// Pattern Matching for switch (JEP 441 — finalized) Object obj = getResult(); String desc = switch (obj) { case Integer i -> "int: " + i; case String s -> "str: " + s; case Long l when l > 1000 -> "big long"; // guard clause case null -> "null"; default -> "unknown"; }; // SequencedCollection (JEP 431 — finalized) // New interface for ordered collections with defined first/last SequencedCollection<String> seq = new ArrayList<>(List.of("a","b","c")); seq.getFirst(); // "a" — unified API across List/Deque/LinkedHashSet seq.getLast(); // "c" seq.addFirst("z"); // ["z","a","b","c"] seq.reversed(); // reversed view ["c","b","a","z"] // SequencedMap — firstEntry / lastEntry SequencedMap<String,Integer> sm = new LinkedHashMap<>(Map.of("a",1,"b",2)); sm.firstEntry(); sm.lastEntry(); // Record Patterns in switch (JEP 440 — finalized) if (shape instanceof Circle(var r) && r > 10) { // r is in scope here — pattern match + destructuring } // Unnamed Classes and Instance Main (Java 21 preview) // void main() { System.out.println("Hello"); } — no class needed
Rule 1: Never use Optional as a field, parameter, or in collections — only as a method return type to signal "this method may not return a value."
map vs flatMap: map(fn) wraps result: if fn returns T, you get Optional<T>. flatMap(fn) flattens: fn must return Optional<T>, result is Optional<T> (no nesting).
orElse vs orElseGet: orElse(default) evaluates the default expression always even if Optional has value. orElseGet(() -> expensive()) is lazy — only calls the supplier if empty.
// ✅ Correct — return type only public Optional<User> findById(Long id) { ... } // map vs flatMap Optional<String> name = findUser(id) .map(User::getName); // Optional<String> Optional<Address> addr = findUser(id) .flatMap(User::getOptionalAddr); // flatten Optional<Optional<Address>> // orElse: ALWAYS evaluates even if present String s = opt.orElse(expensiveDefault()); // called unconditionally! // orElseGet: LAZY — only calls if empty String s2 = opt.orElseGet(() -> expensiveDefault()); // only if empty // Java 9+: ifPresentOrElse opt.ifPresentOrElse(v -> log.info("found {}", v), () -> log.warn("not found")); // Java 9+: or — supply alternative Optional Optional<User> user = findInDB(id) .or(() -> findInCache(id)); // fallback chain // Java 9+: stream — bridge to Stream pipeline List<User> users = ids.stream() .map(this::findById) // Stream<Optional<User>> .flatMap(Optional::stream) // Java 9: unwrap non-empty .toList();
Tricky interview gotchas · Memory corruption · Concurrent bugs
Java 7 — infinite loop bug: Resize transferred entries using a reverse-order loop. Thread A starts resize, reverses entries in a bucket. Thread B starts resize concurrently. When both complete, one bucket can contain a circular link: Node A → Node B → Node A. Any subsequent get() on that bucket loops forever at 100% CPU.
Java 8+ — data loss: Java 8 fixed the circular link but introduced a different race. Both threads allocate new tables independently. One thread's table = newTable write overwrites the other's → all entries inserted by the losing thread vanish silently. No exception, no log — entries just gone.
// Java 7 circular list (simplified transfer logic) void transfer(Entry[] newTable) { for (Entry<K,V> e : table) { while (e != null) { Entry<K,V> next = e.next; e.next = newTable[i]; // Thread A reverses here newTable[i] = e; // Thread B does same → A.next=B, B.next=A e = next; // → CIRCULAR → infinite loop on get() } } } // Java 8: table reference overwrite data loss // Thread A: newTab = new Node[32]; ... table = newTab; ✓ // Thread B: newTab = new Node[32]; ... table = newTab; ← overwrites A! // → A's entries inserted before B's resize are LOST // ✅ Always use ConcurrentHashMap Map<K,V> safe = new ConcurrentHashMap<>(); // CAS + per-bucket sync: zero data loss, no infinite loops
class BrokenSingleton { private static BrokenSingleton instance; // NOT volatile private int value = 42; public static BrokenSingleton get() { if (instance == null) { synchronized(BrokenSingleton.class) { if (instance == null) instance = new BrokenSingleton(); // BUG 1 } } return instance; // BUG 2 } void use() { BrokenSingleton s = get(); assert s.value == 42; // Can FAIL! } } // BUG 1: new BrokenSingleton() may be reordered by JIT: // a) allocate memory // b) write reference to instance ← Thread B sees non-null here // c) initialize fields (value=42) ← NOT YET DONE // Thread B reads instance != null, skips sync, reads value = 0! // BUG 2: Without volatile, Thread B may read stale null // from its CPU cache even after Thread A wrote it — visibility! // FIX: private static volatile BrokenSingleton instance; // volatile write establishes HB: constructor end HB volatile write HB every read
CompletableFuture<Integer> idFuture = CompletableFuture.supplyAsync(() -> 42); // Q1: What is the type of r1? var r1 = idFuture.thenApply(id -> CompletableFuture.supplyAsync(() -> "User:" + id)); // A: CompletableFuture<CompletableFuture<String>> — NESTED! Wrong. // You'd need r1.join().join() — bad. // Q2: What is the type of r2? var r2 = idFuture.thenCompose(id -> CompletableFuture.supplyAsync(() -> "User:" + id)); // A: CompletableFuture<String> — FLATTENED. Correct! // thenCompose = flatMap in Stream / Mono.flatMap in Reactor // Rule: thenApply when fn returns T (not CF) // thenCompose when fn returns CompletionStage<T> // Q3: What thread executes thenApply? var r3 = idFuture.thenApply(id -> id * 2); // A: The thread that completed idFuture (ForkJoinPool worker) // OR the calling thread if already complete at registration time // thenApplyAsync: ALWAYS a new ForkJoinPool thread
// ❌ BUG: ArrayList not thread-safe — concurrent add causes corruption List<Integer> results = new ArrayList<>(); IntStream.range(0, 1000).parallel() .forEach(results::add); // results.size() may be 983, 995, or throw ArrayIndexOutOfBoundsException // Two threads expand array simultaneously → corrupted state // ❌ BUG: shared counter — race condition int[] count = {0}; IntStream.range(0, 1000).parallel() .forEach(i -> count[0]++); // lost updates! // ✅ FIX 1: use collect() — thread-safe combiner List<Integer> safe = IntStream.range(0, 1000).parallel() .boxed() .collect(Collectors.toList()); // always correct — thread-safe merge // ✅ FIX 2: thread-safe collection List<Integer> safe2 = new CopyOnWriteArrayList<>(); // or Vector, or ConcurrentLinkedQueue // ✅ FIX 3: LongAdder for counting LongAdder cnt = new LongAdder(); IntStream.range(0, 1000).parallel().forEach(i -> cnt.increment()); cnt.sum(); // always 1000
// Puzzle 1: Stream reuse — throws IllegalStateException Stream<String> s = Stream.of("a","b","c"); s.forEach(System.out::println); s.forEach(System.out::println); // IllegalStateException: stream already operated // Streams are single-use pipelines // Puzzle 2: stateful lambda in parallel (non-deterministic) List<Integer> seen = new ArrayList<>(); IntStream.range(0,10).parallel() .filter(i -> { seen.add(i); return i%2==0; }) // STATEFUL SIDE EFFECT .count(); // seen.size() may be 6, 8, 10 — non-deterministic // filter may be called in any order, possibly concurrently // Puzzle 3: sorted + parallel order List.of(3,1,4,1,5).parallelStream() .sorted() .forEachOrdered(System.out::println); // [1,1,3,4,5] — ordered despite parallel // sorted() is stateful — must see all before outputting // forEachOrdered preserves encounter order in parallel // Puzzle 4: peek() in parallel long count = List.of(1,2,3).parallelStream() .peek(System.out::println) // may print in any order .count(); // count = 3, always correct
putIfAbsent(k, v): atomically puts only if key absent. Returns old value (or null). Does not compute lazily.
compute(k, fn): atomically applies fn to key+current value. The entire read-compute-write is done under the bucket lock — no race condition. Returns new value. If fn returns null, removes the key.
merge(k, v, fn): if key absent, puts v; if present, fn(oldVal, v) determines new value. Same atomic guarantee as compute. Perfect for aggregation.
// Thread-safe word counter via merge ConcurrentHashMap<String, Long> wordCount = new ConcurrentHashMap<>(); // merge: if absent insert 1L; if present add 1L (atomic) words.parallelStream() .forEach(w -> wordCount.merge(w, 1L, Long::sum)); // compute — atomic read-modify-write for complex state wordCount.compute("hello", (k, v) -> v == null ? 1L : v + 1L); // computeIfAbsent — lazy initialization pattern ConcurrentHashMap<String, List<Event>> groups = new ConcurrentHashMap<>(); groups.computeIfAbsent(key, k -> new CopyOnWriteArrayList<>()) .add(event); // thread-safe group creation // ⚠ DON'T do this — not atomic, race between get + put if (!map.containsKey(k)) map.put(k, new ArrayList<>()); // RACE! // frequency — Java 8 reduce + CHM Map<String, Long> freq = words.stream() .collect(Collectors.groupingByConcurrent( w -> w, Collectors.counting())); // parallel-safe
Spring · Hibernate · Kafka · Snowflake · Real collection choices in production
Spring:
DefaultSingletonBeanRegistry): ConcurrentHashMap<String, Object> singletonObjects — concurrent reads during initializationCopyOnWriteArraySet via AbstractApplicationEventMulticaster — read-heavy, rarely written after context startConcurrentHashMap mapping target class → proxy classLinkedHashMap to preserve registration order of URL patternsHibernate:
HashMap<EntityKey, Object> — per-session, not shared, no concurrency neededWeakHashMap for ClassLoader-sensitive metadataKafka:
HashMap<TopicPartition, OffsetAndMetadata> — single consumer thread per partition, no sync neededConcurrentHashMap<TopicPartition, Partition>ConcurrentSkipListMap for sorted partition/topic navigationSnowflake (JDBC / Spring): Partition metadata range queries use TreeMap<Long, Partition> for timestamp-based subMap() / floorKey() lookups when pruning micro-partitions.
// Rate limiter with Semaphore (N requests/sec) class SemaphoreRateLimiter { private final Semaphore sem; SemaphoreRateLimiter(int rps) { sem = new Semaphore(0); Executors.newSingleThreadScheduledExecutor() .scheduleAtFixedRate(() -> sem.release(rps), 0, 1, TimeUnit.SECONDS); } void acquire() throws InterruptedException { sem.acquire(); } } // Concurrent cache with computeIfAbsent + expiry class TimedCache<K,V> { record Entry<V>(V value, long expiryMs) {} private final ConcurrentHashMap<K, Entry<V>> map = new ConcurrentHashMap<>(); V get(K key, Supplier<V> loader, long ttlMs) { long now = System.currentTimeMillis(); return map.compute(key, (k, e) -> { if (e != null && e.expiryMs() > now) return e; // fresh return new Entry<>(loader.get(), now + ttlMs); // load + refresh }).value(); } } // Token bucket rate limiter (AtomicLong based) class TokenBucket { private final int capacity; private final AtomicLong tokens; private volatile long lastRefill = System.nanoTime(); synchronized boolean tryConsume() { refill(); if (tokens.get() > 0) { tokens.decrementAndGet(); return true; } return false; } private void refill() { long now = System.nanoTime(); long add = (now - lastRefill) / 1_000_000; // per ms if (add > 0) { tokens.set(Math.min(capacity, tokens.get() + add)); lastRefill = now; } } }
// MDC-aware TaskDecorator for @Async propagation @Bean public Executor asyncExecutor() { ThreadPoolTaskExecutor ex = new ThreadPoolTaskExecutor(); ex.setTaskDecorator(runnable -> { Map<String,String> ctx = MDC.getCopyOfContextMap(); // capture return () -> { try { if (ctx != null) MDC.setContextMap(ctx); // restore in new thread runnable.run(); } finally { MDC.clear(); } // always clean up }; }); ex.setCorePoolSize(10); ex.initialize(); return ex; } // Spring Boot 3.2+: enable virtual threads for Tomcat // application.yml: // spring: // threads: // virtual: // enabled: true // This switches Tomcat's request handler from platform thread pool // to Executors.newVirtualThreadPerTaskExecutor() // Spring Boot 3.2+: virtual thread bean config @Bean public Executor virtualThreadExecutor() { return Executors.newVirtualThreadPerTaskExecutor(); } // ⚠ ThreadLocal caveat with virtual threads: // Each VT has its own ThreadLocalMap — value not shared, but // with 100k concurrent VTs, 100k copies of ThreadLocal data in heap // Use ScopedValue instead for read-only per-request context
KafkaConsumer thread-safety: Kafka's KafkaConsumer is NOT thread-safe — all calls must happen from the same polling thread. Spring Kafka wraps it in a KafkaMessageListenerContainer with a dedicated polling loop thread per partition (or per consumer instance).
Concurrency in Spring: ConcurrentKafkaListenerContainerFactory with concurrency=3 creates 3 KafkaMessageListenerContainer instances, each with their own thread, consuming from assigned partitions.
// Factory configuration @Bean public ConcurrentKafkaListenerContainerFactory<String,Order> factory() { var factory = new ConcurrentKafkaListenerContainerFactory<String,Order>(); factory.setConsumerFactory(consumerFactory); factory.setConcurrency(3); // 3 threads = 3 partition consumers factory.getContainerProperties() .setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE); return factory; } // Listener with manual offset commit @KafkaListener(topics = "orders", concurrency = "3") public void onOrder(ConsumerRecord<String,Order> record, Acknowledgment ack) { try { processOrder(record.value()); ack.acknowledge(); // manual commit after success } catch (Exception e) { log.error("Processing failed: {}", record.offset(), e); // don't ack → message retried on next poll } } // Offset tracking: HashMap<TopicPartition, OffsetAndMetadata> // Per-thread (per container) — no synchronization needed // ConcurrentHashMap used at the KafkaConsumer coordinator level