01 CI/CD & DevOps Foundational
MED What is CI/CD, and how did you implement it in your projects?

CI (Continuous Integration) is the practice of automatically building and testing code on every push. CD (Continuous Delivery/Deployment) automates the release to staging or production.

In my project: We used GitLab CI/CD. Every PR triggered a pipeline with stages: build → test → SonarQube scan → Docker build → deploy to staging. On merge to main, it auto-deployed to production via Kubernetes rolling update.

# .gitlab-ci.yml example
stages:
  - build
  - test
  - quality
  - docker
  - deploy

build:
  stage: build
  script: mvn clean package -DskipTests

test:
  stage: test
  script: mvn test

sonar:
  stage: quality
  script: mvn sonar:sonar -Dsonar.host.url=$SONAR_URL
Key talking point: Mention Blue-Green or Canary deployments for zero-downtime, Resilience4j for circuit breaking, and Outbox pattern for reliability.
EASY What's the difference between Docker and a Virtual Machine?

VM: Runs a full OS on top of a hypervisor. Heavy — each VM has its own kernel, GBs of disk, slow startup.

Docker: Shares the host OS kernel. Containers are lightweight processes — start in seconds, MBs of overhead.

  • Isolation: VMs = hardware-level; Docker = process-level (namespaces + cgroups)
  • Portability: Docker images run identically on dev/staging/prod
  • Use case: Microservices → Docker. Legacy monolith requiring OS isolation → VM
# Dockerfile for Spring Boot
FROM eclipse-temurin:21-jre-alpine
WORKDIR /app
COPY target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]
HARD How would you achieve zero-downtime deployment?

Three strategies depending on risk tolerance:

  • Blue-Green: Two identical environments (blue=live, green=new). Switch load balancer after green is healthy. Instant rollback by flipping back.
  • Canary: Route 5% of traffic to new version. Monitor error rates. Gradually increase to 100%.
  • Rolling (Kubernetes default): Replace pods one at a time. New pod must pass readiness probe before old one terminates.
# Kubernetes rolling update strategy
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1        # 1 extra pod during update
    maxUnavailable: 0  # never take pods down first
Also mention: readinessProbe/livenessProbe, DB migrations with backward-compatible changes (Liquibase), feature flags via LaunchDarkly.
MED How do you ensure application logging and monitoring in production?

Logging Stack: SLF4J + Logback → structured JSON logs → shipped to ELK (Elasticsearch + Logstash + Kibana) or Loki + Grafana.

Monitoring Stack: Micrometer (metrics) → Prometheus (scrape) → Grafana (dashboards). Spring Boot Actuator exposes /actuator/prometheus endpoint.

Distributed Tracing: Micrometer Tracing (Spring Boot 3.x) with Zipkin or Tempo. Every request gets a traceId propagated across services via HTTP headers.

# application.yml
management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus,metrics
  tracing:
    sampling:
      probability: 1.0
MED Have you deployed applications using Kubernetes?

Yes. Key Kubernetes concepts used in production:

  • Deployment: Manages pod replicas and rolling updates
  • Service: ClusterIP for internal, LoadBalancer for external traffic
  • ConfigMap/Secret: Externalize config and credentials
  • HPA (Horizontal Pod Autoscaler): Scale pods based on CPU/memory
  • Ingress: Route external HTTP/S traffic to services (NGINX Ingress)
  • Namespace: Logical isolation per team/environment
Key: Always mention readinessProbe and livenessProbe — these are what prevent traffic hitting an unhealthy pod.
HARD How do you write a multi-stage Dockerfile for a Spring Boot app and why does it matter?

Multi-stage builds separate the build environment from the runtime image. The final image only has what's needed to run — no JDK, no Maven, no source code. Result: smaller, safer image.

# Stage 1: Build
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline  # cache deps layer
COPY src ./src
RUN mvn package -DskipTests

# Stage 2: Runtime only
FROM eclipse-temurin:21-jre-alpine
RUN addgroup -S app && adduser -S app -G app  # non-root user
USER app
WORKDIR /app
COPY --from=builder /build/target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-XX:+UseContainerSupport", "-jar", "app.jar"]
Senior tip: Layer ordering matters for caching. Copy pom.xml and fetch deps BEFORE copying source — deps rarely change, source changes often. This makes rebuilds fast.
HARD How do you handle environment-specific configuration across dev/staging/prod in CI/CD?

Multiple layers — outer to inner:

  • Spring Profiles: application-dev.yml, application-prod.yml. Activated via SPRING_PROFILES_ACTIVE env var.
  • Config Server: Central Git-backed config repo. Services fetch their config at startup. Supports per-environment directories.
  • Secrets: Never in Git. Injected via Kubernetes Secrets, AWS Secrets Manager, or HashiCorp Vault. Referenced as env vars in pod spec.
  • CI/CD pipeline variables: GitLab/Jenkins masked variables per environment. Pipeline passes them at deploy time.
# Kubernetes Secret injected as env var
env:
  - name: DB_PASSWORD
    valueFrom:
      secretKeyRef:
        name: db-secret
        key: password
Golden rule: Code is the same artifact across all environments. Only config and secrets differ. Never bake env-specific values into the Docker image.
HARD What is a Helm chart and how does it fit into your deployment workflow?

Helm is the package manager for Kubernetes. A Helm chart is a collection of YAML templates for all K8s resources (Deployment, Service, Ingress, ConfigMap, etc.) parameterized with a values.yaml.

  • Why Helm: Without it, you maintain dozens of separate YAML files per service per environment. Helm lets you template once, override values per environment.
  • Workflow: Developer pushes code → CI builds Docker image, tags with git SHA → CD updates values.yaml image tag → helm upgrade --install deploys to cluster.
  • Rollback: helm rollback <release> <revision> — instant, auditable.
# Helm deploy in CI/CD pipeline
helm upgrade --install order-service ./charts/order-service \
  --set image.tag=$CI_COMMIT_SHA \
  --set replicaCount=3 \
  -f values-prod.yaml \
  --namespace production
HARD How do you do JVM tuning for a containerized Spring Boot app in production?

Containers have fixed CPU/memory. JVM must be aware of container limits — not host machine resources.

  • -XX:+UseContainerSupport (default Java 10+): JVM reads cgroup limits, not host RAM. Critical in containers.
  • Heap sizing: -XX:MaxRAMPercentage=75.0 — use 75% of container memory for heap, leave rest for off-heap (Metaspace, thread stacks, NIO buffers).
  • GC choice: G1GC (default Java 9+) is good for most. ZGC for low-latency requirements (<1ms pauses). Shenandoah for consistent pause time.
  • Thread pool: Default ForkJoinPool uses Runtime.availableProcessors() — works correctly with -XX:ActiveProcessorCount if needed.
ENTRYPOINT ["java",
  "-XX:+UseContainerSupport",
  "-XX:MaxRAMPercentage=75.0",
  "-XX:+UseZGC",
  "-Xlog:gc:file=/logs/gc.log:time,uptime:filecount=5,filesize=20m",
  "-jar", "app.jar"]
Common mistake: Not setting MaxRAMPercentage. JVM defaults to 25% of host RAM, which in a container with 2GB limit means it thinks it has 256MB and OOMKills itself.
HARD How do you implement infrastructure as code (IaC) and what tools have you used?

IaC treats infrastructure (VPCs, ECS clusters, RDS, S3) as versioned code — same Git workflow, code review, and rollback as application code.

  • Terraform: Cloud-agnostic HCL. Plan → Apply workflow. State file tracks actual infrastructure. Used for AWS VPC, EKS cluster, RDS, ElastiCache provisioning.
  • AWS CDK: Define infra in Java/TypeScript. Compiles to CloudFormation. Good if team is Java-heavy.
  • Ansible: Configuration management. Used for server bootstrapping, installing agents, config drift remediation.
# Terraform RDS example
resource "aws_db_instance" "orders_db" {
  engine         = "postgres"
  engine_version = "15"
  instance_class = "db.t3.medium"
  multi_az       = true
  storage_encrypted = true
}
HARD How do you handle a failed production deployment — what's your rollback strategy?

Rollback strategy must be decided BEFORE deployment, not during incident.

  • Kubernetes: kubectl rollout undo deployment/order-service — rolls back to previous ReplicaSet in seconds. Kubernetes keeps last 10 revisions by default.
  • Blue-Green: Flip load balancer back to blue. Zero downtime, instant.
  • Canary (Argo Rollouts): Abort rollout — traffic automatically reverts to stable version.
  • Database rollback: This is the hard part. Liquibase rollback must be pre-scripted. Use expand-contract migrations to avoid needing DB rollback.
  • Feature flags: Kill switch in LaunchDarkly/Unleash — disable the new feature without redeployment.
# Kubernetes instant rollback
kubectl rollout undo deployment/order-service -n production
kubectl rollout status deployment/order-service  # verify
Senior answer: Distinguish between app rollback (easy, automated) vs DB rollback (hard, must plan forward). Design all migrations to be backward-compatible with previous app version.
HARD What is GitOps and how does ArgoCD or Flux fit in?

GitOps: Git is the single source of truth for both application code AND infrastructure/deployment state. Any cluster change must go through Git — no manual kubectl apply in production.

ArgoCD: Watches a Git repo (Helm charts/manifests). If the cluster drifts from the Git state, ArgoCD detects it and auto-syncs (or alerts). Pull-based model — cluster pulls from Git, not CI pushing to cluster.

  • Benefits: Full audit trail (every deployment is a Git commit), easy rollback (git revert), self-healing clusters.
  • Workflow: CI builds image + updates image tag in Git → ArgoCD detects change → syncs to cluster.
  • vs Push-based: Jenkins/GitLab pushing to cluster requires cluster credentials in CI. ArgoCD only needs cluster-internal credentials.
02 Git & Jenkins Tooling
MED Gitflow vs Trunk-based Development — which do you prefer?

Gitflow: Long-lived branches (feature, develop, release, hotfix, main). Good for scheduled release cycles but creates merge hell in large teams.

Trunk-based: Everyone commits to main/trunk frequently (at least daily). Feature flags hide incomplete features. Preferred in CI/CD-heavy shops.

I prefer trunk-based for microservices — it eliminates integration debt, forces small commits, and keeps CI feedback fast. We used feature flags (LaunchDarkly) to safely deploy incomplete features to prod.

MED Webhook vs Poll SCM in Jenkins — what's the difference?

Poll SCM: Jenkins periodically checks the repo on a cron schedule (e.g., every 5 minutes). Wasteful — mostly does nothing.

Webhook: GitHub/GitLab pushes an HTTP event to Jenkins the moment a commit/PR happens. Instant trigger, zero wasted polling.

Always prefer Webhooks in production. Poll SCM is only used when Jenkins is behind a firewall and can't receive inbound requests.

MED How do you integrate SonarQube with Jenkins?

Steps: Install SonarQube Scanner plugin in Jenkins → Add SonarQube server URL in Jenkins global config → Store token as Jenkins credential → Add sonar stage to Jenkinsfile.

stage('SonarQube Analysis') {
  steps {
    withSonarQubeEnv('SonarQube-Server') {
      sh 'mvn sonar:sonar'
    }
  }
}
stage('Quality Gate') {
  steps {
    timeout(time: 2, unit: 'MINUTES') {
      waitForQualityGate abortPipeline: true
    }
  }
}
Quality Gate blocks the pipeline if code coverage drops below threshold or critical bugs are found.
HARD How do you manage credentials securely in Jenkins and reference them in pipelines?

Jenkins has a built-in Credentials Store (encrypted at rest). Never hardcode secrets in Jenkinsfile.

// Reference credentials in Declarative Pipeline
environment {
  DB_PASS = credentials('db-password-secret-id')
  AWS_CREDS = credentials('aws-access-key')
}

// For username+password type:
withCredentials([usernamePassword(
  credentialsId: 'dockerhub-creds',
  usernameVariable: 'USER',
  passwordVariable: 'PASS'
)]) {
  sh 'docker login -u $USER -p $PASS'
}

For enterprise: integrate Jenkins with HashiCorp Vault or AWS Secrets Manager via plugins for dynamic secret injection.

MED What are parallel stages in Jenkins pipelines and when do you use them?

Parallel stages run multiple steps simultaneously, reducing pipeline duration. Ideal when steps are independent — e.g., unit tests + integration tests + security scan can all run at the same time.

stage('Parallel Tests') {
  parallel {
    stage('Unit Tests') {
      steps { sh 'mvn test -Dgroups=unit' }
    }
    stage('Integration Tests') {
      steps { sh 'mvn test -Dgroups=integration' }
    }
    stage('SAST Scan') {
      steps { sh './run-security-scan.sh' }
    }
  }
}
03 Spring Security & JWT High Frequency
EASY Difference between Authentication and Authorization?

Authentication: Verifying WHO you are. (Login with username/password, JWT, OAuth2 token)

Authorization: Verifying WHAT you can do. (Role check — can this user access /admin?)

In Spring Security: Authentication is handled by AuthenticationManagerUserDetailsService. Authorization is handled by SecurityFilterChain with .hasRole() or @PreAuthorize.

HARD Write Spring Security code with JWT — full implementation
// 1. JWT Utility
@Component
public class JwtUtil {
  private final String SECRET = "mySecretKey256bits";

  public String generateToken(String username) {
    return Jwts.builder()
      .subject(username)
      .issuedAt(new Date())
      .expiration(new Date(System.currentTimeMillis() + 86400000))
      .signWith(Keys.hmacShaKeyFor(SECRET.getBytes()))
      .compact();
  }

  public String extractUsername(String token) {
    return Jwts.parser()
      .verifyWith(Keys.hmacShaKeyFor(SECRET.getBytes()))
      .build().parseSignedClaims(token)
      .getPayload().getSubject();
  }
}
// 2. JWT Filter
@Component
public class JwtFilter extends OncePerRequestFilter {
  @Autowired JwtUtil jwtUtil;
  @Autowired UserDetailsService userDetailsService;

  @Override
  protected void doFilterInternal(HttpServletRequest req,
      HttpServletResponse res, FilterChain chain) throws IOException, ServletException {
    String header = req.getHeader("Authorization");
    if (header != null && header.startsWith("Bearer ")) {
      String token = header.substring(7);
      String username = jwtUtil.extractUsername(token);
      if (username != null && SecurityContextHolder.getContext().getAuthentication() == null) {
        UserDetails user = userDetailsService.loadUserByUsername(username);
        UsernamePasswordAuthenticationToken auth =
          new UsernamePasswordAuthenticationToken(user, null, user.getAuthorities());
        SecurityContextHolder.getContext().setAuthentication(auth);
      }
    }
    chain.doFilter(req, res);
  }
}
// 3. Security Config
@Configuration
@EnableWebSecurity
public class SecurityConfig {
  @Bean
  public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
    return http
      .csrf(AbstractHttpConfigurer::disable)
      .sessionManagement(s -> s.sessionCreationPolicy(STATELESS))
      .authorizeHttpRequests(auth -> auth
        .requestMatchers("/api/auth/**").permitAll()
        .requestMatchers("/api/admin/**").hasRole("ADMIN")
        .anyRequest().authenticated())
      .addFilterBefore(jwtFilter, UsernamePasswordAuthenticationFilter.class)
      .build();
  }
}
MED Token in Cookie vs Header — which is more secure?
  • Header (Bearer token): Stored in JS memory or localStorage. Vulnerable to XSS if stored in localStorage. SPAs typically use this.
  • HttpOnly Cookie: JS cannot read it — XSS-proof. But vulnerable to CSRF unless SameSite=Strict is set.
  • Most secure: HttpOnly + Secure + SameSite=Strict cookie. Combine with CSRF token for form submissions.
Bank-grade answer: Access token in memory (short TTL), refresh token in HttpOnly cookie. Never store JWT in localStorage in production.
MED How to validate fields in @RequestBody?
// DTO with validation annotations
public class UserRequest {
  @NotBlank(message = "Name is required")
  private String name;

  @Email(message = "Invalid email")
  private String email;

  @Min(18) @Max(100)
  private int age;
}

// Controller
@PostMapping("/users")
public ResponseEntity createUser(@Valid @RequestBody UserRequest req) { ... }

// Global exception handler
@RestControllerAdvice
public class GlobalExceptionHandler {
  @ExceptionHandler(MethodArgumentNotValidException.class)
  public ResponseEntity handleValidation(MethodArgumentNotValidException ex) {
    Map errors = ex.getBindingResult().getFieldErrors().stream()
      .collect(toMap(FieldError::getField, FieldError::getDefaultMessage));
    return ResponseEntity.badRequest().body(errors);
  }
}
HARD SSO Login Process and SAML assertion in SSO flow

SSO Flow (OAuth2/OIDC):

  • User clicks Login → redirected to Identity Provider (Okta, Keycloak, Azure AD)
  • IdP authenticates user → returns Authorization Code
  • App exchanges code for Access Token + ID Token
  • App validates token, creates session

SAML (older enterprise SSO): XML-based. IdP returns a signed SAML Assertion (XML document) containing user attributes. SP validates the signature using IdP's public key. Spring Security SAML extension handles this. Key: SAML assertions are Base64-encoded XML, signed with IdP's private key — any tampering breaks the signature.

04 Java Core Coding Heavy
MED Write Java 8 code to find frequency of each String
List<String> words = List.of("java", "spring", "java", "kafka", "spring", "java");

// Approach 1: Collectors.groupingBy + counting
Map<String, Long> freq = words.stream()
  .collect(Collectors.groupingBy(
    Function.identity(),
    Collectors.counting()
  ));
// Output: {java=3, spring=2, kafka=1}

// Approach 2: toMap with merge function
Map<String, Integer> freq2 = words.stream()
  .collect(Collectors.toMap(
    Function.identity(), w -> 1,
    Integer::sum
  ));
HARD HashMap output-based coding question — explain internals
// Classic trick question
Map<Integer, String> map = new HashMap<>();
map.put(1, "one");
map.put(1, "ONE"); // duplicate key
System.out.println(map.size()); // Output: 1 (not 2)
System.out.println(map.get(1)); // Output: "ONE" (overwritten)

Internals: HashMap uses an array of buckets. Key's hashCode() determines bucket index. If two keys hash to the same bucket → linked list (Java 7) or Red-Black Tree (Java 8+, when bucket size > 8). Default capacity=16, load factor=0.75 → resizes at 12 entries.

null key: HashMap allows one null key (always goes to bucket 0). HashTable does NOT allow null keys.

HARD Explain race conditions in threads — with example

A race condition occurs when multiple threads access shared mutable state concurrently, producing unpredictable results depending on thread scheduling.

// Race condition: counter++ is NOT atomic
int counter = 0;
// Thread 1 reads counter=5, Thread 2 reads counter=5
// Both increment to 6, both write 6 → lost update!

// Fix 1: AtomicInteger
AtomicInteger counter = new AtomicInteger(0);
counter.incrementAndGet(); // CAS operation — atomic

// Fix 2: synchronized
synchronized(this) { counter++; }

// Fix 3: ReentrantLock
lock.lock();
try { counter++; } finally { lock.unlock(); }
MED Methods in Object class & use of wait() and notify()

Object class methods: equals(), hashCode(), toString(), clone(), wait(), notify(), notifyAll(), getClass(), finalize()

wait() / notify(): Used for inter-thread communication. wait() releases the monitor lock and pauses the thread. notify() wakes one waiting thread. Must be called inside a synchronized block.

// Producer-Consumer with wait/notify
synchronized (lock) {
  while (queue.isEmpty()) {
    lock.wait(); // releases lock, waits
  }
  process(queue.poll());
}

// Producer
synchronized (lock) {
  queue.add(item);
  lock.notifyAll(); // wake waiting consumers
}
MED CompletableFuture vs Stream API — key differences
  • Stream API: Synchronous (blocking), sequential/parallel data processing pipeline. Operations like map, filter, reduce on collections.
  • CompletableFuture: Asynchronous, non-blocking. Represents a future result. Chains async operations, handles failures, combines multiple futures.
// Stream — synchronous processing
List result = orders.stream()
  .filter(o -> o.getAmount() > 1000)
  .map(Order::getId).collect(toList());

// CompletableFuture — async chaining
CompletableFuture.supplyAsync(() -> fetchUser(id))
  .thenApplyAsync(user -> fetchOrders(user))
  .thenAcceptAsync(orders -> sendEmail(orders))
  .exceptionally(ex -> { log.error(ex); return null; });
MED WebClient vs RestClient vs HttpClient — when to use which?
  • WebClient (Spring WebFlux): Reactive, non-blocking. Best for high-throughput async calls. Can be used in servlet apps too.
  • RestClient (Spring Boot 3.2+): Synchronous, fluent API. Replacement for RestTemplate. Simple and modern for blocking calls.
  • HttpClient (Java 11+): Standard library, no Spring dependency. Supports sync and async. Good for non-Spring projects.
// RestClient (Spring Boot 3.2+) — preferred for simple cases
RestClient client = RestClient.create();
User user = client.get()
  .uri("https://api.example.com/users/{id}", userId)
  .retrieve()
  .body(User.class);

// WebClient — reactive/async
webClient.get().uri("/users/{id}", id)
  .retrieve().bodyToMono(User.class)
  .subscribe(user -> process(user));
HARD Java Memory Model — heap, metaspace, stack, and what causes OOM errors?
  • Heap: Object instances, arrays. Divided into Young (Eden + Survivor S0/S1) and Old Generation. GC manages this.
  • Metaspace (Java 8+): Class metadata, method bytecode. Replaced PermGen. Grows dynamically — set -XX:MaxMetaspaceSize to cap it.
  • Stack: Per-thread. Stores method frames, local primitives, references. StackOverflowError = infinite recursion.
  • Off-heap: Direct ByteBuffers, NIO. Used by Kafka, Netty. Not GC-managed.

OOM types:

  • Java heap space: Memory leak or heap too small
  • Metaspace: Class loader leak (CGLIB proxies, JSP engines)
  • GC overhead limit exceeded: JVM spending >98% time in GC — classic leak sign
  • Unable to create native thread: Too many threads, OS limit hit
Diagnosis: -Xlog:gc for GC logs. -XX:+HeapDumpOnOutOfMemoryError for heap dump. Analyze with Eclipse MAT — check Dominator Tree for retained heap.
HARD What is a memory leak in Java — common causes and how to diagnose in production?

Memory leak = objects kept alive by references even though no longer needed. GC cannot collect them — heap grows until OOM.

Common causes (10-year level):

  • Static collections accumulating data never cleared
  • Unclosed resources — DB connections, InputStream not in try-with-resources
  • ThreadLocal not removed in pooled threads — thread never dies, value lives forever
  • Event listeners registered but never deregistered
  • Cache without eviction (Guava/Caffeine without maximumSize or expireAfterWrite)
// Dangerous: ThreadLocal leak in pooled threads
static ThreadLocal<UserContext> ctx = new ThreadLocal<>();
// Fix: always clean up in finally block
try {
  ctx.set(userContext);
  doWork();
} finally {
  ctx.remove(); // CRITICAL — never skip this
}

Diagnosis: 1) Monitor heap growth in Grafana. 2) jmap -dump:live,format=b,file=heap.hprof <pid>. 3) Open in Eclipse MAT → Dominator Tree → find the GC root holding the leak chain.

HARD Deadlock in Java — detect, prevent, resolve

Deadlock: Thread A holds lock1, waits for lock2. Thread B holds lock2, waits for lock1. Both wait forever.

// Classic deadlock — different lock order
// Thread 1: lockA → lockB
// Thread 2: lockB → lockA — DEADLOCK

// Prevention: always acquire in same order
synchronized(lockA) {
  synchronized(lockB) { doWork(); }  // both threads same order
}

// Or use tryLock with timeout (ReentrantLock)
if (lock1.tryLock(1, SECONDS) && lock2.tryLock(1, SECONDS)) {
  try { doWork(); }
  finally { lock2.unlock(); lock1.unlock(); }
}

Detection: jstack <pid> — JVM prints deadlock cycle with thread dump. Also visible in VisualVM/JConsole. ThreadMXBean.findDeadlockedThreads() in code.

Prefer higher-level concurrent structures: ConcurrentHashMap, BlockingQueue, Semaphore — they handle locking internally and reduce deadlock surface area.
HARD Java Virtual Threads (Java 21) — internals, when to use, what to watch out for

Platform threads = 1:1 mapped to OS threads. ~1MB stack, costly to create. Thread pools required. Blocking wastes an OS thread.

Virtual threads = JVM-managed, ~few KB. Millions can exist. When VT blocks (I/O, sleep), JVM parks it and reuses the carrier OS thread for another VT. Blocking is cheap.

// One virtual thread per task — no pool tuning needed
try (var exec = Executors.newVirtualThreadPerTaskExecutor()) {
  IntStream.range(0, 100_000).forEach(i ->
    exec.submit(() -> callDatabase(i))
  );
}

# Spring Boot 3.2+ — one line to enable
spring.threads.virtual.enabled=true
  • Best for: I/O-bound work (DB, HTTP, file). Each request gets its own VT.
  • Not for: CPU-intensive computation — use platform threads + ForkJoinPool.
  • Pinning trap: synchronized blocks inside VT pin the carrier thread — negates the benefit. Replace with ReentrantLock.
  • Observe pinning: -Djdk.tracePinnedThreads=full
HARD volatile vs synchronized vs AtomicInteger — when to use which?
  • volatile: Visibility only. Every read sees latest write. No atomicity. Use for: flags, state fields read by multiple threads without compound operations.
  • synchronized: Visibility + atomicity + mutual exclusion. Use for: compound actions on shared state, methods that must be atomic as a whole.
  • AtomicInteger/AtomicReference: Lock-free atomic operations via CAS (Compare-And-Swap). Faster than synchronized for single-variable updates under high contention.
// volatile — correct: simple flag
volatile boolean shutdown = false;

// volatile — WRONG: compound action
volatile int count = 0;
count++;  // read-modify-write is NOT atomic!

// Correct: use Atomic
AtomicInteger count = new AtomicInteger(0);
count.incrementAndGet();  // atomic CAS — no lock

// Double-checked locking — volatile required
private volatile Singleton instance;
if (instance == null) {
  synchronized(this) {
    if (instance == null) instance = new Singleton();
  }
}
HARD Java 8 to Java 21 — key new features you actively use in production
  • Java 8: Streams, Lambda, Optional, CompletableFuture, LocalDate/Time API
  • Java 9: List.of(), Map.of() immutable factories; module system (rarely used in practice)
  • Java 11: String.isBlank(), strip(), lines(); HttpClient; var in lambda params
  • Java 15: Text blocks — multiline strings for JSON/SQL inline
  • Java 16: Records (GA), instanceof pattern matching
  • Java 17 (LTS): Sealed classes, switch expressions
  • Java 21 (LTS): Virtual Threads (GA), Sequenced Collections, pattern matching in switch, record patterns
// Record — Java 16+ (replaces Lombok @Data for simple DTOs)
record OrderSummary(Long id, String status, BigDecimal amount) {}

// Sealed class + pattern switch — Java 21
sealed interface PaymentResult permits Success, Failure {}
record Success(String txId) implements PaymentResult {}
record Failure(String reason) implements PaymentResult {}

String message = switch (result) {
  case Success s -> "Paid: " + s.txId();
  case Failure f -> "Failed: " + f.reason();
};
HARD SOLID principles — show violation and fix for each in Java
  • S — Single Responsibility: UserService doing auth + email + reporting → split into AuthService, NotificationService, ReportService.
  • O — Open/Closed: PaymentService with if-else per payment type → PaymentStrategy interface + UpiStrategy, CardStrategy implementations.
  • L — Liskov Substitution: Square extends Rectangle but breaks setWidth/setHeight — a square can't be a rectangle behaviorally. Fix: no inheritance, separate classes.
  • I — Interface Segregation: Animal interface forces Dog to implement swim(). Fix: separate Swimmable, Flyable interfaces.
  • D — Dependency Inversion: OrderService directly instantiates EmailSender. Fix: inject NotificationService interface via Spring @Autowired.
// O — Open/Closed violation vs fix
// Bad
void pay(String type, double amt) {
  if (type.equals("UPI")) { ... }
  else if (type.equals("CARD")) { ... }
}
// Good — add new types without touching existing code
interface PaymentStrategy { void pay(double amount); }
class UpiStrategy implements PaymentStrategy { ... }
class CardStrategy implements PaymentStrategy { ... }
HARD Explain ConcurrentHashMap internals and why it's preferred over Hashtable

Hashtable: All methods synchronized on the whole object. One lock for the entire map — complete bottleneck under concurrency.

ConcurrentHashMap (Java 8+): No global lock. Uses CAS (Compare-And-Swap) for most operations. Only locks at the individual bucket level on hash collision. 16 default concurrency segments conceptually, but Java 8 went even finer.

// Thread-safe compute — no external sync needed
ConcurrentHashMap<String, Long> wordCount = new ConcurrentHashMap<>();
wordCount.merge(word, 1L, Long::sum);  // atomic merge
wordCount.compute(word, (k, v) -> v == null ? 1 : v + 1);  // atomic compute

// Wrong: NOT atomic even with CHM
if (!map.containsKey(key)) map.put(key, val);  // use putIfAbsent instead
map.putIfAbsent(key, val);  // atomic check-and-put
ConcurrentHashMap does NOT allow null keys or values (unlike HashMap). Null would be ambiguous — is the value missing or is it null?
05 Microservices Patterns Architecture
HARD CQRS Design Pattern — explain with example

CQRS (Command Query Responsibility Segregation): Separate the write model (Commands) from the read model (Queries). Commands mutate state; Queries only read. They use different databases optimized for each purpose.

  • Write side: PostgreSQL with normalized schema. Handles order creation, updates.
  • Read side: Elasticsearch or MongoDB with denormalized documents. Optimized for fast queries.
  • Sync: Domain events published to Kafka → event handler updates read store.
Why use it? When read and write loads differ drastically (e.g., 100x more reads than writes). Enables independent scaling of read and write sides.
HARD Concurrent clicks for order — how do microservices handle it?

Classic double-submit problem. Solutions:

  • Idempotency Key: Client sends a unique key (UUID) per request. Server stores processed keys in Redis with TTL. Duplicate requests return cached response.
  • Optimistic Locking (JPA): @Version field — concurrent update throws OptimisticLockException, client retries.
  • Database UNIQUE constraint: Unique constraint on (userId, idempotencyKey) ensures only one insert succeeds.
  • Distributed Lock (Redisson): Acquire Redis lock before processing, release after.
@Version
private Long version; // JPA optimistic lock

// Redis idempotency check
String key = "order:" + idempotencyKey;
Boolean isNew = redis.setIfAbsent(key, "processing", 10, MINUTES);
if (!isNew) return getCachedResponse(key);
HARD TraceId logic in microservices — how does distributed tracing work?

Every incoming request gets a traceId (unique per user request) and a spanId (unique per service hop). Both are propagated via HTTP headers (W3C TraceContext standard).

# Headers propagated between services
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
#             version-traceId-parentSpanId-flags

In Spring Boot 3.x: Micrometer Tracing auto-propagates traceId via WebClient/RestClient/Kafka headers. Add traceId to MDC for structured logs:

# logback-spring.xml pattern
%d [%X{traceId}] [%X{spanId}] %-5level %logger - %msg%n
Visualization: Send traces to Zipkin or Grafana Tempo. You can see the full call tree across all services for a single user request.
HARD Service-to-service communication and authorization

Communication options:

  • Synchronous: REST (RestClient/WebClient) or gRPC (proto-based, faster)
  • Asynchronous: Kafka events (decoupled, resilient)

Authorization between services:

  • mTLS: Each service has a client certificate. Mutual authentication — both sides verify identity. Used in service meshes (Istio).
  • Service account JWT: Each service gets its own JWT (client_credentials OAuth2 flow) from the IdP. Sent as Bearer token in service calls.
  • API Gateway: Gateway validates external tokens. Internal services trust gateway and verify internal JWTs only.
HARD Config Server and API Gateway design

Spring Cloud Config Server: Centralizes externalized configuration. Reads from Git repo. Each microservice fetches config at startup (and on refresh via /actuator/refresh).

API Gateway (Spring Cloud Gateway): Single entry point. Handles: routing, load balancing, authentication, rate limiting, CORS, request/response transformation.

# gateway application.yml
spring.cloud.gateway.routes:
  - id: order-service
    uri: lb://ORDER-SERVICE  # load balanced via Eureka
    predicates:
      - Path=/api/orders/**
    filters:
      - AuthFilter    # custom JWT validation filter
      - name: RequestRateLimiter
        args:
          redis-rate-limiter.replenishRate: 10
          redis-rate-limiter.burstCapacity: 20
HARD Event Sourcing pattern — how is it different from traditional CRUD?

Traditional CRUD: Store current state. UPDATE overwrites the previous value. No history.

Event Sourcing: Never update. Append immutable events to an event store. Current state = replay of all events. Complete audit trail.

// Traditional: UPDATE orders SET status='SHIPPED'
// Event Sourcing: append event
record OrderShipped(String orderId, Instant shippedAt, String carrier) {}

// State rebuilt by replaying events
Order rebuildState(List<OrderEvent> events) {
  Order order = new Order();
  events.forEach(order::apply);
  return order;
}
  • Benefits: Full audit trail, temporal queries (state at any point in time), event replay for projections, debugging.
  • Combined with CQRS: Event store = write side. Events project to read models (Elasticsearch, Redis).
  • Drawback: Eventual consistency, complex replay logic, schema evolution of old events.
HARD Saga pattern — choreography vs orchestration — production tradeoffs

Saga manages distributed transactions across microservices without 2PC. Each step publishes an event; on failure, compensating transactions undo previous steps.

Choreography: No central coordinator. Each service reacts to events and publishes its own. Decoupled but hard to track overall state.

Orchestration: Central orchestrator (e.g., Saga orchestrator service or Temporal) tells each service what to do. Easier to track, debug, and add steps. More coupling to orchestrator.

// Orchestration saga — order placement
// 1. Reserve inventory → 2. Charge payment → 3. Ship order
// On payment failure: compensate → release inventory
class OrderSaga {
  void execute(Order order) {
    inventoryService.reserve(order);  // step 1
    try {
      paymentService.charge(order);   // step 2
      shippingService.ship(order);    // step 3
    } catch (PaymentException e) {
      inventoryService.release(order); // compensate
    }
  }
}
For production: Axon Framework or Temporal handle saga state persistence, retries, and timeouts. Don't implement from scratch.
06 Kafka & Async Processing Data Engineering
HARD What will you do if Kafka has high message volume at a particular time?

This is a backpressure + scaling problem. Layered approach:

  • Scale consumers: Add more consumer instances (up to partition count). Each partition is consumed by exactly one consumer in a group.
  • Increase partitions: More partitions = more parallelism. Plan this upfront — repartitioning is disruptive.
  • Batch processing: Set max.poll.records higher, process in micro-batches instead of one-by-one.
  • Async consumer processing: Don't do heavy work in poll loop. Offload to thread pool, commit offsets after processing.
  • Back-pressure monitoring: Alert on consumer lag (kafka_consumer_lag metric in Prometheus). Set threshold alerts.
  • Dead Letter Topic: Failed messages go to DLT for retry/manual review — don't block the main partition.
@KafkaListener(topics = "orders", containerFactory = "batchFactory")
public void processBatch(List<ConsumerRecord<String, Order>> records) {
  records.parallelStream().forEach(r -> processOrder(r.value()));
}
HARD Ways for async programming to process a batch of records
  • CompletableFuture with thread pool: Submit each record as a future, join all at the end.
  • @Async (Spring): Simple annotation-based async — uses TaskExecutor.
  • Virtual Threads (Java 21): One virtual thread per record — massively scalable with simple blocking code.
  • Reactive Streams (WebFlux): Flux.fromIterable(records).flatMap(r -> processAsync(r)).
  • Spring Batch: For large-scale batch jobs with retry, skip, restart capabilities.
// CompletableFuture batch processing
List<CompletableFuture<Void>> futures = records.stream()
  .map(r -> CompletableFuture.runAsync(() -> process(r), executor))
  .collect(toList());
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();

// Virtual Threads (Java 21)
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
  records.forEach(r -> executor.submit(() -> process(r)));
}
HARD Kafka exactly-once semantics — how do you guarantee it end-to-end?

Three delivery guarantees: at-most-once (fire and forget), at-least-once (retry on failure, may duplicate), exactly-once (no loss, no duplicate). Only exactly-once guarantees correctness for financial transactions.

  • Producer side: enable.idempotence=true (retries don't duplicate) + transactional.id for atomic multi-partition writes.
  • Consumer side: isolation.level=read_committed — only read committed messages from transactional producers.
  • Application side: Idempotent consumer — store processed messageId in DB. Before processing, check if already handled.
# Producer config for exactly-once
enable.idempotence=true
transactional.id=order-producer-1
acks=all
retries=Integer.MAX_VALUE

# Consumer config
isolation.level=read_committed
True exactly-once across Kafka + DB requires the Outbox pattern — write to DB + outbox table in one transaction, relay publishes to Kafka. This is the production-grade approach.
HARD Kafka consumer group rebalancing — what triggers it and how to minimize disruption?

Rebalancing = partition reassignment across consumer group members. During rebalance, ALL consumers stop processing (stop-the-world by default).

Triggers: Consumer joins/leaves group, consumer crashes (heartbeat timeout), partitions added, session.timeout.ms exceeded.

Minimize disruption:

  • Static group membership: group.instance.id — consumer rejoins with same ID, avoids full rebalance on restart. Ideal for Kubernetes rolling updates.
  • Cooperative Sticky Assignor: Only moves partitions that need to move (vs eager assignor that revokes ALL). partition.assignment.strategy=CooperativeStickyAssignor.
  • Tune heartbeat: heartbeat.interval.ms=3000, session.timeout.ms=45000. Don't set too aggressively.
  • Commit offsets before shutdown: Call consumer.commitSync() in shutdown hook.
HARD Kafka vs SQS — when would you choose one over the other?
  • Kafka strengths: High throughput (millions/sec), message replay (retention period), multiple independent consumer groups, event streaming, exactly-once semantics, Kafka Streams for stream processing.
  • SQS strengths: Fully managed (no broker to maintain), auto-scaling, dead letter queue built-in, visibility timeout for safe processing, FIFO queues for ordering, native AWS integration (Lambda triggers). Max message size 256KB.
  • Choose Kafka when: You need replay, multiple consumers reading the same events independently, high throughput, event sourcing, or stream processing (Kafka Streams/Flink).
  • Choose SQS when: Simple task queuing on AWS, you want zero ops burden, Lambda-driven processing, or loose coupling between AWS services.
For a fintech system with audit requirements — Kafka. For a simple background job queue on AWS — SQS. Both valid; question is whether you need the event log.
07 System Design L2 Focus
HARD Design a system with high throughput and better performance

Framework: CACHING → ASYNC → SHARDING → CDN → OPTIMIZE

  • Caching: Redis L2 cache for hot data. Cache-aside pattern. TTL based on data volatility.
  • Async processing: Kafka for write-heavy operations. Return 202 Accepted immediately, process in background.
  • DB read replicas: Route read queries to replicas, writes to primary.
  • Connection pooling: HikariCP (Spring Boot default) — tune pool size = (core_count * 2) + effective_spindle_count.
  • Horizontal scaling + Load balancer: Stateless services + sticky sessions for state (Redis-backed).
  • CDN: Static assets + edge caching for geographically distributed users.
  • DB indexes: Cover query patterns, use EXPLAIN ANALYZE, avoid N+1.
HARD Database sharding vs partitioning — and read-heavy vs write-heavy design

Partitioning: Splitting a table within the same DB server (horizontal partitions by range/hash/list). Transparent to application.

Sharding: Splitting data across multiple DB servers. Each shard is an independent DB. Application must know which shard to query (shard key). More complex but truly distributed.

Read-heavy design:

  • Multiple read replicas, route reads via read-only datasource
  • Aggressive caching (Redis) with write-through or read-aside
  • CQRS — separate read model (Elasticsearch)
  • Materialized views for complex queries

Write-heavy design:

  • Async writes via Kafka — decouple write pressure from DB
  • Batch inserts instead of row-by-row
  • Write to append-only log (Event Sourcing)
  • Time-series DB for metrics (InfluxDB, TimescaleDB)
HARD Rate limiting — explanation and implementation

Rate limiting controls how many requests a client can make in a time window.

Algorithms:

  • Token Bucket: N tokens refilled per second. Each request consumes one token. Allows bursts.
  • Fixed Window Counter: Count per window (e.g., 100/min). Simple but boundary spike issue.
  • Sliding Window Log: Most accurate. Tracks request timestamps in Redis sorted set.
// Spring Cloud Gateway rate limiter (Redis-backed)
@Bean
public KeyResolver userKeyResolver() {
  return exchange -> Mono.just(
    exchange.getRequest().getHeaders()
      .getFirst("X-User-Id")
  );
}
In practice: API Gateway handles rate limiting at edge. Per-user limits stored in Redis. Return 429 Too Many Requests with Retry-After header.
MED Liquibase — DB schema creation and maintenance in deployment

Liquibase manages DB migrations as versioned changesets. On app startup, it runs pending changesets in order. Applied changesets are tracked in DATABASECHANGELOG table.

# db/changelog/v1.0/create-orders-table.yaml
databaseChangeLog:
  - changeSet:
      id: 1
      author: bheemesh
      changes:
        - createTable:
            tableName: orders
            columns:
              - column:
                  name: id
                  type: BIGINT
                  autoIncrement: true

Zero-downtime migration rules: Only additive changes (new columns with defaults, new tables). Never rename columns directly. Use expand-contract pattern: add new column → migrate data → remove old column (in separate deployments).

HARD Design a URL shortener like bit.ly — full system design walkthrough

Requirements: Shorten URL, redirect, handle ~100M URLs, 10:1 read-write ratio, 99.99% availability.

  • Short key generation: Base62 encoding of auto-increment ID or MD5 hash of URL (first 7 chars). Base62 (a-z A-Z 0-9) gives 62^7 = 3.5 trillion unique keys.
  • Storage: MySQL/Postgres — schema: (short_key, original_url, created_at, user_id, expiry). Index on short_key.
  • Read path (redirect): GET /{key} → Redis cache lookup → DB fallback → 301/302 redirect. Cache hit rate ~99%.
  • Write path: POST /shorten → generate key → write to DB → cache it → return short URL.
  • Scale: Read replicas for DB. Redis cluster for cache. CDN at edge for globally popular URLs.
  • 301 vs 302: 301 = permanent (browser caches, no future tracking). 302 = temporary (every redirect hits your server → analytics possible).
Interviewer follow-ups: custom aliases, expiry, click analytics (Kafka for async event stream), spam prevention (URL blacklist), rate limiting per user.
HARD Design a real-time notification system (push/email/SMS) at scale

Requirements: Send notifications across channels (push, email, SMS, in-app). High volume, low latency. Retry on failure. User preferences respected.

  • Producer services publish NotificationEvent to Kafka topic
  • Notification Router consumes events, checks user preferences (DB), routes to channel-specific topics
  • Channel workers (email-worker, sms-worker, push-worker) consume their topic, call third-party providers (SES, Twilio, FCM/APNs)
  • Retry: Exponential backoff via Spring Retry or DLT in Kafka. Provider failures → retry up to 3x, then DLT for alert
  • Template service: Handlebars/Thymeleaf templates per notification type, per language
  • Delivery tracking: Webhook callbacks from providers update notification status in DB → exposed via API
// NotificationEvent on Kafka
record NotificationEvent(
  String userId, NotificationType type,
  Map<String,String> payload, Instant createdAt
) {}
HARD CAP theorem — explain with examples of real databases

CAP theorem: A distributed system can guarantee only 2 of these 3 properties simultaneously:

  • Consistency (C): Every read gets the most recent write (or an error)
  • Availability (A): Every request gets a response (not necessarily latest data)
  • Partition Tolerance (P): System continues operating despite network partitions

In practice: network partitions WILL happen. So the real choice is CP vs AP.

  • CP systems: HBase, MongoDB (in strong consistency mode), Zookeeper. During partition, refuse to serve stale reads → consistent but potentially unavailable.
  • AP systems: Cassandra, CouchDB, DynamoDB (eventual consistency). During partition, serve possibly stale data → always available.
  • CA systems: Traditional RDBMS (PostgreSQL, MySQL) — only works if no network partition (single-node or synchronous replication).
Extension: PACELC theorem adds that even without partition, there's a tradeoff between latency (L) and consistency (C). Lower latency usually means eventual consistency.
08 Observability & AWS Cloud
HARD Prometheus + Micrometer integration in Spring Boot

Micrometer is Spring Boot's metrics facade — like SLF4J for metrics. It records metrics and exports to various backends (Prometheus, Datadog, CloudWatch).

# pom.xml
<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

# application.yml
management.endpoints.web.exposure.include: prometheus
management.metrics.export.prometheus.enabled: true
// Custom metric example
@Autowired MeterRegistry registry;

Counter orderCounter = Counter.builder("orders.created")
  .tag("type", "premium")
  .register(registry);
orderCounter.increment();
Prometheus scrapes /actuator/prometheus every 15s. Grafana queries Prometheus and visualizes dashboards. Set alerts on error_rate > 5% or latency_p99 > 500ms.
MED AWS services used in projects — with follow-up on RDS Replication

Commonly used: EC2 (compute), ECS/EKS (containers), RDS (managed DB), S3 (storage), SQS (queue), SNS (notifications), API Gateway, Lambda, ElastiCache (Redis), CloudWatch (logs/metrics), IAM (access), Secrets Manager.

RDS Read Replication:

  • Primary handles all writes. Read replicas async-replicate from primary.
  • Replicas can serve SELECT queries — reduce primary load.
  • Replication lag is typically <1 second but can spike under write pressure.
  • Multi-AZ = synchronous replication for failover (high availability). Read Replica = async replication for read scaling. These are different!
  • Spring: configure separate DataSource beans for read/write routing using AbstractRoutingDataSource.
HARD Sidecar observability pattern — what is it?

The Sidecar pattern deploys a secondary container alongside your main service container in the same Kubernetes pod. The sidecar handles cross-cutting concerns so the main service doesn't have to.

Observability sidecar examples:

  • Envoy proxy: Captures all in/out traffic metrics (requests, latency, errors) transparently
  • Fluentd/Fluent Bit: Tails log files, ships to ELK/Loki — no logging SDK needed in app
  • Istio (service mesh): Injects Envoy sidecar automatically — gives you mTLS, distributed tracing, and metrics for free
This is an L2 differentiator topic. Mention: "We used Istio's sidecar injection to get service mesh observability without changing application code."
HARD How do you set up alerting in production — what metrics matter most for Java services?

The Four Golden Signals (Google SRE) — these are the most critical metrics for any service:

  • Latency: p50, p95, p99 response times. Alert when p99 > SLA threshold (e.g., 500ms).
  • Traffic: Requests per second. Sudden drop = possible outage. Sudden spike = potential DDoS.
  • Errors: HTTP 5xx rate, exception rate. Alert when error rate > 1% of traffic.
  • Saturation: CPU, heap usage, DB connection pool usage, Kafka consumer lag.

Java-specific alerts (Prometheus + AlertManager):

  • jvm_memory_used_bytes / jvm_memory_max_bytes > 0.85 → heap pressure
  • hikaricp_connections_pending > 5 → connection pool exhaustion
  • kafka_consumer_lag > 10000 → consumer falling behind
  • GC pause time > 500ms → GC tuning needed
# Prometheus AlertManager rule example
- alert: HighErrorRate
  expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) /
        rate(http_server_requests_seconds_count[5m]) > 0.01
  for: 2m
  labels:
    severity: critical
HARD ELK stack vs Loki+Grafana — how do you choose and set up structured logging?

ELK (Elasticsearch + Logstash + Kibana): Full-text search on logs. High storage cost. Complex to operate. Best when you need powerful ad-hoc log querying, log analytics, or regex search across millions of logs.

Loki + Grafana: Label-based log querying (like Prometheus for logs). Much cheaper storage (logs stored compressed, not indexed by content). Best when you already use Grafana for metrics — single pane of glass.

Structured logging setup in Spring Boot:

# application.yml — JSON structured logs
logging.structured.format.console: ecs  # Spring Boot 3.4+

# Or with logback-spring.xml
# Output: {"@timestamp":"...","level":"INFO","traceId":"abc","message":"..."}
// Add custom fields to MDC (appears in every log line)
MDC.put("userId", userId);
MDC.put("orderId", orderId);
log.info("Order processed");  // MDC fields auto-included
MDC.clear();
Senior tip: Always log at service boundaries — incoming request (with traceId), outgoing calls, DB queries that exceed 100ms, and all exceptions with full context.
HARD AWS Lambda + API Gateway — how do you build serverless with Spring Boot?

Spring Cloud Function + AWS Lambda Adapter lets you run Spring Boot logic serverlessly. The function is the handler; API Gateway triggers it via HTTP.

// Spring Cloud Function handler
@Bean
public Function<Order, OrderConfirmation> processOrder() {
  return order -> {
    // business logic
    return new OrderConfirmation(order.getId(), "CONFIRMED");
  };
}

# application.properties
spring.cloud.function.definition=processOrder

Cold start problem: JVM startup time (2-3s) causes first Lambda invocation to be slow. Solutions:

  • Provisioned concurrency: Keep N Lambdas warm — costs money
  • GraalVM native compilation: Spring Boot 3 + native image — startup in <100ms. No JVM overhead.
  • SnapStart (Java 21): AWS takes snapshot after JVM init — restores from snapshot in ~1s
09 Database & Caching Performance
MED What is Redis and where is it used? How do you handle sensitive data?

Redis is an in-memory data structure store used as: cache, session store, rate limiter, pub/sub message broker, distributed lock.

Use cases in my project:

  • L2 cache for frequently read data (product catalog, user profile)
  • JWT blacklist for logout/token revocation
  • Rate limiting counters (API Gateway)
  • Distributed locks (Redisson) for idempotency
  • Session storage for stateless JWT auth

Sensitive data in Redis: Encrypt before storing. Use Redis AUTH password + TLS in transit. Never store raw PII — hash or encrypt it. Set TTL to minimize exposure window.

MED SQL query — average salary from Employee table + ORDER BY vs GROUP BY
-- Average salary per department
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 50000
ORDER BY avg_salary DESC;

GROUP BY: Collapses multiple rows into summary groups. Used with aggregate functions (SUM, AVG, COUNT). Executed before HAVING.

ORDER BY: Sorts the result set. Applied last — after GROUP BY, HAVING, SELECT. Does not reduce rows, just reorders them.

Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

HARD DynamoDB — NoSQL usage and key concepts
  • Partition Key (PK): Determines which partition stores the item. Must be unique if used alone.
  • Sort Key (SK): Enables range queries within a partition. PK+SK combo must be unique.
  • GSI (Global Secondary Index): Query on non-key attributes. Has its own PK/SK.
  • Single-table design: Store multiple entity types in one table using PK/SK patterns (e.g., PK=USER#123, SK=ORDER#456).
  • Capacity modes: On-demand (pay per request) vs Provisioned (set RCU/WCU).
  • DynamoDB Streams: Change data capture — trigger Lambda on insert/update/delete.
DynamoDB is schema-less but access-pattern-driven. Model your data based on your queries, not entity relationships.
HARD JPA/Hibernate N+1 problem — how to detect and fix it in production?

N+1 problem: 1 query to fetch N entities, then N additional queries to fetch their associations. 100 orders → 101 queries. Invisible in dev, catastrophic in prod under load.

// N+1 trap — LAZY fetch with loop
List<Order> orders = orderRepo.findAll();  // 1 query
orders.forEach(o -> System.out.println(o.getCustomer().getName()));  // N queries!

// Fix 1: JOIN FETCH
@Query("SELECT o FROM Order o JOIN FETCH o.customer")
List<Order> findAllWithCustomer();

// Fix 2: @EntityGraph
@EntityGraph(attributePaths = "customer")
List<Order> findAll();

// Fix 3: batch fetching (hibernate)
@BatchSize(size = 50)
private List<OrderItem> items;

Detection: hibernate.show_sql=true in dev. In prod: enable Hibernate statistics, alert when query count per request exceeds threshold. Datasource-proxy logs all queries with call stack.

HARD Cache stampede (thundering herd) — what is it and how do you prevent it?

Cache stampede: A hot cache key expires. Suddenly 1000 concurrent requests all miss the cache simultaneously, all hit the DB to rebuild it, the DB gets overwhelmed.

Prevention strategies:

  • Mutex/distributed lock: Only one thread rebuilds cache. Others wait and then read the rebuilt value.
  • Probabilistic early expiration: Randomly start refreshing cache slightly before TTL — distributes the rebuild load.
  • Stale-while-revalidate: Serve stale cache while asynchronously refreshing in background. No thundering herd, slight staleness acceptable.
  • TTL jitter: Add random offset to TTL (e.g., 300s ± 30s) so not all keys expire simultaneously.
// Mutex approach with Redisson
RLock lock = redisson.getLock("rebuild-lock:" + key);
if (lock.tryLock(0, 30, SECONDS)) {
  try {
    Object val = cache.get(key);  // double-check after lock
    if (val == null) cache.set(key, db.fetch(key), 300, SECONDS);
  } finally { lock.unlock(); }
} else { return waitAndGetCache(key); }
HARD Window functions in SQL — explain with a real-world example

Window functions perform calculations across a set of rows related to the current row — without collapsing rows like GROUP BY does.

-- Rank employees by salary within each department
SELECT
  name,
  department_id,
  salary,
  RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dept_rank,
  LAG(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS prev_salary,
  SUM(salary) OVER (PARTITION BY department_id) AS dept_total,
  salary * 100.0 / SUM(salary) OVER (PARTITION BY department_id) AS pct_of_dept
FROM employees;

-- Running total (cumulative sum)
SELECT order_date, amount,
  SUM(amount) OVER (ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  AS running_total
FROM orders;

Common window functions: ROW_NUMBER() (unique rank), RANK() (ties get same rank, gap after), DENSE_RANK() (no gap), LAG/LEAD (previous/next row value), NTILE() (divide into buckets).

10 L2 Advanced Round Senior Level
HARD Spring Boot auto-configuration — how does it work internally?

Auto-configuration is the magic behind Spring Boot. How it works:

  • @SpringBootApplication includes @EnableAutoConfiguration
  • Spring scans META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports (Boot 3.x) for all auto-configuration classes
  • Each class is annotated with @ConditionalOnClass, @ConditionalOnMissingBean, etc.
  • Only conditions that evaluate to true result in beans being registered
// Example: How DataSource auto-config works
@Configuration
@ConditionalOnClass(DataSource.class)           // only if JDBC on classpath
@ConditionalOnMissingBean(DataSource.class)       // only if no custom bean
@EnableConfigurationProperties(DataSourceProperties.class)
public class DataSourceAutoConfiguration { ... }
To debug: run with --debug flag. Spring prints "Positive matches" and "Negative matches" for all auto-configuration classes.
MED Pull Request — maximum recommended lines of change?

Industry standard: 200–400 lines per PR. Research shows review quality degrades sharply above 400 lines — reviewers stop carefully reading and just approve.

Best practices:

  • One PR = one concern (single feature, single bug fix)
  • Large features = stacked PRs (feature branch → intermediate → main)
  • Generated code (migrations, DTOs) can exceed limits — annotate clearly
  • Include: what changed, why, how to test, screenshots if UI
HARD Design patterns used in production and reasoning behind them
  • Saga Pattern: Manage distributed transactions across microservices. Each step publishes an event; compensating transactions on failure.
  • Outbox Pattern: Atomically write to DB + publish event. Avoid dual-write problem. Write to outbox table in same transaction; relay reads and publishes to Kafka.
  • Circuit Breaker (Resilience4j): Stop calling a failing service. States: CLOSED → OPEN (on failure threshold) → HALF_OPEN (test recovery).
  • Builder: Complex object creation (RequestDTO, configuration objects).
  • Strategy: Payment processing — different strategies for UPI/card/wallet without if-else chains.
  • Factory: Notification service — EmailNotification vs SMSNotification via NotificationFactory.
  • Decorator: Layered caching — CachingOrderRepository wraps JpaOrderRepository.
HARD How do you migrate Java and Spring Boot versions?

Spring Boot 2.x → 3.x migration steps:

  • Upgrade to Java 17+ first (Spring Boot 3 requires minimum Java 17)
  • Use OpenRewrite migration recipes: mvn rewrite:run -Drewrite.activeRecipes=org.openrewrite.java.spring.boot3.UpgradeSpringBoot_3_0
  • javax.* → jakarta.* package rename (biggest breaking change)
  • Spring Security config: SecurityFilterChain bean replaces WebSecurityConfigurerAdapter
  • Actuator endpoint changes — verify /actuator paths
  • Test with feature flags enabled on a canary environment first
OpenRewrite is the key tool here — automates ~80% of the mechanical changes. Mention this in interviews to show production awareness.
HARD JUnit + Mockito — which class do you mock in Controller tests?
// Controller layer: mock the Service, not the Repository
@WebMvcTest(OrderController.class)  // loads only web layer
class OrderControllerTest {

  @Autowired MockMvc mockMvc;
  @MockBean OrderService orderService;  // mock the service

  @Test
  void shouldCreateOrder() throws Exception {
    Order mockOrder = new Order(1L, "PENDING");
    given(orderService.create(any())).willReturn(mockOrder);

    mockMvc.perform(post("/api/orders")
        .contentType(APPLICATION_JSON)
        .content("""{"amount": 500}"""))
      .andExpect(status().isCreated())
      .andExpect(jsonPath("$.status").value("PENDING"));
  }
}
HARD How do you write integration tests for Spring Boot — TestContainers approach?

Integration tests spin up real infrastructure (DB, Redis, Kafka) using TestContainers. Docker containers start before tests, torn down after. Tests run against real implementations — not mocks.

@SpringBootTest
@Testcontainers
class OrderServiceIntegrationTest {

  @Container
  static PostgreSQLContainer<?> postgres =
    new PostgreSQLContainer<>("postgres:15");

  @Container
  static GenericContainer<?> redis =
    new GenericContainer<>("redis:7").withExposedPorts(6379);

  @DynamicPropertySource
  static void configureProperties(DynamicPropertyRegistry registry) {
    registry.add("spring.datasource.url", postgres::getJdbcUrl);
    registry.add("spring.data.redis.host", redis::getHost);
    registry.add("spring.data.redis.port", redis::getFirstMappedPort);
  }

  @Test
  void shouldCreateOrderAndPersistToDb() {
    // test against real Postgres + Redis
  }
}
TestContainers + Spring Boot 3.1+ has @ServiceConnection — auto-wires container properties without @DynamicPropertySource boilerplate.
HARD How do you handle backward compatibility in REST APIs across multiple consumers?

Breaking API changes in a microservices world hurt dependent consumers. Strategies to maintain backward compatibility:

  • URI versioning: /api/v1/orders vs /api/v2/orders. Simple, explicit. Old clients continue working on v1.
  • Header versioning: Accept: application/vnd.myapp.v2+json. Cleaner URLs but harder to test in browser.
  • Additive changes only: New fields are added, never removed. Consumers ignore unknown fields (Jackson default).
  • Deprecation header: Sunset: Sat, 31 Dec 2025 23:59:59 GMT — tells consumers when v1 will be removed.
  • Consumer-driven contracts (Pact): Each consumer defines a contract (expected request/response). Provider runs Pact tests to verify it still satisfies all contracts before release.
// Jackson — add new field safely (no breaking change)
public class OrderResponse {
  private Long id;
  private String status;
  @JsonInclude(NON_NULL)
  private String trackingCode;  // new field — old clients ignore it
}
HARD Resilience4j — Circuit Breaker, Retry, Rate Limiter, Bulkhead patterns with code

Resilience4j provides fault tolerance primitives for microservices. All composable via annotations.

// Circuit Breaker — stop calling failing service
@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
@Retry(name = "paymentService")
public PaymentResult processPayment(PaymentRequest req) {
  return paymentClient.pay(req);
}

public PaymentResult paymentFallback(PaymentRequest req, Exception ex) {
  log.warn("Payment service down, returning cached result", ex);
  return PaymentResult.pending(req.getOrderId());
}
# application.yml — CB config
resilience4j.circuitbreaker.instances.paymentService:
  slidingWindowSize: 10
  failureRateThreshold: 50       # open after 50% failures
  waitDurationInOpenState: 10s  # wait before half-open
  permittedNumberOfCallsInHalfOpenState: 3

resilience4j.retry.instances.paymentService:
  maxAttempts: 3
  waitDuration: 500ms
  retryExceptions: [java.io.IOException, java.util.concurrent.TimeoutException]

Bulkhead: Limit concurrent calls to a service — prevents one slow dependency from exhausting all threads and taking down everything else.

HARD How do you do performance testing and profiling of a Java application?

Load testing tools:

  • JMeter: GUI and CLI based. HTTP load testing. Thread groups simulate concurrent users. Assertions on response time.
  • Gatling: Scala DSL, CI-friendly, beautiful HTML reports. Better for complex scenarios.
  • k6: JS scripting, cloud-native, excellent for microservices API testing.

Profiling in production (low overhead):

  • Async Profiler: CPU + memory sampling. Can attach to running JVM. Output: flamegraph. Zero instrumentation overhead.
  • JFR (Java Flight Recorder): Built into JVM since Java 11. Continuous profiling with <1% overhead. View in JDK Mission Control.
  • VisualVM / JConsole: Heap, threads, GC in dev/staging. Too invasive for prod.
# Attach async profiler to running JVM
./profiler.sh -d 30 -f flamegraph.html <pid>

# Enable JFR in production JVM args
-XX:StartFlightRecording=duration=60s,filename=recording.jfr
11 Spring AOP & Transactions Deep Spring
HARD How does Spring AOP work internally — JDK proxy vs CGLIB?

Spring AOP intercepts method calls by wrapping beans in proxies at startup. Two proxy types:

  • JDK Dynamic Proxy: Bean must implement an interface. Proxy implements the same interface. Intercepts interface method calls.
  • CGLIB Proxy: Subclasses the target class. Used when bean has no interface (Spring Boot default). Cannot proxy final classes or final methods.
// Self-invocation bug — THIS IS THE #1 AOP TRAP
@Service
public class OrderService {

  @Transactional
  public void createOrder(Order o) {
    saveOrder(o);
    sendNotification(o);  // calls internal method
  }

  @Transactional(propagation = REQUIRES_NEW)
  public void sendNotification(Order o) {
    // THIS @Transactional is IGNORED!
    // Because: this.sendNotification() bypasses the proxy
  }
}

// Fix: inject self reference or extract to separate bean
@Autowired private OrderService self;  // inject proxy
self.sendNotification(o);  // goes through proxy → AOP applies
Critical interview point: Any @Transactional, @Cacheable, @Async, or custom @Aspect annotation is SILENTLY IGNORED on internal (self) method calls. This is the most common Spring bug in production.
HARD @Transactional deep dive — propagation levels, isolation levels, common traps

Propagation levels (most important):

  • REQUIRED (default): Join existing tx if present, else create new one.
  • REQUIRES_NEW: Always create a new tx. Suspend existing one. Used for audit logging — must persist even if outer tx rolls back.
  • SUPPORTS: Join tx if exists, else run non-transactionally.
  • NOT_SUPPORTED: Suspend active tx, run non-transactionally. For read-only operations on unreliable resources.
  • NEVER: Throw exception if a transaction exists.
  • NESTED: Savepoint within existing tx. Partial rollback possible.

Isolation levels:

  • READ_COMMITTED (Postgres default): Cannot read uncommitted data. Prevents dirty reads.
  • REPEATABLE_READ: Same row read twice = same value. Prevents non-repeatable reads.
  • SERIALIZABLE: Full isolation. Prevents phantom reads. Highest contention.
// Common trap: @Transactional only works on public methods
@Transactional
protected void internalMethod() { // IGNORED — not public! }

// Common trap: RuntimeException rolls back, checked exception does NOT
@Transactional
public void process() throws IOException {
  // IOException thrown → transaction COMMITS (not rolled back!)
}
// Fix:
@Transactional(rollbackFor = IOException.class)
HARD Write a custom Spring AOP aspect — logging, timing, retry example
@Aspect
@Component
public class PerformanceAspect {

  // Pointcut: all methods in service layer
  @Around("execution(* com.app.service.*.*(..))")
  public Object logExecutionTime(ProceedingJoinPoint pjp) throws Throwable {
    long start = System.currentTimeMillis();
    String method = pjp.getSignature().toShortString();
    try {
      Object result = pjp.proceed();
      long elapsed = System.currentTimeMillis() - start;
      if (elapsed > 500) log.warn("SLOW method {} took {}ms", method, elapsed);
      return result;
    } catch (Exception e) {
      log.error("Exception in {} : {}", method, e.getMessage());
      throw e;
    }
  }
}

// Custom annotation-based pointcut
@Target(METHOD)
@Retention(RUNTIME)
public @interface Audited {}

@Around("@annotation(com.app.Audited)")
public Object audit(ProceedingJoinPoint pjp) throws Throwable {
  auditLog.record(pjp.getSignature().getName(), getCurrentUser());
  return pjp.proceed();
}
Aspect execution order: Use @Order(1) — lower number = higher priority (runs first on the way in, last on the way out). Transaction aspect has a default order — your custom aspects can wrap around it.
HARD Spring Bean lifecycle — how are beans created, initialized and destroyed?

Full Spring Bean lifecycle in order:

  • 1. Instantiation: Constructor called (or factory method)
  • 2. Dependency injection: @Autowired fields/setters injected
  • 3. BeanNameAware / BeanFactoryAware: Aware callbacks if implemented
  • 4. BeanPostProcessor.postProcessBeforeInitialization(): All registered BPPs run (this is where AOP proxies are created)
  • 5. @PostConstruct / InitializingBean.afterPropertiesSet(): Init logic
  • 6. BeanPostProcessor.postProcessAfterInitialization(): Final processing
  • 7. Bean is ready for use
  • 8. @PreDestroy / DisposableBean.destroy(): On context close
@Component
public class CacheLoader {
  @PostConstruct
  public void init() {
    // runs after all dependencies injected — safe to use @Autowired fields
    loadInitialCacheData();
  }

  @PreDestroy
  public void cleanup() {
    // runs on application shutdown — close connections, flush data
    cache.clear();
  }
}
12 React & Full-Stack Full-Stack
MED React hooks — explain useState, useEffect, useCallback, useMemo and when to use each
  • useState: Local component state. Re-renders component on state change.
  • useEffect: Side effects (API calls, subscriptions, timers). Runs after render. Cleanup function handles unmount.
  • useCallback: Memoize a function reference. Prevents child re-renders when parent re-renders but function hasn't changed.
  • useMemo: Memoize an expensive computed value. Only recomputes when dependencies change.
  • useRef: Mutable ref that doesn't trigger re-render. DOM access or storing interval IDs.
  • useContext: Consume React context without prop drilling.
// useEffect with cleanup
useEffect(() => {
  const sub = webSocket.subscribe(userId, setMessages);
  return () => sub.unsubscribe();  // cleanup on unmount
}, [userId]);  // re-run only when userId changes

// useCallback — stable function reference for child
const handleSubmit = useCallback((data) => {
  submitOrder(data);
}, [submitOrder]);  // only recreate if submitOrder changes
HARD How do you optimize React performance — what causes unnecessary re-renders?

React re-renders a component when its state, props, or context changes. But it also re-renders ALL children by default — even if their props didn't change.

Causes of unnecessary re-renders:

  • Inline object/array props: {style={{color:'red'}}} creates a new object every render
  • Inline callback props: onClick={() => doSomething()} — new function reference every render
  • Context: any context update re-renders all consumers, even if relevant value didn't change
  • Parent re-render cascading to all children unconditionally

Fixes:

  • React.memo: Wrap child component — only re-render if props actually changed
  • useCallback: Stable function references passed as props
  • useMemo: Stable object/array references passed as props
  • Context split: Separate frequently-changing context from stable context
  • React DevTools Profiler: Identify which components re-render and why
HARD How do you handle state management in large React apps — Context vs Redux vs Zustand?
  • React Context: Good for low-frequency global state (theme, auth user, locale). NOT good for high-frequency updates — any context update re-renders all consumers.
  • Redux Toolkit: Industry standard for complex global state with many slices, time-travel debugging, middleware (RTK Query for API caching). Verbose but powerful.
  • Zustand: Lightweight (~1KB). Simple API. No providers needed. Selective subscription — components only re-render when their specific slice changes. Great for medium complexity apps.
  • React Query / TanStack Query: Server state management (API calls, caching, background refetching). Not a replacement for client state.
// Zustand — simple global store
const useOrderStore = create((set) => ({
  orders: [],
  addOrder: (order) => set((state) => ({
    orders: [...state.orders, order]
  })),
}));

// Component — only re-renders when orders changes
const orders = useOrderStore((state) => state.orders);
HARD CORS — what is it, why does it happen, and how do you configure it in Spring Boot?

CORS (Cross-Origin Resource Sharing): Browser blocks JS from making requests to a different origin (domain/port/protocol) than the page itself. This is enforced by the browser — not the server or network.

When React app at localhost:3000 calls Spring Boot at localhost:8080 — different ports = different origin → CORS error.

// Spring Boot 3 — global CORS config
@Bean
public WebMvcConfigurer corsConfigurer() {
  return new WebMvcConfigurer() {
    @Override
    public void addCorsMappings(CorsRegistry registry) {
      registry.addMapping("/api/**")
        .allowedOrigins("https://myapp.com", "http://localhost:3000")
        .allowedMethods("GET", "POST", "PUT", "DELETE")
        .allowedHeaders("*")
        .allowCredentials(true)
        .maxAge(3600);
    }
  };
}

// Or per-controller
@CrossOrigin(origins = "https://myapp.com")
@RestController
public class OrderController { ... }
Preflight request: Browser sends OPTIONS request before the actual POST/PUT to check if CORS is allowed. Spring Security must also permit OPTIONS requests or JWT filter will block the preflight.