Load Balancing & Reverse Proxy — System Design Deep Dive

⚡ System Design · Week 1

Load Balancing
& Reverse Proxy

Complete production guide: L4/L7 internals, Spring Cloud Gateway routing, Nginx upstream configs, AWS ALB target groups, circuit breakers, health checks, and zero-downtime deploy strategies — with real Spring Boot examples at 10 YOE depth.

Spring Cloud Gateway Resilience4j AWS ALB Nginx Consul Kubernetes Docker Kafka JWT Redis Session

Deep Sections

40+

Code Snippets

10 YOE

Depth Level

∞

Production Focus

🌐

L4 vs L7 Load Balancing — OSI Model Deep Dive

Transport layer vs Application layer — what the difference actually means for your Spring Boot services

Foundation

1.1 What is Load Balancing? The full picture from TCP to HTTP Core

A load balancer is a system that distributes incoming network traffic across multiple backend servers (called an upstream pool or target group). The goal is threefold: eliminate single points of failure, scale horizontally, and reduce latency by routing to the healthiest/nearest instance.

The key architectural insight: a load balancer sits between clients and your application. All traffic flows through it. This makes it the ideal enforcement point for: rate limiting, SSL termination, authentication header injection, request tracing, blue-green routing, and canary deployments.

┌─────────────────────────────┐ │ CLIENT REQUEST │ │ GET /api/orders HTTP/1.1 │ │ Host: api.myapp.com │ └──────────────┬──────────────┘ │ ▼ ┌──────────────────────────────────────────────────────┐ │ LOAD BALANCER / REVERSE PROXY │ │ │ │ L4: Routes by TCP/UDP (IP:Port) — no payload read │ │ L7: Routes by HTTP headers, path, cookies, body │ │ │ │ → SSL Termination → Rate Limiting │ │ → Health Checks → Circuit Breaking │ │ → Auth Headers → Access Logging │ └──────────────┬────────────────┬────────────────┘ │ │ ▼ ▼ ┌──────────────────┐ ┌──────────────────┐ │ order-service │ │ order-service │ │ :8080 (pod-1) │ │ :8080 (pod-2) │ │ ✅ healthy │ │ ✅ healthy │ └──────────────────┘ └──────────────────┘

📐 Mental Model Think of a load balancer as a router that can read the mail. L4 only sees the envelope (IP + port). L7 opens the envelope and reads the letter (HTTP headers, path, cookies). Reading the letter takes more CPU but enables intelligent routing — like sending all /admin/* requests to a dedicated admin service, or routing a specific user to a specific pod for session affinity.

1.2 L4 Load Balancing — TCP/UDP Layer, How It Actually Works Core

L4 operates at the Transport Layer (TCP/UDP). It makes routing decisions based on IP address and port number only. It never looks at the payload — it doesn't know if the traffic is HTTP, HTTPS, Kafka, or a database protocol. This makes it extremely fast and protocol-agnostic.

How it works under the hood: The L4 LB receives a TCP SYN packet, picks a backend based on the algorithm (usually consistent hashing on src IP), rewrites the destination IP/port (NAT), and forwards. The backend responds directly through the LB (two-arm) or directly back to client via DSR (Direct Server Return — one-arm, more complex).

Key characteristics:

No SSL termination by default — the encrypted stream is just bytes to L4. TLS terminates at your Spring Boot app.
No header inspection — cannot route based on URL path or cookies
Connection-level stickiness — a given TCP connection always goes to the same backend, but a new connection may go elsewhere
Much lower latency — no HTTP parsing overhead, ~microseconds vs milliseconds
Works for any protocol — Kafka, Redis, MySQL, gRPC, WebSocket all route transparently

✅ Use L4 When • Routing non-HTTP protocols (Kafka, Redis, Postgres)
• You need sub-millisecond routing overhead
• Your app handles TLS internally (mTLS, end-to-end encryption)
• You're doing TCP port-based routing (port 9092 → Kafka cluster)
• High throughput raw data pipelines

⚠️ L4 Limitations • Cannot route based on URL path, headers, or cookies
• No HTTP-aware health checks (only TCP connect check)
• Cannot do HTTP redirect (301/302)
• Cannot inject headers like X-Real-IP
• Cannot do blue-green path routing

AWS NLB — L4 Terraform Config

# AWS Network Load Balancer — L4 (TCP)
resource "aws_lb" "kafka_nlb" {
  name               = "kafka-nlb"
  internal           = true
  load_balancer_type = "network"  # L4
  subnets            = var.private_subnet_ids
  enable_cross_zone_load_balancing = true
}

resource "aws_lb_target_group" "kafka_tg" {
  name     = "kafka-brokers"
  port     = 9092
  protocol = "TCP"  # L4 — no HTTP here
  vpc_id   = var.vpc_id

  health_check {
    protocol            = "TCP"   # L4 health check — just TCP connect
    port                = "traffic-port"
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 10
  }
}

resource "aws_lb_listener" "kafka" {
  load_balancer_arn = aws_lb.kafka_nlb.arn
  port              = "9092"
  protocol          = "TCP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.kafka_tg.arn
  }
}

1.3 L7 Load Balancing — HTTP/HTTPS Layer, Content-Based Routing Advanced

L7 operates at the Application Layer (HTTP/HTTPS). It fully terminates the connection, parses the HTTP request, makes a routing decision based on the content, and establishes a new connection to the backend. This is your Spring Cloud Gateway, AWS ALB, Nginx (in HTTP mode).

What L7 can inspect and route on:

Host header — route api.myapp.com vs admin.myapp.com to different backends
URL path — /api/v1/orders/** → order-service, /api/v1/users/** → user-service
HTTP method — GET vs POST to different handlers
Request headers — X-API-Version: 2 → v2 service, Accept: application/json routing
Cookies — JSESSIONID for sticky sessions, canary=true for canary deployments
Query parameters — ?version=beta routing
JWT claims — route based on user role extracted from Bearer token (Spring Cloud Gateway)

SSL Termination: The L7 LB handles TLS handshake. Your backend services talk plain HTTP, which simplifies certificate management enormously. You have one cert at the LB, not on every pod. AWS ACM provides free auto-rotating certificates for ALB.

CLIENT (HTTPS) L7 LOAD BALANCER BACKENDS (HTTP) ───────────── ──────────────── ─────────────── GET /api/orders ─────TLS─────► SSL TERMINATION ─────HTTP────► order-service GET /api/users ─────TLS─────► PATH MATCH ─────HTTP────► user-service GET /admin/stats ─────TLS─────► HOST MATCH ─────HTTP────► admin-service GET /api/v2/* ─────TLS─────► HEADER MATCH ─────HTTP────► v2-service ┌─────────────────────────────────────────┐ │ L7 can ALSO inject headers: │ │ X-Real-IP: 203.0.113.42 │ │ X-Forwarded-For: 203.0.113.42 │ │ X-Request-ID: uuid-1234-... │ │ X-User-Role: ADMIN (from JWT) │ └─────────────────────────────────────────┘

🏗️ Key Interview Point: L7 Connection Pooling L7 LBs maintain two separate connection pools: one facing clients and one facing backends. This enables connection multiplexing — a single persistent HTTP/2 connection from the LB to the backend can serve thousands of short-lived client requests. This massively reduces connection overhead on your Spring Boot services. AWS ALB uses this. In HTTP/2 mode, Spring Cloud Gateway also multiplexes via Reactor Netty.

# application.yml — Spring Cloud Gateway (L7 content-based routing)
spring:
  cloud:
    gateway:
      routes:
        # Route 1: Path-based routing to order-service
        - id: order-service-route
          uri: lb://order-service          # lb:// = Spring Cloud LB discovery
          predicates:
            - Path=/api/v1/orders/**
            - Method=GET,POST,PUT,DELETE
          filters:
            - StripPrefix=2                  # strip /api/v1 before forwarding
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 100
                redis-rate-limiter.burstCapacity: 200

        # Route 2: Header-based version routing (canary)
        - id: order-service-v2
          uri: lb://order-service-v2
          predicates:
            - Path=/api/orders/**
            - Header=X-API-Version, 2      # only if header matches

        # Route 3: Host-based routing (multi-tenant)
        - id: admin-route
          uri: lb://admin-service
          predicates:
            - Host=admin.myapp.com
            - Path=/api/**

        # Route 4: Cookie-based routing (sticky / A-B test)
        - id: beta-route
          uri: lb://order-service-beta
          predicates:
            - Cookie=X-Beta, true           # route cookie=true to beta
            - Path=/api/orders/**

        # Route 5: Weight-based routing (blue-green / canary %)
        - id: orders-green
          uri: lb://order-service-green
          predicates:
            - Path=/api/orders/**
            - Weight=orders-group, 90      # 90% traffic to green
        - id: orders-blue
          uri: lb://order-service-blue
          predicates:
            - Path=/api/orders/**
            - Weight=orders-group, 10      # 10% traffic to blue (canary)

      globalcors:
        add-to-simple-url-handler-mapping: true
        corsConfigurations:
          '[/**]':
            allowedOrigins: ["https://myapp.com"]
            allowedMethods: ["GET", "POST", "PUT", "DELETE", "OPTIONS"]

⚖️

Load Balancing Algorithms

Round-robin, least connections, IP hash, weighted — when to use each in your microservices

Critical

2.1 All 6 Algorithms Explained — With Spring Boot Context Advanced

Algorithm	How It Works	Best For	Avoid When	Spring/AWS
Round Robin	Cyclically distribute each request to next server in pool	Stateless services with uniform request cost (REST APIs returning JSON)	Requests have vastly different processing times (some requests hit heavy DB queries)	Default in Spring Cloud LB; ALB default
Weighted Round Robin	Round-robin but servers with higher weight get proportionally more requests	Mixed instance types (some pods on larger EC2 instances)	You don't know relative capacity ahead of time	Nginx upstream weight param; ALB target weight
Least Connections	New request goes to server with fewest active connections	Long-running requests, heavy compute, WebSocket endpoints	Very short-lived requests — overhead of tracking isn't worth it	Nginx least_conn; Not default on ALB
Least Response Time	Routes to server with lowest active connections AND lowest average response time	Mixed workloads; when some backends are slower (cold JVM, GC pause)	High-throughput APIs where measurement overhead matters	HAProxy, Nginx Plus; Spring custom LoadBalancer
IP Hash / Consistent Hash	Hash client IP to deterministically pick server — same IP always goes to same server	In-memory session, when you can't use distributed session (Redis)	Behind NAT (all users share same IP); when servers scale up/down frequently	Nginx ip_hash; NLB by default
Random with Two Choices	Pick 2 random servers, send to the one with fewer connections (Power of Two Choices)	Very large pools where tracking all servers is expensive; distributed systems	Small pools (2–3 servers) where randomness hurts more than helps	Envoy Proxy default; Netflix Ribbon option

💡 Senior Interview Point: Why Round-Robin Fails for Spring Boot Services Round-robin distributes request count equally, not load equally. If your POST /orders takes 500ms and your GET /orders/123 takes 5ms, round-robin sends them equally to all pods — but a pod that got 20 POSTs will be saturated while a pod with 20 GETs is idle. Least connections + request rate limiting is the safer default for mixed-workload APIs. For your Snowflake sync service which does heavy batch queries, definitely use least-connections.

/**
 * Custom LoadBalancer: Least Response Time strategy
 * Use when your order-service has mixed fast (GET) and slow (POST/batch) endpoints
 */
@Component
public class LeastResponseTimeLoadBalancer implements ReactorServiceInstanceLoadBalancer {

    private final ObjectProvider<ServiceInstanceListSupplier> supplierProvider;
    private final ConcurrentHashMap<String, AtomicLong> responseTimes = new ConcurrentHashMap<>();
    private final ConcurrentHashMap<String, AtomicInteger> activeConnections = new ConcurrentHashMap<>();

    @Override
    public Mono<Response<ServiceInstance>> choose(Request request) {
        return supplierProvider.getIfAvailable()
            .get()
            .next()
            .map(instances -> chooseInstance(instances));
    }

    private Response<ServiceInstance> chooseInstance(List<ServiceInstance> instances) {
        if (instances.isEmpty()) return new EmptyResponse<>();

        // Score = activeConnections * avgResponseTime (lower = better)
        ServiceInstance best = instances.stream()
            .min(Comparator.comparingLong(i -> {
                String key = i.getHost() + ":" + i.getPort();
                long conns = activeConnections.getOrDefault(key, new AtomicInteger()).get();
                long avgMs  = responseTimes.getOrDefault(key, new AtomicLong(1)).get();
                return conns * avgMs;
            }))
            .orElse(instances.get(0));

        return new DefaultResponse<>(best);
    }

    // Call this from a Gateway filter to record actual response times
    public void recordResponseTime(String instanceKey, long responseTimeMs) {
        responseTimes.compute(instanceKey, (k, existing) -> {
            if (existing == null) return new AtomicLong(responseTimeMs);
            // Exponential moving average: α=0.3
            long ema = (long) (0.3 * responseTimeMs + 0.7 * existing.get());
            existing.set(ema);
            return existing;
        });
    }
}

// Register the custom LoadBalancer
@Configuration
@LoadBalancerClient(name = "order-service", configuration = LeastResponseTimeConfig.class)
public class GatewayConfig {}

@Configuration
public class LeastResponseTimeConfig {
    @Bean
    public ReactorLoadBalancer<ServiceInstance> reactorServiceInstanceLoadBalancer(
            Environment env,
            LoadBalancerClientFactory factory) {
        String name = env.getProperty(LoadBalancerClientFactory.PROPERTY_NAME);
        return new LeastResponseTimeLoadBalancer(
            factory.getLazyProvider(name, ServiceInstanceListSupplier.class));
    }
}

📌

Sticky Sessions — Implementation, Problems & Redis Alternative

How sticky sessions work, when they break, and the right way with Spring Session + Redis

Critical

3.1 How Sticky Sessions Work — Cookie vs IP-Hash Approach Advanced

Sticky sessions (also called session persistence or session affinity) ensure that a client always reaches the same backend instance for the duration of their session. This is necessary when session state is stored in-memory on the application server rather than in a distributed store.

Two mechanisms:

Cookie-based (L7 only): The LB injects a cookie (e.g., AWSALB, SERVERID) on the first request. Subsequent requests carry this cookie. The LB reads the cookie and routes to the same backend. Works across NAT/proxies. Requires L7.
IP Hash (L4 or L7): Hash the client IP to pick a server. Same IP always same server. Breaks when users are behind a corporate NAT (thousands of users appear as one IP → all hit one server). Also breaks when you scale up/down (consistent hashing helps but doesn't eliminate the problem).

FIRST REQUEST (no cookie) SUBSEQUENT REQUEST (with cookie) ───────────────────────── ──────────────────────────────── Client → LB Client → LB GET /dashboard GET /dashboard (no AWSALB cookie) Cookie: AWSALB=abc123def456 │ │ ▼ ▼ LB picks pod-2 via round-robin LB reads AWSALB cookie │ → decodes → pod-2 IP ▼ │ pod-2 creates session ▼ Session ID: sess-xyz LB forwards to pod-2 (ALWAYS) User data: {userId: 42} │ │ ▼ ▼ pod-2 session still valid LB injects Set-Cookie header: User sees their data ✅ Set-Cookie: AWSALB=abc123def456; Path=/ (encrypted pod-2 identifier)

❌ Why Sticky Sessions Are Dangerous at Scale Pod crash = session loss: If pod-2 crashes, all sticky-session users lose their session. They get routed to a different pod which has no session data → forced logout or error.

Uneven load distribution: If pod-2 gets all heavy users (long sessions), it gets overloaded while pod-1 is idle. You can't rebalance without dropping sessions.

Auto-scaling incompatible: When you scale down, you can't gracefully terminate an instance if sticky-session users are still active on it. You either wait indefinitely or drop their sessions.

The fix: use Spring Session + Redis — session state lives outside the application, any pod can serve any user.

/* ─── pom.xml dependencies ─── */
// spring-session-data-redis
// spring-boot-starter-data-redis
// spring-boot-starter-security (optional)

/* ─── application.yml ─── */
spring:
  session:
    store-type: redis                    # session → Redis, not JVM memory
    timeout: 30m
    redis:
      namespace: myapp:session          # Redis key prefix
      flush-mode: on-save                # write to Redis when session modified
      # IMMEDIATE: write every attribute change (safer but more Redis ops)
      # ON_SAVE: write at end of request (better performance, small staleness window)
  data:
    redis:
      host: redis-cluster.internal
      port: 6379
      password: ${REDIS_PASSWORD}
      ssl: true                           # encrypt Redis traffic
      lettuce:
        pool:
          max-active: 20
          max-idle: 5
          min-idle: 2

/* ─── Spring Security + Session config ─── */
@Configuration
@EnableRedisHttpSession(
    maxInactiveIntervalInSeconds = 1800,   // 30 min TTL in Redis
    redisNamespace = "myapp"
)
public class SessionConfig {

    @Bean
    public CookieSerializer cookieSerializer() {
        DefaultCookieSerializer serializer = new DefaultCookieSerializer();
        serializer.setCookieName("SESS_ID");      // custom cookie name
        serializer.setCookieMaxAge(-1);             // session cookie (browser close)
        serializer.setUseHttpOnlyCookie(true);      // prevent XSS
        serializer.setUseSecureCookie(true);        // HTTPS only
        serializer.setSameSite("Strict");           // CSRF protection
        serializer.setDomainName("myapp.com");      // shared across subdomains
        return serializer;
    }

    @Bean
    public RedisSerializer<Object> springSessionDefaultRedisSerializer() {
        // Use JSON (not Java serialization) for Redis session storage
        // Avoids deserialization issues when deploying new app versions
        return new GenericJackson2JsonRedisSerializer();
    }
}

/* ─── How it works now ─── */
// ANY pod can serve ANY user:
// 1. User hits pod-1: Spring Security creates session → stores in Redis
//    Key: "myapp:session:sessions:{sessionId}"
//    Value: {userId: 42, roles: [ORDER_MANAGER], createdAt: ...}
// 2. Next request hits pod-3 (round-robin): reads same session from Redis
// 3. pod-1 crashes → user's session still alive in Redis → seamless
// 4. Scale to 50 pods → LB distributes freely → no stickiness needed

/* ─── AWS ALB with Redis sessions — disable sticky sessions ─── */
// ALB target group: stickiness.enabled = false (round-robin works perfectly now)

# ALB Target Group with sticky sessions (use only if you can't use Redis session)
resource "aws_lb_target_group" "order_service" {
  name     = "order-service-tg"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = var.vpc_id
  target_type = "ip"                  # ECS Fargate / K8s pods

  stickiness {
    type            = "lb_cookie"       # ALB-managed cookie (AWSALB)
    cookie_duration = 86400             # 24 hours sticky
    enabled         = true              # set false if using Redis session!
  }

  health_check {
    enabled             = true
    path                = "/actuator/health"
    port                = "traffic-port"
    protocol            = "HTTP"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 15
    matcher             = "200"
  }

  deregistration_delay = 30            # wait 30s before removing from pool
  slow_start           = 60            # warm up: gradually increase traffic for 60s
}

🏥

Health Checks — Liveness, Readiness, Deep Health

Spring Boot Actuator health checks that actually catch real failure modes

Critical

4.1 Health Check Types, Kubernetes Probes, and Spring Boot Actuator Advanced

Health checks are how your load balancer knows whether to send traffic to an instance. Getting them wrong is one of the most common causes of production outages. There are three distinct levels:

Liveness check: "Is the process alive?" Should ONLY fail if the app is in an unrecoverable state (deadlock, OOM) and needs to be killed and restarted. Do NOT check external dependencies here.
Readiness check: "Can this instance accept traffic?" Should fail if the app is still starting up, performing a warm-up, or if a critical dependency (DB, Redis) is unavailable. The LB should NOT send traffic to unready pods.
Deep health check: "Are all dependencies healthy?" Used for monitoring and alerting, NOT for LB routing decisions. Checking slow external APIs here can cause cascading LB-removal failures.

/* ─── application.yml — health endpoint config ─── */
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus,env
  endpoint:
    health:
      show-details: always                 # show component breakdown
      show-components: always
      group:
        liveness:                          # Kubernetes liveness probe URL
          include: livenessState           # /actuator/health/liveness
        readiness:                         # Kubernetes readiness probe URL
          include: readinessState,db,redis # /actuator/health/readiness
        deep:                              # Detailed monitoring (not LB)
          include: db,redis,kafka,diskSpace,ping,snowflake
  health:
    db:
      enabled: true
    redis:
      enabled: true
    kafka:
      enabled: true
    defaults:
      enabled: true
  server:
    port: 8081                              # separate port for management (don't expose to internet)

/* ─── Custom Health Indicator — Snowflake Connection ─── */
@Component
@Slf4j
public class SnowflakeHealthIndicator implements HealthIndicator {

    private final SnowflakeDataSource snowflake;
    private static final String HEALTH_QUERY = "SELECT 1";
    private static final int TIMEOUT_SECONDS = 3;

    @Override
    public Health health() {
        try (Connection conn = snowflake.getConnection();
             Statement stmt = conn.createStatement()) {

            stmt.setQueryTimeout(TIMEOUT_SECONDS);
            ResultSet rs = stmt.executeQuery(HEALTH_QUERY);

            if (rs.next()) {
                return Health.up()
                    .withDetail("database", "snowflake")
                    .withDetail("status", "reachable")
                    .withDetail("connectionPool", snowflake.getActiveConnections())
                    .build();
            }
        } catch (Exception e) {
            log.error("Snowflake health check failed", e);
            return Health.down()
                .withDetail("error", e.getMessage())
                .withException(e)
                .build();
        }
        return Health.unknown().build();
    }
}

/* ─── Custom Readiness: warm-up check ─── */
@Component
public class WarmUpReadinessIndicator implements ApplicationListener<ApplicationReadyEvent> {

    private final ApplicationContext context;
    private volatile boolean warmedUp = false;

    @Override
    public void onApplicationEvent(ApplicationReadyEvent event) {
        // Perform JPA warm-up (loads Hibernate metamodel, connection pool)
        performWarmUp();
        warmedUp = true;
        // Now readiness probe will return 200 → LB starts sending traffic
    }

    private void performWarmUp() {
        try {
            // Run a lightweight query to warm Hibernate metadata cache
            // and establish DB connection pool connections
            context.getBean(OrderRepository.class).count();
            log.info("Warm-up complete — ready for traffic");
        } catch (Exception e) {
            log.warn("Warm-up failed, pod may be slow initially", e);
        }
    }
}

# kubernetes deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: order-service
        image: my-registry/order-service:v2.3.1
        ports:
        - containerPort: 8080    # app traffic
        - containerPort: 8081    # management (health)

        livenessProbe:                           # Kill and restart if fails
          httpGet:
            path: /actuator/health/liveness
            port: 8081
          initialDelaySeconds: 60              # wait for JVM start
          periodSeconds: 10
          failureThreshold: 3                  # fail 3x before killing
          timeoutSeconds: 3

        readinessProbe:                          # Remove from LB if fails
          httpGet:
            path: /actuator/health/readiness     # checks db + redis
            port: 8081
          initialDelaySeconds: 30              # spring boot starts in ~25s
          periodSeconds: 5                     # check every 5s
          failureThreshold: 2                  # remove from LB after 2 failures (10s)
          successThreshold: 1                  # add back after 1 success
          timeoutSeconds: 3

        startupProbe:                            # Allow slow starts (first JVM boot)
          httpGet:
            path: /actuator/health/liveness
            port: 8081
          failureThreshold: 30                # 30 * 10s = 5 min max start time
          periodSeconds: 10
          # Once startupProbe passes, liveness + readiness take over

        lifecycle:
          preStop:                               # Graceful shutdown signal
            exec:
              command: ["/bin/sh", "-c", "sleep 5"]
              # Wait 5s for LB to stop sending new connections before shutdown

        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "1000m"

🚀

Spring Cloud Gateway — Full Production Config

Filters, predicates, rate limiting, JWT auth, request/response transformation, observability

Deep Dive

5.1 Spring Cloud Gateway Architecture — How It Works Internally Advanced

Spring Cloud Gateway is built on Spring WebFlux + Project Reactor + Netty — a fully non-blocking, event-driven architecture. Unlike Zuul 1.x (blocking Servlet threads), SCG uses a small number of threads handling thousands of concurrent connections via Reactor event loop.

Request flow through SCG:

Client → Gateway Handler Mapping

Incoming HTTP request hits Netty. Gateway iterates all configured routes. For each route, evaluates ALL predicates (Path, Host, Header, Method, Cookie, Weight, etc.). First matching route wins.

Pre-Filters Execute (ordered chain)

Pre-filters run before forwarding to backend: AddRequestHeader, StripPrefix, RequestRateLimiter, CircuitBreaker, JWT validation, request logging, body modification. Each filter can modify the ServerWebExchange or short-circuit with a response.

Service Discovery + Load Balancing

If URI is lb://order-service, the ReactiveLoadBalancerExchangeFilterFunction resolves it via Spring Cloud LoadBalancer (Consul/Eureka/Kubernetes service). Picks an instance using configured algorithm. Falls back to next instance on failure.

Netty HTTP Client → Backend

Non-blocking HTTP call via Reactor Netty. Connection pooling, configurable timeouts, HTTP/2 support. Handles connection reuse transparently.

Post-Filters Execute + Response

Post-filters run after backend responds: AddResponseHeader, CORS headers, response body transformation, logging response status/duration, circuit breaker state update.

/* ─── build.gradle / pom.xml ─── */
// spring-cloud-starter-gateway
// spring-cloud-starter-loadbalancer
// spring-cloud-starter-consul-discovery (or eureka)
// spring-boot-starter-actuator
// spring-boot-starter-data-redis-reactive  (for rate limiting)
// resilience4j-spring-boot3

@SpringBootApplication
@EnableDiscoveryClient
public class GatewayApplication {
    public static void main(String[] args) {
        SpringApplication.run(GatewayApplication.class, args);
    }
}

@Configuration
public class GatewayRoutingConfig {

    @Bean
    public RouteLocator routes(RouteLocatorBuilder builder,
                                  JwtAuthenticationFilter jwtFilter,
                                  RequestLoggingFilter loggingFilter) {
        return builder.routes()

            // ─── Order Service Route ───────────────────────────────────
            .route("order-service", r -> r
                .path("/api/v1/orders/**")
                .and().not(p -> p.path("/api/v1/orders/admin/**"))
                .filters(f -> f
                    .stripPrefix(2)                         // /api/v1 stripped
                    .addRequestHeader("X-Service-Name", "gateway")
                    .addRequestHeader("X-Request-ID", "#{T(java.util.UUID).randomUUID().toString()}")
                    .requestRateLimiter(c -> c
                        .setRateLimiter(redisRateLimiter())
                        .setKeyResolver(userKeyResolver())    // rate limit per user
                        .setDenyEmptyKey(false)
                        .setEmptyKeyStatus("FORBIDDEN")
                    )
                    .circuitBreaker(c -> c
                        .setName("orderServiceCB")
                        .setFallbackUri("forward:/fallback/orders")
                    )
                    .retry(c -> c
                        .setRetries(2)
                        .setStatuses(HttpStatus.BAD_GATEWAY, HttpStatus.SERVICE_UNAVAILABLE)
                        .setMethods(HttpMethod.GET)           // only retry GET (idempotent)
                        .setBackoff(Duration.ofMillis(100), Duration.ofMillis(1000), 2, true)
                    )
                )
                .uri("lb://order-service")              // lb:// = service discovery
            )

            // ─── User Service Route ────────────────────────────────────
            .route("user-service", r -> r
                .path("/api/v1/users/**")
                .filters(f -> f
                    .stripPrefix(2)
                    .filter(jwtFilter)                     // JWT validation
                    .addRequestHeader("X-User-ID", "#{@jwtService.extractUserId(request)}")
                    .modifyResponseBody(String.class, String.class, (exchange, body) -> {
                        // Example: mask sensitive fields in response
                        return Mono.just(maskSensitiveData(body));
                    })
                )
                .uri("lb://user-service")
            )

            .build();
    }

    @Bean
    public RedisRateLimiter redisRateLimiter() {
        // replenishRate = tokens per second
        // burstCapacity = max burst (bucket size)
        // requestedTokens = tokens per request (default 1)
        return new RedisRateLimiter(100, 200, 1);
    }

    @Bean
    public KeyResolver userKeyResolver() {
        // Rate limit per authenticated user ID (from JWT)
        return exchange -> {
            String userId = exchange.getRequest().getHeaders()
                .getFirst("X-User-ID");
            return Mono.justOrEmpty(userId)
                .switchIfEmpty(Mono.just(
                    exchange.getRequest().getRemoteAddress()
                        .getAddress().getHostAddress()  // fallback to IP
                ));
        };
    }
}

/* ─── JWT Authentication Global Filter ─── */
@Component
@Order(Ordered.HIGHEST_PRECEDENCE)
public class JwtAuthenticationFilter implements GatewayFilter, Ordered {

    private static final List<String> OPEN_PATHS = List.of(
        "/api/v1/auth/login", "/api/v1/auth/register", "/actuator/health"
    );

    private final JwtService jwtService;

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        String path = exchange.getRequest().getPath().value();

        if (OPEN_PATHS.stream().anyMatch(path::startsWith)) {
            return chain.filter(exchange);  // bypass auth for open routes
        }

        String authHeader = exchange.getRequest().getHeaders()
            .getFirst(HttpHeaders.AUTHORIZATION);

        if (authHeader == null || !authHeader.startsWith("Bearer ")) {
            exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
            return exchange.getResponse().setComplete();
        }

        String token = authHeader.substring(7);
        try {
            Claims claims = jwtService.validateAndExtract(token);
            // Inject claims as headers for downstream services
            ServerHttpRequest mutatedRequest = exchange.getRequest().mutate()
                .header("X-User-ID",    claims.getSubject())
                .header("X-User-Roles", claims.get("roles", String.class))
                .header("X-Tenant-ID",  claims.get("tenantId", String.class))
                .build();
            return chain.filter(exchange.mutate().request(mutatedRequest).build());
        } catch (JwtException e) {
            exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
            return exchange.getResponse().setComplete();
        }
    }

    @Override
    public int getOrder() { return -100; } // run before other filters
}

⚡

Circuit Breaker Pattern — Resilience4j Deep Dive

State machine, sliding window, half-open probing, bulkhead, and fallback strategies

Critical

6.1 Circuit Breaker State Machine + Resilience4j Full Config Expert

A circuit breaker is a proxy that monitors calls to a downstream service. When failures reach a threshold, it opens the circuit — subsequent calls fail immediately without even attempting the real call. This prevents cascading failures: if order-service is down, you don't want 100 threads blocking for 30 seconds each waiting for a timeout.

┌─────────────────────────────────────────────────────────────────────┐ │ CIRCUIT BREAKER STATE MACHINE │ └─────────────────────────────────────────────────────────────────────┘ ┌──────────────────────┐ │ CLOSED │ ← Normal operation │ Calls go through │ Counts failures in sliding window │ Sliding window: 10 │ failureRate < 50%: stay CLOSED └──────────┬───────────┘ │ failureRate >= 50% (5/10 calls failed) │ ▼ ┌──────────────────────┐ │ OPEN │ ← Tripped! Fail-fast mode │ Calls REJECTED │ No calls reach backend │ Immediate exception │ Returns fallback response └──────────┬───────────┘ │ waitDurationInOpenState = 30s elapsed │ ▼ ┌──────────────────────┐ │ HALF-OPEN │ ← Probing recovery │ permittedCalls: 5 │ Let 5 test calls through │ Measure success │ Success >= 50% → CLOSED └──────────┬───────────┘ Failure → back to OPEN │ ├──── success rate >= 50% ───────► CLOSED ✅ └──── failure rate >= 50% ───────► OPEN ❌

# application.yml — Resilience4j configuration
resilience4j:
  circuitbreaker:
    instances:
      # Circuit breaker for order-service HTTP calls
      orderServiceCB:
        registerHealthIndicator: true            # expose in /actuator/health
        slidingWindowType: COUNT_BASED            # or TIME_BASED
        slidingWindowSize: 10                     # last 10 calls
        minimumNumberOfCalls: 5                   # need at least 5 calls before evaluating
        failureRateThreshold: 50                  # open if 50%+ fail
        slowCallRateThreshold: 80                 # also open if 80%+ are slow
        slowCallDurationThreshold: 3s              # "slow" = >3 seconds
        waitDurationInOpenState: 30s               # stay OPEN for 30s, then try HALF-OPEN
        permittedNumberOfCallsInHalfOpenState: 5  # probe with 5 calls
        automaticTransitionFromOpenToHalfOpenEnabled: true
        recordExceptions:
          - java.io.IOException
          - java.net.ConnectException
          - org.springframework.web.reactive.function.client.WebClientResponseException$InternalServerError
          - org.springframework.web.reactive.function.client.WebClientResponseException$ServiceUnavailable
        ignoreExceptions:
          - com.myapp.exception.BusinessValidationException  # 400s are not circuit failures
          - com.myapp.exception.NotFoundException             # 404 not a circuit issue

      # Separate CB for Snowflake (longer wait, fewer calls)
      snowflakeCB:
        slidingWindowSize: 5
        failureRateThreshold: 60
        waitDurationInOpenState: 60s               # longer recovery for DB
        permittedNumberOfCallsInHalfOpenState: 2

  bulkhead:
    instances:
      # Limit concurrent calls to order-service (prevent thread starvation)
      orderServiceBulkhead:
        maxConcurrentCalls: 20                    # max 20 concurrent calls
        maxWaitDuration: 100ms                     # wait 100ms for a slot, then reject

  timelimiter:
    instances:
      orderServiceTimeout:
        timeoutDuration: 3s                        # cancel call after 3s
        cancelRunningFuture: true

  retry:
    instances:
      orderServiceRetry:
        maxAttempts: 3
        waitDuration: 200ms
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2           # 200ms → 400ms → 800ms
        retryExceptions:
          - java.io.IOException
          - java.net.ConnectException
        ignoreExceptions:
          - com.myapp.exception.BusinessValidationException

/* ─── Java: Service using @CircuitBreaker + @Retry ─── */
@Service
@Slf4j
public class OrderServiceClient {

    private final WebClient webClient;

    // @CircuitBreaker + @Bulkhead + @TimeLimiter + @Retry
    // Order of decoration (outer to inner): Retry → CB → Bulkhead → TimeLimiter
    @CircuitBreaker(name = "orderServiceCB", fallbackMethod = "getOrdersFallback")
    @Bulkhead(name = "orderServiceBulkhead", type = Bulkhead.Type.SEMAPHORE)
    @Retry(name = "orderServiceRetry")
    @TimeLimiter(name = "orderServiceTimeout")
    public CompletableFuture<List<OrderDTO>> getOrders(String userId) {
        return webClient.get()
            .uri("/orders?userId={userId}", userId)
            .retrieve()
            .bodyToFlux(OrderDTO.class)
            .collectList()
            .toFuture();
    }

    // Fallback must have same signature + Throwable/Exception param at end
    public CompletableFuture<List<OrderDTO>> getOrdersFallback(String userId, Throwable t) {
        log.warn("Circuit breaker fallback for userId={}, reason={}", userId, t.getMessage());

        if (t instanceof CallNotPermittedException) {
            // Circuit is OPEN — return cached/stale data
            return CompletableFuture.completedFuture(getCachedOrders(userId));
        }

        if (t instanceof BulkheadFullException) {
            // Too many concurrent calls — return partial response
            return CompletableFuture.completedFuture(List.of());
        }

        // Timeout or connection error
        return CompletableFuture.failedFuture(
            new ServiceUnavailableException("Order service temporarily unavailable"));
    }

    // Programmatic CB state inspection (useful in admin endpoints)
    public CircuitBreaker.State getCircuitBreakerState() {
        return CircuitBreakerRegistry.ofDefaults()
            .circuitBreaker("orderServiceCB")
            .getState();
    }
}

🔧

Nginx Upstream Configuration — Production Tuning

Keepalive pools, upstream health, rate limiting, WebSocket proxying, SSL termination

Production

7.1 Nginx as Reverse Proxy — Full Production nginx.conf Expert

# /etc/nginx/nginx.conf — Production config for Spring Boot microservices

user www-data;
worker_processes auto;          # 1 worker per CPU core
worker_rlimit_nofile 65535;      # max open file descriptors per worker

events {
    worker_connections 4096;     # max concurrent connections per worker
    multi_accept on;             # accept all pending connections at once
    use epoll;                   # Linux kernel event queue (most efficient)
}

http {
    # ─── Performance ─────────────────────────────────────────
    sendfile on;
    tcp_nopush on;               # batch TCP segments (better throughput)
    tcp_nodelay on;              # disable Nagle for low-latency responses
    keepalive_timeout 65;        # keep client connections open 65s
    keepalive_requests 1000;     # reuse connection for up to 1000 requests
    types_hash_max_size 2048;

    # ─── Upstream: order-service pods ────────────────────────
    upstream order_service {
        least_conn;              # least connections algorithm

        server 10.0.1.10:8080 weight=1 max_fails=3 fail_timeout=30s;
        server 10.0.1.11:8080 weight=1 max_fails=3 fail_timeout=30s;
        server 10.0.1.12:8080 weight=1 max_fails=3 fail_timeout=30s;
        server 10.0.1.13:8080 weight=2 max_fails=3 fail_timeout=30s; # bigger instance

        keepalive 32;            # keep 32 idle connections to backends
        keepalive_requests 100;  # reuse upstream conn for 100 requests
        keepalive_timeout 60s;   # close idle upstream conn after 60s
        # keepalive prevents TCP connection setup overhead on every request
        # This is critical for high-throughput Spring Boot APIs
    }

    # ─── Upstream: user-service ──────────────────────────────
    upstream user_service {
        server 10.0.2.10:8080 max_fails=2 fail_timeout=20s;
        server 10.0.2.11:8080 max_fails=2 fail_timeout=20s;
        server 10.0.2.12:8080 backup;          # only used if all primary fail
        keepalive 16;
    }

    # ─── Rate Limiting Zones ─────────────────────────────────
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;
    # 10m = 10MB shared memory zone (~160k IP addresses)
    # rate=100r/s = max 100 requests per second per IP

    limit_req_zone $http_x_user_id zone=user_limit:10m rate=50r/s;
    # per-user rate limiting using X-User-ID header from JWT

    # ─── Server Block: API Gateway ───────────────────────────
    server {
        listen 443 ssl http2;
        server_name api.myapp.com;

        # SSL Configuration
        ssl_certificate     /etc/ssl/certs/myapp.com.crt;
        ssl_certificate_key /etc/ssl/private/myapp.com.key;
        ssl_protocols       TLSv1.2 TLSv1.3;
        ssl_ciphers         ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
        ssl_session_cache   shared:SSL:10m;
        ssl_session_timeout 10m;
        ssl_stapling        on;       # OCSP stapling (faster SSL handshake)
        add_header Strict-Transport-Security "max-age=31536000" always;

        # ─── Order Service Routes ──────────────────────────────
        location /api/v1/orders {
            limit_req zone=api_limit burst=200 nodelay;
            # burst=200: allow temporary burst of 200 extra requests
            # nodelay: process burst immediately (no queueing delay)

            proxy_pass http://order_service;     # L7 proxy to upstream pool
            proxy_http_version 1.1;              # required for keepalive
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "";      # "" enables keepalive upstream

            # Pass real client IP to Spring Boot
            proxy_set_header Host              $host;
            proxy_set_header X-Real-IP         $remote_addr;
            proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # Timeouts — tune per endpoint requirements
            proxy_connect_timeout  5s;           # backend TCP connect timeout
            proxy_send_timeout     30s;          # write to backend
            proxy_read_timeout     30s;          # read from backend

            # Buffer sizing
            proxy_buffer_size        16k;
            proxy_buffers            8 16k;
            proxy_busy_buffers_size  32k;
        }

        # ─── WebSocket Support (for real-time order updates) ───
        location /ws {
            proxy_pass http://order_service;
            proxy_http_version 1.1;
            proxy_set_header Upgrade    $http_upgrade;  # WebSocket upgrade
            proxy_set_header Connection "upgrade";       # keep connection open
            proxy_read_timeout 3600s;                    # 1 hour for WS connections
            proxy_send_timeout 3600s;
        }

        # ─── Static health check (no backend hit) ──────────────
        location /health {
            return 200 '{"status":"UP"}';
            add_header Content-Type application/json;
        }

        # ─── Deny direct access to actuator ──────────────────
        location /actuator {
            deny all;
            return 403;
        }
    }

    # HTTP → HTTPS redirect
    server {
        listen 80;
        server_name api.myapp.com;
        return 301 https://$host$request_uri;
    }
}

☁️

AWS ALB — Target Groups, Routing Rules, ECS/EKS Integration

Listener rules, weighted target groups, WAF integration, access logs, Spring Boot on Fargate

Production

8.1 ALB Listener Rules and Target Groups — Full Architecture Expert

┌──────────────────────────────────────────────────────────────────────┐ │ AWS APPLICATION LOAD BALANCER │ │ api.myapp.com │ │ │ │ LISTENER: HTTPS :443 → Listener Rules (evaluated top to bottom) │ │ ───────────────────────────────────────────────────────────────── │ │ Rule 1: IF path=/api/v1/orders/* AND method=GET → TG: orders-ro │ │ Rule 2: IF path=/api/v1/orders/* → TG: orders-rw │ │ Rule 3: IF path=/api/v1/users/* → TG: users │ │ Rule 4: IF header[X-API-Version]=2 → TG: orders-v2 │ │ Rule 5: DEFAULT → TG: orders-rw │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ TG: orders-ro TG: orders-rw TG: users TG: orders-v2 (read replicas) (main pods) (user svc) (canary pods) pod-1: :8080 pod-3: :8080 pod-5: :8080 pod-7: :8080 pod-2: :8080 pod-4: :8080 pod-6: :8080 pod-8: :8080

# ─── ALB ─────────────────────────────────────────────────────────────
resource "aws_lb" "api_alb" {
  name               = "myapp-api-alb"
  internal           = false           # internet-facing
  load_balancer_type = "application"   # L7
  security_groups    = [aws_security_group.alb_sg.id]
  subnets            = var.public_subnet_ids

  enable_deletion_protection       = true
  enable_http2                     = true
  enable_cross_zone_load_balancing = true
  drop_invalid_header_fields       = true  # security best practice
  desync_mitigation_mode           = "strictest"

  # Access Logs → S3 (critical for debugging, compliance)
  access_logs {
    bucket  = aws_s3_bucket.alb_logs.bucket
    prefix  = "api-alb"
    enabled = true
  }
}

# ─── HTTPS Listener ──────────────────────────────────────────────────
resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.api_alb.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"  # TLS 1.2+
  certificate_arn   = aws_acm_certificate.api_cert.arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.orders_rw.arn
  }
}

# ─── Listener Rule: Path-based routing ───────────────────────────────
resource "aws_lb_listener_rule" "orders_read" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 10  # lower = higher priority

  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.orders_ro.arn
  }

  condition {
    path_pattern { values = ["/api/v1/orders/*"] }
  }
  condition {
    http_request_method { values = ["GET", "HEAD"] }
  }
}

# ─── Listener Rule: Header-based canary routing ───────────────────────
resource "aws_lb_listener_rule" "canary_v2" {
  listener_arn = aws_lb_listener.https.arn
  priority     = 5

  action {
    type = "forward"
    forward {
      target_group {
        arn    = aws_lb_target_group.orders_v2.arn
        weight = 100
      }
    }
  }

  condition {
    http_header {
      http_header_name = "X-API-Version"
      values           = ["2"]
    }
  }
}

# ─── Target Group: Order Service (Spring Boot) ───────────────────────
resource "aws_lb_target_group" "orders_rw" {
  name        = "orders-rw-tg"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"   # For ECS Fargate / Kubernetes pods

  health_check {
    enabled             = true
    path                = "/actuator/health/readiness"  # Spring Boot Actuator
    port                = "8081"                         # management port
    protocol            = "HTTP"
    matcher             = "200"
    interval            = 15
    timeout             = 5
    healthy_threshold   = 2
    unhealthy_threshold = 3   # 3 failures × 15s = 45s to mark unhealthy
  }

  deregistration_delay = 30   # wait 30s before removing (drain connections)
  slow_start           = 60   # ramp up traffic over 60s (JVM warm-up)

  load_balancing_algorithm_type = "least_outstanding_requests"
  # LOR: route to instance with fewest in-flight requests (better than round-robin for Spring Boot)

  stickiness {
    type    = "lb_cookie"
    enabled = false  # disabled because we use Redis sessions
  }
}

# ─── WAF Association ─────────────────────────────────────────────────
resource "aws_wafv2_web_acl_association" "alb_waf" {
  resource_arn = aws_lb.api_alb.arn
  web_acl_arn  = aws_wafv2_web_acl.api_waf.arn
  # WAF rules: SQL injection, XSS, rate limiting, IP blocklist
}

# ─── CloudWatch Alarms ───────────────────────────────────────────────
resource "aws_cloudwatch_metric_alarm" "alb_5xx_alarm" {
  alarm_name          = "alb-5xx-rate-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "HTTPCode_Target_5XX_Count"
  namespace           = "AWS/ApplicationELB"
  period              = "60"
  statistic           = "Sum"
  threshold           = "10"  # alert if >10 5xx per minute
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    LoadBalancer = aws_lb.api_alb.arn_suffix
  }
}

🔄

Zero-Downtime Deploys — Rolling, Blue-Green, Canary

How LB enables all three strategies, with Spring Boot graceful shutdown and Kubernetes rollout configs

Production

9.1 All Three Strategies Compared with Spring Boot Graceful Shutdown Expert

🔄 Rolling Deploy

Replace instances one-at-a-time. LB removes pod from pool, pod drains, new version starts, health check passes, pod re-added. Never have zero capacity. Risk: old and new versions run simultaneously (requires backward compatibility).

🔵🟢 Blue-Green

Two identical environments. Deploy to idle environment (blue). Test. Switch LB to point to blue (single DNS/ALB update). Keep green warm as rollback target. Zero compatibility issues, instant rollback. Cost: 2× infra during deploy.

🐦 Canary Release

Route small % (1-10%) of traffic to new version. Monitor error rates, latency, business metrics. Gradually increase % if healthy. Real-world testing with limited blast radius. Requires weighted routing (ALB, SCG Weight predicate).

# application.yml — Graceful Shutdown (Spring Boot 2.3+)
server:
  shutdown: graceful                   # default is "immediate" — ALWAYS use graceful
  tomcat:
    threads:
      max: 200

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s    # wait up to 30s for in-flight requests

# What happens on SIGTERM (Kubernetes pod shutdown):
# 1. Spring receives SIGTERM
# 2. Sets readiness probe to DOWN (LB stops sending new requests — ~10s)
# 3. Waits for in-flight requests to complete (up to 30s)
# 4. Closes database connections, Kafka producers
# 5. JVM exits cleanly

/* ─── Kafka Consumer: pause before shutdown ─── */
@Component
public class KafkaGracefulShutdown implements ApplicationListener<ContextClosedEvent> {

    private final KafkaListenerEndpointRegistry registry;

    @Override
    public void onApplicationEvent(ContextClosedEvent event) {
        // Pause all Kafka consumers to stop fetching new messages
        registry.getListenerContainers().forEach(MessageListenerContainer::pause);
        log.info("Kafka consumers paused — waiting for in-flight to complete");
        // Spring lifecycle then waits for in-flight message handlers to finish
    }
}

/* ─── Kubernetes Rolling Deploy ─── */
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2         # add 2 new pods before removing old ones
      maxUnavailable: 0  # NEVER have fewer than 6 pods running (zero downtime)
  template:
    spec:
      terminationGracePeriodSeconds: 60  # must be > spring.lifecycle.timeout
      containers:
      - name: order-service
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 5"]
              # Sleep 5s BEFORE SIGTERM: allows LB to stop routing new connections
              # Kubernetes sends SIGTERM to app AND updates endpoints simultaneously
              # Without sleep, app starts shutting down while LB still routes traffic

/* ─── Canary via Spring Cloud Gateway Weight ─── */
# Canary deploy: 10% traffic to new version
# STEP 1: Deploy order-service-v2 with 2 pods (alongside v1's 8 pods)
# STEP 2: Set gateway weights: v1=90, v2=10
# STEP 3: Monitor Prometheus: error_rate{service="order-service-v2"}
# STEP 4: If ok: shift to 50/50, then 100% v2
# STEP 5: Scale down v1 pods

# Monitoring canary health with Prometheus
record: job:order_service:error_rate_5m
expr: |
  rate(http_server_requests_seconds_count{
    job="order-service",status=~"5..",version="v2"}[5m])
  /
  rate(http_server_requests_seconds_count{
    job="order-service",version="v2"}[5m])
# Alert if v2 error rate > 2x v1 error rate

🎯

Senior Interview Q&A — 10 YOE Level

Questions that test architectural judgment, not just memorized facts

Interview

10.1 Architecture Decision Questions Expert

Q: Your Spring Boot order-service starts slowing down after traffic spikes. Orders come from a Kafka topic but API calls also hit it. How would you architect the LB strategy?

Answer: This is a mixed-workload problem. First, separate concerns: Kafka consumers should be a separate deployment from the HTTP API handlers (even if same codebase, scale independently). For the HTTP side, I'd use least-outstanding-requests on the ALB (not round-robin) because API latency is variable — some POST endpoints do heavy DB writes, GET endpoints are fast.

For the Kafka side, there's no load balancer — Kafka's own partition assignment distributes load across consumer group members. I'd scale Kafka consumers to match partition count (1 consumer thread per partition, no more). If consumers are slow, increase partition count (irreversible, plan ahead).

I'd also add a circuit breaker between the API and any downstream calls (like a payment service), and a bulkhead to prevent order-processing threads from being starved by, say, a slow reporting endpoint.

Q: Sticky sessions vs distributed sessions — when would you keep sticky sessions in production?

Answer: Almost never in a greenfield system. Sticky sessions are a liability: they prevent free horizontal scaling, create hot-spots, and mean a pod crash directly causes user sessions to fail.

The only legitimate case I'd keep sticky sessions is: legacy applications that store complex in-memory state that can't be serialized to Redis without significant refactoring risk (e.g., a Tomcat app with 2000-line HttpSession objects). In that case, use IP-hash stickiness at Nginx level (L4 hash, survives if one pod restarts — other pods remain sticky to their IPs).

For new Spring Boot services: Spring Session + Redis from day one. Use a CookieSerializer with SameSite=Strict, HttpOnly, Secure, and serialize session to JSON (not Java serialization) for version compatibility.

Q: You're using Spring Cloud Gateway as your API gateway. A backend service is returning 503s. Walk me through how circuit breakers, retries, and timeouts interact.

Answer: The decorator order matters: Retry wraps CircuitBreaker wraps Bulkhead wraps TimeLimiter. When the backend returns 503:

TimeLimiter — if the call takes longer than threshold, cancels and throws TimeoutException
CircuitBreaker — records the failure. If failure rate crosses threshold, opens circuit
Retry — retries up to N times (only for GET — never retry POST/PATCH blindly, they're not idempotent)
Fallback — if CB is OPEN or retries exhausted, calls the fallback method

Critical nuance: retries amplify load. If you retry 3× with 50 concurrent clients, you get 150 requests to an already-struggling backend. Always pair retries with exponential backoff and jitter. Also, retries on a CLOSED circuit are fine; if they fail, CB opens and subsequent requests fail immediately (protecting the backend).

Q: How do you do a zero-downtime deployment of a Spring Boot service that consumes Kafka AND serves HTTP?

Answer: The HTTP side is straightforward with Kubernetes rolling update + graceful shutdown (server.shutdown=graceful, terminationGracePeriodSeconds=60, preStop: sleep 5). The tricky part is Kafka.

When a pod receives SIGTERM: (1) Pause Kafka consumers immediately (stop fetching new messages), (2) Allow in-flight message handlers to complete (respect spring.lifecycle.timeout-per-shutdown-phase), (3) Commit final offsets, (4) Close Kafka producer, (5) JVM exits.

Kafka consumer group rebalancing happens twice during a rolling deploy: once when old pod leaves the group, once when new pod joins. During rebalancing, no partition is being consumed (rebalance gap). To minimize this: use partition.assignment.strategy=CooperativeStickyAssignor (incremental rebalance — only moves necessary partitions, others keep consuming). Also set max.poll.interval.ms high enough to survive shutdown without being kicked from the group mid-processing.

💀

Production Pitfalls — What Actually Goes Wrong

Real failure modes at 10 YOE level — with root cause and fix

Warnings

11.1 The 8 Most Common LB-Related Production Incidents Expert

🔴 Connection Pool Exhaustion LB has 100 connections to pods but DB pool is 10. LB allows 100 concurrent requests → 90 queue on DB pool → requests time out at LB → 502s. Fix: ALB timeout < DB pool timeout. Rate limit at gateway to match DB capacity.

🔴 False Health Check Flapping Health check hits /actuator/health which checks Redis. Redis is slow but available. Health check times out at 3s. Pod marked unhealthy → removed from LB → pod re-added → repeat. Fix: Separate liveness (no deps) from readiness (deps). Increase timeout.

🔴 Thundering Herd on CB Open Circuit opens. 30s later, all queued requests retry simultaneously (HALF-OPEN). Backend gets 500 req/s instead of 10. Fix: Jitter in waitDurationInOpenState. Only permit 5 calls in HALF-OPEN, not unlimited.

🟡 Deregistration Without Drain Pod receives SIGTERM, deregisters from ALB immediately, but ALB already dispatched requests to it. Those requests fail mid-flight. Fix: preStop sleep 5s + deregistrationDelay=30s on target group.

🟡 X-Forwarded-For Trust Issue Client sends X-Forwarded-For: evil.ip, 1.2.3.4. App trusts it, logs wrong IP, bypasses IP-based rate limits. Fix: Configure Spring's RemoteIpFilter to only trust the LB's IP range. Set trusted-proxies.

🟡 Retry on Non-Idempotent Ops Gateway retries POST /orders on 503. Order is created twice. Fix: Only retry GET (idempotent). For POST, use idempotency keys. Never retry writes automatically.

🔵 Slow Start + Cold JVM New pod passes health check (returns 200 quickly) but JVM is cold. First real requests take 2-3s (class loading, Hibernate metadata). Users see latency spike. Fix: Slow start (60s on ALB), JVM warmup in ApplicationReadyEvent, GraalVM native if latency critical.

🔵 Session Serialization Break Deploy v2 with new fields in User session object. Redis has v1 serialized sessions. v2 pod reads them, fails to deserialize → 500 for all logged-in users. Fix: Use JSON serialization (GenericJackson2JsonRedisSerializer), version session schema, or invalidate sessions on deploy.