L4 vs L7 Load Balancing — OSI Model Deep Dive
Transport layer vs Application layer — what the difference actually means for your Spring Boot services
A load balancer is a system that distributes incoming network traffic across multiple backend servers (called an upstream pool or target group). The goal is threefold: eliminate single points of failure, scale horizontally, and reduce latency by routing to the healthiest/nearest instance.
The key architectural insight: a load balancer sits between clients and your application. All traffic flows through it. This makes it the ideal enforcement point for: rate limiting, SSL termination, authentication header injection, request tracing, blue-green routing, and canary deployments.
/admin/* requests to a dedicated admin service, or routing a specific user to a specific pod for session affinity.
L4 operates at the Transport Layer (TCP/UDP). It makes routing decisions based on IP address and port number only. It never looks at the payload — it doesn't know if the traffic is HTTP, HTTPS, Kafka, or a database protocol. This makes it extremely fast and protocol-agnostic.
How it works under the hood: The L4 LB receives a TCP SYN packet, picks a backend based on the algorithm (usually consistent hashing on src IP), rewrites the destination IP/port (NAT), and forwards. The backend responds directly through the LB (two-arm) or directly back to client via DSR (Direct Server Return — one-arm, more complex).
Key characteristics:
- No SSL termination by default — the encrypted stream is just bytes to L4. TLS terminates at your Spring Boot app.
- No header inspection — cannot route based on URL path or cookies
- Connection-level stickiness — a given TCP connection always goes to the same backend, but a new connection may go elsewhere
- Much lower latency — no HTTP parsing overhead, ~microseconds vs milliseconds
- Works for any protocol — Kafka, Redis, MySQL, gRPC, WebSocket all route transparently
• You need sub-millisecond routing overhead
• Your app handles TLS internally (mTLS, end-to-end encryption)
• You're doing TCP port-based routing (port 9092 → Kafka cluster)
• High throughput raw data pipelines
• No HTTP-aware health checks (only TCP connect check)
• Cannot do HTTP redirect (301/302)
• Cannot inject headers like
X-Real-IP• Cannot do blue-green path routing
AWS NLB — L4 Terraform Config
# AWS Network Load Balancer — L4 (TCP) resource "aws_lb" "kafka_nlb" { name = "kafka-nlb" internal = true load_balancer_type = "network" # L4 subnets = var.private_subnet_ids enable_cross_zone_load_balancing = true } resource "aws_lb_target_group" "kafka_tg" { name = "kafka-brokers" port = 9092 protocol = "TCP" # L4 — no HTTP here vpc_id = var.vpc_id health_check { protocol = "TCP" # L4 health check — just TCP connect port = "traffic-port" healthy_threshold = 2 unhealthy_threshold = 2 interval = 10 } } resource "aws_lb_listener" "kafka" { load_balancer_arn = aws_lb.kafka_nlb.arn port = "9092" protocol = "TCP" default_action { type = "forward" target_group_arn = aws_lb_target_group.kafka_tg.arn } }
L7 operates at the Application Layer (HTTP/HTTPS). It fully terminates the connection, parses the HTTP request, makes a routing decision based on the content, and establishes a new connection to the backend. This is your Spring Cloud Gateway, AWS ALB, Nginx (in HTTP mode).
What L7 can inspect and route on:
Hostheader — routeapi.myapp.comvsadmin.myapp.comto different backends- URL path —
/api/v1/orders/**→ order-service,/api/v1/users/**→ user-service - HTTP method — GET vs POST to different handlers
- Request headers —
X-API-Version: 2→ v2 service,Accept: application/jsonrouting - Cookies —
JSESSIONIDfor sticky sessions,canary=truefor canary deployments - Query parameters —
?version=betarouting - JWT claims — route based on user role extracted from Bearer token (Spring Cloud Gateway)
SSL Termination: The L7 LB handles TLS handshake. Your backend services talk plain HTTP, which simplifies certificate management enormously. You have one cert at the LB, not on every pod. AWS ACM provides free auto-rotating certificates for ALB.
# application.yml — Spring Cloud Gateway (L7 content-based routing) spring: cloud: gateway: routes: # Route 1: Path-based routing to order-service - id: order-service-route uri: lb://order-service # lb:// = Spring Cloud LB discovery predicates: - Path=/api/v1/orders/** - Method=GET,POST,PUT,DELETE filters: - StripPrefix=2 # strip /api/v1 before forwarding - name: RequestRateLimiter args: redis-rate-limiter.replenishRate: 100 redis-rate-limiter.burstCapacity: 200 # Route 2: Header-based version routing (canary) - id: order-service-v2 uri: lb://order-service-v2 predicates: - Path=/api/orders/** - Header=X-API-Version, 2 # only if header matches # Route 3: Host-based routing (multi-tenant) - id: admin-route uri: lb://admin-service predicates: - Host=admin.myapp.com - Path=/api/** # Route 4: Cookie-based routing (sticky / A-B test) - id: beta-route uri: lb://order-service-beta predicates: - Cookie=X-Beta, true # route cookie=true to beta - Path=/api/orders/** # Route 5: Weight-based routing (blue-green / canary %) - id: orders-green uri: lb://order-service-green predicates: - Path=/api/orders/** - Weight=orders-group, 90 # 90% traffic to green - id: orders-blue uri: lb://order-service-blue predicates: - Path=/api/orders/** - Weight=orders-group, 10 # 10% traffic to blue (canary) globalcors: add-to-simple-url-handler-mapping: true corsConfigurations: '[/**]': allowedOrigins: ["https://myapp.com"] allowedMethods: ["GET", "POST", "PUT", "DELETE", "OPTIONS"]
Load Balancing Algorithms
Round-robin, least connections, IP hash, weighted — when to use each in your microservices
| Algorithm | How It Works | Best For | Avoid When | Spring/AWS |
|---|---|---|---|---|
| Round Robin | Cyclically distribute each request to next server in pool | Stateless services with uniform request cost (REST APIs returning JSON) | Requests have vastly different processing times (some requests hit heavy DB queries) | Default in Spring Cloud LB; ALB default |
| Weighted Round Robin | Round-robin but servers with higher weight get proportionally more requests | Mixed instance types (some pods on larger EC2 instances) | You don't know relative capacity ahead of time | Nginx upstream weight param; ALB target weight |
| Least Connections | New request goes to server with fewest active connections | Long-running requests, heavy compute, WebSocket endpoints | Very short-lived requests — overhead of tracking isn't worth it | Nginx least_conn; Not default on ALB |
| Least Response Time | Routes to server with lowest active connections AND lowest average response time | Mixed workloads; when some backends are slower (cold JVM, GC pause) | High-throughput APIs where measurement overhead matters | HAProxy, Nginx Plus; Spring custom LoadBalancer |
| IP Hash / Consistent Hash | Hash client IP to deterministically pick server — same IP always goes to same server | In-memory session, when you can't use distributed session (Redis) | Behind NAT (all users share same IP); when servers scale up/down frequently | Nginx ip_hash; NLB by default |
| Random with Two Choices | Pick 2 random servers, send to the one with fewer connections (Power of Two Choices) | Very large pools where tracking all servers is expensive; distributed systems | Small pools (2–3 servers) where randomness hurts more than helps | Envoy Proxy default; Netflix Ribbon option |
POST /orders takes 500ms and your GET /orders/123 takes 5ms, round-robin sends them equally to all pods — but a pod that got 20 POSTs will be saturated while a pod with 20 GETs is idle. Least connections + request rate limiting is the safer default for mixed-workload APIs. For your Snowflake sync service which does heavy batch queries, definitely use least-connections.
/** * Custom LoadBalancer: Least Response Time strategy * Use when your order-service has mixed fast (GET) and slow (POST/batch) endpoints */ @Component public class LeastResponseTimeLoadBalancer implements ReactorServiceInstanceLoadBalancer { private final ObjectProvider<ServiceInstanceListSupplier> supplierProvider; private final ConcurrentHashMap<String, AtomicLong> responseTimes = new ConcurrentHashMap<>(); private final ConcurrentHashMap<String, AtomicInteger> activeConnections = new ConcurrentHashMap<>(); @Override public Mono<Response<ServiceInstance>> choose(Request request) { return supplierProvider.getIfAvailable() .get() .next() .map(instances -> chooseInstance(instances)); } private Response<ServiceInstance> chooseInstance(List<ServiceInstance> instances) { if (instances.isEmpty()) return new EmptyResponse<>(); // Score = activeConnections * avgResponseTime (lower = better) ServiceInstance best = instances.stream() .min(Comparator.comparingLong(i -> { String key = i.getHost() + ":" + i.getPort(); long conns = activeConnections.getOrDefault(key, new AtomicInteger()).get(); long avgMs = responseTimes.getOrDefault(key, new AtomicLong(1)).get(); return conns * avgMs; })) .orElse(instances.get(0)); return new DefaultResponse<>(best); } // Call this from a Gateway filter to record actual response times public void recordResponseTime(String instanceKey, long responseTimeMs) { responseTimes.compute(instanceKey, (k, existing) -> { if (existing == null) return new AtomicLong(responseTimeMs); // Exponential moving average: α=0.3 long ema = (long) (0.3 * responseTimeMs + 0.7 * existing.get()); existing.set(ema); return existing; }); } } // Register the custom LoadBalancer @Configuration @LoadBalancerClient(name = "order-service", configuration = LeastResponseTimeConfig.class) public class GatewayConfig {} @Configuration public class LeastResponseTimeConfig { @Bean public ReactorLoadBalancer<ServiceInstance> reactorServiceInstanceLoadBalancer( Environment env, LoadBalancerClientFactory factory) { String name = env.getProperty(LoadBalancerClientFactory.PROPERTY_NAME); return new LeastResponseTimeLoadBalancer( factory.getLazyProvider(name, ServiceInstanceListSupplier.class)); } }
Sticky Sessions — Implementation, Problems & Redis Alternative
How sticky sessions work, when they break, and the right way with Spring Session + Redis
Sticky sessions (also called session persistence or session affinity) ensure that a client always reaches the same backend instance for the duration of their session. This is necessary when session state is stored in-memory on the application server rather than in a distributed store.
Two mechanisms:
- Cookie-based (L7 only): The LB injects a cookie (e.g.,
AWSALB,SERVERID) on the first request. Subsequent requests carry this cookie. The LB reads the cookie and routes to the same backend. Works across NAT/proxies. Requires L7. - IP Hash (L4 or L7): Hash the client IP to pick a server. Same IP always same server. Breaks when users are behind a corporate NAT (thousands of users appear as one IP → all hit one server). Also breaks when you scale up/down (consistent hashing helps but doesn't eliminate the problem).
Uneven load distribution: If pod-2 gets all heavy users (long sessions), it gets overloaded while pod-1 is idle. You can't rebalance without dropping sessions.
Auto-scaling incompatible: When you scale down, you can't gracefully terminate an instance if sticky-session users are still active on it. You either wait indefinitely or drop their sessions.
The fix: use Spring Session + Redis — session state lives outside the application, any pod can serve any user.
/* ─── pom.xml dependencies ─── */ // spring-session-data-redis // spring-boot-starter-data-redis // spring-boot-starter-security (optional) /* ─── application.yml ─── */ spring: session: store-type: redis # session → Redis, not JVM memory timeout: 30m redis: namespace: myapp:session # Redis key prefix flush-mode: on-save # write to Redis when session modified # IMMEDIATE: write every attribute change (safer but more Redis ops) # ON_SAVE: write at end of request (better performance, small staleness window) data: redis: host: redis-cluster.internal port: 6379 password: ${REDIS_PASSWORD} ssl: true # encrypt Redis traffic lettuce: pool: max-active: 20 max-idle: 5 min-idle: 2 /* ─── Spring Security + Session config ─── */ @Configuration @EnableRedisHttpSession( maxInactiveIntervalInSeconds = 1800, // 30 min TTL in Redis redisNamespace = "myapp" ) public class SessionConfig { @Bean public CookieSerializer cookieSerializer() { DefaultCookieSerializer serializer = new DefaultCookieSerializer(); serializer.setCookieName("SESS_ID"); // custom cookie name serializer.setCookieMaxAge(-1); // session cookie (browser close) serializer.setUseHttpOnlyCookie(true); // prevent XSS serializer.setUseSecureCookie(true); // HTTPS only serializer.setSameSite("Strict"); // CSRF protection serializer.setDomainName("myapp.com"); // shared across subdomains return serializer; } @Bean public RedisSerializer<Object> springSessionDefaultRedisSerializer() { // Use JSON (not Java serialization) for Redis session storage // Avoids deserialization issues when deploying new app versions return new GenericJackson2JsonRedisSerializer(); } } /* ─── How it works now ─── */ // ANY pod can serve ANY user: // 1. User hits pod-1: Spring Security creates session → stores in Redis // Key: "myapp:session:sessions:{sessionId}" // Value: {userId: 42, roles: [ORDER_MANAGER], createdAt: ...} // 2. Next request hits pod-3 (round-robin): reads same session from Redis // 3. pod-1 crashes → user's session still alive in Redis → seamless // 4. Scale to 50 pods → LB distributes freely → no stickiness needed /* ─── AWS ALB with Redis sessions — disable sticky sessions ─── */ // ALB target group: stickiness.enabled = false (round-robin works perfectly now)
# ALB Target Group with sticky sessions (use only if you can't use Redis session) resource "aws_lb_target_group" "order_service" { name = "order-service-tg" port = 8080 protocol = "HTTP" vpc_id = var.vpc_id target_type = "ip" # ECS Fargate / K8s pods stickiness { type = "lb_cookie" # ALB-managed cookie (AWSALB) cookie_duration = 86400 # 24 hours sticky enabled = true # set false if using Redis session! } health_check { enabled = true path = "/actuator/health" port = "traffic-port" protocol = "HTTP" healthy_threshold = 2 unhealthy_threshold = 3 timeout = 5 interval = 15 matcher = "200" } deregistration_delay = 30 # wait 30s before removing from pool slow_start = 60 # warm up: gradually increase traffic for 60s }
Health Checks — Liveness, Readiness, Deep Health
Spring Boot Actuator health checks that actually catch real failure modes
Health checks are how your load balancer knows whether to send traffic to an instance. Getting them wrong is one of the most common causes of production outages. There are three distinct levels:
- Liveness check: "Is the process alive?" Should ONLY fail if the app is in an unrecoverable state (deadlock, OOM) and needs to be killed and restarted. Do NOT check external dependencies here.
- Readiness check: "Can this instance accept traffic?" Should fail if the app is still starting up, performing a warm-up, or if a critical dependency (DB, Redis) is unavailable. The LB should NOT send traffic to unready pods.
- Deep health check: "Are all dependencies healthy?" Used for monitoring and alerting, NOT for LB routing decisions. Checking slow external APIs here can cause cascading LB-removal failures.
/* ─── application.yml — health endpoint config ─── */ management: endpoints: web: exposure: include: health,info,metrics,prometheus,env endpoint: health: show-details: always # show component breakdown show-components: always group: liveness: # Kubernetes liveness probe URL include: livenessState # /actuator/health/liveness readiness: # Kubernetes readiness probe URL include: readinessState,db,redis # /actuator/health/readiness deep: # Detailed monitoring (not LB) include: db,redis,kafka,diskSpace,ping,snowflake health: db: enabled: true redis: enabled: true kafka: enabled: true defaults: enabled: true server: port: 8081 # separate port for management (don't expose to internet) /* ─── Custom Health Indicator — Snowflake Connection ─── */ @Component @Slf4j public class SnowflakeHealthIndicator implements HealthIndicator { private final SnowflakeDataSource snowflake; private static final String HEALTH_QUERY = "SELECT 1"; private static final int TIMEOUT_SECONDS = 3; @Override public Health health() { try (Connection conn = snowflake.getConnection(); Statement stmt = conn.createStatement()) { stmt.setQueryTimeout(TIMEOUT_SECONDS); ResultSet rs = stmt.executeQuery(HEALTH_QUERY); if (rs.next()) { return Health.up() .withDetail("database", "snowflake") .withDetail("status", "reachable") .withDetail("connectionPool", snowflake.getActiveConnections()) .build(); } } catch (Exception e) { log.error("Snowflake health check failed", e); return Health.down() .withDetail("error", e.getMessage()) .withException(e) .build(); } return Health.unknown().build(); } } /* ─── Custom Readiness: warm-up check ─── */ @Component public class WarmUpReadinessIndicator implements ApplicationListener<ApplicationReadyEvent> { private final ApplicationContext context; private volatile boolean warmedUp = false; @Override public void onApplicationEvent(ApplicationReadyEvent event) { // Perform JPA warm-up (loads Hibernate metamodel, connection pool) performWarmUp(); warmedUp = true; // Now readiness probe will return 200 → LB starts sending traffic } private void performWarmUp() { try { // Run a lightweight query to warm Hibernate metadata cache // and establish DB connection pool connections context.getBean(OrderRepository.class).count(); log.info("Warm-up complete — ready for traffic"); } catch (Exception e) { log.warn("Warm-up failed, pod may be slow initially", e); } } }
# kubernetes deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: order-service spec: replicas: 3 template: spec: containers: - name: order-service image: my-registry/order-service:v2.3.1 ports: - containerPort: 8080 # app traffic - containerPort: 8081 # management (health) livenessProbe: # Kill and restart if fails httpGet: path: /actuator/health/liveness port: 8081 initialDelaySeconds: 60 # wait for JVM start periodSeconds: 10 failureThreshold: 3 # fail 3x before killing timeoutSeconds: 3 readinessProbe: # Remove from LB if fails httpGet: path: /actuator/health/readiness # checks db + redis port: 8081 initialDelaySeconds: 30 # spring boot starts in ~25s periodSeconds: 5 # check every 5s failureThreshold: 2 # remove from LB after 2 failures (10s) successThreshold: 1 # add back after 1 success timeoutSeconds: 3 startupProbe: # Allow slow starts (first JVM boot) httpGet: path: /actuator/health/liveness port: 8081 failureThreshold: 30 # 30 * 10s = 5 min max start time periodSeconds: 10 # Once startupProbe passes, liveness + readiness take over lifecycle: preStop: # Graceful shutdown signal exec: command: ["/bin/sh", "-c", "sleep 5"] # Wait 5s for LB to stop sending new connections before shutdown resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "1000m"
Spring Cloud Gateway — Full Production Config
Filters, predicates, rate limiting, JWT auth, request/response transformation, observability
Spring Cloud Gateway is built on Spring WebFlux + Project Reactor + Netty — a fully non-blocking, event-driven architecture. Unlike Zuul 1.x (blocking Servlet threads), SCG uses a small number of threads handling thousands of concurrent connections via Reactor event loop.
Request flow through SCG:
lb://order-service, the ReactiveLoadBalancerExchangeFilterFunction resolves it via Spring Cloud LoadBalancer (Consul/Eureka/Kubernetes service). Picks an instance using configured algorithm. Falls back to next instance on failure./* ─── build.gradle / pom.xml ─── */ // spring-cloud-starter-gateway // spring-cloud-starter-loadbalancer // spring-cloud-starter-consul-discovery (or eureka) // spring-boot-starter-actuator // spring-boot-starter-data-redis-reactive (for rate limiting) // resilience4j-spring-boot3 @SpringBootApplication @EnableDiscoveryClient public class GatewayApplication { public static void main(String[] args) { SpringApplication.run(GatewayApplication.class, args); } } @Configuration public class GatewayRoutingConfig { @Bean public RouteLocator routes(RouteLocatorBuilder builder, JwtAuthenticationFilter jwtFilter, RequestLoggingFilter loggingFilter) { return builder.routes() // ─── Order Service Route ─────────────────────────────────── .route("order-service", r -> r .path("/api/v1/orders/**") .and().not(p -> p.path("/api/v1/orders/admin/**")) .filters(f -> f .stripPrefix(2) // /api/v1 stripped .addRequestHeader("X-Service-Name", "gateway") .addRequestHeader("X-Request-ID", "#{T(java.util.UUID).randomUUID().toString()}") .requestRateLimiter(c -> c .setRateLimiter(redisRateLimiter()) .setKeyResolver(userKeyResolver()) // rate limit per user .setDenyEmptyKey(false) .setEmptyKeyStatus("FORBIDDEN") ) .circuitBreaker(c -> c .setName("orderServiceCB") .setFallbackUri("forward:/fallback/orders") ) .retry(c -> c .setRetries(2) .setStatuses(HttpStatus.BAD_GATEWAY, HttpStatus.SERVICE_UNAVAILABLE) .setMethods(HttpMethod.GET) // only retry GET (idempotent) .setBackoff(Duration.ofMillis(100), Duration.ofMillis(1000), 2, true) ) ) .uri("lb://order-service") // lb:// = service discovery ) // ─── User Service Route ──────────────────────────────────── .route("user-service", r -> r .path("/api/v1/users/**") .filters(f -> f .stripPrefix(2) .filter(jwtFilter) // JWT validation .addRequestHeader("X-User-ID", "#{@jwtService.extractUserId(request)}") .modifyResponseBody(String.class, String.class, (exchange, body) -> { // Example: mask sensitive fields in response return Mono.just(maskSensitiveData(body)); }) ) .uri("lb://user-service") ) .build(); } @Bean public RedisRateLimiter redisRateLimiter() { // replenishRate = tokens per second // burstCapacity = max burst (bucket size) // requestedTokens = tokens per request (default 1) return new RedisRateLimiter(100, 200, 1); } @Bean public KeyResolver userKeyResolver() { // Rate limit per authenticated user ID (from JWT) return exchange -> { String userId = exchange.getRequest().getHeaders() .getFirst("X-User-ID"); return Mono.justOrEmpty(userId) .switchIfEmpty(Mono.just( exchange.getRequest().getRemoteAddress() .getAddress().getHostAddress() // fallback to IP )); }; } } /* ─── JWT Authentication Global Filter ─── */ @Component @Order(Ordered.HIGHEST_PRECEDENCE) public class JwtAuthenticationFilter implements GatewayFilter, Ordered { private static final List<String> OPEN_PATHS = List.of( "/api/v1/auth/login", "/api/v1/auth/register", "/actuator/health" ); private final JwtService jwtService; @Override public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) { String path = exchange.getRequest().getPath().value(); if (OPEN_PATHS.stream().anyMatch(path::startsWith)) { return chain.filter(exchange); // bypass auth for open routes } String authHeader = exchange.getRequest().getHeaders() .getFirst(HttpHeaders.AUTHORIZATION); if (authHeader == null || !authHeader.startsWith("Bearer ")) { exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED); return exchange.getResponse().setComplete(); } String token = authHeader.substring(7); try { Claims claims = jwtService.validateAndExtract(token); // Inject claims as headers for downstream services ServerHttpRequest mutatedRequest = exchange.getRequest().mutate() .header("X-User-ID", claims.getSubject()) .header("X-User-Roles", claims.get("roles", String.class)) .header("X-Tenant-ID", claims.get("tenantId", String.class)) .build(); return chain.filter(exchange.mutate().request(mutatedRequest).build()); } catch (JwtException e) { exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED); return exchange.getResponse().setComplete(); } } @Override public int getOrder() { return -100; } // run before other filters }
Circuit Breaker Pattern — Resilience4j Deep Dive
State machine, sliding window, half-open probing, bulkhead, and fallback strategies
A circuit breaker is a proxy that monitors calls to a downstream service. When failures reach a threshold, it opens the circuit — subsequent calls fail immediately without even attempting the real call. This prevents cascading failures: if order-service is down, you don't want 100 threads blocking for 30 seconds each waiting for a timeout.
# application.yml — Resilience4j configuration resilience4j: circuitbreaker: instances: # Circuit breaker for order-service HTTP calls orderServiceCB: registerHealthIndicator: true # expose in /actuator/health slidingWindowType: COUNT_BASED # or TIME_BASED slidingWindowSize: 10 # last 10 calls minimumNumberOfCalls: 5 # need at least 5 calls before evaluating failureRateThreshold: 50 # open if 50%+ fail slowCallRateThreshold: 80 # also open if 80%+ are slow slowCallDurationThreshold: 3s # "slow" = >3 seconds waitDurationInOpenState: 30s # stay OPEN for 30s, then try HALF-OPEN permittedNumberOfCallsInHalfOpenState: 5 # probe with 5 calls automaticTransitionFromOpenToHalfOpenEnabled: true recordExceptions: - java.io.IOException - java.net.ConnectException - org.springframework.web.reactive.function.client.WebClientResponseException$InternalServerError - org.springframework.web.reactive.function.client.WebClientResponseException$ServiceUnavailable ignoreExceptions: - com.myapp.exception.BusinessValidationException # 400s are not circuit failures - com.myapp.exception.NotFoundException # 404 not a circuit issue # Separate CB for Snowflake (longer wait, fewer calls) snowflakeCB: slidingWindowSize: 5 failureRateThreshold: 60 waitDurationInOpenState: 60s # longer recovery for DB permittedNumberOfCallsInHalfOpenState: 2 bulkhead: instances: # Limit concurrent calls to order-service (prevent thread starvation) orderServiceBulkhead: maxConcurrentCalls: 20 # max 20 concurrent calls maxWaitDuration: 100ms # wait 100ms for a slot, then reject timelimiter: instances: orderServiceTimeout: timeoutDuration: 3s # cancel call after 3s cancelRunningFuture: true retry: instances: orderServiceRetry: maxAttempts: 3 waitDuration: 200ms enableExponentialBackoff: true exponentialBackoffMultiplier: 2 # 200ms → 400ms → 800ms retryExceptions: - java.io.IOException - java.net.ConnectException ignoreExceptions: - com.myapp.exception.BusinessValidationException /* ─── Java: Service using @CircuitBreaker + @Retry ─── */ @Service @Slf4j public class OrderServiceClient { private final WebClient webClient; // @CircuitBreaker + @Bulkhead + @TimeLimiter + @Retry // Order of decoration (outer to inner): Retry → CB → Bulkhead → TimeLimiter @CircuitBreaker(name = "orderServiceCB", fallbackMethod = "getOrdersFallback") @Bulkhead(name = "orderServiceBulkhead", type = Bulkhead.Type.SEMAPHORE) @Retry(name = "orderServiceRetry") @TimeLimiter(name = "orderServiceTimeout") public CompletableFuture<List<OrderDTO>> getOrders(String userId) { return webClient.get() .uri("/orders?userId={userId}", userId) .retrieve() .bodyToFlux(OrderDTO.class) .collectList() .toFuture(); } // Fallback must have same signature + Throwable/Exception param at end public CompletableFuture<List<OrderDTO>> getOrdersFallback(String userId, Throwable t) { log.warn("Circuit breaker fallback for userId={}, reason={}", userId, t.getMessage()); if (t instanceof CallNotPermittedException) { // Circuit is OPEN — return cached/stale data return CompletableFuture.completedFuture(getCachedOrders(userId)); } if (t instanceof BulkheadFullException) { // Too many concurrent calls — return partial response return CompletableFuture.completedFuture(List.of()); } // Timeout or connection error return CompletableFuture.failedFuture( new ServiceUnavailableException("Order service temporarily unavailable")); } // Programmatic CB state inspection (useful in admin endpoints) public CircuitBreaker.State getCircuitBreakerState() { return CircuitBreakerRegistry.ofDefaults() .circuitBreaker("orderServiceCB") .getState(); } }
Nginx Upstream Configuration — Production Tuning
Keepalive pools, upstream health, rate limiting, WebSocket proxying, SSL termination
# /etc/nginx/nginx.conf — Production config for Spring Boot microservices user www-data; worker_processes auto; # 1 worker per CPU core worker_rlimit_nofile 65535; # max open file descriptors per worker events { worker_connections 4096; # max concurrent connections per worker multi_accept on; # accept all pending connections at once use epoll; # Linux kernel event queue (most efficient) } http { # ─── Performance ───────────────────────────────────────── sendfile on; tcp_nopush on; # batch TCP segments (better throughput) tcp_nodelay on; # disable Nagle for low-latency responses keepalive_timeout 65; # keep client connections open 65s keepalive_requests 1000; # reuse connection for up to 1000 requests types_hash_max_size 2048; # ─── Upstream: order-service pods ──────────────────────── upstream order_service { least_conn; # least connections algorithm server 10.0.1.10:8080 weight=1 max_fails=3 fail_timeout=30s; server 10.0.1.11:8080 weight=1 max_fails=3 fail_timeout=30s; server 10.0.1.12:8080 weight=1 max_fails=3 fail_timeout=30s; server 10.0.1.13:8080 weight=2 max_fails=3 fail_timeout=30s; # bigger instance keepalive 32; # keep 32 idle connections to backends keepalive_requests 100; # reuse upstream conn for 100 requests keepalive_timeout 60s; # close idle upstream conn after 60s # keepalive prevents TCP connection setup overhead on every request # This is critical for high-throughput Spring Boot APIs } # ─── Upstream: user-service ────────────────────────────── upstream user_service { server 10.0.2.10:8080 max_fails=2 fail_timeout=20s; server 10.0.2.11:8080 max_fails=2 fail_timeout=20s; server 10.0.2.12:8080 backup; # only used if all primary fail keepalive 16; } # ─── Rate Limiting Zones ───────────────────────────────── limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s; # 10m = 10MB shared memory zone (~160k IP addresses) # rate=100r/s = max 100 requests per second per IP limit_req_zone $http_x_user_id zone=user_limit:10m rate=50r/s; # per-user rate limiting using X-User-ID header from JWT # ─── Server Block: API Gateway ─────────────────────────── server { listen 443 ssl http2; server_name api.myapp.com; # SSL Configuration ssl_certificate /etc/ssl/certs/myapp.com.crt; ssl_certificate_key /etc/ssl/private/myapp.com.key; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256; ssl_session_cache shared:SSL:10m; ssl_session_timeout 10m; ssl_stapling on; # OCSP stapling (faster SSL handshake) add_header Strict-Transport-Security "max-age=31536000" always; # ─── Order Service Routes ────────────────────────────── location /api/v1/orders { limit_req zone=api_limit burst=200 nodelay; # burst=200: allow temporary burst of 200 extra requests # nodelay: process burst immediately (no queueing delay) proxy_pass http://order_service; # L7 proxy to upstream pool proxy_http_version 1.1; # required for keepalive proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection ""; # "" enables keepalive upstream # Pass real client IP to Spring Boot proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Timeouts — tune per endpoint requirements proxy_connect_timeout 5s; # backend TCP connect timeout proxy_send_timeout 30s; # write to backend proxy_read_timeout 30s; # read from backend # Buffer sizing proxy_buffer_size 16k; proxy_buffers 8 16k; proxy_busy_buffers_size 32k; } # ─── WebSocket Support (for real-time order updates) ─── location /ws { proxy_pass http://order_service; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; # WebSocket upgrade proxy_set_header Connection "upgrade"; # keep connection open proxy_read_timeout 3600s; # 1 hour for WS connections proxy_send_timeout 3600s; } # ─── Static health check (no backend hit) ────────────── location /health { return 200 '{"status":"UP"}'; add_header Content-Type application/json; } # ─── Deny direct access to actuator ────────────────── location /actuator { deny all; return 403; } } # HTTP → HTTPS redirect server { listen 80; server_name api.myapp.com; return 301 https://$host$request_uri; } }
AWS ALB — Target Groups, Routing Rules, ECS/EKS Integration
Listener rules, weighted target groups, WAF integration, access logs, Spring Boot on Fargate
# ─── ALB ───────────────────────────────────────────────────────────── resource "aws_lb" "api_alb" { name = "myapp-api-alb" internal = false # internet-facing load_balancer_type = "application" # L7 security_groups = [aws_security_group.alb_sg.id] subnets = var.public_subnet_ids enable_deletion_protection = true enable_http2 = true enable_cross_zone_load_balancing = true drop_invalid_header_fields = true # security best practice desync_mitigation_mode = "strictest" # Access Logs → S3 (critical for debugging, compliance) access_logs { bucket = aws_s3_bucket.alb_logs.bucket prefix = "api-alb" enabled = true } } # ─── HTTPS Listener ────────────────────────────────────────────────── resource "aws_lb_listener" "https" { load_balancer_arn = aws_lb.api_alb.arn port = "443" protocol = "HTTPS" ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06" # TLS 1.2+ certificate_arn = aws_acm_certificate.api_cert.arn default_action { type = "forward" target_group_arn = aws_lb_target_group.orders_rw.arn } } # ─── Listener Rule: Path-based routing ─────────────────────────────── resource "aws_lb_listener_rule" "orders_read" { listener_arn = aws_lb_listener.https.arn priority = 10 # lower = higher priority action { type = "forward" target_group_arn = aws_lb_target_group.orders_ro.arn } condition { path_pattern { values = ["/api/v1/orders/*"] } } condition { http_request_method { values = ["GET", "HEAD"] } } } # ─── Listener Rule: Header-based canary routing ─────────────────────── resource "aws_lb_listener_rule" "canary_v2" { listener_arn = aws_lb_listener.https.arn priority = 5 action { type = "forward" forward { target_group { arn = aws_lb_target_group.orders_v2.arn weight = 100 } } } condition { http_header { http_header_name = "X-API-Version" values = ["2"] } } } # ─── Target Group: Order Service (Spring Boot) ─────────────────────── resource "aws_lb_target_group" "orders_rw" { name = "orders-rw-tg" port = 8080 protocol = "HTTP" vpc_id = var.vpc_id target_type = "ip" # For ECS Fargate / Kubernetes pods health_check { enabled = true path = "/actuator/health/readiness" # Spring Boot Actuator port = "8081" # management port protocol = "HTTP" matcher = "200" interval = 15 timeout = 5 healthy_threshold = 2 unhealthy_threshold = 3 # 3 failures × 15s = 45s to mark unhealthy } deregistration_delay = 30 # wait 30s before removing (drain connections) slow_start = 60 # ramp up traffic over 60s (JVM warm-up) load_balancing_algorithm_type = "least_outstanding_requests" # LOR: route to instance with fewest in-flight requests (better than round-robin for Spring Boot) stickiness { type = "lb_cookie" enabled = false # disabled because we use Redis sessions } } # ─── WAF Association ───────────────────────────────────────────────── resource "aws_wafv2_web_acl_association" "alb_waf" { resource_arn = aws_lb.api_alb.arn web_acl_arn = aws_wafv2_web_acl.api_waf.arn # WAF rules: SQL injection, XSS, rate limiting, IP blocklist } # ─── CloudWatch Alarms ─────────────────────────────────────────────── resource "aws_cloudwatch_metric_alarm" "alb_5xx_alarm" { alarm_name = "alb-5xx-rate-high" comparison_operator = "GreaterThanThreshold" evaluation_periods = "2" metric_name = "HTTPCode_Target_5XX_Count" namespace = "AWS/ApplicationELB" period = "60" statistic = "Sum" threshold = "10" # alert if >10 5xx per minute alarm_actions = [aws_sns_topic.alerts.arn] dimensions = { LoadBalancer = aws_lb.api_alb.arn_suffix } }
Zero-Downtime Deploys — Rolling, Blue-Green, Canary
How LB enables all three strategies, with Spring Boot graceful shutdown and Kubernetes rollout configs
🔄 Rolling Deploy
Replace instances one-at-a-time. LB removes pod from pool, pod drains, new version starts, health check passes, pod re-added. Never have zero capacity. Risk: old and new versions run simultaneously (requires backward compatibility).
🔵🟢 Blue-Green
Two identical environments. Deploy to idle environment (blue). Test. Switch LB to point to blue (single DNS/ALB update). Keep green warm as rollback target. Zero compatibility issues, instant rollback. Cost: 2× infra during deploy.
🐦 Canary Release
Route small % (1-10%) of traffic to new version. Monitor error rates, latency, business metrics. Gradually increase % if healthy. Real-world testing with limited blast radius. Requires weighted routing (ALB, SCG Weight predicate).
# application.yml — Graceful Shutdown (Spring Boot 2.3+) server: shutdown: graceful # default is "immediate" — ALWAYS use graceful tomcat: threads: max: 200 spring: lifecycle: timeout-per-shutdown-phase: 30s # wait up to 30s for in-flight requests # What happens on SIGTERM (Kubernetes pod shutdown): # 1. Spring receives SIGTERM # 2. Sets readiness probe to DOWN (LB stops sending new requests — ~10s) # 3. Waits for in-flight requests to complete (up to 30s) # 4. Closes database connections, Kafka producers # 5. JVM exits cleanly /* ─── Kafka Consumer: pause before shutdown ─── */ @Component public class KafkaGracefulShutdown implements ApplicationListener<ContextClosedEvent> { private final KafkaListenerEndpointRegistry registry; @Override public void onApplicationEvent(ContextClosedEvent event) { // Pause all Kafka consumers to stop fetching new messages registry.getListenerContainers().forEach(MessageListenerContainer::pause); log.info("Kafka consumers paused — waiting for in-flight to complete"); // Spring lifecycle then waits for in-flight message handlers to finish } } /* ─── Kubernetes Rolling Deploy ─── */ apiVersion: apps/v1 kind: Deployment spec: replicas: 6 strategy: type: RollingUpdate rollingUpdate: maxSurge: 2 # add 2 new pods before removing old ones maxUnavailable: 0 # NEVER have fewer than 6 pods running (zero downtime) template: spec: terminationGracePeriodSeconds: 60 # must be > spring.lifecycle.timeout containers: - name: order-service lifecycle: preStop: exec: command: ["/bin/sh", "-c", "sleep 5"] # Sleep 5s BEFORE SIGTERM: allows LB to stop routing new connections # Kubernetes sends SIGTERM to app AND updates endpoints simultaneously # Without sleep, app starts shutting down while LB still routes traffic /* ─── Canary via Spring Cloud Gateway Weight ─── */ # Canary deploy: 10% traffic to new version # STEP 1: Deploy order-service-v2 with 2 pods (alongside v1's 8 pods) # STEP 2: Set gateway weights: v1=90, v2=10 # STEP 3: Monitor Prometheus: error_rate{service="order-service-v2"} # STEP 4: If ok: shift to 50/50, then 100% v2 # STEP 5: Scale down v1 pods # Monitoring canary health with Prometheus record: job:order_service:error_rate_5m expr: | rate(http_server_requests_seconds_count{ job="order-service",status=~"5..",version="v2"}[5m]) / rate(http_server_requests_seconds_count{ job="order-service",version="v2"}[5m]) # Alert if v2 error rate > 2x v1 error rate
Senior Interview Q&A — 10 YOE Level
Questions that test architectural judgment, not just memorized facts
Q: Your Spring Boot order-service starts slowing down after traffic spikes. Orders come from a Kafka topic but API calls also hit it. How would you architect the LB strategy?
Answer: This is a mixed-workload problem. First, separate concerns: Kafka consumers should be a separate deployment from the HTTP API handlers (even if same codebase, scale independently). For the HTTP side, I'd use least-outstanding-requests on the ALB (not round-robin) because API latency is variable — some POST endpoints do heavy DB writes, GET endpoints are fast.
For the Kafka side, there's no load balancer — Kafka's own partition assignment distributes load across consumer group members. I'd scale Kafka consumers to match partition count (1 consumer thread per partition, no more). If consumers are slow, increase partition count (irreversible, plan ahead).
I'd also add a circuit breaker between the API and any downstream calls (like a payment service), and a bulkhead to prevent order-processing threads from being starved by, say, a slow reporting endpoint.
Q: Sticky sessions vs distributed sessions — when would you keep sticky sessions in production?
Answer: Almost never in a greenfield system. Sticky sessions are a liability: they prevent free horizontal scaling, create hot-spots, and mean a pod crash directly causes user sessions to fail.
The only legitimate case I'd keep sticky sessions is: legacy applications that store complex in-memory state that can't be serialized to Redis without significant refactoring risk (e.g., a Tomcat app with 2000-line HttpSession objects). In that case, use IP-hash stickiness at Nginx level (L4 hash, survives if one pod restarts — other pods remain sticky to their IPs).
For new Spring Boot services: Spring Session + Redis from day one. Use a CookieSerializer with SameSite=Strict, HttpOnly, Secure, and serialize session to JSON (not Java serialization) for version compatibility.
Q: You're using Spring Cloud Gateway as your API gateway. A backend service is returning 503s. Walk me through how circuit breakers, retries, and timeouts interact.
Answer: The decorator order matters: Retry wraps CircuitBreaker wraps Bulkhead wraps TimeLimiter. When the backend returns 503:
- TimeLimiter — if the call takes longer than threshold, cancels and throws TimeoutException
- CircuitBreaker — records the failure. If failure rate crosses threshold, opens circuit
- Retry — retries up to N times (only for GET — never retry POST/PATCH blindly, they're not idempotent)
- Fallback — if CB is OPEN or retries exhausted, calls the fallback method
Critical nuance: retries amplify load. If you retry 3× with 50 concurrent clients, you get 150 requests to an already-struggling backend. Always pair retries with exponential backoff and jitter. Also, retries on a CLOSED circuit are fine; if they fail, CB opens and subsequent requests fail immediately (protecting the backend).
Q: How do you do a zero-downtime deployment of a Spring Boot service that consumes Kafka AND serves HTTP?
Answer: The HTTP side is straightforward with Kubernetes rolling update + graceful shutdown (server.shutdown=graceful, terminationGracePeriodSeconds=60, preStop: sleep 5). The tricky part is Kafka.
When a pod receives SIGTERM: (1) Pause Kafka consumers immediately (stop fetching new messages), (2) Allow in-flight message handlers to complete (respect spring.lifecycle.timeout-per-shutdown-phase), (3) Commit final offsets, (4) Close Kafka producer, (5) JVM exits.
Kafka consumer group rebalancing happens twice during a rolling deploy: once when old pod leaves the group, once when new pod joins. During rebalancing, no partition is being consumed (rebalance gap). To minimize this: use partition.assignment.strategy=CooperativeStickyAssignor (incremental rebalance — only moves necessary partitions, others keep consuming). Also set max.poll.interval.ms high enough to survive shutdown without being kicked from the group mid-processing.
Production Pitfalls — What Actually Goes Wrong
Real failure modes at 10 YOE level — with root cause and fix
X-Forwarded-For: evil.ip, 1.2.3.4. App trusts it, logs wrong IP, bypasses IP-based rate limits. Fix: Configure Spring's RemoteIpFilter to only trust the LB's IP range. Set trusted-proxies.