Java Spring Boot
Microservices Patterns
Production-grade code examples for every major microservices pattern. Built for engineers with 10+ years of experience who want real implementations — not theory.
Circuit Breaker
Prevents cascading failures by wrapping downstream calls and tripping open when failures exceed a threshold.
Three states: CLOSED (normal flow) → OPEN (fast-fail, no calls) → HALF_OPEN (probe with limited calls).
COUNT_BASED (last N calls) vs TIME_BASED (calls in last N seconds) sliding windows. Always pair with @TimeLimiter — a hanging call never counts as a failure without it.<!-- Spring Boot 3 --> <dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-spring-boot3</artifactId> <version>2.2.0</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-aop</artifactId> </dependency>
resilience4j: circuitbreaker: instances: orderService: # COUNT_BASED: evaluate last N calls | TIME_BASED: calls in last N seconds slidingWindowType: COUNT_BASED slidingWindowSize: 10 # evaluate last 10 calls failureRateThreshold: 50 # OPEN if 50%+ fail slowCallRateThreshold: 80 # also treat slow calls as failures slowCallDurationThreshold: 3s waitDurationInOpenState: 10s # stay OPEN 10s before probing permittedNumberOfCallsInHalfOpenState: 3 minimumNumberOfCalls: 5 # need 5 calls before evaluating automaticTransitionFromOpenToHalfOpenEnabled: true recordExceptions: - java.io.IOException - java.util.concurrent.TimeoutException ignoreExceptions: - com.example.BusinessException # don't count business errors retry: instances: orderService: maxAttempts: 3 waitDuration: 500ms enableExponentialBackoff: true exponentialBackoffMultiplier: 2 retryExceptions: - java.io.IOException ignoreExceptions: - com.example.PaymentDeclinedException
@Service @Slf4j public class OrderService { private final InventoryClient inventoryClient; private final CircuitBreakerRegistry cbRegistry; // ── Annotation approach ────────────────────────────────────────────────── @CircuitBreaker(name = "orderService", fallbackMethod = "getFallbackInventory") @TimeLimiter(name = "orderService") // always pair — prevents hanging calls public CompletableFuture<InventoryResponse> checkInventory(String productId) { return CompletableFuture.supplyAsync(() -> inventoryClient.getStock(productId) ); } // Fallback must match exact signature + Throwable param public CompletableFuture<InventoryResponse> getFallbackInventory( String productId, Throwable ex) { log.warn("CB fallback for productId={}, reason={}", productId, ex.getMessage()); return CompletableFuture.completedFuture( InventoryResponse.builder() .productId(productId) .available(false) .source("CACHE_FALLBACK") .build() ); } // ── Programmatic approach — full control + event listeners ──────────────── public InventoryResponse checkInventoryProgrammatic(String productId) { CircuitBreaker cb = cbRegistry.circuitBreaker("orderService"); // Register event listeners for alerting/metrics cb.getEventPublisher() .onStateTransition(evt -> log.warn( "CB state: {} -> {}", evt.getStateTransition().getFromState(), evt.getStateTransition().getToState())) .onFailureRateExceeded(evt -> log.error( "Failure rate: {}%", evt.getFailureRate())); CheckedFunction0<InventoryResponse> decorated = CircuitBreaker.decorateCheckedSupplier(cb, () -> inventoryClient.getStock(productId)); return Try.of(decorated) .recover(CallNotPermittedException.class, ex -> cachedFallback(productId)) // CB is OPEN .recover(ex -> defaultFallback(productId)) .get(); } }
// CORRECT order: Retry → CircuitBreaker → TimeLimiter → Bulkhead → function @CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback") @Retry(name = "paymentService") // outer — retries count toward CB @TimeLimiter(name = "paymentService") public CompletableFuture<PaymentResult> processPayment(PaymentRequest req) { return CompletableFuture.supplyAsync(() -> paymentClient.charge(req)); }
Bulkhead Pattern
Isolate failures by limiting concurrent calls per downstream dependency. Like a ship's bulkheads — if one section floods, others stay dry. Prevents a slow downstream from consuming all your threads.
SemaphoreBulkhead (same thread, lightweight, rate-limit concurrent calls) and ThreadPoolBulkhead (dedicated thread pool, good for blocking I/O). Combine with Circuit Breaker for maximum isolation.# application.yml resilience4j: bulkhead: instances: inventoryService: maxConcurrentCalls: 20 # max parallel calls at any time maxWaitDuration: 100ms # wait before rejecting with BulkheadFullException thread-pool-bulkhead: instances: paymentService: maxThreadPoolSize: 10 coreThreadPoolSize: 5 queueCapacity: 20 # queue before rejecting keepAliveDuration: 20ms // ───────────────────────────────────────────────────────────────────────────── @Service public class ProductService { // Semaphore Bulkhead — lightweight, same thread @Bulkhead(name = "inventoryService", fallbackMethod = "inventoryFallback") @CircuitBreaker(name = "inventoryService") public InventoryStatus checkInventory(String productId) { return inventoryClient.check(productId); } // Thread Pool Bulkhead — dedicated pool for payment calls @Bulkhead(name = "paymentService", type = Bulkhead.Type.THREADPOOL, fallbackMethod = "paymentFallback") public CompletableFuture<PaymentResult> processPayment(PaymentRequest req) { return CompletableFuture.supplyAsync(() -> paymentClient.charge(req)); } public InventoryStatus inventoryFallback(String productId, BulkheadFullException ex) { log.warn("Bulkhead full for inventoryService, productId={}", productId); return InventoryStatus.UNKNOWN; // safe degraded response } // Programmatic: per-tenant isolation (multi-tenancy) private final Map<String, Bulkhead> tenantBulkheads = new ConcurrentHashMap<>(); public Object callWithTenantIsolation(String tenantId, Supplier<?> call) { Bulkhead bulkhead = tenantBulkheads.computeIfAbsent(tenantId, id -> Bulkhead.of("tenant-" + id, BulkheadConfig.custom() .maxConcurrentCalls(10) .maxWaitDuration(Duration.ofMillis(50)) .build())); return Bulkhead.decorateSupplier(bulkhead, call).get(); } }
Saga Pattern
Replaces distributed ACID transactions (2PC). Each service executes a local transaction and publishes an event. On failure, compensating transactions undo prior steps in reverse order.
Orchestration
- Central coordinator controls workflow
- Easy to visualize and debug
- Coordinator is a single point of failure
- Best for complex multi-step flows
- State persisted in saga table
Choreography
- Each service reacts to events independently
- No central coordinator — fully decoupled
- Hard to trace workflow across services
- Best for simple 2-3 step flows
- Add correlationId for debugging
@Service @Slf4j @Transactional public class OrderSagaOrchestrator { private final InventoryServiceClient inventoryClient; private final PaymentServiceClient paymentClient; private final ShippingServiceClient shippingClient; private final SagaRepository sagaRepository; public void startOrderSaga(CreateOrderCommand cmd) { // Persist saga state FIRST — survive crashes & restarts SagaState saga = SagaState.builder() .sagaId(UUID.randomUUID().toString()) .orderId(cmd.getOrderId()) .status(SagaStatus.STARTED) .completedSteps(new ArrayList<>()) .build(); sagaRepository.save(saga); try { // Step 1 — Reserve Inventory InventoryResponse inv = inventoryClient.reserve( cmd.getProductId(), cmd.getQuantity()); saga.addStep("INVENTORY_RESERVED", inv.getReservationId()); sagaRepository.save(saga); // checkpoint after each step // Step 2 — Process Payment PaymentResponse payment = paymentClient.charge( cmd.getCustomerId(), cmd.getAmount()); saga.addStep("PAYMENT_PROCESSED", payment.getTransactionId()); sagaRepository.save(saga); // Step 3 — Schedule Shipping ShippingResponse shipping = shippingClient.schedule(cmd); saga.addStep("SHIPPING_SCHEDULED", shipping.getTrackingId()); saga.setStatus(SagaStatus.COMPLETED); sagaRepository.save(saga); } catch (PaymentException | ShippingException ex) { log.error("Saga {} failed at step, compensating...", saga.getSagaId()); compensate(saga); } } private void compensate(SagaState saga) { // Execute compensations in REVERSE order List<SagaStep> steps = saga.getCompletedSteps(); for (int i = steps.size() - 1; i >= 0; i--) { SagaStep step = steps.get(i); try { switch (step.getName()) { case "PAYMENT_PROCESSED" -> paymentClient.refund(step.getReferenceId()); case "INVENTORY_RESERVED" -> inventoryClient.release(step.getReferenceId()); case "SHIPPING_SCHEDULED" -> shippingClient.cancel(step.getReferenceId()); } } catch (Exception compensationEx) { // Compensation failures = manual intervention needed log.error("COMPENSATION FAILED for step={}, sagaId={}", step.getName(), saga.getSagaId(), compensationEx); alertOpsTeam(saga, step, compensationEx); } } saga.setStatus(SagaStatus.COMPENSATED); sagaRepository.save(saga); } }
// InventoryService — reacts to order.created, emits success/failure @Service public class InventoryEventHandler { @KafkaListener(topics = "order.created", groupId = "inventory-saga") public void onOrderCreated(OrderCreatedEvent event) { try { String reservationId = reserveInventory( event.getProductId(), event.getQuantity()); // SUCCESS — triggers Payment Service kafka.send("inventory.reserved", new InventoryReservedEvent(event.getOrderId(), reservationId)); } catch (InsufficientStockException ex) { // FAILURE — triggers compensations upstream kafka.send("inventory.reservation.failed", new InventoryFailedEvent(event.getOrderId(), ex.getMessage())); } } // Compensating transaction — triggered when payment fails @KafkaListener(topics = "payment.failed", groupId = "inventory-compensate") public void onPaymentFailed(PaymentFailedEvent event) { releaseInventory(event.getSagaId()); kafka.send("inventory.released", new InventoryReleasedEvent(event.getOrderId())); } } // PaymentService — reacts to inventory.reserved @Service public class PaymentEventHandler { @KafkaListener(topics = "inventory.reserved", groupId = "payment-saga") public void onInventoryReserved(InventoryReservedEvent event) { try { chargeCustomer(event.getCustomerId(), event.getAmount()); kafka.send("payment.processed", new PaymentProcessedEvent(event.getOrderId())); } catch (PaymentException ex) { // This triggers inventory compensation above kafka.send("payment.failed", new PaymentFailedEvent(event.getOrderId(), event.getSagaId())); } } }
Transactional Outbox
Solves the dual-write problem: writing to DB and publishing to Kafka must be atomic. Without Outbox, your DB can commit but Kafka publish fails — causing silent data loss. Write the event to an outbox_events table in the same transaction, then a poller or Debezium CDC publishes it.
// ── Outbox Entity ───────────────────────────────────────────────────────────── @Entity @Table(name = "outbox_events") @Data public class OutboxEvent { @Id @GeneratedValue(strategy = GenerationType.UUID) private String id; private String aggregateId; // e.g., orderId private String aggregateType; // e.g., "Order" private String eventType; // e.g., "OrderCreated" @Column(columnDefinition = "jsonb") private String payload; @Enumerated(EnumType.STRING) private OutboxStatus status = OutboxStatus.PENDING; private Instant createdAt; private Instant processedAt; private int retryCount = 0; } // ── Service — BOTH writes in ONE @Transactional ─────────────────────────────── @Service @Transactional // KEY: atomicity guaranteed — if either fails, both rollback public class OrderService { public Order createOrder(CreateOrderRequest req) { // 1. Save business entity Order order = Order.builder() .id(UUID.randomUUID().toString()) .customerId(req.getCustomerId()) .status(OrderStatus.PENDING) .build(); orderRepo.save(order); // 2. Save outbox event IN SAME TRANSACTION OutboxEvent outbox = OutboxEvent.builder() .aggregateId(order.getId()) .aggregateType("Order") .eventType("OrderCreated") .payload(mapper.writeValueAsString(OrderCreatedEvent.from(order))) .build(); outboxRepo.save(outbox); return order; // both committed or both rolled back } } // ── Outbox Poller — publishes pending events to Kafka ──────────────────────── @Component @Slf4j public class OutboxPoller { @Scheduled(fixedDelay = 1000) // every 1 second @Transactional public void processOutbox() { List<OutboxEvent> pending = outboxRepo .findTop100ByStatusOrderByCreatedAtAsc(OutboxStatus.PENDING); for (OutboxEvent event : pending) { try { kafka.send( toTopic(event.getEventType()), event.getAggregateId(), // partition key — preserves order event.getPayload() ).get(5, TimeUnit.SECONDS); // wait for broker ack event.setStatus(OutboxStatus.PROCESSED); event.setProcessedAt(Instant.now()); } catch (Exception ex) { log.error("Outbox publish failed id={}", event.getId(), ex); event.setRetryCount(event.getRetryCount() + 1); if (event.getRetryCount() >= 5) event.setStatus(OutboxStatus.DEAD_LETTER); } outboxRepo.save(event); } } }
CQRS
Separate the write model (Commands → PostgreSQL) from the read model (Queries → Elasticsearch/Redis). Domain events bridge the two. Read models are denormalized — no joins needed at query time.
// ── Command Side — writes to PostgreSQL ────────────────────────────────────── public record CreateProductCommand(String name, BigDecimal price, int quantity) {} @Service @Transactional public class ProductCommandHandler { public String handle(CreateProductCommand cmd) { if (cmd.price().compareTo(BigDecimal.ZERO) <= 0) throw new InvalidCommandException("Price must be positive"); Product product = Product.builder() .id(UUID.randomUUID().toString()) .name(cmd.name()) .price(cmd.price()) .build(); writeRepo.save(product); // Publish via Outbox to update read model asynchronously outboxRepo.save(OutboxEvent.of( product.getId(), "ProductCreated", mapper.writeValueAsString(ProductCreatedEvent.from(product)))); return product.getId(); } } // ── Read Model — denormalized Elasticsearch document ───────────────────────── @Document(indexName = "products") @Data public class ProductReadModel { @Id private String id; private String name; private BigDecimal price; private String categoryName; // Denormalized — no join needed at query time private double averageRating; // Pre-computed aggregate private int reviewCount; private Instant lastUpdated; } // ── Query Handler — rich Elasticsearch queries ──────────────────────────────── @Service public class ProductQueryHandler { public Page<ProductReadModel> search(ProductSearchQuery query) { NativeQuery nq = NativeQuery.builder() .withQuery(q -> q.bool(b -> b .must(m -> m.match(match -> match.field("name").query(query.keyword()))) .filter(f -> f.range(r -> r.field("price") .gte(JsonData.of(query.minPrice())) .lte(JsonData.of(query.maxPrice())))))) .withSort(Sort.by("averageRating").descending()) .withPageable(PageRequest.of(query.page(), query.size())) .build(); return esOps.searchForPage(nq, ProductReadModel.class); } } // ── Read Model Updater — keeps ES in sync via Kafka events ─────────────────── @Service public class ProductReadModelUpdater { @KafkaListener(topics = "product.created", groupId = "product-read-model") public void onProductCreated(ProductCreatedEvent event) { readRepo.save(ProductReadModel.from(event)); } @KafkaListener(topics = "review.added", groupId = "product-read-model") public void onReviewAdded(ReviewAddedEvent event) { ProductReadModel model = readRepo.findById(event.getProductId()).orElseThrow(); model.setAverageRating(event.getNewAverageRating()); // pre-computed update model.setReviewCount(model.getReviewCount() + 1); model.setLastUpdated(Instant.now()); readRepo.save(model); } }
Event Sourcing
Store every state change as an immutable event. Current state is rebuilt by replaying events. The event log IS the source of truth. Snapshots avoid replaying thousands of events for performance.
// ── Domain Aggregate — state derived entirely from events ───────────────────── public class BankAccount { private String accountId; private BigDecimal balance; private AccountStatus status; private long version = 0; private final List<DomainEvent> uncommittedEvents = new ArrayList<>(); // Static factory — raises AccountOpenedEvent public static BankAccount open(String id, BigDecimal initialDeposit) { BankAccount account = new BankAccount(); account.applyEvent(new AccountOpenedEvent(id, initialDeposit, Instant.now())); return account; } public void deposit(BigDecimal amount) { if (amount.compareTo(BigDecimal.ZERO) <= 0) throw new IllegalArgumentException("Deposit must be positive"); applyEvent(new MoneyDepositedEvent(accountId, amount, Instant.now())); } public void withdraw(BigDecimal amount) { if (balance.compareTo(amount) < 0) throw new InsufficientFundsException(accountId, amount, balance); applyEvent(new MoneyWithdrawnEvent(accountId, amount, Instant.now())); } // applyEvent mutates state ONLY — never saves private void applyEvent(DomainEvent event) { if (event instanceof AccountOpenedEvent e) { this.accountId = e.getAccountId(); this.balance = e.getInitialDeposit(); this.status = AccountStatus.ACTIVE; } else if (event instanceof MoneyDepositedEvent e) { this.balance = this.balance.add(e.getAmount()); } else if (event instanceof MoneyWithdrawnEvent e) { this.balance = this.balance.subtract(e.getAmount()); } this.version++; uncommittedEvents.add(event); } // Reconstitute from event history (replay) public static BankAccount reconstitute(List<DomainEvent> history) { BankAccount account = new BankAccount(); history.forEach(account::applyEvent); account.uncommittedEvents.clear(); // historical events already committed return account; } } // ── Event Store — save with optimistic locking ─────────────────────────────── @Repository public class EventStoreRepository { @Transactional public void save(String aggregateId, List<DomainEvent> events, long expectedVersion) { long currentVersion = getCurrentVersion(aggregateId); if (currentVersion != expectedVersion) throw new OptimisticLockingException( "Expected: " + expectedVersion + ", Got: " + currentVersion); long version = expectedVersion; for (DomainEvent event : events) { version++; jdbc.update( "INSERT INTO event_store (aggregate_id, version, event_type, event_data) VALUES (?, ?, ?, ?::jsonb)", aggregateId, version, event.getClass().getSimpleName(), mapper.writeValueAsString(event)); } } // Load from snapshot + subsequent events for performance public List<DomainEvent> loadEvents(String aggregateId) { Optional<Snapshot> snapshot = loadLatestSnapshot(aggregateId); long fromVersion = snapshot.map(Snapshot::getVersion).orElse(0L); return jdbc.query( "SELECT * FROM event_store WHERE aggregate_id = ? AND version > ? ORDER BY version ASC", eventRowMapper, aggregateId, fromVersion) .stream().map(this::deserialize).collect(Collectors.toList()); } }
API Gateway + BFF
API Gateway = single entry point for all clients. BFF (Backend for Frontend) = dedicated aggregator per client type with client-specific data shaping, reducing over-fetching and multiple round trips.
Mono.zip() for parallel calls, never sequential.@Configuration public class GatewayConfig { @Bean public RouteLocator routes(RouteLocatorBuilder builder) { return builder.routes() .route("order-service", r -> r .path("/api/v1/orders/**") .filters(f -> f .filter(authFilter) // JWT validation .requestRateLimiter(config -> config .setRateLimiter(redisRateLimiter()) .setKeyResolver(userKeyResolver())) .circuitBreaker(cb -> cb .setName("orderCB") .setFallbackUri("forward:/fallback/orders")) .retry(retry -> retry.setRetries(3)) .addRequestHeader("X-Gateway-Source", "api-gateway") .removeRequestHeader("Cookie")) // security: strip cookies .uri("lb://order-service")) // lb:// = load balanced // Mobile BFF route — different auth + dedicated service .route("mobile-bff", r -> r .path("/mobile/**") .and().header("User-Agent", ".*MobileApp.*") .filters(f -> f .stripPrefix(1) // remove /mobile prefix .filter(mobileAuthFilter)) .uri("lb://mobile-bff-service")) .build(); } @Bean public RedisRateLimiter redisRateLimiter() { return new RedisRateLimiter(10, 20, 1); // 10 req/s, burst 20 } }
@RestController @RequestMapping("/api/mobile/v1") public class MobileBFFController { @GetMapping("/dashboard") public Mono<MobileDashboard> getDashboard(@AuthenticationPrincipal JwtUser user) { // ALL 3 calls fire in parallel — not sequential! Mono<List<OrderSummary>> orders = orderClient.get() .uri("/internal/orders/recent/{id}", user.getId()) .retrieve().bodyToFlux(Order.class) .map(this::toMobileOrderSummary) // mobile-specific mapping .take(5) // mobile needs 5, web needs 20 .collectList() .onErrorReturn(Collections.emptyList()); // graceful degradation Mono<UserProfile> profile = userClient.get() .uri("/internal/users/{id}", user.getId()) .retrieve().bodyToMono(UserProfile.class) .onErrorReturn(UserProfile.anonymous()); Mono<List<ProductSummary>> recs = productClient.get() .uri("/internal/products/recommended/{id}", user.getId()) .retrieve().bodyToFlux(Product.class) // Mobile: thumbnail only. Web BFF would include full-res images .map(p -> new ProductSummary(p.getId(), p.getName(), p.getThumbnailUrl())) .take(10).collectList() .onErrorReturn(Collections.emptyList()); // Zip — waits for ALL 3 in parallel, then builds response return Mono.zip(orders, profile, recs) .map(t -> MobileDashboard.builder() .recentOrders(t.getT1()) .user(t.getT2()) .recommendations(t.getT3()) .build()); } }
Strangler Fig + Anti-Corruption Layer
Incrementally migrate a monolith by routing migrated endpoints to new services. The ACL translates between the legacy domain model and the new one — preventing legacy concepts from polluting your new domain.
// ── Gateway Strangler Routing ───────────────────────────────────────────────── @Component public class StranglerFigFilter implements GlobalFilter, Ordered { private final MigrationConfigService migrationConfig; @Override public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) { String path = exchange.getRequest().getPath().value(); MigrationStatus status = migrationConfig.getStatus(path); return switch (status) { case FULLY_MIGRATED -> routeToNewService(exchange, chain); case CANARY -> Math.random() < 0.10 ? routeToNewService(exchange, chain) // 10% canary : routeToMonolith(exchange, chain); // 90% legacy default -> routeToMonolith(exchange, chain); }; } } // ── Anti-Corruption Layer — translates legacy → new domain model ───────────── @Service public class LegacyOrderAdapter { private final LegacyOrderRepository legacyRepo; // points to monolith DB private final NewOrderRepository newRepo; public Order findById(String orderId) { // Try new service DB first, fall back to monolith with translation return newRepo.findById(orderId) .orElseGet(() -> translateFromLegacy( legacyRepo.findByOrderNumber(orderId).orElseThrow())); } private Order translateFromLegacy(LegacyOrder legacy) { return Order.builder() .id(legacy.getOrderNumber()) // renamed field .customerId(legacy.getCustId()) // abbreviated → full name .status(mapLegacyStatus(legacy.getStatusCode())) // code → enum .totalAmount(legacy.getTotalCents() // cents → BigDecimal .divide(BigDecimal.valueOf(100))) .createdAt(legacy.getCreateDttm().toInstant()) .build(); } private OrderStatus mapLegacyStatus(String code) { return switch (code) { case "P" -> OrderStatus.PENDING; case "A" -> OrderStatus.PROCESSING; // 'A' = Active in legacy system case "C" -> OrderStatus.COMPLETED; case "X" -> OrderStatus.CANCELLED; default -> throw new UnknownStatusException(code); }; } }
Sidecar Pattern & Istio Service Mesh
A sidecar container (Envoy proxy) runs alongside your service in the same pod, handling mTLS, retries, circuit breaking, and observability — without any code changes in your app.
# ── VirtualService — traffic splitting + retries + fault injection ──────────── apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: order-service spec: hosts: - order-service http: - match: - headers: x-canary: {exact: 'true'} route: - destination: host: order-service subset: v2 weight: 100 - route: - destination: host: order-service subset: v1 weight: 90 - destination: host: order-service subset: v2 weight: 10 # 10% canary retries: attempts: 3 perTryTimeout: 2s retryOn: gateway-error,connect-failure,retriable-4xx timeout: 10s fault: # chaos engineering — inject 5% delays in staging delay: percentage: {value: 5.0} fixedDelay: 5s --- # ── DestinationRule — CB + mTLS + connection pool (Bulkhead at mesh level) ──── apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: order-service spec: host: order-service trafficPolicy: connectionPool: tcp: maxConnections: 100 # Bulkhead at mesh level http: http1MaxPendingRequests: 50 outlierDetection: # Circuit Breaker at mesh level consecutive5xxErrors: 5 interval: 30s baseEjectionTime: 30s maxEjectionPercent: 50 tls: mode: ISTIO_MUTUAL # automatic mTLS — zero code change subsets: - name: v1 labels: {version: v1} - name: v2 labels: {version: v2}
JWT + OAuth2 (Zero-Trust)
External calls use user JWT Bearer tokens. Internal service-to-service calls use Client Credentials grant (machine tokens). Every service validates independently — never trust the network.
@Configuration @EnableWebSecurity @EnableMethodSecurity // enables @PreAuthorize on methods public class SecurityConfig { @Bean public SecurityFilterChain filterChain(HttpSecurity http) throws Exception { return http .csrf(AbstractHttpConfigurer::disable) .sessionManagement(s -> s.sessionCreationPolicy(SessionCreationPolicy.STATELESS)) .authorizeHttpRequests(auth -> auth .requestMatchers("/actuator/health").permitAll() .requestMatchers("/actuator/**").hasRole("ADMIN") .requestMatchers(HttpMethod.GET, "/api/v1/products/**") .hasAnyRole("USER", "ADMIN", "SERVICE") .requestMatchers("/api/v1/admin/**").hasRole("ADMIN") .anyRequest().authenticated()) .oauth2ResourceServer(oauth2 -> oauth2 .jwt(jwt -> jwt .jwtAuthenticationConverter(jwtAuthConverter()))) .build(); } @Bean public JwtAuthenticationConverter jwtAuthConverter() { JwtGrantedAuthoritiesConverter converter = new JwtGrantedAuthoritiesConverter(); converter.setAuthoritiesClaimName("roles"); // custom claim name converter.setAuthorityPrefix("ROLE_"); JwtAuthenticationConverter jwtConverter = new JwtAuthenticationConverter(); jwtConverter.setJwtGrantedAuthoritiesConverter(converter); return jwtConverter; } } // ── Service-to-Service — Client Credentials WebClient ──────────────────────── @Bean public WebClient paymentServiceClient(OAuth2AuthorizedClientManager clientManager) { // Automatically fetches, caches, and refreshes client_credentials token ServerOAuth2AuthorizedClientExchangeFilterFunction oauth = new ServerOAuth2AuthorizedClientExchangeFilterFunction(clientManager); oauth.setDefaultClientRegistrationId("payment-service"); return WebClient.builder() .baseUrl("http://payment-service") .filter(oauth) // auto-attaches Bearer token to every request .build(); } // ── application.yml — client registration ──────────────────────────────────── // spring.security.oauth2.client.registration.payment-service: // client-id: order-service // client-secret: ${ORDER_SERVICE_SECRET} # from Vault // authorization-grant-type: client_credentials // scope: payment:write payment:read
Distributed Tracing
Every request gets a traceId spanning all services and a spanId per service hop. Spring Boot 3 uses Micrometer Tracing (replaces Sleuth). Propagate through Kafka headers, WebClient, and @Async boundaries.
@Async thread pools, WebClient. Spring auto-propagates for HTTP, but Kafka requires manual header injection/extraction.# application.yml management: tracing: sampling: probability: 1.0 # 100% in dev, 0.1 (10%) in prod zipkin: tracing: endpoint: http://zipkin:9411/api/v2/spans logging: pattern: # traceId + spanId auto-injected into every log line level: '%5p [${spring.application.name},%X{traceId},%X{spanId}]' // ───────────────────────────────────────────────────────────────────────────── @Service @Slf4j public class OrderProcessingService { private final Tracer tracer; public OrderResult processOrder(Order order) { Span span = tracer.nextSpan() .name("process-order") .tag("order.id", order.getId()) .tag("order.amount", order.getAmount().toString()) .start(); try (Tracer.SpanInScope ws = tracer.withSpan(span)) { log.info("Processing order"); // traceId auto-injected in log OrderResult result = executeOrder(order); span.tag("order.status", result.getStatus().name()); return result; } catch (Exception ex) { span.error(ex); // marks span as ERROR in Zipkin/Tempo throw ex; } finally { span.end(); } } } // ── Kafka Consumer — MUST extract trace context from message headers ────────── @Component public class TracingKafkaConsumer { @KafkaListener(topics = "order.created") public void consume(ConsumerRecord<String, String> record) { // Extract parent trace from Kafka headers to continue the same trace TraceContext parentCtx = propagator.extract( record.headers(), (headers, key) -> { Header h = headers.lastHeader(key); return h != null ? new String(h.value()) : null; } ); Span span = tracer.nextSpan(parentCtx).name("consume-order-created").start(); try (Tracer.SpanInScope ws = tracer.withSpan(span)) { processMessage(record); } finally { span.end(); } } }
Observability — RED & USE Metrics
RED: Rate (req/sec), Errors (error %), Duration (latency p95/p99). USE: Utilization (CPU/pool %), Saturation (queue depth), Errors (system errors). Custom health indicators expose deep service health beyond just "UP".
/health/liveness and /health/readiness for Kubernetes probes.// ── Custom Health Indicator — Kafka Consumer Lag ────────────────────────────── @Component public class KafkaHealthIndicator implements HealthIndicator { @Override public Health health() { try { long maxLag = offsetService.getMaxConsumerLag("order-processor"); Map<String, Object> details = Map.of( "maxConsumerLag", maxLag, "brokerCount", getBrokerCount()); if (maxLag > 10000) { return Health.down() .withDetails(details) .withDetail("reason", "Consumer lag too high") .build(); } return Health.up().withDetails(details).build(); } catch (Exception ex) { return Health.down(ex).build(); } } } // ── RED Metrics with Micrometer ─────────────────────────────────────────────── @Service public class OrderMetricsService { private final Counter orderCreatedCounter; private final Counter orderFailedCounter; private final Timer orderProcessingTimer; public OrderMetricsService(MeterRegistry registry) { // RED: Rate orderCreatedCounter = Counter.builder("orders.created.total") .description("Total orders created") .register(registry); // RED: Errors orderFailedCounter = Counter.builder("orders.failed.total") .register(registry); // RED: Duration — auto-calculates p50, p95, p99 orderProcessingTimer = Timer.builder("orders.processing.duration") .publishPercentiles(0.5, 0.95, 0.99) .publishPercentileHistogram(true) .sla(Duration.ofMillis(100), Duration.ofMillis(500)) .register(registry); } public Order createOrder(CreateOrderRequest req) { return orderProcessingTimer.record(() -> { try { Order o = doCreateOrder(req); orderCreatedCounter.increment(); // Tag-based breakdown by region/channel registry.counter("orders.created.by.region", "region", req.getRegion(), "channel", req.getChannel()) .increment(); return o; } catch (Exception ex) { orderFailedCounter.increment(); registry.counter("orders.failed.by.reason", "reason", ex.getClass().getSimpleName()).increment(); throw ex; } }); } }
Consumer-Driven Contracts
Consumers define contracts (what API they need). Providers verify against those contracts in CI. Both sides get automated verification without running all services together — enabling truly independent deployments.
// ── Contract (lives in provider project's test resources) ───────────────────── // src/test/resources/contracts/order/get-order-by-id.groovy Contract.make { description 'should return order by ID' request { method GET() url '/api/v1/orders/12345' headers { header('Authorization', $(consumer(regex('Bearer .+')), producer('Bearer valid-token'))) } } response { status OK() body( orderId: '12345', status: $(anyOf('PENDING', 'PROCESSING', 'SHIPPED')), customerId: $(anyNonEmptyString()), totalAmount: $(anyPositiveDecimal()) ) } } // ── Provider Base Test — stub data the contract expects ─────────────────────── @SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT) public class OrderContractBase { @MockBean OrderRepository orderRepo; @BeforeEach void setUp() { given(orderRepo.findById("12345")).willReturn( Optional.of(Order.builder() .id("12345") .status(OrderStatus.PROCESSING) .customerId("cust-99") .totalAmount(new BigDecimal("99.99")) .build())); RestAssured.baseURI = "http://localhost:" + port; } // Plugin auto-generates test classes from contracts that extend this base } // ── Consumer Test — uses generated WireMock stubs ───────────────────────────── @SpringBootTest @AutoConfigureStubRunner( ids = "com.example:order-service:+:stubs", // pulls from Maven stubsMode = StubRunnerProperties.StubsMode.CLASSPATH ) class OrderClientTest { @Autowired OrderClient orderClient; @Test void shouldFetchOrderById() { // Stub runner starts WireMock with contract stubs automatically OrderResponse response = orderClient.getOrder("12345"); assertThat(response.getOrderId()).isEqualTo("12345"); assertThat(response.getStatus()).isIn("PENDING", "PROCESSING", "SHIPPED"); assertThat(response.getTotalAmount()).isPositive(); } }
Testing Strategy
70% Unit tests (business logic isolation), 20% Integration tests (real DB/Kafka via Testcontainers), 10% Component/Contract/E2E. Never mock what you own — use Testcontainers for real databases.
// ── Integration Test — real PostgreSQL + Kafka in Docker ───────────────────── @SpringBootTest @Testcontainers class OrderRepositoryIntegrationTest { @Container static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:15") .withDatabaseName("testdb") .withInitScript("schema.sql"); @Container static KafkaContainer kafka = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:7.4.0")); @DynamicPropertySource static void configure(DynamicPropertyRegistry registry) { registry.add("spring.datasource.url", postgres::getJdbcUrl); registry.add("spring.datasource.username", postgres::getUsername); registry.add("spring.datasource.password", postgres::getPassword); registry.add("spring.kafka.bootstrap-servers", kafka::getBootstrapServers); } @Autowired OrderRepository orderRepo; @Test void shouldPersistOrder() { Order order = Order.builder() .id(UUID.randomUUID().toString()) .customerId("cust-1") .status(OrderStatus.PENDING) .build(); orderRepo.save(order); assertThat(orderRepo.findById(order.getId())).isPresent(); } } // ── Component Test — real Spring context, mocked downstream ────────────────── @SpringBootTest(webEnvironment = RANDOM_PORT) class OrderServiceComponentTest { @RegisterExtension static WireMockExtension inventoryMock = WireMockExtension.newInstance() .options(wireMockConfig().port(8081)).build(); @RegisterExtension static WireMockExtension paymentMock = WireMockExtension.newInstance() .options(wireMockConfig().port(8082)).build(); @Test void shouldCreateOrderWhenInventoryAndPaymentSucceed() { inventoryMock.stubFor(get("/api/inventory/PROD-1") .willReturn(okJson("{\"available\":true,\"productId\":\"PROD-1\"}"))); paymentMock.stubFor(post("/api/payment/charge") .willReturn(okJson("{\"transactionId\":\"TXN-99\",\"status\":\"SUCCESS\"}"))); ResponseEntity<Order> response = restTemplate.postForEntity( "/api/v1/orders", new CreateOrderRequest("cust-1", "PROD-1", 2), Order.class); assertThat(response.getStatusCode()).isEqualTo(HttpStatus.CREATED); assertThat(response.getBody().getStatus()).isEqualTo(OrderStatus.CONFIRMED); // Verify downstream services were actually called inventoryMock.verify(1, getRequestedFor(urlEqualTo("/api/inventory/PROD-1"))); paymentMock.verify(1, postRequestedFor(urlEqualTo("/api/payment/charge"))); } }
Quick Reference
| Problem | Pattern | Key Interview Points |
|---|---|---|
| Cascading failures from slow downstream | Circuit Breaker | 3 states: CLOSED→OPEN→HALF_OPEN. failureRateThreshold=50, slidingWindow=10, waitDuration=10s. Always pair with TimeLimiter. |
| DB write + Kafka publish atomicity | Transactional Outbox | Same @Transactional saves entity + outbox event. Poller/Debezium publishes. At-least-once delivery — consumers must be idempotent. |
| Multi-service business transaction | Saga (Orchestration) | Stateful coordinator + compensating transactions in reverse order. Persist saga state in DB after each step. Compensation failures need manual ops. |
| Decoupled event-driven transaction | Saga (Choreography) | Each service emits success/failure events. Hard to debug — add correlationId/sagaId to all events. No single point of failure. |
| High read/write load divergence | CQRS | Write: PostgreSQL. Read: Elasticsearch/Redis. Events bridge the two. Expect ms-to-second eventual consistency lag. |
| Full audit trail / time travel queries | Event Sourcing | Immutable event log. Replay = current state. Snapshots every 100 events for performance. Schema evolution via upcasters. |
| Thread starvation from slow dependency | Bulkhead | Separate thread pool per downstream. queueCapacity limits backlog. Semaphore for lightweight rate-limiting. |
| Monolith to microservices migration | Strangler Fig + ACL | Gateway routes migrated endpoints. ACL translates legacy model. Canary routing for gradual cutover. Always have rollback switch. |
| API breaking changes detection | Consumer Contracts | Pact/Spring Cloud Contract. Stubs in Maven. Provider auto-tests from DSL. Catches breaks in CI before deploy. |
| Cross-service call debugging | Distributed Tracing | TraceId across all logs. Propagate through Kafka headers, @Async, WebClient. Sample 10% in prod, 100% in dev. |
| Service-to-service authentication | Client Credentials | Machine tokens, 5-15 min TTL, auto-cached + auto-refreshed. mTLS from Istio for defense in depth. |
| Client-specific API aggregation | BFF Pattern | Separate aggregator per client type. Parallel calls via Mono.zip(). No business logic — only aggregation. Team owns their BFF. |
| Zero-code retries, CB, mTLS between services | Service Mesh (Istio) | Envoy sidecar handles everything. Config in VirtualService/DestinationRule YAML. Ops changes without redeploying services. |
| Integration tests with real dependencies | Testcontainers | Real PostgreSQL + Kafka in Docker. @DynamicPropertySource wires ports. Never mock what you own. |