Senior Engineer Interview Prep

Java Spring Boot
Microservices Patterns

Production-grade code examples for every major microservices pattern. Built for engineers with 10+ years of experience who want real implementations — not theory.

Spring Boot 3.x Java 17+ Resilience4j 2.x Kafka Micrometer Tracing Testcontainers Istio Vault

Circuit Breaker

resilience4j · spring-boot-starter-aop

Resilience4j Fault Tolerance

Prevents cascading failures by wrapping downstream calls and tripping open when failures exceed a threshold. Three states: CLOSED (normal flow) → OPEN (fast-fail, no calls) → HALF_OPEN (probe with limited calls).

💡

Interview Key: Explain the 3 states clearly. Know the difference between COUNT_BASED (last N calls) vs TIME_BASED (calls in last N seconds) sliding windows. Always pair with @TimeLimiter — a hanging call never counts as a failure without it.

Maven Dependency

XML · pom.xml

<!-- Spring Boot 3 -->
<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId>
    <version>2.2.0</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

application.yml — Full Configuration

YAML

resilience4j:
  circuitbreaker:
    instances:
      orderService:
        # COUNT_BASED: evaluate last N calls | TIME_BASED: calls in last N seconds
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 10           # evaluate last 10 calls
        failureRateThreshold: 50        # OPEN if 50%+ fail
        slowCallRateThreshold: 80       # also treat slow calls as failures
        slowCallDurationThreshold: 3s
        waitDurationInOpenState: 10s    # stay OPEN 10s before probing
        permittedNumberOfCallsInHalfOpenState: 3
        minimumNumberOfCalls: 5         # need 5 calls before evaluating
        automaticTransitionFromOpenToHalfOpenEnabled: true
        recordExceptions:
          - java.io.IOException
          - java.util.concurrent.TimeoutException
        ignoreExceptions:
          - com.example.BusinessException  # don't count business errors
  retry:
    instances:
      orderService:
        maxAttempts: 3
        waitDuration: 500ms
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2
        retryExceptions:
          - java.io.IOException
        ignoreExceptions:
          - com.example.PaymentDeclinedException

Service Implementation

Java

@Service
@Slf4j
public class OrderService {

    private final InventoryClient inventoryClient;
    private final CircuitBreakerRegistry cbRegistry;

    // ── Annotation approach ──────────────────────────────────────────────────
    @CircuitBreaker(name = "orderService", fallbackMethod = "getFallbackInventory")
    @TimeLimiter(name = "orderService")   // always pair — prevents hanging calls
    public CompletableFuture<InventoryResponse> checkInventory(String productId) {
        return CompletableFuture.supplyAsync(() ->
            inventoryClient.getStock(productId)
        );
    }

    // Fallback must match exact signature + Throwable param
    public CompletableFuture<InventoryResponse> getFallbackInventory(
            String productId, Throwable ex) {
        log.warn("CB fallback for productId={}, reason={}", productId, ex.getMessage());
        return CompletableFuture.completedFuture(
            InventoryResponse.builder()
                .productId(productId)
                .available(false)
                .source("CACHE_FALLBACK")
                .build()
        );
    }

    // ── Programmatic approach — full control + event listeners ────────────────
    public InventoryResponse checkInventoryProgrammatic(String productId) {
        CircuitBreaker cb = cbRegistry.circuitBreaker("orderService");

        // Register event listeners for alerting/metrics
        cb.getEventPublisher()
            .onStateTransition(evt -> log.warn(
                "CB state: {} -> {}",
                evt.getStateTransition().getFromState(),
                evt.getStateTransition().getToState()))
            .onFailureRateExceeded(evt -> log.error(
                "Failure rate: {}%", evt.getFailureRate()));

        CheckedFunction0<InventoryResponse> decorated =
            CircuitBreaker.decorateCheckedSupplier(cb,
                () -> inventoryClient.getStock(productId));

        return Try.of(decorated)
            .recover(CallNotPermittedException.class,
                ex -> cachedFallback(productId))  // CB is OPEN
            .recover(ex -> defaultFallback(productId))
            .get();
    }
}

Composition Order (Critical!)

⚠️

Order matters: Retry wraps CircuitBreaker. Each retry attempt counts as a separate call to the CB. Wrong order = retries don't affect the CB failure count.

Java

// CORRECT order: Retry → CircuitBreaker → TimeLimiter → Bulkhead → function
@CircuitBreaker(name = "paymentService", fallbackMethod = "paymentFallback")
@Retry(name = "paymentService")  // outer — retries count toward CB
@TimeLimiter(name = "paymentService")
public CompletableFuture<PaymentResult> processPayment(PaymentRequest req) {
    return CompletableFuture.supplyAsync(() -> paymentClient.charge(req));
}

Bulkhead Pattern

resilience4j · thread pool isolation

Resilience4jIsolation

Isolate failures by limiting concurrent calls per downstream dependency. Like a ship's bulkheads — if one section floods, others stay dry. Prevents a slow downstream from consuming all your threads.

💡

Interview Key: Two types — SemaphoreBulkhead (same thread, lightweight, rate-limit concurrent calls) and ThreadPoolBulkhead (dedicated thread pool, good for blocking I/O). Combine with Circuit Breaker for maximum isolation.

YAML + Java

# application.yml
resilience4j:
  bulkhead:
    instances:
      inventoryService:
        maxConcurrentCalls: 20      # max parallel calls at any time
        maxWaitDuration: 100ms      # wait before rejecting with BulkheadFullException
  thread-pool-bulkhead:
    instances:
      paymentService:
        maxThreadPoolSize: 10
        coreThreadPoolSize: 5
        queueCapacity: 20           # queue before rejecting
        keepAliveDuration: 20ms

// ─────────────────────────────────────────────────────────────────────────────

@Service
public class ProductService {

    // Semaphore Bulkhead — lightweight, same thread
    @Bulkhead(name = "inventoryService", fallbackMethod = "inventoryFallback")
    @CircuitBreaker(name = "inventoryService")
    public InventoryStatus checkInventory(String productId) {
        return inventoryClient.check(productId);
    }

    // Thread Pool Bulkhead — dedicated pool for payment calls
    @Bulkhead(name = "paymentService",
              type = Bulkhead.Type.THREADPOOL,
              fallbackMethod = "paymentFallback")
    public CompletableFuture<PaymentResult> processPayment(PaymentRequest req) {
        return CompletableFuture.supplyAsync(() -> paymentClient.charge(req));
    }

    public InventoryStatus inventoryFallback(String productId,
            BulkheadFullException ex) {
        log.warn("Bulkhead full for inventoryService, productId={}", productId);
        return InventoryStatus.UNKNOWN; // safe degraded response
    }

    // Programmatic: per-tenant isolation (multi-tenancy)
    private final Map<String, Bulkhead> tenantBulkheads = new ConcurrentHashMap<>();

    public Object callWithTenantIsolation(String tenantId, Supplier<?> call) {
        Bulkhead bulkhead = tenantBulkheads.computeIfAbsent(tenantId, id ->
            Bulkhead.of("tenant-" + id, BulkheadConfig.custom()
                .maxConcurrentCalls(10)
                .maxWaitDuration(Duration.ofMillis(50))
                .build()));

        return Bulkhead.decorateSupplier(bulkhead, call).get();
    }
}

Saga Pattern

orchestration · choreography · compensating transactions

SagaKafka

Replaces distributed ACID transactions (2PC). Each service executes a local transaction and publishes an event. On failure, compensating transactions undo prior steps in reverse order.

Orchestration

Central coordinator controls workflow
Easy to visualize and debug
Coordinator is a single point of failure
Best for complex multi-step flows
State persisted in saga table

Choreography

Each service reacts to events independently
No central coordinator — fully decoupled
Hard to trace workflow across services
Best for simple 2-3 step flows
Add correlationId for debugging

💡

Interview Key: Always persist saga state to DB before calling next step. Compensation failures need manual intervention + alerting. Idempotent compensation is critical — it might be called multiple times.

Orchestration-Based Saga

Java · Orchestrator

@Service
@Slf4j
@Transactional
public class OrderSagaOrchestrator {

    private final InventoryServiceClient inventoryClient;
    private final PaymentServiceClient   paymentClient;
    private final ShippingServiceClient  shippingClient;
    private final SagaRepository         sagaRepository;

    public void startOrderSaga(CreateOrderCommand cmd) {
        // Persist saga state FIRST — survive crashes & restarts
        SagaState saga = SagaState.builder()
            .sagaId(UUID.randomUUID().toString())
            .orderId(cmd.getOrderId())
            .status(SagaStatus.STARTED)
            .completedSteps(new ArrayList<>())
            .build();
        sagaRepository.save(saga);

        try {
            // Step 1 — Reserve Inventory
            InventoryResponse inv = inventoryClient.reserve(
                cmd.getProductId(), cmd.getQuantity());
            saga.addStep("INVENTORY_RESERVED", inv.getReservationId());
            sagaRepository.save(saga); // checkpoint after each step

            // Step 2 — Process Payment
            PaymentResponse payment = paymentClient.charge(
                cmd.getCustomerId(), cmd.getAmount());
            saga.addStep("PAYMENT_PROCESSED", payment.getTransactionId());
            sagaRepository.save(saga);

            // Step 3 — Schedule Shipping
            ShippingResponse shipping = shippingClient.schedule(cmd);
            saga.addStep("SHIPPING_SCHEDULED", shipping.getTrackingId());

            saga.setStatus(SagaStatus.COMPLETED);
            sagaRepository.save(saga);

        } catch (PaymentException | ShippingException ex) {
            log.error("Saga {} failed at step, compensating...", saga.getSagaId());
            compensate(saga);
        }
    }

    private void compensate(SagaState saga) {
        // Execute compensations in REVERSE order
        List<SagaStep> steps = saga.getCompletedSteps();
        for (int i = steps.size() - 1; i >= 0; i--) {
            SagaStep step = steps.get(i);
            try {
                switch (step.getName()) {
                    case "PAYMENT_PROCESSED"  -> paymentClient.refund(step.getReferenceId());
                    case "INVENTORY_RESERVED"  -> inventoryClient.release(step.getReferenceId());
                    case "SHIPPING_SCHEDULED"  -> shippingClient.cancel(step.getReferenceId());
                }
            } catch (Exception compensationEx) {
                // Compensation failures = manual intervention needed
                log.error("COMPENSATION FAILED for step={}, sagaId={}",
                    step.getName(), saga.getSagaId(), compensationEx);
                alertOpsTeam(saga, step, compensationEx);
            }
        }
        saga.setStatus(SagaStatus.COMPENSATED);
        sagaRepository.save(saga);
    }
}

Choreography-Based Saga (Event-Driven)

Java · Choreography

// InventoryService — reacts to order.created, emits success/failure
@Service
public class InventoryEventHandler {

    @KafkaListener(topics = "order.created", groupId = "inventory-saga")
    public void onOrderCreated(OrderCreatedEvent event) {
        try {
            String reservationId = reserveInventory(
                event.getProductId(), event.getQuantity());
            // SUCCESS — triggers Payment Service
            kafka.send("inventory.reserved",
                new InventoryReservedEvent(event.getOrderId(), reservationId));
        } catch (InsufficientStockException ex) {
            // FAILURE — triggers compensations upstream
            kafka.send("inventory.reservation.failed",
                new InventoryFailedEvent(event.getOrderId(), ex.getMessage()));
        }
    }

    // Compensating transaction — triggered when payment fails
    @KafkaListener(topics = "payment.failed", groupId = "inventory-compensate")
    public void onPaymentFailed(PaymentFailedEvent event) {
        releaseInventory(event.getSagaId());
        kafka.send("inventory.released",
            new InventoryReleasedEvent(event.getOrderId()));
    }
}

// PaymentService — reacts to inventory.reserved
@Service
public class PaymentEventHandler {

    @KafkaListener(topics = "inventory.reserved", groupId = "payment-saga")
    public void onInventoryReserved(InventoryReservedEvent event) {
        try {
            chargeCustomer(event.getCustomerId(), event.getAmount());
            kafka.send("payment.processed",
                new PaymentProcessedEvent(event.getOrderId()));
        } catch (PaymentException ex) {
            // This triggers inventory compensation above
            kafka.send("payment.failed",
                new PaymentFailedEvent(event.getOrderId(), event.getSagaId()));
        }
    }
}

Transactional Outbox

dual-write · at-least-once delivery · debezium CDC

KafkaAtomicity

Solves the dual-write problem: writing to DB and publishing to Kafka must be atomic. Without Outbox, your DB can commit but Kafka publish fails — causing silent data loss. Write the event to an outbox_events table in the same transaction, then a poller or Debezium CDC publishes it.

💡

Interview Key: This is the most important pattern for reliable event publishing. Without it — DB commits but Kafka fails = permanent data loss. The outbox guarantees at-least-once delivery. Use idempotent consumers on the receiving end to handle duplicates.

Java · Entity + Service + Poller

// ── Outbox Entity ─────────────────────────────────────────────────────────────
@Entity
@Table(name = "outbox_events")
@Data
public class OutboxEvent {
    @Id @GeneratedValue(strategy = GenerationType.UUID)
    private String id;
    private String aggregateId;   // e.g., orderId
    private String aggregateType; // e.g., "Order"
    private String eventType;    // e.g., "OrderCreated"
    @Column(columnDefinition = "jsonb")
    private String payload;
    @Enumerated(EnumType.STRING)
    private OutboxStatus status = OutboxStatus.PENDING;
    private Instant createdAt;
    private Instant processedAt;
    private int retryCount = 0;
}

// ── Service — BOTH writes in ONE @Transactional ───────────────────────────────
@Service
@Transactional  // KEY: atomicity guaranteed — if either fails, both rollback
public class OrderService {

    public Order createOrder(CreateOrderRequest req) {
        // 1. Save business entity
        Order order = Order.builder()
            .id(UUID.randomUUID().toString())
            .customerId(req.getCustomerId())
            .status(OrderStatus.PENDING)
            .build();
        orderRepo.save(order);

        // 2. Save outbox event IN SAME TRANSACTION
        OutboxEvent outbox = OutboxEvent.builder()
            .aggregateId(order.getId())
            .aggregateType("Order")
            .eventType("OrderCreated")
            .payload(mapper.writeValueAsString(OrderCreatedEvent.from(order)))
            .build();
        outboxRepo.save(outbox);

        return order;  // both committed or both rolled back
    }
}

// ── Outbox Poller — publishes pending events to Kafka ────────────────────────
@Component
@Slf4j
public class OutboxPoller {

    @Scheduled(fixedDelay = 1000)  // every 1 second
    @Transactional
    public void processOutbox() {
        List<OutboxEvent> pending = outboxRepo
            .findTop100ByStatusOrderByCreatedAtAsc(OutboxStatus.PENDING);

        for (OutboxEvent event : pending) {
            try {
                kafka.send(
                    toTopic(event.getEventType()),
                    event.getAggregateId(),  // partition key — preserves order
                    event.getPayload()
                ).get(5, TimeUnit.SECONDS);  // wait for broker ack

                event.setStatus(OutboxStatus.PROCESSED);
                event.setProcessedAt(Instant.now());

            } catch (Exception ex) {
                log.error("Outbox publish failed id={}", event.getId(), ex);
                event.setRetryCount(event.getRetryCount() + 1);
                if (event.getRetryCount() >= 5)
                    event.setStatus(OutboxStatus.DEAD_LETTER);
            }
            outboxRepo.save(event);
        }
    }
}

CQRS

command query responsibility segregation · elasticsearch read model

CQRSElasticsearch

Separate the write model (Commands → PostgreSQL) from the read model (Queries → Elasticsearch/Redis). Domain events bridge the two. Read models are denormalized — no joins needed at query time.

💡

Interview Key: Don't over-engineer — use CQRS when read/write patterns diverge significantly. Main trade-off is eventual consistency: the read model may lag by milliseconds to seconds. Handle this with "last updated" timestamps or WebSocket push for real-time requirements.

Java · Command + Query + Read Model Updater

// ── Command Side — writes to PostgreSQL ──────────────────────────────────────
public record CreateProductCommand(String name, BigDecimal price, int quantity) {}

@Service
@Transactional
public class ProductCommandHandler {

    public String handle(CreateProductCommand cmd) {
        if (cmd.price().compareTo(BigDecimal.ZERO) <= 0)
            throw new InvalidCommandException("Price must be positive");

        Product product = Product.builder()
            .id(UUID.randomUUID().toString())
            .name(cmd.name())
            .price(cmd.price())
            .build();
        writeRepo.save(product);

        // Publish via Outbox to update read model asynchronously
        outboxRepo.save(OutboxEvent.of(
            product.getId(), "ProductCreated",
            mapper.writeValueAsString(ProductCreatedEvent.from(product))));

        return product.getId();
    }
}

// ── Read Model — denormalized Elasticsearch document ─────────────────────────
@Document(indexName = "products")
@Data
public class ProductReadModel {
    @Id private String id;
    private String     name;
    private BigDecimal price;
    private String     categoryName;  // Denormalized — no join needed at query time
    private double     averageRating; // Pre-computed aggregate
    private int        reviewCount;
    private Instant    lastUpdated;
}

// ── Query Handler — rich Elasticsearch queries ────────────────────────────────
@Service
public class ProductQueryHandler {

    public Page<ProductReadModel> search(ProductSearchQuery query) {
        NativeQuery nq = NativeQuery.builder()
            .withQuery(q -> q.bool(b -> b
                .must(m -> m.match(match ->
                    match.field("name").query(query.keyword())))
                .filter(f -> f.range(r -> r.field("price")
                    .gte(JsonData.of(query.minPrice()))
                    .lte(JsonData.of(query.maxPrice()))))))
            .withSort(Sort.by("averageRating").descending())
            .withPageable(PageRequest.of(query.page(), query.size()))
            .build();
        return esOps.searchForPage(nq, ProductReadModel.class);
    }
}

// ── Read Model Updater — keeps ES in sync via Kafka events ───────────────────
@Service
public class ProductReadModelUpdater {

    @KafkaListener(topics = "product.created", groupId = "product-read-model")
    public void onProductCreated(ProductCreatedEvent event) {
        readRepo.save(ProductReadModel.from(event));
    }

    @KafkaListener(topics = "review.added", groupId = "product-read-model")
    public void onReviewAdded(ReviewAddedEvent event) {
        ProductReadModel model = readRepo.findById(event.getProductId()).orElseThrow();
        model.setAverageRating(event.getNewAverageRating()); // pre-computed update
        model.setReviewCount(model.getReviewCount() + 1);
        model.setLastUpdated(Instant.now());
        readRepo.save(model);
    }
}

Event Sourcing

immutable event log · state reconstruction · snapshots

Event StoreAudit Trail

Store every state change as an immutable event. Current state is rebuilt by replaying events. The event log IS the source of truth. Snapshots avoid replaying thousands of events for performance.

💡

Interview Key: Challenges — event schema evolution (add version field + upcasters), query complexity (you need projections/read models for complex queries), snapshot strategy (every 100 events, or on a schedule). Optimistic locking prevents concurrent writes to same aggregate.

Java · Aggregate + Event Store

// ── Domain Aggregate — state derived entirely from events ─────────────────────
public class BankAccount {
    private String     accountId;
    private BigDecimal balance;
    private AccountStatus status;
    private long       version = 0;
    private final List<DomainEvent> uncommittedEvents = new ArrayList<>();

    // Static factory — raises AccountOpenedEvent
    public static BankAccount open(String id, BigDecimal initialDeposit) {
        BankAccount account = new BankAccount();
        account.applyEvent(new AccountOpenedEvent(id, initialDeposit, Instant.now()));
        return account;
    }

    public void deposit(BigDecimal amount) {
        if (amount.compareTo(BigDecimal.ZERO) <= 0)
            throw new IllegalArgumentException("Deposit must be positive");
        applyEvent(new MoneyDepositedEvent(accountId, amount, Instant.now()));
    }

    public void withdraw(BigDecimal amount) {
        if (balance.compareTo(amount) < 0)
            throw new InsufficientFundsException(accountId, amount, balance);
        applyEvent(new MoneyWithdrawnEvent(accountId, amount, Instant.now()));
    }

    // applyEvent mutates state ONLY — never saves
    private void applyEvent(DomainEvent event) {
        if (event instanceof AccountOpenedEvent e) {
            this.accountId = e.getAccountId();
            this.balance   = e.getInitialDeposit();
            this.status    = AccountStatus.ACTIVE;
        } else if (event instanceof MoneyDepositedEvent e) {
            this.balance = this.balance.add(e.getAmount());
        } else if (event instanceof MoneyWithdrawnEvent e) {
            this.balance = this.balance.subtract(e.getAmount());
        }
        this.version++;
        uncommittedEvents.add(event);
    }

    // Reconstitute from event history (replay)
    public static BankAccount reconstitute(List<DomainEvent> history) {
        BankAccount account = new BankAccount();
        history.forEach(account::applyEvent);
        account.uncommittedEvents.clear();  // historical events already committed
        return account;
    }
}

// ── Event Store — save with optimistic locking ───────────────────────────────
@Repository
public class EventStoreRepository {

    @Transactional
    public void save(String aggregateId, List<DomainEvent> events, long expectedVersion) {
        long currentVersion = getCurrentVersion(aggregateId);
        if (currentVersion != expectedVersion)
            throw new OptimisticLockingException(
                "Expected: " + expectedVersion + ", Got: " + currentVersion);

        long version = expectedVersion;
        for (DomainEvent event : events) {
            version++;
            jdbc.update(
                "INSERT INTO event_store (aggregate_id, version, event_type, event_data) VALUES (?, ?, ?, ?::jsonb)",
                aggregateId, version,
                event.getClass().getSimpleName(),
                mapper.writeValueAsString(event));
        }
    }

    // Load from snapshot + subsequent events for performance
    public List<DomainEvent> loadEvents(String aggregateId) {
        Optional<Snapshot> snapshot = loadLatestSnapshot(aggregateId);
        long fromVersion = snapshot.map(Snapshot::getVersion).orElse(0L);

        return jdbc.query(
            "SELECT * FROM event_store WHERE aggregate_id = ? AND version > ? ORDER BY version ASC",
            eventRowMapper, aggregateId, fromVersion)
            .stream().map(this::deserialize).collect(Collectors.toList());
    }
}

API Gateway + BFF

spring cloud gateway · backend for frontend · parallel aggregation

GatewayAggregation

API Gateway = single entry point for all clients. BFF (Backend for Frontend) = dedicated aggregator per client type with client-specific data shaping, reducing over-fetching and multiple round trips.

💡

Interview Key: Prevent BFF from becoming a monolith — strict rule: no business logic in BFF, only aggregation and transformation. Separate deployments per BFF (mobile team owns mobile BFF). Use Mono.zip() for parallel calls, never sequential.

Spring Cloud Gateway — Route Configuration

Java · Gateway Config

@Configuration
public class GatewayConfig {

    @Bean
    public RouteLocator routes(RouteLocatorBuilder builder) {
        return builder.routes()
            .route("order-service", r -> r
                .path("/api/v1/orders/**")
                .filters(f -> f
                    .filter(authFilter)              // JWT validation
                    .requestRateLimiter(config -> config
                        .setRateLimiter(redisRateLimiter())
                        .setKeyResolver(userKeyResolver()))
                    .circuitBreaker(cb -> cb
                        .setName("orderCB")
                        .setFallbackUri("forward:/fallback/orders"))
                    .retry(retry -> retry.setRetries(3))
                    .addRequestHeader("X-Gateway-Source", "api-gateway")
                    .removeRequestHeader("Cookie"))  // security: strip cookies
                .uri("lb://order-service"))          // lb:// = load balanced

            // Mobile BFF route — different auth + dedicated service
            .route("mobile-bff", r -> r
                .path("/mobile/**")
                .and().header("User-Agent", ".*MobileApp.*")
                .filters(f -> f
                    .stripPrefix(1)  // remove /mobile prefix
                    .filter(mobileAuthFilter))
                .uri("lb://mobile-bff-service"))
            .build();
    }

    @Bean
    public RedisRateLimiter redisRateLimiter() {
        return new RedisRateLimiter(10, 20, 1); // 10 req/s, burst 20
    }
}

Mobile BFF — Parallel Aggregation

Java · Mobile BFF Controller

@RestController
@RequestMapping("/api/mobile/v1")
public class MobileBFFController {

    @GetMapping("/dashboard")
    public Mono<MobileDashboard> getDashboard(@AuthenticationPrincipal JwtUser user) {

        // ALL 3 calls fire in parallel — not sequential!
        Mono<List<OrderSummary>> orders = orderClient.get()
            .uri("/internal/orders/recent/{id}", user.getId())
            .retrieve().bodyToFlux(Order.class)
            .map(this::toMobileOrderSummary)  // mobile-specific mapping
            .take(5)                           // mobile needs 5, web needs 20
            .collectList()
            .onErrorReturn(Collections.emptyList());  // graceful degradation

        Mono<UserProfile> profile = userClient.get()
            .uri("/internal/users/{id}", user.getId())
            .retrieve().bodyToMono(UserProfile.class)
            .onErrorReturn(UserProfile.anonymous());

        Mono<List<ProductSummary>> recs = productClient.get()
            .uri("/internal/products/recommended/{id}", user.getId())
            .retrieve().bodyToFlux(Product.class)
            // Mobile: thumbnail only. Web BFF would include full-res images
            .map(p -> new ProductSummary(p.getId(), p.getName(), p.getThumbnailUrl()))
            .take(10).collectList()
            .onErrorReturn(Collections.emptyList());

        // Zip — waits for ALL 3 in parallel, then builds response
        return Mono.zip(orders, profile, recs)
            .map(t -> MobileDashboard.builder()
                .recentOrders(t.getT1())
                .user(t.getT2())
                .recommendations(t.getT3())
                .build());
    }
}

Strangler Fig + Anti-Corruption Layer

incremental migration · gateway routing · legacy translation

MigrationACL

Incrementally migrate a monolith by routing migrated endpoints to new services. The ACL translates between the legacy domain model and the new one — preventing legacy concepts from polluting your new domain.

💡

Interview Key: Never let the new service directly depend on the monolith's DB schema. ACL = adapter layer that absorbs legacy complexity. Use canary routing (10% to new service) to validate before full cutover. Always have a rollback switch.

Java · Gateway Filter + ACL

// ── Gateway Strangler Routing ─────────────────────────────────────────────────
@Component
public class StranglerFigFilter implements GlobalFilter, Ordered {

    private final MigrationConfigService migrationConfig;

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        String path = exchange.getRequest().getPath().value();
        MigrationStatus status = migrationConfig.getStatus(path);

        return switch (status) {
            case FULLY_MIGRATED -> routeToNewService(exchange, chain);
            case CANARY -> Math.random() < 0.10
                ? routeToNewService(exchange, chain)   // 10% canary
                : routeToMonolith(exchange, chain);    // 90% legacy
            default -> routeToMonolith(exchange, chain);
        };
    }
}

// ── Anti-Corruption Layer — translates legacy → new domain model ─────────────
@Service
public class LegacyOrderAdapter {

    private final LegacyOrderRepository legacyRepo;  // points to monolith DB
    private final NewOrderRepository    newRepo;

    public Order findById(String orderId) {
        // Try new service DB first, fall back to monolith with translation
        return newRepo.findById(orderId)
            .orElseGet(() -> translateFromLegacy(
                legacyRepo.findByOrderNumber(orderId).orElseThrow()));
    }

    private Order translateFromLegacy(LegacyOrder legacy) {
        return Order.builder()
            .id(legacy.getOrderNumber())           // renamed field
            .customerId(legacy.getCustId())        // abbreviated → full name
            .status(mapLegacyStatus(legacy.getStatusCode())) // code → enum
            .totalAmount(legacy.getTotalCents()    // cents → BigDecimal
                .divide(BigDecimal.valueOf(100)))
            .createdAt(legacy.getCreateDttm().toInstant())
            .build();
    }

    private OrderStatus mapLegacyStatus(String code) {
        return switch (code) {
            case "P" -> OrderStatus.PENDING;
            case "A" -> OrderStatus.PROCESSING;   // 'A' = Active in legacy system
            case "C" -> OrderStatus.COMPLETED;
            case "X" -> OrderStatus.CANCELLED;
            default -> throw new UnknownStatusException(code);
        };
    }
}

Sidecar Pattern & Istio Service Mesh

envoy proxy · mTLS · traffic management · zero-code resilience

IstiomTLSTraffic

A sidecar container (Envoy proxy) runs alongside your service in the same pod, handling mTLS, retries, circuit breaking, and observability — without any code changes in your app.

💡

Interview Key: With Istio, retry/CB/timeout config moves to YAML (VirtualService / DestinationRule). Your service makes plain HTTP — Envoy does everything else. This enables ops teams to change resilience config without redeploying services.

YAML · Istio VirtualService + DestinationRule

# ── VirtualService — traffic splitting + retries + fault injection ────────────
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
    - order-service
  http:
    - match:
        - headers:
            x-canary: {exact: 'true'}
      route:
        - destination:
            host: order-service
            subset: v2
          weight: 100
    - route:
        - destination:
            host: order-service
            subset: v1
          weight: 90
        - destination:
            host: order-service
            subset: v2
          weight: 10  # 10% canary
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: gateway-error,connect-failure,retriable-4xx
      timeout: 10s
      fault:  # chaos engineering — inject 5% delays in staging
        delay:
          percentage: {value: 5.0}
          fixedDelay: 5s
---
# ── DestinationRule — CB + mTLS + connection pool (Bulkhead at mesh level) ────
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: order-service
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100          # Bulkhead at mesh level
      http:
        http1MaxPendingRequests: 50
    outlierDetection:              # Circuit Breaker at mesh level
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
    tls:
      mode: ISTIO_MUTUAL  # automatic mTLS — zero code change
  subsets:
    - name: v1
      labels: {version: v1}
    - name: v2
      labels: {version: v2}

JWT + OAuth2 (Zero-Trust)

resource server · client credentials · service-to-service auth

SecurityOAuth2

External calls use user JWT Bearer tokens. Internal service-to-service calls use Client Credentials grant (machine tokens). Every service validates independently — never trust the network.

💡

Interview Key: mTLS (from Istio) + JWT gives defense in depth. Client Credentials tokens: short-lived (5-15 min), auto-cached by Spring Security, auto-refreshed before expiry. Never hardcode secrets — use Vault dynamic credentials.

Java · Security Config + WebClient OAuth2

@Configuration
@EnableWebSecurity
@EnableMethodSecurity  // enables @PreAuthorize on methods
public class SecurityConfig {

    @Bean
    public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
        return http
            .csrf(AbstractHttpConfigurer::disable)
            .sessionManagement(s ->
                s.sessionCreationPolicy(SessionCreationPolicy.STATELESS))
            .authorizeHttpRequests(auth -> auth
                .requestMatchers("/actuator/health").permitAll()
                .requestMatchers("/actuator/**").hasRole("ADMIN")
                .requestMatchers(HttpMethod.GET, "/api/v1/products/**")
                    .hasAnyRole("USER", "ADMIN", "SERVICE")
                .requestMatchers("/api/v1/admin/**").hasRole("ADMIN")
                .anyRequest().authenticated())
            .oauth2ResourceServer(oauth2 -> oauth2
                .jwt(jwt -> jwt
                    .jwtAuthenticationConverter(jwtAuthConverter())))
            .build();
    }

    @Bean
    public JwtAuthenticationConverter jwtAuthConverter() {
        JwtGrantedAuthoritiesConverter converter = new JwtGrantedAuthoritiesConverter();
        converter.setAuthoritiesClaimName("roles");  // custom claim name
        converter.setAuthorityPrefix("ROLE_");
        JwtAuthenticationConverter jwtConverter = new JwtAuthenticationConverter();
        jwtConverter.setJwtGrantedAuthoritiesConverter(converter);
        return jwtConverter;
    }
}

// ── Service-to-Service — Client Credentials WebClient ────────────────────────
@Bean
public WebClient paymentServiceClient(OAuth2AuthorizedClientManager clientManager) {
    // Automatically fetches, caches, and refreshes client_credentials token
    ServerOAuth2AuthorizedClientExchangeFilterFunction oauth =
        new ServerOAuth2AuthorizedClientExchangeFilterFunction(clientManager);
    oauth.setDefaultClientRegistrationId("payment-service");

    return WebClient.builder()
        .baseUrl("http://payment-service")
        .filter(oauth)  // auto-attaches Bearer token to every request
        .build();
}

// ── application.yml — client registration ────────────────────────────────────
// spring.security.oauth2.client.registration.payment-service:
//   client-id: order-service
//   client-secret: ${ORDER_SERVICE_SECRET}  # from Vault
//   authorization-grant-type: client_credentials
//   scope: payment:write payment:read

Distributed Tracing

micrometer tracing · brave · zipkin · kafka header propagation

ObservabilityTracing

Every request gets a traceId spanning all services and a spanId per service hop. Spring Boot 3 uses Micrometer Tracing (replaces Sleuth). Propagate through Kafka headers, WebClient, and @Async boundaries.

💡

Interview Key: Trace context MUST propagate across async boundaries — Kafka headers, @Async thread pools, WebClient. Spring auto-propagates for HTTP, but Kafka requires manual header injection/extraction.

YAML + Java · Tracing Setup + Kafka Propagation

# application.yml
management:
  tracing:
    sampling:
      probability: 1.0  # 100% in dev, 0.1 (10%) in prod
  zipkin:
    tracing:
      endpoint: http://zipkin:9411/api/v2/spans
logging:
  pattern:
    # traceId + spanId auto-injected into every log line
    level: '%5p [${spring.application.name},%X{traceId},%X{spanId}]'

// ─────────────────────────────────────────────────────────────────────────────

@Service @Slf4j
public class OrderProcessingService {

    private final Tracer tracer;

    public OrderResult processOrder(Order order) {
        Span span = tracer.nextSpan()
            .name("process-order")
            .tag("order.id",     order.getId())
            .tag("order.amount", order.getAmount().toString())
            .start();

        try (Tracer.SpanInScope ws = tracer.withSpan(span)) {
            log.info("Processing order");  // traceId auto-injected in log
            OrderResult result = executeOrder(order);
            span.tag("order.status", result.getStatus().name());
            return result;
        } catch (Exception ex) {
            span.error(ex);   // marks span as ERROR in Zipkin/Tempo
            throw ex;
        } finally {
            span.end();
        }
    }
}

// ── Kafka Consumer — MUST extract trace context from message headers ──────────
@Component
public class TracingKafkaConsumer {

    @KafkaListener(topics = "order.created")
    public void consume(ConsumerRecord<String, String> record) {
        // Extract parent trace from Kafka headers to continue the same trace
        TraceContext parentCtx = propagator.extract(
            record.headers(),
            (headers, key) -> {
                Header h = headers.lastHeader(key);
                return h != null ? new String(h.value()) : null;
            }
        );

        Span span = tracer.nextSpan(parentCtx).name("consume-order-created").start();
        try (Tracer.SpanInScope ws = tracer.withSpan(span)) {
            processMessage(record);
        } finally { span.end(); }
    }
}

Observability — RED & USE Metrics

micrometer · custom health indicators · prometheus · grafana

MetricsHealth

RED: Rate (req/sec), Errors (error %), Duration (latency p95/p99). USE: Utilization (CPU/pool %), Saturation (queue depth), Errors (system errors). Custom health indicators expose deep service health beyond just "UP".

💡

Interview Key: Health endpoint should check DB pool saturation, Kafka consumer lag, downstream dependency health — not just "is the app running". Expose separate /health/liveness and /health/readiness for Kubernetes probes.

Java · Custom Health + RED Metrics

// ── Custom Health Indicator — Kafka Consumer Lag ──────────────────────────────
@Component
public class KafkaHealthIndicator implements HealthIndicator {

    @Override
    public Health health() {
        try {
            long maxLag = offsetService.getMaxConsumerLag("order-processor");
            Map<String, Object> details = Map.of(
                "maxConsumerLag", maxLag,
                "brokerCount",  getBrokerCount());

            if (maxLag > 10000) {
                return Health.down()
                    .withDetails(details)
                    .withDetail("reason", "Consumer lag too high")
                    .build();
            }
            return Health.up().withDetails(details).build();

        } catch (Exception ex) {
            return Health.down(ex).build();
        }
    }
}

// ── RED Metrics with Micrometer ───────────────────────────────────────────────
@Service
public class OrderMetricsService {

    private final Counter orderCreatedCounter;
    private final Counter orderFailedCounter;
    private final Timer   orderProcessingTimer;

    public OrderMetricsService(MeterRegistry registry) {
        // RED: Rate
        orderCreatedCounter = Counter.builder("orders.created.total")
            .description("Total orders created")
            .register(registry);
        // RED: Errors
        orderFailedCounter = Counter.builder("orders.failed.total")
            .register(registry);
        // RED: Duration — auto-calculates p50, p95, p99
        orderProcessingTimer = Timer.builder("orders.processing.duration")
            .publishPercentiles(0.5, 0.95, 0.99)
            .publishPercentileHistogram(true)
            .sla(Duration.ofMillis(100), Duration.ofMillis(500))
            .register(registry);
    }

    public Order createOrder(CreateOrderRequest req) {
        return orderProcessingTimer.record(() -> {
            try {
                Order o = doCreateOrder(req);
                orderCreatedCounter.increment();
                // Tag-based breakdown by region/channel
                registry.counter("orders.created.by.region",
                    "region",  req.getRegion(),
                    "channel", req.getChannel())
                    .increment();
                return o;
            } catch (Exception ex) {
                orderFailedCounter.increment();
                registry.counter("orders.failed.by.reason",
                    "reason", ex.getClass().getSimpleName()).increment();
                throw ex;
            }
        });
    }
}

Consumer-Driven Contracts

spring cloud contract · pact · wiremock stubs · independent deploys

TestingContracts

Consumers define contracts (what API they need). Providers verify against those contracts in CI. Both sides get automated verification without running all services together — enabling truly independent deployments.

💡

Interview Key: Stubs published to Maven repo by CI pipeline. Consumer pulls stubs, runs against WireMock. Provider auto-generates tests from contract DSL. Catches breaking API changes BEFORE deploy — not after.

Groovy DSL + Java · Contract Definition + Tests

// ── Contract (lives in provider project's test resources) ─────────────────────
// src/test/resources/contracts/order/get-order-by-id.groovy
Contract.make {
    description 'should return order by ID'
    request {
        method GET()
        url '/api/v1/orders/12345'
        headers {
            header('Authorization', $(consumer(regex('Bearer .+')),
                                       producer('Bearer valid-token')))
        }
    }
    response {
        status OK()
        body(
            orderId:     '12345',
            status:      $(anyOf('PENDING', 'PROCESSING', 'SHIPPED')),
            customerId:  $(anyNonEmptyString()),
            totalAmount: $(anyPositiveDecimal())
        )
    }
}

// ── Provider Base Test — stub data the contract expects ───────────────────────
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.DEFINED_PORT)
public class OrderContractBase {
    @MockBean OrderRepository orderRepo;

    @BeforeEach
    void setUp() {
        given(orderRepo.findById("12345")).willReturn(
            Optional.of(Order.builder()
                .id("12345")
                .status(OrderStatus.PROCESSING)
                .customerId("cust-99")
                .totalAmount(new BigDecimal("99.99"))
                .build()));
        RestAssured.baseURI = "http://localhost:" + port;
    }
    // Plugin auto-generates test classes from contracts that extend this base
}

// ── Consumer Test — uses generated WireMock stubs ─────────────────────────────
@SpringBootTest
@AutoConfigureStubRunner(
    ids = "com.example:order-service:+:stubs",    // pulls from Maven
    stubsMode = StubRunnerProperties.StubsMode.CLASSPATH
)
class OrderClientTest {

    @Autowired OrderClient orderClient;

    @Test
    void shouldFetchOrderById() {
        // Stub runner starts WireMock with contract stubs automatically
        OrderResponse response = orderClient.getOrder("12345");

        assertThat(response.getOrderId()).isEqualTo("12345");
        assertThat(response.getStatus()).isIn("PENDING", "PROCESSING", "SHIPPED");
        assertThat(response.getTotalAmount()).isPositive();
    }
}

Testing Strategy

testcontainers · wiremock · test pyramid · component tests

TestcontainersWireMock

70% Unit tests (business logic isolation), 20% Integration tests (real DB/Kafka via Testcontainers), 10% Component/Contract/E2E. Never mock what you own — use Testcontainers for real databases.

💡

Interview Key: Integration tests with Testcontainers spin up real PostgreSQL + Kafka in Docker. Component tests use WireMock for downstream stubs. Avoid excessive E2E tests — they're slow, flaky, and hard to maintain.

Java · Testcontainers Integration + WireMock Component

// ── Integration Test — real PostgreSQL + Kafka in Docker ─────────────────────
@SpringBootTest
@Testcontainers
class OrderRepositoryIntegrationTest {

    @Container
    static PostgreSQLContainer<?> postgres =
        new PostgreSQLContainer<>("postgres:15")
            .withDatabaseName("testdb")
            .withInitScript("schema.sql");

    @Container
    static KafkaContainer kafka =
        new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:7.4.0"));

    @DynamicPropertySource
    static void configure(DynamicPropertyRegistry registry) {
        registry.add("spring.datasource.url",      postgres::getJdbcUrl);
        registry.add("spring.datasource.username",  postgres::getUsername);
        registry.add("spring.datasource.password",  postgres::getPassword);
        registry.add("spring.kafka.bootstrap-servers", kafka::getBootstrapServers);
    }

    @Autowired OrderRepository orderRepo;

    @Test
    void shouldPersistOrder() {
        Order order = Order.builder()
            .id(UUID.randomUUID().toString())
            .customerId("cust-1")
            .status(OrderStatus.PENDING)
            .build();
        orderRepo.save(order);
        assertThat(orderRepo.findById(order.getId())).isPresent();
    }
}

// ── Component Test — real Spring context, mocked downstream ──────────────────
@SpringBootTest(webEnvironment = RANDOM_PORT)
class OrderServiceComponentTest {

    @RegisterExtension
    static WireMockExtension inventoryMock = WireMockExtension.newInstance()
        .options(wireMockConfig().port(8081)).build();

    @RegisterExtension
    static WireMockExtension paymentMock = WireMockExtension.newInstance()
        .options(wireMockConfig().port(8082)).build();

    @Test
    void shouldCreateOrderWhenInventoryAndPaymentSucceed() {
        inventoryMock.stubFor(get("/api/inventory/PROD-1")
            .willReturn(okJson("{\"available\":true,\"productId\":\"PROD-1\"}")));

        paymentMock.stubFor(post("/api/payment/charge")
            .willReturn(okJson("{\"transactionId\":\"TXN-99\",\"status\":\"SUCCESS\"}")));

        ResponseEntity<Order> response = restTemplate.postForEntity(
            "/api/v1/orders",
            new CreateOrderRequest("cust-1", "PROD-1", 2),
            Order.class);

        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.CREATED);
        assertThat(response.getBody().getStatus()).isEqualTo(OrderStatus.CONFIRMED);

        // Verify downstream services were actually called
        inventoryMock.verify(1, getRequestedFor(urlEqualTo("/api/inventory/PROD-1")));
        paymentMock.verify(1, postRequestedFor(urlEqualTo("/api/payment/charge")));
    }
}

★

Quick Reference

pattern decision guide · interview cheat sheet

Problem	Pattern	Key Interview Points
Cascading failures from slow downstream	Circuit Breaker	3 states: CLOSED→OPEN→HALF_OPEN. failureRateThreshold=50, slidingWindow=10, waitDuration=10s. Always pair with TimeLimiter.
DB write + Kafka publish atomicity	Transactional Outbox	Same @Transactional saves entity + outbox event. Poller/Debezium publishes. At-least-once delivery — consumers must be idempotent.
Multi-service business transaction	Saga (Orchestration)	Stateful coordinator + compensating transactions in reverse order. Persist saga state in DB after each step. Compensation failures need manual ops.
Decoupled event-driven transaction	Saga (Choreography)	Each service emits success/failure events. Hard to debug — add correlationId/sagaId to all events. No single point of failure.
High read/write load divergence	CQRS	Write: PostgreSQL. Read: Elasticsearch/Redis. Events bridge the two. Expect ms-to-second eventual consistency lag.
Full audit trail / time travel queries	Event Sourcing	Immutable event log. Replay = current state. Snapshots every 100 events for performance. Schema evolution via upcasters.
Thread starvation from slow dependency	Bulkhead	Separate thread pool per downstream. queueCapacity limits backlog. Semaphore for lightweight rate-limiting.
Monolith to microservices migration	Strangler Fig + ACL	Gateway routes migrated endpoints. ACL translates legacy model. Canary routing for gradual cutover. Always have rollback switch.
API breaking changes detection	Consumer Contracts	Pact/Spring Cloud Contract. Stubs in Maven. Provider auto-tests from DSL. Catches breaks in CI before deploy.
Cross-service call debugging	Distributed Tracing	TraceId across all logs. Propagate through Kafka headers, @Async, WebClient. Sample 10% in prod, 100% in dev.
Service-to-service authentication	Client Credentials	Machine tokens, 5-15 min TTL, auto-cached + auto-refreshed. mTLS from Istio for defense in depth.
Client-specific API aggregation	BFF Pattern	Separate aggregator per client type. Parallel calls via Mono.zip(). No business logic — only aggregation. Team owns their BFF.
Zero-code retries, CB, mTLS between services	Service Mesh (Istio)	Envoy sidecar handles everything. Config in VirtualService/DestinationRule YAML. Ops changes without redeploying services.
Integration tests with real dependencies	Testcontainers	Real PostgreSQL + Kafka in Docker. @DynamicPropertySource wires ports. Never mock what you own.

🎯

In interviews, always explain WHY (what problem it solves), TRADE-OFFS (what you give up), and WHEN NOT TO USE it. Interviewers reward context and judgment over memorized code.