Execution Tracing in Stove 0.21.0¶
If you've spent any time debugging e2e test failures, you know the routine. The test says "expected 201 but was 500" and you're left reverse-engineering what actually happened. Did the request reach the controller? Did the database reject the write? Did a downstream service return something unexpected? You open the logs, grep for request IDs, cross-reference timestamps, and eventually piece together the story. Twenty minutes later, you have an answer.
The fundamental problem is that e2e tests treat the application as a black box. They can tell you the output was wrong, but they have no visibility into the execution path that produced it. For simple flows that's fine. For a request that touches a gRPC service, two REST APIs, a database, and a Kafka topic before returning a response, it's a real productivity drain. In a microservice architecture with multiple integration points, this kind of failure can easily take 30 minutes to diagnose. Multiply that by every flaky test in your CI pipeline, and the cost adds up fast.
Stove 0.21.0 introduces execution tracing to address this. When a test fails, you get the entire call chain of your application: every controller method, every database query, every Kafka message, every HTTP call, with timing and the exact point of failure. The bug might be buried deep in the persistence layer, but the trace pinpoints it without a single grep.
Stove in 30 Seconds¶
For those new to Stove: it's an end-to-end testing framework for the JVM. It spins up your real application with real dependencies (PostgreSQL, Kafka, MongoDB, Redis, etc. via Testcontainers) and gives you a unified Kotlin DSL for assertions across all of them. It works with Spring Boot, Ktor, Micronaut, and Quarkus. Tests can be written in Kotlin, Java, or Scala.
The key idea: test your entire application stack as it runs in production, not a stripped-down mock version.
A Real Application: The Spring Showcase¶
To demonstrate tracing, let's walk through a realistic application. The spring-showcase recipe is an order service that touches six different integration points during a single request:
flowchart LR
A["HTTP POST /api/orders"] --> B[OrderService]
B --> C["Fraud Detection (gRPC)"]
B --> D["Inventory Check (REST)"]
B --> E["Payment (REST)"]
B --> F["PostgreSQL - Save Order"]
B --> G["Kafka - Publish Events"]
B --> H["db-scheduler - Schedule Email"]
Here's the service code. Each method is annotated with @WithSpan so the OpenTelemetry agent captures it:
@Service
class OrderService(
private val orderRepository: OrderRepository,
private val inventoryClient: InventoryClient,
private val paymentClient: PaymentClient,
private val fraudDetectionClient: FraudDetectionClient,
private val eventPublisher: OrderEventPublisher,
private val emailSchedulerService: EmailSchedulerService
) {
@WithSpan("OrderService.createOrder")
suspend fun createOrder(userId: String, productId: String, amount: Double): Order {
// Step 1: Check fraud via gRPC
checkFraudViaGrpc(orderId, userId, amount, productId)
// Step 2: Check inventory via REST
checkInventoryViaRest(productId)
// Step 3: Process payment via REST
val payment = processPaymentViaRest(userId, amount)
// Step 4: Save to database
val savedOrder = saveOrderToDatabase(orderId, userId, productId, amount, payment.transactionId!!)
// Step 5: Publish events to Kafka
publishEventsToKafka(savedOrder, payment.transactionId)
// Step 6: Schedule confirmation email
scheduleConfirmationEmail(savedOrder)
return savedOrder
}
}
And here's how the Stove test covers the entire flow in a single test:
test("The Complete Order Flow - Every Feature in One Test") {
stove {
// 1. Mock the external gRPC service (Fraud Detection)
grpcMock {
mockUnary(
serviceName = "frauddetection.FraudDetectionService",
methodName = "CheckFraud",
response = CheckFraudResponse.newBuilder()
.setIsFraudulent(false)
.setRiskScore(0.15)
.build()
)
}
// 2. Mock the external REST APIs (Inventory + Payment)
wiremock {
mockGet(url = "/inventory/$productId", statusCode = 200,
responseBody = InventoryResponse(productId, available = true, quantity = 10).some())
mockPost(url = "/payments/charge", statusCode = 200,
responseBody = PaymentResult(success = true, transactionId = "txn-123", amount = amount).some())
}
// 3. Call our API
http {
postAndExpectBody<OrderResponse>(uri = "/api/orders",
body = CreateOrderRequest(userId, productId, amount).some()
) { response ->
response.status shouldBe 201
response.body().status shouldBe "CONFIRMED"
}
}
// 4. Verify database state
postgresql {
shouldQuery<OrderRow>(
query = "SELECT * FROM orders WHERE user_id = '$userId'",
mapper = { row -> OrderRow(/* ... */) }
) { orders ->
orders.size shouldBe 1
orders.first().status shouldBe "CONFIRMED"
}
}
// 5. Verify Kafka events
kafka {
shouldBePublished<OrderCreatedEvent> {
actual.userId == userId && actual.productId == productId
}
shouldBePublished<PaymentProcessedEvent> {
actual.amount == amount && actual.success
}
}
// 6. Verify the consumer updated the read model (CQRS)
kafka {
shouldBeConsumed<OrderCreatedEvent> {
actual.userId == userId
}
}
// 7. Test our gRPC server
grpc {
channel<OrderQueryServiceCoroutineStub> {
val order = getOrder(GetOrderRequest.newBuilder().setOrderId(orderId!!).build())
order.found shouldBe true
}
}
// 8. Verify scheduled tasks
tasks {
shouldBeExecuted<OrderEmailPayload> {
this.orderId == orderId && this.userId == userId
}
}
}
}
One test covering eight integration points against real infrastructure.
Setting Up Tracing¶
Tracing takes two configuration steps.
Step 1: Enable in your Stove config¶
Stove()
.with {
tracing {
enableSpanReceiver()
}
// ... your other systems (http, kafka, postgresql, etc.)
}
.run()
Step 2: Attach the OpenTelemetry agent in your build¶
Copy StoveTracingConfiguration.kt to your project's buildSrc/src/main/kotlin/ directory, then add to your build.gradle.kts:
import com.trendyol.stove.gradle.stoveTracing
stoveTracing {
serviceName = "my-service"
testTaskNames = listOf("e2eTest") // optional: scope to specific test tasks
}
This handles downloading the OpenTelemetry Java Agent, configuring JVM arguments, attaching the agent to your test tasks, and dynamically assigning ports so parallel test runs don't conflict.
Gradle Plugin available since 0.21.2
Starting with 0.21.2, a standalone Gradle plugin is available that eliminates the need to copy this file. See the 0.21.2 release notes for details.
No code changes to your application are needed. The OpenTelemetry agent instruments 100+ libraries (Spring, JDBC, Kafka, gRPC, HTTP clients, Redis, MongoDB, and more) automatically. The @WithSpan annotations are optional. They add your own method-level spans on top of what the agent already captures.
What Happens When a Test Fails¶
To see this in practice, we ran the spring-showcase with a bug deliberately injected in the persistence layer: a validation that rejects orders over $1000. The test output included the full execution report:
╔═════════════════════════════════════════════════════════════════════════════
STOVE TEST EXECUTION REPORT
Test: The Complete Order Flow - Every Feature in One Test
ID: TheShowcase::The Complete Order Flow - Every Feature in One Test
Status: FAILED
╠═════════════════════════════════════════════════════════════════════════════
TIMELINE
────────
17:27:22.298 ✓ PASSED [gRPC Mock] Register unary stub: FraudDetectionService/CheckFraud
Output: risk_score: 0.15 reason: "low_risk_user"
17:27:22.335 ✓ PASSED [WireMock] Register stub: GET /inventory/macbook-pro-16
Metadata: {statusCode=200}
17:27:22.341 ✓ PASSED [WireMock] Register stub: POST /payments/charge
Metadata: {statusCode=200}
17:27:25.092 ✗ FAILED [HTTP] POST /api/orders
Input: CreateOrderRequest(userId=user-4b9bb522, productId=macbook-pro-16, amount=2499.99)
Output: {"message":"Internal server error","errorCode":"INTERNAL_ERROR"}
Metadata: {status=500}
Expected: Response<OrderResponse> matching expectation
Error: expected:<201> but was:<500>
╠═════════════════════════════════════════════════════════════════════════════
SYSTEM SNAPSHOTS
────────────────
┌─ GRPC MOCK ────────────────────────────
Registered stubs: 1
Received requests: 1
Matched requests: 1
┌─ WIREMOCK ─────────────────────────────
Registered stubs (this test): 2
Served requests (this test): 2 (matched: 2)
┌─ KAFKA ────────────────────────────────
Consumed: 0
Published: 0
Failed: 0
╚═════════════════════════════════════════════════════════════════════════════
The report is structured in two parts. First, a timeline of every test step showing what passed and what failed. Then, a snapshot of each system's state at the moment of failure. You can already read the situation: the gRPC mock matched its request, WireMock served both stubs successfully, but Kafka has zero messages. The application crashed before it could publish any events.
Below the report, the execution trace shows what happened inside the application:
═══════════════════════════════════════════════════════════════
EXECUTION TRACE (Call Chain)
═══════════════════════════════════════════════════════════════
✓ POST /api/orders [250ms]
├── ✓ OrderService.createOrder [245ms]
│ ├── ✓ OrderService.checkFraudViaGrpc [30ms]
│ │ └── ✓ FraudDetectionClient.checkFraud [25ms]
│ ├── ✓ OrderService.checkInventoryViaRest [40ms]
│ │ └── http.url: http://localhost:54648/inventory/macbook-pro-16
│ ├── ✓ OrderService.processPaymentViaRest [35ms]
│ │ └── http.url: http://localhost:54648/payments/charge
│ ├── ✗ OrderService.saveOrderToDatabase [8ms] ◄── FAILURE POINT
│ │ └── ✗ PostgresOrderRepository.save [5ms]
│ │ │ Error: OrderPersistenceException
│ │ │ Message: Failed to persist order: amount exceeds internal threshold
│ │ │ at PostgresOrderRepository.validateOrderAmount(PostgresOrderRepository.kt:102)
│ │ └── db.system: postgresql
The fraud, inventory, and payment steps all passed. The failure happened in OrderService.saveOrderToDatabase, specifically in PostgresOrderRepository.save, with the exception type, message, and stack trace right there. Without tracing, this would have been a 500 error with no context. With tracing, the root cause is immediately visible.
Automatic Trace Propagation¶
Stove injects trace headers into every outgoing interaction without any test code changes:
- HTTP requests get a
traceparentheader - Kafka messages get trace headers
- gRPC calls get trace metadata
This is visible in the actual test output. The HTTP request sent by Stove:
REQUEST: http://localhost:8024/api/orders
METHOD: POST
HEADERS:
Accept: application/json
X-Stove-Test-Id: TheShowcase::The Complete Order Flow - Every Feature in One Test
traceparent: 00-475e686523af0b4ee0433f91a69a6b55-81edd5ba7e4dec42-01
And the WireMock request log confirming the propagation reached the downstream call:
Request received:
127.0.0.1 - GET /inventory/macbook-pro-16
traceparent: [00-475e686523af0b4ee0433f91a69a6b55-e3f138ac02509a0b-01]
Same trace ID (475e686523af0b4ee0433f91a69a6b55), different span ID. The entire call chain is correlated.
Per-Test Trace Isolation¶
A critical detail: every test gets its own trace. Stove generates a unique trace ID at the start of each test and injects it into every outgoing interaction. All spans collected during that test are correlated back to that trace ID and that test alone.
This means traces from concurrent or sequential tests never bleed into each other. When a test fails, the execution trace shows only what happened during that specific test, not spans from a previous test that happened to use the same Kafka topic or a background job triggered by an earlier request.
This is not something you get for free with OpenTelemetry. In production, a trace starts when a request enters the system. In testing, there's no natural entry point. Stove creates one. It manages the W3C trace context lifecycle (start, propagate, end) per test, ties it to the test identity (X-Stove-Test-Id header), and ensures the OTLP receiver maps incoming spans to the correct test. The result is that tracing in Stove is deterministic and test-scoped, not a sampling-based best-effort like production tracing.
Trace Validation DSL¶
Beyond automatic failure reports, you can actively assert on the execution flow using the tracing { } DSL. This is useful when you want to verify how your application handled a request, not just that it produced the right output:
test("order processing should call all expected services") {
stove {
http {
postAndExpectBody<OrderResponse>("/api/orders", request.some()) { response ->
response.status shouldBe 201
}
}
tracing {
// Verify which operations happened
shouldContainSpan("OrderService.createOrder")
shouldContainSpan("OrderService.checkFraudViaGrpc")
shouldContainSpan("OrderService.checkInventoryViaRest")
shouldContainSpan("PostgresOrderRepository.save")
// Verify no operations failed
shouldNotHaveFailedSpans()
// Performance assertions
executionTimeShouldBeLessThan(500.milliseconds)
// Attribute assertions
shouldHaveSpanWithAttribute("db.system", "postgresql")
// Debugging helpers
println(renderTree()) // Print the hierarchical tree
println(renderSummary()) // Print compact summary
}
}
}
The DSL supports:
- Span assertions:
shouldContainSpan(),shouldNotContainSpan(),shouldContainSpanMatching() - Failure assertions:
shouldNotHaveFailedSpans(),shouldHaveFailedSpan() - Performance assertions:
executionTimeShouldBeLessThan(),spanCountShouldBeAtLeast() - Attribute assertions:
shouldHaveSpanWithAttribute(),shouldHaveSpanWithAttributeContaining() - Query methods:
findSpanByName(),getFailedSpans(),getTotalDuration() - Async support:
waitForSpans(expectedCount, timeoutMs)for async flows
How It Works¶
sequenceDiagram
participant Test as Stove Test
participant App as Application
participant OTel as OTel Agent
participant Receiver as OTLP Receiver
participant Report as Report Builder
Test->>App: HTTP POST with traceparent
OTel->>OTel: Instrument libraries
App->>App: Process request
OTel->>Receiver: Export spans via OTLP gRPC
Receiver->>Receiver: Correlate spans by trace ID
alt Test passes
Test->>Test: Traces available via DSL
else Test fails
Report->>Receiver: Query spans for this test
Report->>Report: Build report + trace tree
Report->>Test: Display combined report
end
The architecture:
- OpenTelemetry Java Agent attaches to your application process (configured via Gradle) and instruments 100+ libraries without code changes
- Stove starts an OTLP gRPC receiver on a dynamically assigned port that collects spans exported by the agent
- W3C
traceparentheaders are injected into every HTTP, Kafka, and gRPC interaction, correlating all spans back to the originating test - On test failure, the report builder queries the collected spans, builds a hierarchical tree, and renders it alongside the execution report
- Ports are dynamically assigned so parallel test runs on CI don't conflict
Worth noting: the OTel agent does add some startup overhead to the test JVM (a few seconds). For most e2e test suites that spin up Testcontainers, this is negligible relative to container startup time. If it matters, tracing can be toggled off with enabled = false in the Gradle config.
Practical Advice¶
- Enable tracing by default. The overhead is minimal compared to container startup, and the diagnostic value on failure is significant.
- Use
tracing { }sparingly. The automatic failure reports cover most debugging needs. Reserve the DSL for cases where you want to assert on the execution flow itself, for example verifying that a cache was hit instead of the database. - Start with
shouldNotHaveFailedSpans(). The simplest assertion that catches unexpected errors anywhere in the call chain. - Filter noisy instrumentations. Some libraries generate a lot of spans. Tune with
disabledInstrumentations:
stoveTracing {
serviceName = "my-service"
disabledInstrumentations = listOf("jdbc", "hibernate", "spring-scheduling")
}
Getting Started¶
Add the dependencies:
dependencies {
testImplementation(platform("com.trendyol:stove-bom:0.21.0"))
testImplementation("com.trendyol:stove")
testImplementation("com.trendyol:stove-spring") // or stove-ktor, stove-micronaut
testImplementation("com.trendyol:stove-tracing")
testImplementation("com.trendyol:stove-extensions-kotest") // or stove-extensions-junit
// Add components as needed: stove-postgres, stove-kafka, stove-http, etc.
}
Enable tracing in two steps:
// build.gradle.kts
import com.trendyol.stove.gradle.stoveTracing
stoveTracing {
serviceName = "my-service"
}
// Stove config
tracing {
enableSpanReceiver()
}
For a complete working example, see the spring-showcase recipe. It demonstrates all Stove features together (HTTP, gRPC, Kafka, PostgreSQL, WireMock, db-scheduler, and tracing) in a realistic Spring Boot application.
Links: