Behaviour-Driven Development (BDD) is a software development process that
evolved from Test-Driven Development (TDD). Although testing remains integral,
the primary focus is on collaboration and communication among developers, QA
teams, business analysts, and product owners. This guide walks through
implementing BDD in Rust with the modern cucumber testing framework, covering
practical techniques, best practices, and lessons from real-world projects.
At its core, BDD is not merely a testing technique but a methodology for building a shared understanding of a system's behaviour.1 The central goal is to create a ubiquitous language that both technical and non-technical stakeholders can use to describe and agree upon software requirements.2 This process is centred on conversation; the discussions about how a feature should behave are the most valuable output of BDD.3
The tangible artefact of these conversations is a set of specifications written in a structured, natural language format. These specifications serve a dual purpose: they are human-readable documentation of the system's features, and they are also executable tests that verify the system's behaviour. This approach ensures that documentation and implementation cannot drift apart, creating a suite of "living documentation."
The value of BDD is realized before a single line of implementation code is written. When a development team writes behaviour specifications in isolation, they are simply using a different syntax for their tests. The transformative potential of BDD is unlocked only when these specifications are co-created and validated through a collaborative process involving all team members. This ensures that what is built is precisely what the business needs, reducing ambiguity and rework.
To facilitate this process, BDD frameworks like Cucumber use a specific Domain-
Specific Language (DSL) called Gherkin.4 Gherkin provides a simple,
structured grammar for writing executable specifications in plain text files
with a .feature extension.5 Its syntax is designed to be intuitive and
accessible, enabling clear communication across different project roles.3
A Gherkin document is line-oriented, with most lines beginning with a specific keyword. The primary keywords give structure and meaning to the specifications.6
| Keyword | Purpose | Simple Example |
|---|---|---|
| Feature | Provides a high-level description of a software feature and groups related scenarios.3 | Feature: User Authentication |
| Scenario | Describes a single, concrete example of the feature's behaviour.3 | Scenario: Successful login with valid credentials |
| Given | Sets the initial context or preconditions for a scenario.4 | Given the user is on the login page |
| When | Describes the key action or event that triggers the behaviour being tested.1 | When the user enters their username and password |
| Then | Specifies the expected outcome or result of the action.7 | Then the user should be redirected to the dashboard |
| And, But | Used to add more steps to a Given, When, or Then clause without repetition, improving readability.3 | And the user's name should be displayed |
| Background | Defines a set of steps that run before every Scenario in a Feature, used for common setup.5 | Background: Given a registered user "Alice" exists |
| Scenario Outline | A template for running the same Scenario multiple times with different data sets.3 | Scenario Outline: Login with various credentials |
| Examples | A data table that provides the values for a Scenario Outline.3 | username | password | outcome |
For developers, the Given-When-Then structure is not an entirely new concept.
It is a highly effective reformulation of well-established testing patterns
that many are already familiar with from unit testing.4 The most common
parallel is the Arrange-Act-Assert (AAA) pattern, conceptualized by Bill
Wake.
-
Given corresponds to Arrange: This phase sets up the world. It establishes all preconditions, initializes objects, and brings the system under test (SUT) to the specific state required for the test. In Gherkin, this is where the team describes the context before the behaviour begins.4
-
When corresponds to Act: This is the single, pivotal action performed on the SUT. It's the event or trigger whose consequences are being specified. This phase should ideally contain only one primary action.4
-
Then corresponds to Assert: This phase verifies the outcome. After the action in the
Whenstep, theThensteps check that the SUT's state has changed as expected. These steps should contain the assertions and should be free of side effects.4
This connection demystifies BDD. It is not an alien methodology but a structured, collaborative application of a pattern developers already use. The power of Gherkin lies in making the Arrange-Act-Assert pattern legible and verifiable by non-programmers, thereby turning a simple test into a piece of shared, executable documentation.
Setting up a Rust project to use the cucumber crate requires a few specific
configurations in Cargo.toml and a well-defined directory structure. This
section walks through creating a minimal, runnable test suite from scratch.
To begin, the necessary dependencies must be added and a custom test runner
configured. The cucumber crate is async-native and requires an async runtime
to execute tests; tokio is the most common choice and is used throughout the
official documentation.8
The key configuration step is defining a [[test]] target in Cargo.toml.
This tells Cargo to build a specific test executable. Setting harness = false
is crucial; it disables Rust's default test harness, allowing the cucumber
runner to take control of the process and print its own formatted output to the
console.9
| Section | Key | Value / Description |
|---|---|---|
| [dependencies] | tokio | The async runtime. Required with features like macros and rt-multi-thread.9 |
| [dev-dependencies] | cucumber | The main testing framework crate.10 |
| [dev-dependencies] | futures | Often needed for async operations, particularly with older examples or for specific combinators.11 |
| [[test]] | name | The name of the test-runner file (e.g., "cucumber"). This must match the filename in tests/. |
| [[test]] | harness | Must be set to false so cucumber can manage test execution and output.12 |
Here is a complete Cargo.toml configuration snippet:
[package]
name = "rust-cucumber-guide"
version = "0.1.0"
edition = "2021"
# Your application's dependencies go here
[dependencies]
[dev-dependencies]
cucumber = "0.20"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
[[test]]
name = "cucumber" # Corresponds to `tests/cucumber.rs`
harness = falseA well-organized project structure is vital for maintainable BDD tests. The standard convention separates the human-readable feature specifications from the Rust implementation code.11
.
├── Cargo.lock
├── Cargo.toml
├── src/
│ └── lib.rs # Your application code
└── tests/
├── cucumber.rs # The main test runner
├── features/ # Directory for Gherkin files
│ └── calculator.feature
└── steps/ # Directory for step definition modules
└── calculator_steps.rs
This structure physically embodies the BDD philosophy of separating concerns.
The .feature files in tests/features/ define what the system should do.
These can be read, written, and reviewed by non-technical stakeholders. The
Rust files in tests/steps/ define how those behaviours are tested. This
clear boundary is a cornerstone of effective BDD practice and is strongly
recommended.12
The World is the most critical concept in cucumber-rs. It is a user-defined
struct that encapsulates all the shared state for a single test scenario.10
Each time a scenario begins, a new instance of the World is created. This
instance is then passed mutably to each step (Given, When, Then) within
that scenario.11
This design provides a powerful mechanism for test isolation. Because each
scenario gets its private World instance, there is no risk of state leaking
from one test to another, even when tests are run concurrently.13 This is a
significant advantage of the Rust implementation, leveraging the language's
ownership model to solve a common and difficult problem in test automation.14
To create a World, define a struct and derive cucumber::World. It is also
conventional to derive Debug and Default.8
Worked Example: For a simple calculator application, the World might look
like this:
// In a shared location, e.g., tests/cucumber.rs
use cucumber::World;
// Assuming `my_crate::Calculator` is defined in your `src` directory.
#
pub struct CalculatorWorld {
pub calculator: my_crate::Calculator,
pub last_result: Option<i32>,
}By default, cucumber will instantiate the World using Default::default().
If a World requires more complex initialization (for example, starting a mock
server or connecting to a test database), provide a custom constructor function
using the #[world(init = ...)] attribute.13
With the harness = false setting in Cargo.toml, supply a custom main
function in the test target file (for example, tests/cucumber.rs). This
function acts as the entry point for the test suite.
Because cucumber-rs is async, the main function must be an async fn and
is typically annotated with #[tokio::main].9 The core of the function is
a single line that invokes the test runner:
YourWorld::run("path/to/features").await.10
Worked Example:
// In tests/cucumber.rs
use cucumber::World;
// Define your application's types or import them.
// For this example, we assume a simple Calculator struct exists.
pub mod my_crate {
#
pub struct Calculator {
value: i32,
}
impl Calculator {
pub fn add(&mut self, num: i32) {
self.value += num;
}
pub fn set(&mut self, num: i32) {
self.value = num;
}
pub fn get_value(&self) -> i32 {
self.value
}
}
}
// Define the World for our tests
#
pub struct CalculatorWorld {
pub calculator: my_crate::Calculator,
}
// Import the step definitions
mod steps;
#[tokio::main]
async fn main() {
CalculatorWorld::run("tests/features").await;
}At this point, there is a complete, albeit empty, test suite. Running
cargo test --test cucumber will compile the runner, which will then discover
.feature files in tests/features, find no matching steps, and report them
as undefined.
Step definitions are the "glue" that connects the human-readable Gherkin steps
in .feature files to executable Rust code. The cucumber crate provides
procedural macros to make this connection seamless and type-safe.
The core of step definition is a set of attribute macros: #[given],
#[when], and #[then].8 You apply these macros to Rust functions. When
the test runner encounters a Gherkin step, it looks for a function annotated
with the corresponding macro and a matching text pattern.
Each step definition function must accept a mutable reference to the World
struct as its first argument (for example, world: &mut CalculatorWorld).11
This affords the function the ability to modify the shared state for the
current scenario.
A key design choice in cucumber-rs is the strict separation of these step
types. A function marked with #[then] cannot be used to satisfy a Given
step in a feature file.13 This is a deliberate feature, not a limitation. It
encourages developers to maintain the clean Arrange-Act-Assert structure by
preventing them from accidentally using assertion logic during setup, or
performing actions during verification. This discipline leads to more readable,
robust, and maintainable tests.
Worked Example:
// In tests/steps/calculator_steps.rs
use crate::CalculatorWorld; // Import the World from the test runner
use cucumber::{given, when, then};
#[given("a calculator")]
fn a_calculator(world: &mut CalculatorWorld) {
// The world is already initialized with a default Calculator
// via `CalculatorWorld::default()`, so this step can be empty.
// It exists to make the Gherkin readable.
}
#[when(expr = "I input {int}")]
fn input_number(world: &mut CalculatorWorld, number: i32) {
world.calculator.set(number);
}
#[when(expr = "I add {int}")]
fn add_number(world: &mut CalculatorWorld, number: i32) {
world.calculator.add(number);
}
#[then(expr = "the result should be {int}")]
fn check_result(world: &mut CalculatorWorld, expected: i32) {
assert_eq!(world.calculator.get_value(), expected);
}To make steps dynamic, captured fragments of the Gherkin text must be passed as
arguments to the corresponding Rust functions. cucumber-rs supports two
mechanisms for this: regular expressions and Cucumber Expressions.10
-
Cucumber Expressions (
expr = "..."): This is the recommended default. They are less powerful than regex but are more readable and explicitly designed for this purpose. They provide built-in parsing for common types like{int},{float},{word}, and{string}(in quotes).10 The framework automatically handles parsing the captured string into the corresponding Rust type in your function signature. -
Regular Expressions (
regex = "..."): For more complex matching needs, full regex syntax can be used. Capture groups(...)in the regex correspond to function arguments.11 The framework will still attempt to parse the captured&strinto the function's argument type. It is a best practice to anchor the regex with^and$to ensure the entire step text is matched, preventing partial or ambiguous matches.11.
| Feature | Cucumber Expression Example | Regex Example | Recommendation |
|---|---|---|---|
| Basic Capture | expr = "I have {int} cucumbers" | regex = r"^I have (\d+) cucumbers$" | Use expressions for clarity. |
| Type Conversion | {int} automatically maps to i32, u64, etc. | Capture group (\d+) is a &str, parsed to the function's numeric type. | Expressions are more direct and less error-prone. |
| Readability | High. The intent is clear from the expression itself. | Medium to Low. Regex can become complex and hard to read. | Expressions are superior for collaboration. |
| Flexibility | Limited to its defined syntax (e.g., cannot match complex patterns). | High. Can match almost any text pattern. | Use Regex as a power tool when expressions are insufficient. |
The Then steps are where you verify the system's state. The most
straightforward way to do this is with Rust's standard assertion macros, like
assert_eq! or assert!.10 If an assertion fails, the thread will panic,
and cucumber will mark the step as failed.
However, a more idiomatic and powerful approach is to have your step functions
return a Result.13 A step that returns
Ok(()) passes, while one that returns an Err(...) fails. This has two major
benefits:
-
Cleaner Code: Using the
?operator propagates errors from the application logic or from parsing steps, leading to more concise and readable code. -
Richer Failure Messages: A panic from an
assert!often gives a limited error message. Returning a custom error type that implementsstd::error::Errorprovides detailed, contextual information about why the test failed. This is invaluable for debugging.
Rust's error handling philosophy is built around the Result enum for
recoverable errors, and a test failure is a recoverable error from the test
runner's perspective.15 Embracing this pattern in your step definitions is a
significant best practice.
Worked Example (using Result):
// In a custom error module
# // Using the `thiserror` crate for convenience
pub enum TestError {
#[error("API call failed: {0}")]
ApiError(#[from] reqwest::Error),
#[error("Unexpected status code: expected {expected}, got {actual}")]
UnexpectedStatus { expected: u16, actual: u16 },
#
ParseError(#[from] serde_json::Error),
}
// In a `then` step
#[then(expr = "the response status should be {int}")]
fn check_status(world: &mut ApiWorld, expected_status: u16) -> Result<(), TestError> {
let response = world.last_response.as_ref().expect("No response found");
let actual_status = response.status().as_u16();
if actual_status!= expected_status {
Err(TestError::UnexpectedStatus {
expected: expected_status,
actual: actual_status,
})
} else {
Ok(())
}
}This approach provides a clear, structured error that is much more informative than a simple assertion failure.
As test suites grow in complexity, more advanced Gherkin features become necessary to keep them maintainable and expressive. This section covers techniques for data-driven testing, handling complex inputs, and managing asynchronous operations.
Often, the same behaviour must be tested with various inputs and expected
outputs. Writing a separate Scenario for each case would be highly
repetitive. Gherkin solves this with the Scenario Outline keyword.3
A Scenario Outline acts as a template. You write the steps using placeholders
enclosed in angle brackets, like <input> or <output>. Below the outline,
you provide an Examples table. Each row in this table represents a concrete
run of the scenario, with the column headers matching the placeholders in the
steps.16
Worked Example:
# In tests/features/calculator.feature
Feature: Basic arithmetic
As a user
I want to perform calculations
So I can avoid doing math in my head
Scenario Outline: Adding various numbers
Given a calculator
When I input <num1>
And I add <num2>
Then the result should be <total>
Examples:
| num1 | num2 | total |
| 2 | 3 | 5 |
| -5 | 5 | 0 |
| 100 | 0 | 100 |
| 0 | -20 | -20 |This single Scenario Outline will generate and run four separate tests, each
with its own World instance, providing excellent test coverage with minimal
boilerplate.
Sometimes, a step requires a more complex data structure than can be passed with simple arguments. For example, setting up an initial inventory or providing a list of users. For this, Gherkin provides Data Tables.17
A Data Table is a pipe-delimited table placed directly after a Gherkin step. To
access this table in a Rust step definition, add a
step: &cucumber::gherkin::Step argument to the function. The table can then
be accessed via step.table (which is an Option<Table>).17
Data tables encourage a more declarative style of testing. Instead of writing a series of imperative steps to build up a state (e.g., "Given I add a user 'Alice'", "And I set her role to 'Admin'"), the entire state can be described in a single, readable table.18.
This makes the
Given steps more concise and focused on the initial context.
Worked Example:
# In tests/features/inventory.feature
Scenario: Checking stock levels
Given the following items are in the warehouse:
| name | quantity | location |
| Widgets | 100 | Aisle 1 |
| Gadgets | 50 | Aisle 3 |
When a customer orders 20 "Widgets"
Then the stock level for "Widgets" should be 80// In tests/steps/inventory_steps.rs
use cucumber::{gherkin::Step, given};
use crate::InventoryWorld;
use std::collections::HashMap;
#[given("the following items are in the warehouse:")]
fn given_items_in_warehouse(world: &mut InventoryWorld, step: &Step) {
let table = step.table.as_ref().expect("Step should have a data table");
// The first row is the header.
let header = &table.rows;
let name_idx = header.iter().position(|h| h == "name").unwrap();
let quantity_idx = header.iter().position(|h| h == "quantity").unwrap();
// Iterate over the data rows.
for row in table.rows.iter().skip(1) {
let name = row[name_idx].clone();
let quantity: u32 = row[quantity_idx].parse().unwrap();
world.stock.insert(name, quantity);
}
}If every scenario in a .feature file shares the same set of initial Given
steps, you can use the Background keyword to reduce duplication.5 The
steps listed under
Background will be executed before each Scenario in that feature
file.19
Worked Example:
Feature: User account management
Background:
Given the user is logged in as "Alice"
And the user navigates to the "Profile" page
Scenario: Updating user's email
When the user changes their email to "alice.new@example.com"
Then a confirmation email should be sent to "alice.new@example.com"
Scenario: Changing user's password
When the user updates their password
Then the user should be logged outPitfall Warning: Use Background with caution. If it becomes too long or
is not relevant to every single scenario, it can make the tests harder to
understand by hiding essential context. If only some scenarios share setup, it
is better to create a dedicated Given step and repeat it.14
Modern Rust applications, especially those involving networking, databases, or
file I/O, are heavily asynchronous. The cucumber-rs crate is designed with
this in mind, making it an excellent choice for integration and end-to-end
(E2E) testing.
Step definition functions can be declared as async fn.8 Inside these
functions, any Future – such as a database query or HTTP request – can be
.await-ed. This requires that your test runner’s main function is powered
by an async runtime like tokio.9
The async-first design of cucumber-rs is one of its most powerful features.
It allows for writing tests that accurately reflect the asynchronous nature of
the application under test. Furthermore, because cucumber can run scenarios
concurrently by default, I/O-bound tests can execute in parallel, dramatically
reducing the total runtime of the test suite compared with traditional
synchronous, serial test runners.13 This makes it feasible to run
comprehensive integration test suites as part of your regular development
workflow.
Worked Example (Async Step):
// In tests/steps/api_steps.rs
use crate::ApiWorld;
use cucumber::when;
use std::time::Duration;
#[when(expr = "the client requests the user profile for {word}")]
async fn request_user_profile(world: &mut ApiWorld, username: String) {
// Simulate a network request
tokio::time::sleep(Duration::from_millis(50)).await;
let client = &world.http_client;
let url = format!("{}/users/{}", world.server_address, username);
let response = client.get(&url).send().await.unwrap();
world.last_response = Some(response);
}This section synthesizes the concepts from previous parts into a complete,
practical example: testing a simple key-value store REST API. This demonstrates
a realistic use case for cucumber-rs as an integration testing tool,
leveraging reqwest for HTTP requests and wiremock-rs for creating an
isolated, in-process mock server.
First, the desired behaviour is defined in a Gherkin .feature file. This file
serves as the executable specification for the API.
# In tests/features/kv_store.feature
Feature: Key-Value Store API
As a client of the service
I want to store and retrieve values by key
So that I can persist simple data
Scenario: Successfully storing and retrieving a value
Given the key "hello" does not exist in the store
When a client POSTs the value "world" to "/kv/hello"
Then the response status should be 201
And when the client GETs "/kv/hello"
Then the response status should be 200
And the response body should be "world"
Scenario: Retrieving a non-existent key
Given the key "goodbye" does not exist in the store
When a client GETs "/kv/goodbye"
Then the response status should be 404The World for this test suite needs to manage the state of the HTTP client
and the mock server. It will also store the last API response so that Then
steps can perform assertions on it.
A crucial aspect of this design is that the mock server is part of the World.
This means each scenario gets its own, completely isolated mock server instance
running on a random port. This is the key to enabling fast, reliable, and
parallelizable integration tests.13
// In tests/cucumber.rs
use cucumber::World;
use reqwest::Response;
use wiremock::MockServer;
pub struct ApiWorld {
pub server: MockServer,
pub client: reqwest::Client,
pub last_response: Option<Response>,
}
impl ApiWorld {
async fn new() -> Self {
Self {
server: MockServer::start().await,
client: reqwest::Client::new(),
last_response: None,
}
}
}
mod steps;
#[tokio::main]
async fn main() {
ApiWorld::run("tests/features/kv_store.feature").await;
}Note the use of # and the async fn new() implementation. This is necessary
because starting the MockServer is an async operation and cannot be done in a
Default::default() implementation.13
wiremock-rs is a pure-Rust library for mocking HTTP-based APIs.20
Expectations can be defined (for example, "expect a GET request to /foo") and
specify responses. This is done in the Given steps to set up the state of the
external world before the When action occurs.
Using an in-process mock server like wiremock-rs is a superior pattern for
integration testing. It avoids the complexity and slowness of managing external
services or Docker containers, leading to faster and more reliable test
execution.20
The step definitions will use the server from the ApiWorld to set up mocks
and the client to make requests.
// In tests/steps/api_steps.rs
use crate::ApiWorld;
use cucumber::{given, when, then};
use wiremock::{Mock, ResponseTemplate};
use wiremock::matchers::{method, path};
#[given(expr = "the key {string} does not exist in the store")]
async fn key_does_not_exist(world: &mut ApiWorld, key: String) {
// For a GET, we mock a 404 response.
Mock::given(method("GET"))
.and(path(format!("/kv/{}", key)))
.respond_with(ResponseTemplate::new(404))
.mount(&world.server)
.await;
// For a POST, we mock a 201 Created response.
// This mock will also store the value so a subsequent GET can find it.
Mock::given(method("POST"))
.and(path(format!("/kv/{}", key)))
.respond_with(|req_body: &bytes::Bytes| {
// Mock the successful retrieval after creation
Mock::given(method("GET"))
.and(path(format!("/kv/{}", key)))
.respond_with(ResponseTemplate::new(200).set_body_bytes(req_body.clone()))
.mount(&world.server); // Note: cannot await here, but mount is sync
ResponseTemplate::new(201)
})
.mount(&world.server)
.await;
}
#
async fn client_posts_value(world: &mut ApiWorld, value: String, path: String) {
let url = format!("{}{}", world.server.uri(), path);
let response = world.client.post(&url).body(value).send().await.unwrap();
world.last_response = Some(response);
}
#
async fn client_gets_path(world: &mut ApiWorld, path: String) {
let url = format!("{}{}", world.server.uri(), path);
let response = world.client.get(&url).send().await.unwrap();
world.last_response = Some(response);
}
#[then(expr = "the response status should be {int}")]
async fn check_response_status(world: &mut ApiWorld, expected_status: u16) {
let response = world.last_response.as_ref().expect("No response stored");
assert_eq!(response.status().as_u16(), expected_status);
}
#[then(expr = "the response body should be {string}")]
async fn check_response_body(world: &mut ApiWorld, expected_body: String) {
let response = world.last_response.take().expect("No response stored");
let body = response.text().await.unwrap();
assert_eq!(body, expected_body);
}This complete example demonstrates the full BDD cycle: defining behaviour,
setting up an isolated test environment in the World, mocking dependencies,
executing actions, and asserting outcomes, all within Rust's powerful async
ecosystem.
As a project grows, so does its test suite. Adhering to best practices is essential to ensure that your Cucumber tests remain a valuable asset rather than a maintenance burden.
A fundamental principle for writing clean Gherkin is that each scenario should test exactly one behaviour.5 A common anti-pattern is to chain multiple actions and outcomes within a single scenario, often indicated by multiple
When-Then pairs.
Anti-Pattern:
Scenario: User manages their cart
Given a user has an item in their cart
When the user increases the item quantity
Then the cart total should update
And when the user applies a discount code
Then the final price should be lowerThis scenario is testing two distinct behaviours: updating quantity and
applying a discount. If the second Then fails, it's unclear if the issue is
with the discount logic or if the state from the first action was incorrect.
Best Practice: Split this into two focused scenarios.
Scenario: Updating item quantity updates cart total
Given a user has an item in their cart
When the user increases the item quantity
Then the cart total should update
Scenario: Applying a valid discount code reduces the final price
Given a user has an item in their cart
When the user applies a valid discount code
Then the final price should be lowerThis approach isolates failures, improves clarity, and makes each scenario an independent specification of a single rule.5
The most maintainable test suites favor a declarative style over an imperative one.
-
Imperative steps describe how an action is performed, often coupling the test to specific UI elements or implementation details (e.g., "When I click the 'submit-button'"). This makes tests brittle; a small UI change can break many tests.18
-
Declarative steps describe what the user is trying to achieve, focusing on intent and behaviour (e.g., "When I submit my registration").
The collective set of step definitions should evolve into a project-specific Domain-Specific Language (DSL).3 A step like
When I register my account is declarative. Internally, its Rust
implementation might perform several imperative actions (fill form fields,
click a button, wait for an API response), but these details are abstracted
away from the .feature file. This abstraction is the key to creating a robust
and maintainable test suite that communicates business value.
In large projects, the World struct can become a "god object," accumulating
dozens of fields and becoming difficult to manage. To avoid this, use
composition.
Best Practice: Instead of a monolithic World, structure it with smaller,
focused context structs.
// Less maintainable
pub struct MonolithicWorld {
user_id: Option<u32>,
auth_token: Option<String>,
api_client: ApiClient,
db_connection: DbPool,
last_api_response: Option<ApiResponse>,
//… and 20 more fields
}
// More maintainable
pub struct AuthContext {
user_id: Option<u32>,
auth_token: Option<String>,
}
pub struct ComposedWorld {
pub auth: AuthContext,
pub api_client: ApiClient,
pub db_pool: DbPool,
pub last_response: Option<ApiResponse>,
}This approach organizes state logically and makes the World easier to reason
about. For complex setup, always prefer a custom constructor with
#[world(init =...)] over trying to force everything into Default.13
Test code should be organized in the same way as application code.
-
Feature Files: Group
.featurefiles by application capability or user story.19 For example,tests/features/authentication/,tests/features/product_catalog/, etc. -
Step Definitions: Mirror the feature file structure in your
tests/steps/directory. Create a Rust module for each feature area (e.g.,tests/steps/ authentication_steps.rs,tests/steps/catalog_steps.rs). This prevents having a single, massive step definition file and makes it easier to find the code corresponding to a Gherkin step.
Even with best practices, developers can encounter common issues when implementing BDD. Recognizing these pitfalls is the first step to avoiding them.
Pitfall: Sharing state between scenarios using static variables, global
state, or external files. This is a primary cause of flaky, non-deterministic
tests, especially because cucumber-rs runs scenarios concurrently by
default.13
Solution: The World object is the only sanctioned place for scenario
state. Treat each scenario as if it could be running at the same time as any
other. If you must interact with a shared, singular resource (like a physical
hardware device), you must tag the relevant scenarios with @serial. This
forces them to run one at a time.13 However, overuse of @serial is often a
sign of a poor test design and negates the performance benefits of concurrency.
This tag should be used sparingly.
Pitfall: Tests that fail intermittently, often due to timing issues or race conditions in asynchronous code.21 A common mistake is using fixed delays (
tokio::time::sleep) to "wait" for an operation to complete.
Solution:
-
Avoid Arbitrary Sleeps: Never use fixed delays to wait for an event. The correct duration is impossible to guess and will lead to either slow tests or flaky tests.
-
Use Deterministic Mocks: When possible, use tools like
wiremock-rs. The interactions are deterministic and immediate, eliminating timing issues related to network latency.20 -
Implement Explicit Synchronization: When testing against real systems, use mechanisms like polling with a timeout, waiting for a specific log message, or checking a database flag to know when an operation is complete.
-
Use Built-in Retries: For tests that are inherently prone to transient failures (e.g., E2E tests over a real network), use the
cucumberrunner's retry mechanism (--retry <count>) to automatically re-run failed scenarios.22
Pitfall: Littering step definitions with .unwrap() and .expect(). When
these panic, the resulting error message is often generic and lacks the context
needed to quickly diagnose the problem.15 For example, a panic on
world.last_response.as_ref().unwrap() does not indicate which API call failed
to produce a response.
Solution: As discussed in section 3.3, step functions should return a
Result. Define custom, descriptive error types using crates like thiserror
or anyhow to wrap underlying errors and add context. A well-defined Err
variant is far more valuable for debugging than a stack trace from a panic.13
Pitfall: The test run fails with an "ambiguous step" error. This means a single Gherkin step matches the patterns of two or more Rust functions.14
Solution:
-
Be More Specific: Make the Gherkin step text or the matching pattern more precise to eliminate the ambiguity.
-
Anchor Regex: When using regular expressions, always anchor them with
^at the start and$at the end (e.g.,regex = r"^the user is logged in$"). This prevents a step like"the admin user is logged in"from accidentally matching a less specific pattern likeregex = r"user is logged in".11
BDD is most effective when it is an integral part of the daily development workflow and the automated CI/CD pipeline.
Running the entire test suite can be slow. The cucumber test runner supports
a rich set of command-line arguments that allow you to run a targeted subset of
scenarios. These arguments are passed to your test executable after a --
separator: cargo test --test cucumber --.23
| Flag | Purpose | Example Usage |
|---|---|---|
-t, --tags <EXPR> |
Filter scenarios by a tag expression. Supports and, or, and not. | cargo test --test cucumber -- -t "@wip and not @slow" |
-n, --name <REGEX> |
Filter scenarios by a regular expression matching their name. | cargo test --test cucumber -- -n "login" |
| --fail-fast | Stop the test run on the first failure. | cargo test --test cucumber -- --fail-fast |
-c, --concurrency <N> |
Limit the number of scenarios running concurrently. | cargo test --test cucumber -- -c 1 (for serial execution) |
--retry <N> |
Retry failed scenarios up to N times. | cargo test --test cucumber -- --retry 2 |
These flags are essential for developer productivity, enabling rapid feedback by running only the tests relevant to the current work.
The ultimate goal of BDD is to have a suite of executable specifications that continuously validate the system's behaviour. Integrating Cucumber tests into a CI/CD pipeline is what brings this "living documentation" to life.24
The process involves two main steps:
-
Run the tests: The CI job executes
cargo test --test cucumber. Thecucumberrunner will exit with a non-zero status code if any scenario fails, which automatically fails the CI build.24 -
Publish reports: Many CI platforms can parse and display test results in a structured format. The
cucumbercrate supports generating JUnit XML reports via theoutput-junitfeature flag.10 These XML files can then be published as test artifacts for platforms like GitHub Actions, GitLab CI, or Jenkins to consume.24
This CI integration closes the BDD loop. The .feature files, once checked
into version control, are no longer static documents. They become active
participants in the build process. A CI failure on a Cucumber test provides
immediate, unambiguous feedback that the implementation has diverged from the
agreed- upon behaviour, prompting a conversation to either fix the code or
update the specification.
Worked Example (GitHub Actions):
# .github/workflows/ci.yml
name: Rust CI
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
jobs:
test:
name: Run tests
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install Rust toolchain
uses: actions-rs/toolchain@v1
with:
toolchain: stable
components: clippy, rustfmt
- name: Run linter
run: cargo clippy -- -D warnings
- name: Check formatting
run: cargo fmt -- --check
- name: Run Cucumber tests and generate JUnit report
run: cargo test --test cucumber -- --format junit --out-dir target/junit
- name: Publish Test Report
uses: actions/upload-artifact@v4
if: always() # Upload report even if the test step fails
with:
name: junit-test-report
path: target/junit/*.xmlThis workflow demonstrates a standard CI setup for a Rust project, including linting, formatting, and running the Cucumber tests. The final step ensures that the test results are always available for inspection, providing a clear and continuous record of the application's behavioural health.25
Behaviour-Driven Development with the cucumber crate offers a powerful
paradigm for building robust, well-documented, and correct Rust applications.
By shifting the focus from "testing" to "specifying behaviour," BDD fosters
collaboration and creates a shared language that bridges the gap between
business and technical teams.
For Rust developers, the cucumber-rs ecosystem provides a modern, idiomatic,
and high-performance toolset. Its async-first design, combined with the safety
of the World-based state management, makes it uniquely suited for writing the
comprehensive integration and E2E tests required by today's complex, I/O-bound
systems. By mastering the Gherkin syntax, embracing best practices for scenario
and step definition, and integrating these executable specifications into an
automated CI/CD pipeline, development teams can build higher-quality software
with greater confidence and clarity, ensuring that what they build is always
aligned with what is needed.
Footnotes
-
"Given When Then" Framework: a step-by-step guide with examples — Miro, accessed on 14 July 2025, https://miro.com/agile/given-when-then-framework/ ↩ ↩2
-
Is it acceptable to write a "Given When Then When Then" test in Gherkin? — Stack Overflow, accessed on 14 July 2025, https://stackoverflow.com/questions/12060011/is-it-acceptable-to-write-a-given-when-then-when-then-test-in-gherkin ↩
-
Gherkin in Testing: A Beginner's Guide — Rafał Buczyński, Medium, accessed on 14 July 2025, https://medium.com/@buczynski.rafal/gherkin-in-testing-a-beginners-guide-f2e179d5e2df ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9
-
Given When Then — Martin Fowler, accessed on 14 July 2025, https://martinfowler.com/bliki/GivenWhenThen.html ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
How To Start Writing Gherkin Test Scenarios? - Selleo.com, accessed on 14 July 2025, https://selleo.com/blog/how-to-start-writing-gherkin-test-scenarios ↩ ↩2 ↩3 ↩4 ↩5
-
Reference — Cucumber, accessed on 14 July 2025, https://cucumber.io/docs/gherkin/reference/ ↩
-
Given-When-Then - Wikipedia, accessed on 14 July 2025, https://en.wikipedia.org/wiki/Given-When-Then ↩
-
Cucumber Rust Book — Introduction, accessed on 14 July 2025, https://cucumber-rs.github.io/cucumber/main/ ↩ ↩2 ↩3 ↩4
-
Rust BDD tests with Cucumber - DEV Community, accessed on 14 July 2025 https://dev.to/rogertorres/rust-bdd-with-cucumber-4p68 ↩ ↩2 ↩3 ↩4
-
cucumber - Rust - Docs.rs, accessed on 14 July 2025, https://docs.rs/cucumber ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
-
Quickstart — Cucumber Rust Book, accessed on 14 July 2025, https://cucumber-rs.github.io/cucumber/current/quickstart.html
on 14 July 2025, https://www.florianreinhard.de/cucumber-in-rust-beginners-tutorial/ ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
-
Cucumber-rs — fully-native Cucumber testing framework for Rust with no external test runners or dependencies. GitHub, accessed on 14 July 2025, https://github.com/AidaPaul/cucumber-rust ↩ ↩2
-
Quickstart - Cucumber Rust Book, accessed on 14 July 2025, https://cucumber-rs.github.io/cucumber/main/quickstart.html ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11
-
Common Pitfalls and Troubleshooting in Cucumber - GeeksforGeeks, accessed on July 14, 2025, https://www.geeksforgeeks.org/software-testing/common-pitfalls-and-troubleshooting-in-cucumber/ ↩ ↩2 ↩3
-
How to do error handling in Rust and what are the common pitfalls? - Stack Overflow, accessed on 14 July 2025, https://stackoverflow.com/questions/30505639/how-to-do-error-handling-in-rust-and-what-are-the-common-pitfalls ↩ ↩2
-
Writing scenarios with Gherkin syntax — CucumberStudio Documentation, accessed on 14 July 2025, https://support.smartbear.com/cucumberstudio/docs/bdd/write-gherkin-scenarios.html ↩
-
Data tables - Cucumber Rust Book, accessed on 14 July 2025, https://cucumber-rs.github.io/cucumber/main/writing/data_tables.html ↩ ↩2
-
Best practices for scenario writing | CucumberStudio Documentation ↩ ↩2
-
Cucumber Best Practices to follow for efficient BDD Testing | by KailashPathak - Medium, accessed on 14 July 2025, https://kailash-pathak.medium.com/cucumber-best-practices-to-follow-for-efficient-bdd-testing-b3eb1c7e9757 ↩ ↩2
-
Rust Solutions - WireMock, accessed on 14 July 2025, https://wiremock.org/docs/solutions/rust/ ↩ ↩2 ↩3
-
Common Challenges in Cucumber Testing and How to Overcome Them - Medium, accessed on July 14, 2025, https://medium.com/@realtalkdev/common-challenges-in-cucumber-testing-and-how-to-overcome-them-dc95fffb43c8 ↩
-
Cucumber in cucumber - Rust - Docs.rs, accessed on 14 July 2025, https://docs.rs/cucumber/latest/cucumber/struct.Cucumber.html ↩
-
CLI (command-line interface) - Cucumber Rust Book, accessed on 14 July 2025, https://cucumber-rs.github.io/cucumber/main/cli.html ↩
-
Continuous Integration - Cucumber, accessed on 14 July 2025, https://cucumber.io/docs/guides/continuous-integration ↩ ↩2 ↩3
-
Setting up effective CI/CD for Rust projects - a short primer - shuttle.dev, accessed on 14 July 2025, https://www.shuttle.dev/blog/2025/01/23/setup-rust-ci-cd ↩