RFC: Test Sharding

Summary

This RFC proposes adding test sharding to Tuist, allowing users to split their test suites across multiple CI runners for faster feedback loops. The system will distribute tests across shards using timing data collected by the Tuist server, with fallback strategies when no historical data is available. The splitting granularity differs by build system: module-level for Xcode (test targets) and suite-level for Gradle (test suites, i.e. Gradle test classes), reflecting the conventions and tooling capabilities of each ecosystem. While the initial focus is on GitHub Actions integration, the design is CI-agnostic.

Motivation

As projects grow, test suites become the bottleneck in CI pipelines. A monorepo with dozens of test targets can take 30+ minutes to run sequentially on a single machine. Teams work around this by manually splitting tests across CI jobs, but this approach is fragile, hard to maintain, and leads to unbalanced shards where one job takes 20 minutes while others finish in 5.

Tuist is uniquely positioned to solve this because:

  1. Tuist already knows the project graph – it understands which test targets exist and their dependencies.
  2. The Tuist server already collects test timing data – per-module and per-test-case durations from tuist test result uploads, stored in ClickHouse with recent_durations and avg_duration fields.
  3. Tuist already detects CI environments – GitHub Actions, GitLab CI, CircleCI, Buildkite, Bitrise, and Codemagic.

The missing piece is an orchestration layer that uses this data to produce balanced shard assignments and integrates with CI matrix strategies.

Prior Art

Buildkite Test Engine Client (bktec)

Buildkite uses a bin-packing algorithm with historical timing data to distribute tests so all parallel workers finish at roughly the same time. It supports file-level and example-level splitting, marks files exceeding 70% of a worker’s estimated time for finer-grained splitting, and suggests parallelism counts that keep all workers within a ~2-minute completion window. New tests default to an estimated 1000ms until real data is available.

CircleCI circleci tests split

CircleCI offers three strategies: by name (round-robin alphabetically), by timing (historical execution times from store_test_results), and by file size. The parallelism: N key spins up N containers, each aware of its index via $CIRCLE_NODE_INDEX / $CIRCLE_NODE_TOTAL. Timing data accumulates automatically from uploaded test results.

Bazel shard_count

Bazel sets TEST_TOTAL_SHARDS and TEST_SHARD_INDEX environment variables. The test runner selects tests via index % total_shards == shard_index. Purely count-based with no timing optimization.

Gradle / Develocity

Gradle’s built-in maxParallelForks uses round-robin across JVM forks (single machine). Develocity Test Distribution (commercial) uses timing-based partitioning across remote agents with real-time work-stealing.

Proposed Solution

Overview

The sharding workflow has three phases:

  1. Plan – The CLI (or Gradle plugin) queries the server for test timing data and computes a shard assignment.
  2. Execute – Each CI runner receives its shard index and runs only its assigned tests.
  3. Report – Each shard uploads its test results to the server as it does today.

Shard Configuration

Sharding is configured via CLI flags on tuist test --build-only or tuist xcodebuild build-for-testing (for Xcode) or environment variables (for Gradle). This allows users to experiment with sharding in feature branches before rolling it out — the CI workflow file is branch-specific.

See the sharding flags table in Section 1 (Xcode projects) below for the full list of options.

Running a Specific Shard

There are two execution paths depending on the build system.

1. Xcode projects

Sharding is built into two command layers:

  • tuist test (recommended for Tuist-generated projects) — the existing tuist test command gains --shard-* flags. When --build-only is combined with sharding flags, it generates the project, builds for testing, computes shards, and outputs the matrix. When --without-building is used with shard environment variables, it pulls the filtered .xctestrun and runs the assigned tests. This is the recommended path for Tuist-generated projects because it handles project generation, selective testing, and sharding in a single command.
  • tuist xcodebuild build-for-testing / test-without-building (for non-generated projects) — the same sharding behavior, but without project generation or selective testing. This is the path for projects that manage their own .xcodeproj / .xcworkspace.

Under the hood, both paths use the same .xctestproducts bundle mechanism — the only difference is that tuist test also handles project generation and selective testing.

Test module discovery via .xctestrun: The .xctestrun plist file (embedded inside the .xctestproducts bundle produced by xcodebuild build-for-testing -testProductsPath) contains an entry for every test target in the scheme. This is the authoritative source of “what test modules exist in this project right now” — it works regardless of whether the project uses Tuist manifests.

This solves three problems:

  • New modules: A newly added test target appears in the .xctestrun file immediately. The server won’t have timing data for it, so it gets a default duration estimate. It will still be included in a shard and tested.
  • Removed modules: A deleted test target disappears from the .xctestrun file. The server may still have historical timing data, but since the module isn’t in the discovered set, it’s excluded from shard computation. Stale server data is harmlessly ignored.
  • First run: No bootstrapping problem. The .xctestrun file provides the full module list even when the server has no historical data at all. All modules get default estimates, producing a round-robin-like distribution.

Build-once, test-many pattern across machines: In CI, the build step and shard test steps typically run on different machines. The shard runners cannot reference local files from the build agent. To handle this:

  • The plan step (tuist test --build-only or tuist xcodebuild build-for-testing) auto-injects the -testProductsPath flag, producing a .xctestproducts bundle — a self-contained, portable artifact that packages the .xctestrun file alongside the compiled .xctest bundles.
  • When sharding flags are present, the bundle is uploaded to the Tuist server as part of the shard session. Each shard runner downloads the bundle with a filtered .xctestrun containing only its assigned test targets. No CI-provider-specific artifact sharing is needed.

The .xctestproducts bundle format (validated experimentally):

MyApp.xctestproducts/
├── Info.plist                          # Maps test plans to xctestrun file paths
├── Tests/
│   └── 0/
│       ├── MyApp.xctestrun            # The xctestrun file
│       └── Debug -> ../../Binaries/0/Debug  # Symlink to binaries
└── Binaries/
    └── 0/
        └── Debug/
            ├── AppTests.xctest/       # Compiled test bundles
            ├── CoreTests.xctest/
            └── ...

Key properties:

  • Self-contained and portable: The bundle contains everything needed to run tests on another machine — no source code, intermediate build artifacts, or DerivedData.
  • __TESTROOT__ resolves automatically: The .xctestrun file uses __TESTROOT__ placeholders. Inside the bundle, Tests/0/Debug symlinks to ../../Binaries/0/Debug, so __TESTROOT__/Debug/*.xctest resolves correctly.
  • xcodebuild test-without-building -testProductsPath consumes this bundle directly. Filtering works by modifying the .xctestrun inside the bundle to remove test target entries — only the targets present in the .xctestrun are executed.

Step 1: Plan job:

# Tuist-generated projects (recommended):
# Generates the project, builds for testing, computes shards, outputs the matrix.
tuist test --build-only --shard-max 6

# Non-generated projects:
# Build, compute shards, push to server, and output the shard matrix — all in one command.
tuist xcodebuild build-for-testing \
  -workspace MyApp.xcworkspace \
  -scheme MyApp \
  -destination 'platform=iOS Simulator,name=iPhone 16' \
  --shard-max 6

Sharding flags (available on both tuist test and tuist xcodebuild build-for-testing):

Flag Description Default
--shard-max N Maximum number of shards Number of test modules
--shard-min N Minimum number of shards 1
--shard-total N Exact number of shards (overrides min/max) Auto-determined
--shard-max-duration N Target max shard duration (seconds) None

These are Tuist-specific flags (not passed through to xcodebuild). The presence of any --shard-* flag activates sharding.

Automatic -testProductsPath injection: When sharding is active, the CLI auto-injects -testProductsPath (e.g., .tuist/test-products/<scheme>.xctestproducts) so the bundle is produced in a known location. For tuist xcodebuild, users can override this by passing their own -testProductsPath.

When sharding flags are present, the plan step (either tuist test --build-only or tuist xcodebuild build-for-testing) extends its normal behavior with:

  1. Auto-injects -testProductsPath if not already present.
  2. Runs xcodebuild build-for-testing with the passthrough arguments.
  3. Locates the .xctestrun file inside the produced .xctestproducts bundle.
  4. Parses the .xctestrun plist to discover test targets (each entry in TestConfigurations[0].TestTargets is a test module with a BlueprintName).
  5. Sends the module list, .xctestrun file, and shard configuration (min/max/total/max-duration) to the server. The server fetches timing data, computes shard assignments via bin-packing, and stores the .xctestrun + assignments tagged with a shard session ID.
  6. Receives the shard assignments back from the server.
  7. Outputs the shard matrix to the CI provider:
    • GitHub Actions: Writes matrix={"shard":[0,1,2,...]} directly to $GITHUB_OUTPUT (detected via the GITHUB_OUTPUT environment variable).
    • Other CI providers: Writes a tuist-shard-matrix.json file. Future integrations can add native output for other providers.

Without sharding flags, the command behaves exactly as it does today.

The .xctestproducts bundle is uploaded to the Tuist server as part of the shard session. Shard runners download it from the server alongside the filtered .xctestrun, so no CI-provider-specific artifact sharing is needed.

Step 2: Shard jobs:

# Tuist-generated projects (recommended):
# Downloads filtered .xctestrun, runs assigned tests, uploads results.
tuist test --without-building

# Non-generated projects:
# Same behavior, but without project generation or selective testing.
tuist xcodebuild test-without-building \
  -destination 'platform=iOS Simulator,name=iPhone 16'

When the TUIST_SHARD_INDEX environment variable is set, the shard step (either tuist test --without-building or tuist xcodebuild test-without-building) extends its normal behavior with:

  1. Downloads the .xctestproducts bundle for this shard from the Tuist server (session ID auto-detected from CI environment). The bundle contains a filtered .xctestrun with only the test targets assigned to this shard.
  2. Places the bundle at the known location (.tuist/test-products/<scheme>.xctestproducts) and auto-injects -testProductsPath.
  3. Runs xcodebuild test-without-building -testProductsPath <bundle-path> with the passthrough arguments.
  4. After tests complete, uploads test results to the server (as today), with shard metadata attached.

Without TUIST_SHARD_INDEX, the command behaves exactly as it does today — all tests run.

The server stores the original .xctestproducts bundle and the shard assignments. When a shard runner requests its bundle, the server removes the TestTargets entries that don’t belong to that shard from the .xctestrun’s TestConfigurations and returns the modified bundle.

Shard detection for Xcode shard runners:

Env Var Description
TUIST_SHARD_INDEX The index of this shard (0-based)

This is set in the CI workflow (e.g., from GitHub Actions matrix.shard). The total number of shards is already stored in the shard session on the server — the runner only needs to know its own index.

Coupling plan and shard jobs — shard session ID: The build and test steps need a shared identifier so shard runners can find the correct .xctestrun on the server. This is handled via a shard session ID derived from the CI environment:

CI Provider Session ID derived from
GitHub Actions github-{GITHUB_RUN_ID}-{GITHUB_RUN_ATTEMPT}
CircleCI circleci-{CIRCLE_WORKFLOW_ID}
Buildkite buildkite-{BUILDKITE_BUILD_ID}
GitLab CI gitlab-{CI_PIPELINE_ID}
Other / local Explicit --session <id> flag required

Since the Tuist CLI already detects CI environments, the session ID is auto-detected in most cases. The plan job and shard jobs within the same CI run share the same environment variables, so they produce the same session ID without any manual passing.

For retries: GITHUB_RUN_ATTEMPT is included so a retried workflow run gets a fresh session, avoiding stale shard assignments from a previous attempt.

2. Gradle projects: Tuist Gradle plugin

For Gradle projects, sharding is integrated into the Tuist Gradle plugin (dev.tuist:tuist-gradle-plugin). The plugin already hooks into Gradle’s test lifecycle for test insights and quarantine; sharding extends this with a prepareTestShards task and a test filtering step. To align with the Xcode workflow, shard configuration is passed as flags to the prepareTestShards task rather than being declared in the Gradle DSL.

Why suite-level splitting for Gradle: Gradle projects vary widely in modularization. Many Gradle projects follow a multi-module architecture (:feature:home, :core:network, etc.), but it’s equally common to see projects with a handful of modules or even a single monolithic :app module containing all tests. Module-level sharding would be useless in the latter case. Since the Tuist Gradle plugin already collects per-suite timing data (Gradle test classes map to Tuist test suites) via its TestListener, and Gradle’s filter.includeTestsMatching() API natively supports suite-level filtering (already used for test quarantine), suite-level splitting is both more practical and more effective.

Configuration: Sharding is configured via flags on the prepareTestShards Gradle task, mirroring the --shard-* flags used by tuist test and tuist xcodebuild for Xcode:

Flag Description Default
--shard-max <n> Maximum number of shards Required
--shard-min <n> Minimum number of shards 1
--shard-max-duration <s> Target max shard duration (seconds)

The TUIST_SHARD_INDEX environment variable tells the plugin which shard this runner is. When absent and prepareTestShards is invoked, the plugin runs the plan step — it discovers test suites, sends them to the server, and outputs the shard matrix (same CI integration as Xcode: GitHub Actions $GITHUB_OUTPUT, Buildkite buildkite-agent pipeline upload, or tuist-shard-matrix.json). When TUIST_SHARD_INDEX is set, the plugin runs in shard mode — it pulls its assigned test suites from the server and filters accordingly.

How it works:

Plan step (./gradlew prepareTestShards --shard-max <n>):

  1. The plugin compiles test sources and scans the test classpath to discover all current test suites. This is the source of truth for what exists now (same principle as .xctestrun for Xcode).
  2. The plugin packages the compiled test runtime classpath (compiled classes, application classes, and dependencies).
  3. The plugin calls the Tuist server’s shard session endpoint, uploading the packaged classpath along with the discovered test suites and shard configuration from the task flags.
  4. The server fetches per-suite timing data from test_suite_runs, computes shard assignments via bin-packing, and stores the session alongside the classpath.
  5. The plugin receives the shard assignments and outputs the shard matrix to the CI provider.
  6. Tests do not run in this step — the plan step only compiles, uploads, and computes the matrix.

Shard step (TUIST_SHARD_INDEX set):

  1. The plugin downloads the compiled test classpath from the Tuist server (session ID auto-detected from CI environment, same as Xcode).
  2. The plugin pulls the assigned test suites for this shard from the server.
  3. The plugin uses Gradle’s filter.includeTestsMatching() API (the same mechanism used for test quarantine today) to include only the assigned test suites, and configures the test task to use the downloaded classpath — skipping compilation entirely.
  4. Tests run and results are uploaded as usual, with shard metadata included.
# Plan step — compiles, uploads test classpath to server, computes shards, outputs matrix
./gradlew prepareTestShards --shard-max 6

# Shard step — downloads compiled test classpath from server, runs assigned tests
TUIST_SHARD_INDEX=${{ matrix.shard }} ./gradlew test

Build-once, test-many for Gradle: Like Xcode’s .xctestproducts bundle, the plan step packages and uploads the compiled test runtime classpath (compiled classes, application classes, and dependencies) to the Tuist server. Shard runners download it and run tests without recompilation — the same pattern Develocity Test Distribution uses when transferring compiled test binaries to remote agents.

Partitioning Strategy

The initial implementation uses a single strategy: timing-based bin-packing. The algorithm is the same for both build systems; what differs is the unit of distribution:

  • Xcode: test modules (targets) — data from test_module_runs
  • Gradle: test suites — data from test_suite_runs

timing (default and only strategy)

Uses historical test durations from the Tuist server to create balanced shards via a greedy bin-packing algorithm (Longest Processing Time first, or LPT). The algorithm runs server-side.

The core idea is simple: if you’re packing items of different sizes into a fixed number of bins, you get the most even distribution by placing the largest item first into the emptiest bin, then repeating. Applied to test sharding, each “item” is a test unit (module or class) with a known duration, and each “bin” is a shard. The algorithm minimizes the longest shard’s total duration, which is what determines overall CI wall-clock time.

Steps:

  1. Fetch avg_duration for each unit (module or class) from ClickHouse (scoped to the project and default branch).
  2. Sort units by duration descending (longest first).
  3. For each unit, assign it to the shard with the lowest total estimated duration so far.

Example: Given 5 modules with durations [30s, 25s, 20s, 15s, 10s] and 3 shards:

  • Shard 0 ← 30s → total: 30s
  • Shard 1 ← 25s → total: 25s
  • Shard 2 ← 20s → total: 20s
  • Shard 2 ← 15s → total: 35s (was lowest at 20s)
  • Shard 1 ← 10s → total: 35s (was lowest at 25s)
  • Result: shards of 30s, 35s, 35s — well-balanced despite uneven module sizes.

Units with no timing data are assigned an estimated duration equal to the median of known units (or a default of 30 seconds for modules / 5 seconds for classes if no data exists at all). When no timing data is available at all (e.g., a project that hasn’t uploaded test results yet), all units are assigned equal estimated durations, which effectively produces a round-robin distribution.

Future strategies

The --strategy flag is reserved for future extensibility. If the need arises, we could add strategies such as:

  • round-robin – Distributes test modules alphabetically in round-robin order. No server communication needed. Could serve as an explicit offline fallback for projects not connected to the Tuist server.
  • uniform – Distributes test modules to produce an equal count per shard (±1), ignoring timing data. Useful when test modules have roughly similar execution times.
  • dynamic – A queue-based approach (similar to Knapsack Pro) where runners pull work from a server-side queue at runtime, enabling real-time load balancing. This would require significant server-side infrastructure but would produce optimal shard balance.

We intentionally start with a single strategy to keep the initial implementation focused and gather real-world usage data before investing in alternatives.

Auto-Determining Shard Count

When --shard-total is not specified and --shard-min/--shard-max bounds are set, the timing strategy auto-determines the optimal shard count:

  1. Compute total estimated test duration from server data.
  2. Set target shard duration to total / N for candidate N values within [min, max].
  3. Select the smallest N where the longest shard (after bin-packing) is within 20% of the target duration.

If --shard-max-duration is set, start from ceil(total_duration / max_duration) and clamp to [min, max].

Server API

Shard session creation endpoint (for Xcode)

The CLI sends the discovered module list and shard configuration. The server fetches timing data, computes shard assignments, and returns the result along with an upload URL for the test artifacts.

Step 1: Create shard session

POST /api/projects/:project_handle/tests/shards

Request:

{
  "session_id": "github-12345-1",
  "modules": ["AppTests", "CoreTests", "NetworkTests", "NewFeatureTests"],
  "shard_min": 1,
  "shard_max": 6,
  "shard_max_duration": null
}

The server:

  1. Queries the test_module_runs ClickHouse table for timing data (filtered to CI runs on the default branch).
  2. Modules with no timing data get a default estimated duration (median of known modules, or 30 seconds if no data exists at all).
  3. Modules in the server’s history but not in the request (removed modules) are ignored.
  4. Determines the optimal shard count based on the configuration (min/max/total/max-duration).
  5. Computes shard assignments via the bin-packing algorithm.
  6. Stores the shard assignments and returns an S3 upload URL for the .xctestproducts bundle.

Response:

{
  "session_id": "github-12345-1",
  "shard_count": 4,
  "shards": [
    { "index": 0, "test_targets": ["AppTests", "CoreTests"], "estimated_duration_ms": 45000 },
    { "index": 1, "test_targets": ["NetworkTests", "AuthTests"], "estimated_duration_ms": 43000 }
  ],
  "upload_url": "https://storage.tuist.dev/..."
}

Step 2: Upload test artifacts

After receiving the response, the CLI uploads the .xctestproducts bundle (compressed) directly to S3 via the presigned upload_url. The server uses a conventional path based on the project and session ID, so shard runners can retrieve it later.

Step 3: Download shard (called by each shard runner):

GET /api/projects/:project_handle/tests/shards/:session_id/:shard_index

Response: A JSON with the shard assignment and a download URL for the .xctestproducts bundle. The server returns the bundle with a pre-filtered .xctestrun — test targets not assigned to this shard are already stripped from the TestConfigurations array. The shard runner downloads the bundle and runs tests directly, with no client-side filtering needed.

{
  "test_targets": ["AppTests", "CoreTests"],
  "download_url": "https://storage.tuist.dev/..."
}

Shard sessions are ephemeral — the server can garbage-collect them after a configurable TTL (e.g., 24 hours).

Shard session creation endpoint (for Gradle plugin)

The Gradle plugin uses the same session-based approach as Xcode. The plan step creates a session; shard runners pull their assignments by index.

The Gradle plugin uses the same POST /api/projects/:handle/tests/shards endpoint as Xcode, but sends test_suites instead of modules:

Request:

{
  "session_id": "github-12345-1",
  "test_suites": [
    "com.example.auth.LoginTest",
    "com.example.auth.SignupTest",
    "com.example.core.UtilsTest",
    "com.example.core.DatabaseTest"
  ],
  "shard_max": 4
}

The response includes an upload_url for the compiled test classpath (same pattern as .xctestproducts for Xcode). The plugin uploads the packaged classpath to S3 after creating the session.

Shard runners call GET /api/projects/:handle/tests/shards/:session_id/:shard_index and receive their assigned test suites plus a download_url for the compiled classpath:

{
  "test_suites": [
    "com.example.auth.LoginTest",
    "com.example.core.DatabaseTest"
  ],
  "download_url": "https://storage.tuist.dev/..."
}

This follows the same pattern as Xcode: the plan step provides the source of truth for what exists, the server provides timing data and computes balanced assignments, and shard runners only need their index to pull their assignments and artifacts.

Shard Computation Location

The shard computation happens on the server. The CLI sends the discovered module list (or suite list for Gradle), the .xctestrun file, and the shard configuration. The server fetches timing data from ClickHouse, runs the bin-packing algorithm, stores the .xctestrun and shard assignments, and returns the result to the CLI. This keeps the algorithm centralized (one implementation shared across Xcode and Gradle paths), allows the server to evolve the algorithm without CLI updates, and ensures the computation has direct access to timing data without an extra round-trip.

Dashboard Integration

Sharded test runs should appear as a single test run in the dashboard, not as separate entries per shard. This preserves the user’s mental model: “I ran my tests” produces one result, regardless of how many shards executed in parallel.

Each shard uploads its test results independently (as it does today), but tagged with the shard session ID and shard index. The server merges results into the single parent test run:

Dashboard UI Changes

The test run detail page (test_run_live) needs adjustments to surface shard information:

  • Overview tab: Show shard metadata when the run is sharded — total shard count, per-shard durations (e.g., a bar showing how balanced the shards were), and which shard was the bottleneck.
  • Test Cases / Test Suites / Test Modules tabs: Add a “Shard” column or filter, so users can see which shard ran which tests. A shard filter dropdown lets users drill into a specific shard’s results.
  • Failures tab: Each failure should show which shard it came from, helping users reproduce failures on the right shard.
  • Shard balance visualization: A simple bar chart or breakdown showing per-shard duration and test count. This helps users understand whether sharding is well-balanced and whether they should adjust --shard-max.

GitHub Actions Integration

Xcode — Tuist-generated projects (using tuist test)

This is the recommended path for projects that use Tuist manifests. tuist test handles project generation, selective testing, and sharding in a single command.

name: Tests
on: [pull_request]

jobs:
  plan:
    runs-on: macos-15
    outputs:
      matrix: ${{ steps.build.outputs.matrix }}
    steps:
      - uses: actions/checkout@v4
      - name: Install Tuist
        run: mise install
      # Generates the project, builds for testing, computes shards, and
      # writes the matrix to $GITHUB_OUTPUT — all in one step.
      # Generates the project, builds for testing, uploads the .xctestproducts
      # bundle to the Tuist server, computes shards, and writes the matrix
      # to $GITHUB_OUTPUT — all in one step.
      - name: Build and prepare shards
        id: build
        run: tuist test --build-only --shard-max 6

  test:
    runs-on: macos-15
    needs: plan
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.plan.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - name: Install Tuist
        run: mise install
      # Downloads the .xctestproducts bundle and filtered .xctestrun from
      # the Tuist server, runs the assigned tests, and uploads results.
      - name: Run shard tests
        env:
          TUIST_SHARD_INDEX: ${{ matrix.shard }}
        run: tuist test --without-building

Xcode — non-generated projects (using tuist xcodebuild)

For projects that manage their own .xcodeproj / .xcworkspace without Tuist manifests.

name: Tests
on: [pull_request]

jobs:
  plan:
    runs-on: macos-15
    outputs:
      matrix: ${{ steps.build.outputs.matrix }}
    steps:
      - uses: actions/checkout@v4
      - name: Install Tuist
        run: mise install
      - name: Build and prepare shards
        id: build
        run: |
          tuist xcodebuild build-for-testing \
            -workspace MyApp.xcworkspace \
            -scheme MyApp \
            -destination 'platform=iOS Simulator,name=iPhone 16' \
            --shard-max 6

  test:
    runs-on: macos-15
    needs: plan
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.plan.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - name: Install Tuist
        run: mise install
      - name: Run shard tests
        env:
          TUIST_SHARD_INDEX: ${{ matrix.shard }}
        run: |
          tuist xcodebuild test-without-building \
            -destination 'platform=iOS Simulator,name=iPhone 16'

Gradle (plugin-driven sharding)

Sharding is configured via flags on the prepareTestShards task. Shard runners use TUIST_SHARD_INDEX to pull their assignments.

name: Tests
on: [pull_request]

jobs:
  plan:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.plan.outputs.matrix }}
    steps:
      - uses: actions/checkout@v4
      - name: Set up JDK
        uses: actions/setup-java@v4
        with:
          java-version: '17'
          distribution: 'temurin'
      # Compiles test sources, discovers test suites, computes shards,
      # and writes matrix to $GITHUB_OUTPUT — all via the Gradle plugin.
      - name: Plan shards
        id: plan
        run: ./gradlew prepareTestShards --shard-max 6

  test:
    runs-on: ubuntu-latest
    needs: plan
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.plan.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - name: Set up JDK
        uses: actions/setup-java@v4
        with:
          java-version: '17'
          distribution: 'temurin'
      - name: Run tests
        env:
          TUIST_SHARD_INDEX: ${{ matrix.shard }}
        run: ./gradlew test

Integration with Other CI Providers

The system works with any CI provider that supports parallel jobs. The key contract is:

  1. A plan job runs the plan step (tuist test --build-only, tuist xcodebuild build-for-testing, or ./gradlew prepareTestShards) and produces a shard matrix.
  2. Shard jobs set TUIST_SHARD_INDEX and run their assigned subset (tuist test --without-building, tuist xcodebuild test-without-building, or ./gradlew test).

The CLI outputs the matrix in a CI-native format when possible (GitHub Actions $GITHUB_OUTPUT, Buildkite buildkite-agent pipeline upload) and falls back to writing a tuist-shard-matrix.json file for other providers.

CI providers with native parallelism (CircleCI parallelism, GitLab CI parallel) provide their own index/total environment variables. These map directly to TUIST_SHARD_INDEX:

# CircleCI — Tuist-generated project (Xcode)
TUIST_SHARD_INDEX=$CIRCLE_NODE_INDEX tuist test --without-building

# CircleCI — non-generated project (Xcode)
TUIST_SHARD_INDEX=$CIRCLE_NODE_INDEX \
  tuist xcodebuild test-without-building -destination 'platform=iOS Simulator,name=iPhone 16'

# GitLab CI — Gradle (plugin-driven)
TUIST_SHARD_INDEX=$CI_NODE_INDEX ./gradlew test

Unsupported or custom CI providers can use tuist-shard-matrix.json directly. After the build step, the file contains the full shard assignment. Users read it to spawn parallel jobs in whatever way their CI supports:

# Plan step: build and write tuist-shard-matrix.json (use tuist test --build-only for generated projects)
tuist xcodebuild build-for-testing -scheme MyApp -destination '...' --shard-max 6

# Read the matrix
cat tuist-shard-matrix.json
# {"shard_count":4,"shards":[{"index":0,...},{"index":1,...},...]}

# Shard steps: set the shard index and run (how you spawn these depends on your CI)
TUIST_SHARD_INDEX=0 tuist test --without-building
# or: TUIST_SHARD_INDEX=0 tuist xcodebuild test-without-building -destination '...'

The tuist-shard-matrix.json format is stable and documented, so it can be consumed by any scripting or CI orchestration layer.

Alternatives Considered

Test-case-level splitting for Xcode

Splitting individual test methods (or suites) across shards for Xcode projects would give the most granular control and best balance. However, it requires enumerating all test cases before running (expensive for large suites), creates complex --only-testing argument lists, and breaks test fixtures that assume suite-level setup/teardown. Module-level splitting avoids these issues while still providing meaningful parallelism for Xcode, where projects managed by Tuist tend to be well-modularized. (Note: for Gradle, we do use suite-level splitting because Gradle’s filtering API handles it cleanly and Gradle projects are often not well-modularized.)

Future Direction: Tuist-Managed Runners

This RFC focuses on static shard assignment — the plan step computes a fixed partition, and each CI job runs its assigned subset independently. This requires users to configure CI matrix strategies and artifact sharing themselves.

A natural evolution is Tuist-managed test distribution, where a single tuist test or ./gradlew test invocation provisions remote runners, distributes tests across them, and streams results back in real time — similar to Develocity Test Distribution. This would eliminate the need for CI matrix configuration entirely: users would just run tuist test and Tuist would handle parallelism transparently.

This capability is explicitly out of scope for the current RFC. The static sharding design proposed here lays the groundwork (server-side timing data, bin-packing algorithm, session management) that a future dynamic distribution system would build on.

Open Questions

  1. Should we support test-suite-level splitting for Xcode? Some Xcode projects have a single monolithic test target. Module-level sharding would not help here. We could support an opt-in --granularity suite mode in a future phase (for Gradle, suite-level is already the default).

  2. Should sharding activation be explicit? Currently, the presence of any --shard-* flag implicitly activates sharding. An alternative would be an explicit --shard flag (or similar) to opt in, with --shard-max, --shard-min, etc. as configuration. The implicit approach is more concise, but an explicit flag would make intent clearer in CI workflows.

  3. Should Gradle shard configuration live in settings.gradle.kts instead of task flags? The current proposal uses flags on prepareTestShards (e.g., --shard-max 6) to align with the Xcode CLI approach. An alternative is a DSL block in settings.gradle.kts, closer to how Develocity configures test distribution:

    // settings.gradle.kts
    tuist {
        testSharding {
            maxShards = 6
            // minShards = 2
            // maxDuration = 300
            // isEnabled = System.getenv("CI") != null
        }
    }
    

The DSL approach might be more idiomatic for Gradle users and allows configuration to be checked in once rather than repeated in CI workflow files, but I’m not sure how common that would be. The task flags approach is simpler and more consistent across build systems.

1 Like

Thanks for putting this together. I’m aligned with the direction. Some comments on things that I noted.

Is the processing logic macOS-bound? Sounds like we’ll be able to reuse the infrastructure force the server-side processing. In that case, will the client poll the server state until the shard information is ready to continue? Or is the plan to make it synchronously on the server?

Do jobs need to pull the sources for this if the file is self-contained, containing the binaries?

Does Gradle have a portable format as Xcode does? Or will we have to come up with one ourselves?

Can/should we also support passing the shard as an env. variable?

./gradlew test --shard ${{ matrix.shard }}

Not a big deal, though.

I noticed the payload of this one is very similar to the Xcode one, with a slight difference in terminology, test_suites vs modules. Do you think there’s an opportunity here to align in naming? Or do you think is better to go with a different payload based on the build system of the project?

I’d leave this out of the scope of this work, but something definitely to consider down the line.

I think the presence of --shard-* is explicit enough.

Since the sharding configuration is closer to the CI automation, I think it’s better to make it explicit from the command invocation. If needed, we can add this later.


I’d recommend sharing this with the users that we know are doing sharding already to see if we’ve missed anything.

.xctestrun is a plist. I’d try to process it directly server-side, so we don’t need to deal with it being asynchronous. In the worst case, we could use our processing nodes, yeah. But macOS shouldn’t be necessary for processing it, no.

For tuist test, the picture is a bit more complex as we need the graph to be available to upload selective test results. But we should be able to upload the graph.json and use that. Will make sure to cover this use-case when building this out.

It does not. We will need to bundle individual binaries/resources/etc. There are existing plugins that do that, like this, that we can take inspiration from. It’s certainly doable, but yeah, the complexity for Gradle might be a bit higher because of this.

Yeah, we can support passing the shard both as a env variable and as a CLI option.

Test suites !== modules. Modules are a higher granular level aimed at modularized Xcode projects where module feels like the better abstraction. As mentioned in the RFC, down the road, we could support test suite granularity for Xcode projects and module granularity for Gradle projects. The naming is already aligned with our current database and API test model conventions.

Agree :+1:

Will do.

Great RFC!

Thank you for adding it, it is a great feature, which mobile teams have to implement themselves.

Quick question, previously I remember having issues gathering code coverage from tests executed via .xctestproducts. And we had to use derived data instead.

Is this no longer an issue with this approach?

Great question!

I verified this locally and code coverage works with .xctestproducts bundles without needing DerivedData. The .xctestrun file inside the bundle already includes CodeCoverageBuildableInfos with source file references and TESTROOT
placeholders, so coverage data is produced correctly even when running from a completely different directory.

The challenge with sharding is merging coverage across shards. Each shard produces its own .xcresult with partial coverage. Only the test targets that ran on that shard have actual coverage data; the rest show 0%. Apple’s xcrun xcresulttool merge can combine them
into a single report, but it’s a macOS-only tool.

I’d propose two options:

  1. A new Tuist command (e.g., tuist test merge-results --session <id>) that downloads all shard .xcresult bundles from the server, runs xcresulttool merge, and produces a single merged .xcresult. This would run on a macOS CI runner as a post-sharding step. The
    advantage is that it’s simple, uses Apple’s official tooling, and the merged result is a standard .xcresult that integrates with any existing coverage workflows.
  2. Server-side merging where the Tuist server merges the results. Since xcresulttool is macOS-only, this would require running a macOS node in our infrastructure. The advantage is that it’s fully automated, no extra CI job needed, and the merged result would be
    available directly in the dashboard.

I’d probably start with 1. and see if there would still be a need for 2.

Thanks for raising this!

1 Like

Based on some extra feedback, we decided that suite-level splitting should be supported from day one also for Xcode. Here’s how it would work technically.

Filtering mechanism

The .xctestrun plist already supports class-level filtering natively. Each TestTarget entry accepts:

  • OnlyTestIdentifiers: array of identifiers to include
  • SkipTestIdentifiers: array of identifiers to exclude

The identifier format is ClassName or ClassName/testMethodName. So the server can inject OnlyTestIdentifiers per test target to restrict each shard to its assigned classes, the same way it currently strips entire TestTarget entries for module-level sharding, but one level deeper:

<key>OnlyTestIdentifiers</key>
<array>
    <string>CalculatorTests</string>
    <string>NetworkClientTests</string>
</array>

No -only-testing flags needed. The filtered .xctestrun is self-contained.

Test suite discovery

For module-level sharding, the .xctestrun plist is the source of truth. Each TestTarget entry’s BlueprintName gives the complete module list. For suite-level sharding, the .xctestrun doesn’t list individual classes inside each target, so we need an additional enumeration step.

The plan step already runs xcodebuild build-for-testing. After building, xcodebuild test-without-building -enumerate-tests (Xcode 16+) enumerates all test targets, classes, and methods from the built products without executing them. The client sends this class list to the server the same way it sends the module list for module-level sharding. The server does the same bin-packing either way.

Performance: enumerate-tests doesn’t execute any tests, but it does load the test bundles into a simulator or test host process to reflect on XCTestCase subclasses. This can add 10-30 seconds on top of the build depending on project size and whether the simulator is already booted. Since the plan step already runs build-for-testing (which typically boots a simulator), the incremental cost should be modest. An alternative would be parsing the .xctest Mach-O binaries directly with nm to extract test* symbols — this is instant but fragile across Swift name mangling changes and wouldn’t catch dynamically generated tests. enumerate-tests is the safer default. If it turns out to be too slow for large projects, we could revisit this.

How it fits into the existing design

The change is minimal. It’s the same .xctestrun filtering mechanism, just at a finer granularity:

  • Module-level (current default): Server removes TestTarget entries not assigned to the shard.
  • Suite-level (opt-in, e.g., --granularity suite): Server keeps all TestTarget entries but adds OnlyTestIdentifiers to each, filtering to the assigned classes.
  • Timing data: Uses test_suite_runs (avg_duration per class) instead of test_module_runs.
  • Bin-packing: Same LPT algorithm, just operating on classes instead of modules.

In a very similar way, we could also do sharding at the individual test case level.