RFC: Test Sharding

marekfort · March 6, 2026, 6:27pm

Based on some extra feedback, we decided that suite-level splitting should be supported from day one also for Xcode. Here’s how it would work technically.

Filtering mechanism

The .xctestrun plist already supports class-level filtering natively. Each TestTarget entry accepts:

OnlyTestIdentifiers: array of identifiers to include
SkipTestIdentifiers: array of identifiers to exclude

The identifier format is ClassName or ClassName/testMethodName. So the server can inject OnlyTestIdentifiers per test target to restrict each shard to its assigned classes, the same way it currently strips entire TestTarget entries for module-level sharding, but one level deeper:

<key>OnlyTestIdentifiers</key>
<array>
    <string>CalculatorTests</string>
    <string>NetworkClientTests</string>
</array>

No -only-testing flags needed. The filtered .xctestrun is self-contained.

Test suite discovery

For module-level sharding, the .xctestrun plist is the source of truth. Each TestTarget entry’s BlueprintName gives the complete module list. For suite-level sharding, the .xctestrun doesn’t list individual classes inside each target, so we need an additional enumeration step.

The plan step already runs xcodebuild build-for-testing. After building, xcodebuild test-without-building -enumerate-tests (Xcode 16+) enumerates all test targets, classes, and methods from the built products without executing them. The client sends this class list to the server the same way it sends the module list for module-level sharding. The server does the same bin-packing either way.

Performance: enumerate-tests doesn’t execute any tests, but it does load the test bundles into a simulator or test host process to reflect on XCTestCase subclasses. This can add 10-30 seconds on top of the build depending on project size and whether the simulator is already booted. Since the plan step already runs build-for-testing (which typically boots a simulator), the incremental cost should be modest. An alternative would be parsing the .xctest Mach-O binaries directly with nm to extract test* symbols — this is instant but fragile across Swift name mangling changes and wouldn’t catch dynamically generated tests. enumerate-tests is the safer default. If it turns out to be too slow for large projects, we could revisit this.

How it fits into the existing design

The change is minimal. It’s the same .xctestrun filtering mechanism, just at a finer granularity:

Module-level (current default): Server removes TestTarget entries not assigned to the shard.
Suite-level (opt-in, e.g., --granularity suite): Server keeps all TestTarget entries but adds OnlyTestIdentifiers to each, filtering to the assigned classes.
Timing data: Uses test_suite_runs (avg_duration per class) instead of test_module_runs.
Bin-packing: Same LPT algorithm, just operating on classes instead of modules.

In a very similar way, we could also do sharding at the individual test case level.