Adjust quarantine behavior

Summary

This RFC proposes changing test quarantine behavior from skipping quarantined tests entirely to running them but overriding the exit status when quarantined tests fail. This aligns with industry practice and solves the fundamental problem that we cannot unmark tests as flaky or unquarantine them when they are never executed.

Motivation

Today, when a test is quarantined in Tuist:

  1. The CLI fetches the quarantined test list from the server.
  2. Quarantined tests are excluded from execution (added to skip lists in Xcode/Gradle).
  3. No new test data is collected for those tests.

This creates a dead end: once a test is quarantined, we have no signal to determine if it has been fixed. Teams must manually unquarantine tests and hope they pass. Or worse, quarantined tests accumulate indefinitely and are never addressed.

The desired behavior is:

  • Quarantined tests continue to run, preserving data collection.
  • Their failures do not block CI (exit code is overridden).
  • The dashboard surfaces quarantined test results distinctly, enabling automatic unquarantine when tests stabilize.

This matches the quarantine model used by Trunk.io and other test observability platforms, where the key insight is: quarantine is about decoupling flaky failures from CI signal, not about hiding tests.

Prior Art

Trunk.io

Trunk runs quarantined tests normally, then checks results against the quarantine list. If all failures come from quarantined tests, the exit code is overridden to 0. The dashboard tracks “quarantined jobs” and shows ROI metrics (builds saved, developer time reclaimed). They support both manual and auto-quarantine, and auto-unquarantine when tests stabilize.

Buildkite Test Engine (formerly Buildkite Test Analytics)

Buildkite quarantines flaky tests by letting them run but muting their impact on build status. Results are still recorded and visible in the dashboard with a “quarantined” badge.

Datadog Intelligent Test Runner

Datadog’s approach similarly runs known-flaky tests but separates their results from the “real” test outcome, allowing teams to track flakiness trends over time without blocking pipelines.

Proposed Solution

Overview

The core change: quarantined tests run normally, but their failures are intercepted post-execution and excluded from the exit code determination.

┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Fetch       │     │  Run ALL     │     │  Upload      │     │  Override    │
│  quarantined │────▶│  tests       │────▶│  results     │────▶│  exit code   │
│  test list   │     │  (including  │     │  (mark       │     │  if only     │
│  from server │     │  quarantined)│     │  quarantined │     │  quarantined │
│              │     │              │     │  failures)   │     │  tests fail  │
└──────────────┘     └──────────────┘     └──────────────┘     └──────────────┘

Exit Code Behavior

Scenario Exit Code CI Result
All tests pass 0 Pass
Only quarantined tests fail 0 Pass
Any non-quarantined test fails Non-zero Fail
Mix of quarantined and non-quarantined failures Non-zero Fail
Server unreachable (fail-safe) Original exit code Original behavior

CLI Changes

tuist test (Generated Xcode Projects)

Before: Quarantined tests are added to skipTestTargets and excluded from the generated test plan.

After:

  1. Fetch the quarantined test list from the server (same API call as today).
  2. Run all tests, including quarantined ones — do not add them to skip lists.
  3. After execution, parse the .xcresult bundle to identify which tests failed.
  4. Cross-reference failures against the quarantined test list.
  5. Upload results to the server with original pass/fail statuses preserved. The CLI includes the list of quarantined test identifiers in the upload payload so the server knows which test failures should be excluded from the overall run verdict, without altering the recorded test result itself.
  6. Determine exit code:
    • If all failures are from quarantined tests → exit 0.
    • If any non-quarantined test fails → preserve the original non-zero exit code.
  7. Print a summary distinguishing quarantined failures from real failures:
Testing complete.

✓ 142 tests passed
✗ 3 tests failed (quarantined — not blocking CI):
  · MyModule/LoginTests/test_login_with_expired_token
  · MyModule/LoginTests/test_login_with_invalid_credentials
  · MyModule/NetworkTests/test_retry_on_timeout

All non-quarantined tests passed. Exiting with code 0.

tuist xcodebuild test (Vanilla Xcode Projects)

Same approach as tuist test:

  1. Fetch quarantined test list.
  2. Pass through to xcodebuild test without adding skip filters.
  3. After xcodebuild completes, inspect the exit code and .xcresult.
  4. If xcodebuild returned a failure:
    • Parse the .xcresult to identify which tests failed.
    • If all failures are quarantined → override exit code to 0.
    • Otherwise → preserve the non-zero exit code.
  5. Upload results with original pass/fail statuses preserved, including the list of quarantined test identifiers so the server can distinguish which failures to exclude from the run verdict.
  6. Print the quarantine summary (same format as above).

Xcode Logs and .xcresult Caveat

This is the main trade-off for the Xcode path. When quarantined tests fail:

  • Xcode’s build log will show red failure markers for those tests.
  • The .xcresult bundle will contain test failures.
  • Any Xcode-native CI integrations (e.g., Xcode Cloud, GitHub checks that parse .xcresult) will see failures.

Mitigation strategies:

  1. CLI summary output (primary): The tuist test / tuist xcodebuild test output clearly distinguishes quarantined failures from real failures, so developers reading CI logs see the correct status.
  2. Exit code override (primary): CI systems that rely on exit codes (most of them) will see the correct pass/fail status.
  3. Future enhancement: We could explore post-processing the .xcresult to annotate quarantined tests, but this is out of scope for this RFC, primarily because Xcode doesn’t have a notion of quarantined tests.

Gradle Plugin Changes

Current Behavior

The Gradle plugin uses testTask.filter.excludeTestsMatching(pattern) to skip quarantined tests entirely.

New Behavior

  1. Remove exclusion filters — quarantined tests run normally.
  2. Add a custom TestListener to intercept test results:
testTask.addTestListener(object : TestListener {
    override fun afterTest(desc: TestDescriptor, result: TestResult) {
        if (result.resultType == TestResult.ResultType.FAILURE) {
            val testId = "${desc.className}.${desc.name}"
            if (quarantinedTests.contains(testId)) {
                quarantinedFailures.add(testId)
            } else {
                realFailures.add(testId)
            }
        }
    }
    // ... other listener methods
})
  1. Override build failure if only quarantined tests failed:

    • Set testTask.ignoreFailures = true when quarantine is active.
    • After test execution, if realFailures is non-empty, explicitly fail the build.
    • If only quarantinedFailures exist, let the build pass.
  2. Print summary (same format as CLI):

Tuist: 3 quarantined test(s) failed (not blocking build):
  · com.example.LoginTest.testExpiredToken
  · com.example.NetworkTest.testRetryTimeout
  · com.example.NetworkTest.testConnectionDrop

Gradle Test Report Caveat

Similar to Xcode, the standard Gradle test report (HTML) will show quarantined tests as failures. The Tuist dashboard becomes the source of truth for actual test health. If Gradle’s TestListener API allows modifying results before report generation, we could explore marking quarantined failures as “skipped” in reports, but this needs investigation.

Dashboard Changes

Dashboard: Test Run Detail View

When viewing a specific test run that included quarantined tests:

  • Show quarantined test failures in a separate section from real failures.
  • Use distinct visual treatment (e.g., muted colors, “quarantined” badge).
  • Make it clear that these failures did not affect the CI outcome.
  • Surface test case runs as “Quarantined” in the test cases list

Alternatives Considered

1. Keep Current Skip Behavior

Continue excluding quarantined tests from execution entirely. Rejected because:

  • No new test data is ever collected for quarantined tests, so there is no signal to determine if a test has been fixed.
  • Teams must manually unquarantine tests and hope they pass, or quarantined tests accumulate indefinitely.
  • This is the core problem this RFC aims to solve.

2. Make Quarantine Behavior Configurable (Skip vs. Run-Through)

Allow projects to choose between the current skip behavior and the new run-through behavior via a per-project setting. Rejected because:

  • Two distinct quarantine behaviors make the feature harder to reason about for both users and maintainers.
  • It increases the testing surface for the CLI, Gradle plugin, and dashboard — each would need to support both modes.

3. Modify .xcresult Post-Execution

After test execution, rewrite the .xcresult bundle to mark quarantined failures as passed or skipped. Rejected because:

  • .xcresult is a complex binary format; modifying it is fragile and undocumented.
  • Risk of corrupting result data.

Onboard with the the suggested behavior.