Summary
This RFC proposes adding comparison workflows to Tuist by combining existing data-fetching tools/commands with new MCP prompts and a CLI skill that guide AI agents through structured comparisons. No new server-side comparison endpoints are needed. The approach fills gaps in the current MCP tool surface, adds URL support to MCP tools, and layers comparison intelligence on top via prompts and skills.
Motivation
Tuist already collects rich data about builds, test runs, and bundles. Users and agents can inspect individual resources, but answering comparative questions like “Did this PR make the build slower?” or “Are there new test failures compared to main?” requires multi-step orchestration with no guidance. Prompts and skills solve this by teaching agents what to fetch, what to compare, and how to present findings.
Why Client-Side Comparison
I considered server-side compare_* endpoints. Client-side is better because:
- Payloads are small. A build show or test run show response is ~300-500 tokens. Two side by side is ~1K, which is negligible.
- LLMs are good at diffing. Given two JSON objects and instructions, an LLM produces context-aware comparisons better than a rigid delta endpoint.
- Less server work. No new endpoints, controllers, or tests. Reuse existing infrastructure.
- Faster iteration. Updating a prompt or skill is a text change, not a deployment.
- Flexibility. Prompts can guide the agent to focus on whatever dimension matters for the user’s question.
If a specific comparison involves genuinely large payloads (e.g., diffing 1000+ test case results), we can add a targeted server endpoint later. Start without it.
Current Inventory
CLI Commands (all support --json)
| Command | Filters | Pagination |
|---|---|---|
tuist build list |
--git-branch, --status, --scheme, --configuration, --tags, --values |
--page, --page-size |
tuist build show <id> |
- | - |
tuist test show <test-run-id> |
- | - |
tuist test case list |
--flaky, --quarantined |
--page, --page-size |
tuist test case show <id-or-identifier> |
- | - |
tuist test case run list [id-or-identifier] |
--flaky, --test-run-id |
--page, --page-size |
tuist test case run show <id> |
- | - |
tuist bundle list |
--git-branch |
- |
tuist bundle show <id> |
- | - |
MCP Tools
| Tool | Parameters |
|---|---|
list_projects |
(none) |
list_test_cases |
account_handle, project_handle, flaky, quarantined, module_name, name, suite_name, page, page_size |
get_test_case |
test_case_id OR account_handle + project_handle + identifier |
get_test_run |
test_run_id |
get_test_case_run |
test_case_run_id |
MCP Prompts
| Prompt | Purpose |
|---|---|
fix_flaky_test |
Guides agent through diagnosing and fixing a flaky test |
Structured Data Currently Tracked
Understanding the full data surface is critical for designing a granular interface. Here is everything Tuist stores.
Build Data
Build Run (top-level)
| Field | Type | Description |
|---|---|---|
id |
UUID | Unique identifier |
duration |
Int32 (ms) | Total build duration |
status |
Enum | success, failure |
category |
Enum | clean, incremental |
scheme |
String | Build scheme |
configuration |
String | Build configuration (Debug, Release, etc.) |
xcode_version |
String | Xcode version |
macos_version |
String | macOS version |
model_identifier |
String | Machine model (e.g., MacBookPro18,1) |
is_ci |
Boolean | Whether run on CI |
ci_provider |
Enum | github, gitlab, bitrise, circleci, buildkite, codemagic |
ci_run_id |
String | CI system run ID |
ci_project_handle |
String | CI project reference |
ci_host |
String | CI host URL |
git_branch |
String | Git branch |
git_commit_sha |
String | Git commit SHA |
git_ref |
String | Git reference/tag |
cacheable_tasks_count |
Int32 | Total cacheable compilation tasks |
cacheable_task_local_hits_count |
Int32 | Local cache hits |
cacheable_task_remote_hits_count |
Int32 | Remote cache hits |
custom_tags |
Array(String) | Up to 50 custom tags |
custom_values |
Map(String, String) | Up to 20 custom key-value pairs |
ran_by |
String | Account handle of who ran the build |
Build Targets (per-target metrics)
| Field | Type | Description |
|---|---|---|
name |
String | Target name |
project |
String | Project name (within monorepo) |
build_duration |
UInt64 (ms) | Total build duration for this target |
compilation_duration |
UInt64 (ms) | Compilation-only duration |
status |
Enum | success, failure |
Build Files (per-file compilation metrics)
| Field | Type | Description |
|---|---|---|
type |
Enum | swift, c |
target |
String | Parent target |
project |
String | Parent project |
path |
String | File path relative to project root |
compilation_duration |
UInt64 (ms) | Compilation time for this file |
Build Issues (errors and warnings)
| Field | Type | Description |
|---|---|---|
type |
Enum | warning, error |
target |
String | Target where issue occurred |
project |
String | Project where issue occurred |
title |
String | Issue title |
message |
String | Full message |
signature |
String | Deduplication signature |
path |
String | File path |
starting_line, ending_line |
UInt64 | Line range |
starting_column, ending_column |
UInt64 | Column range |
step_type |
Enum | c_compilation, swift_compilation, script_execution, create_static_library, linker, copy_swift_libs, compile_assets_catalog, compile_storyboard, write_auxiliary_file, link_storyboards, copy_resource_file, merge_swift_module, xib_compilation, swift_aggregated_compilation, precompile_bridging_header, validate_embedded_binary, validate, other |
Cacheable Tasks (per-task cache hit/miss)
| Field | Type | Description |
|---|---|---|
type |
Enum | clang, swift |
status |
Enum | hit_local, hit_remote, miss |
key |
String | Cache key |
read_duration |
Float64 (ms) | Time to read from cache |
write_duration |
Float64 (ms) | Time to write to cache |
description |
String | Task description |
cas_output_node_ids |
Array(String) | Linked CAS output nodes |
CAS Outputs (content-addressed storage operations)
| Field | Type | Description |
|---|---|---|
node_id |
String | CAS node ID |
checksum |
String | Content checksum |
size |
UInt64 (bytes) | Uncompressed size |
compressed_size |
UInt64 (bytes) | Compressed size |
duration |
UInt64 (ms) | Transfer duration |
operation |
Enum | download, upload |
type |
String | Content type (swift, object, pch, dSYM, swiftmodule, etc. – 45+ types) |
Test Data
Test Run (top-level)
| Field | Type | Description |
|---|---|---|
id |
UUID | Unique identifier |
duration |
Int32 (ms) | Total test run duration |
status |
Enum | success, failure, skipped |
is_ci |
Boolean | Whether run on CI |
is_flaky |
Boolean | Whether any test was flaky |
scheme |
String | Build scheme |
xcode_version |
String | Xcode version |
macos_version |
String | macOS version |
model_identifier |
String | Machine model |
git_branch |
String | Git branch |
git_commit_sha |
String | Git commit SHA |
git_ref |
String | Git reference |
build_run_id |
UUID | Associated build run |
ci_provider |
String | CI provider |
ci_run_id |
String | CI run ID |
total_test_count |
Int | Total test cases |
failed_test_count |
Int | Failed test cases |
flaky_test_count |
Int | Flaky test cases |
avg_test_duration |
Int | Average test case duration |
Test Module Run (per-module)
| Field | Type | Description |
|---|---|---|
name |
String | Module/target name |
status |
Enum | success, failure |
is_flaky |
Boolean | Whether any test in module was flaky |
duration |
Int32 (ms) | Total module duration |
test_suite_count |
Int32 | Number of test suites |
test_case_count |
Int32 | Total test cases |
avg_test_case_duration |
Int32 (ms) | Average test case duration |
Test Suite Run (per-suite)
| Field | Type | Description |
|---|---|---|
name |
String | Suite name |
status |
Enum | success, failure, skipped |
is_flaky |
Boolean | Whether any test in suite was flaky |
duration |
Int32 (ms) | Total suite duration |
test_case_count |
Int32 | Test cases in suite |
avg_test_case_duration |
Int32 (ms) | Average duration |
Test Case Run (per-test)
| Field | Type | Description |
|---|---|---|
name |
String | Test case name |
module_name |
String | Module/target |
suite_name |
String | Suite |
status |
Enum | success, failure, skipped |
is_flaky |
Boolean | Flaky |
is_new |
Boolean | First time this test was seen |
is_ci |
Boolean | Ran on CI |
duration |
Int32 (ms) | Duration |
scheme |
String | Build scheme |
git_branch |
String | Branch |
git_commit_sha |
String | Commit |
Test Case Failure
| Field | Type | Description |
|---|---|---|
message |
String | Failure message / assertion |
path |
String | File path |
line_number |
Int32 | Source line |
issue_type |
String | error_thrown, assertion_failure, issue_recorded, unknown |
Test Case Run Repetition (retry attempts)
| Field | Type | Description |
|---|---|---|
repetition_number |
Int32 | 1 = first run, 2 = retry 1, etc. |
name |
String | Human-readable (e.g., “Retry 1”) |
status |
String | success, failure |
duration |
Int32 (ms) | Duration of this attempt |
Test Case Run Attachment
| Field | Type | Description |
|---|---|---|
file_name |
String | Attachment filename (screenshots, logs, etc.) |
Crash Report
| Field | Type | Description |
|---|---|---|
exception_type |
String | e.g., “EXC_CRASH” |
signal |
String | e.g., “SIGABRT” |
exception_subtype |
String | Subtype |
triggered_thread_frames |
String | Formatted stack trace |
Test Case (deduplicated definition with latest metrics)
| Field | Type | Description |
|---|---|---|
name |
String | Test name |
module_name |
String | Module |
suite_name |
String | Suite |
last_status |
Enum | Last observed status |
last_duration |
Int32 (ms) | Duration of last run |
is_flaky |
Boolean | Currently flaky |
is_quarantined |
Boolean | Currently quarantined |
recent_durations |
Array(Int32) | Recent run durations for trends |
avg_duration |
Int64 | Average duration |
reliability_rate |
Float | Percentage of successful runs |
flakiness_rate |
Float | Percentage of flaky runs (last 30 days) |
total_runs |
Int | Lifetime run count |
failed_runs |
Int | Failed run count |
Bundle Data
Bundle (top-level)
| Field | Type | Description |
|---|---|---|
id |
UUID | Unique identifier |
name |
String | Bundle name |
app_bundle_id |
String | App bundle identifier |
version |
String | Version string |
type |
Enum | ipa, app, xcarchive, aab, apk |
supported_platforms |
Array(String) | iOS, Android, macOS, tvOS, watchOS, visionOS, simulators |
install_size |
Int (bytes) | Installed size |
download_size |
Int (bytes) | Download size (nullable) |
git_branch |
String | Git branch |
git_commit_sha |
String | Git commit SHA |
git_ref |
String | Git reference |
uploaded_by_account |
String | Account that uploaded |
Bundle Artifact (recursive tree)
| Field | Type | Description |
|---|---|---|
artifact_type |
String | Type of artifact |
path |
String | Relative path in bundle |
size |
Int (bytes) | Size |
shasum |
String | SHA checksum |
children |
Array | Recursive child artifacts |
What’s Needed
1. Granular CLI Commands and MCP Tools for Builds
The current tuist build show returns all build data in a single response. As we track more data, this becomes unwieldy for agents that only need a specific slice. Rather than changing build show (which would break existing scripts and integrations), we add dedicated subcommands that let agents and users query individual pieces of a build.
tuist build show and tuist build list remain unchanged – they continue to return the same data they do today. The new subcommands are purely additive.
New CLI Subcommands
| Command | Description | Key Filters |
|---|---|---|
tuist build target list <build-id> |
List targets in a build with per-target durations | --status |
tuist build target show <build-id> <target-name> |
Show detailed metrics for a single target | - |
tuist build file list <build-id> |
List files with compilation durations | --target, --type, --sort-by |
tuist build issue list <build-id> |
List build issues (errors/warnings) | --type, --target, --step-type |
tuist build cache list <build-id> |
List cacheable tasks with hit/miss status | --status, --type |
tuist build cas list <build-id> |
List CAS (content-addressed storage) operations | --operation, --type |
These subcommands return the same data that is nested inside tuist build show, but individually and with filtering. This is valuable for agents (which can progressively drill down without fetching everything) and for scripts that only care about one dimension (e.g., “give me just the errors”).
New MCP Tools for Builds
| New MCP Tool | Mirrors CLI Command | Key Parameters |
|---|---|---|
list_builds |
tuist build list |
account_handle, project_handle, git_branch, status, scheme, configuration, tags, values, page, page_size |
get_build |
tuist build show |
build_run_id (ID or URL) |
list_build_targets |
tuist build target list |
build_run_id, status |
get_build_target |
tuist build target show |
build_run_id, target_name |
list_build_files |
tuist build file list |
build_run_id, target, type, sort_by |
list_build_issues |
tuist build issue list |
build_run_id, type, target, step_type |
list_build_cache_tasks |
tuist build cache list |
build_run_id, status, type |
list_build_cas_outputs |
tuist build cas list |
build_run_id, operation, type |
get_build returns the full build response (same as tuist build show today, including nested data). The granular tools exist so agents can fetch individual slices with filtering when they don’t need the full payload:
- “Which targets took longest?” →
list_build_targetssorted by duration - “What files are slow to compile?” →
list_build_files --sort-by compilation_duration - “What errors happened?” →
list_build_issues --type error - “How is the cache performing?” →
list_build_cache_tasks - “How much data transferred?” →
list_build_cas_outputs
2. Granular CLI Commands and MCP Tools for Tests
Similarly, test data should be navigable at each level of the hierarchy: run → module → suite → case. Like builds, tuist test show remains unchanged. The new subcommands are additive.
New CLI Subcommands
| Command | Description | Key Filters |
|---|---|---|
tuist test list |
List test runs | --git-branch, --status, --scheme, --page, --page-size |
tuist test module list <test-run-id> |
List module runs within a test run | --status |
tuist test module show <test-run-id> <module-name> |
Show a specific module run | - |
tuist test suite list <test-run-id> |
List suite runs within a test run | --module, --status |
tuist test suite show <test-run-id> <suite-name> |
Show a specific suite run | - |
Existing test case commands (tuist test case list, tuist test case show, tuist test case run list, tuist test case run show) remain as-is since they already follow the granular pattern.
New MCP Tools for Tests
| New MCP Tool | Mirrors CLI Command | Key Parameters |
|---|---|---|
list_test_runs |
tuist test list |
account_handle, project_handle, git_branch, status, scheme, page, page_size |
list_test_module_runs |
tuist test module list |
test_run_id, status |
get_test_module_run |
tuist test module show |
test_run_id, module_name |
list_test_suite_runs |
tuist test suite list |
test_run_id, module_name, status |
get_test_suite_run |
tuist test suite show |
test_run_id, suite_name |
list_test_case_runs |
tuist test case run list |
account_handle, project_handle, test_case_id, test_run_id, flaky, page, page_size |
3. Fill Remaining MCP Tool Gaps (Bundles)
| New MCP Tool | Mirrors CLI Command | Key Parameters |
|---|---|---|
list_bundles |
tuist bundle list |
account_handle, project_handle, git_branch, page, page_size |
get_bundle |
tuist bundle show |
bundle_id (ID or URL) |
4. Add URL Support to MCP Tools
MCP tools that accept a resource ID should also accept a Tuist dashboard URL. URLs follow these patterns:
https://tuist.dev/:account/:project/builds/build-runs/:id
https://tuist.dev/:account/:project/tests/test-runs/:id
https://tuist.dev/:account/:project/tests/test-cases/:id
https://tuist.dev/:account/:project/tests/test-cases/runs/:id
https://tuist.dev/:account/:project/bundles/:id
Implement a shared Tuist.MCP.URLParser module that pattern-matches on path segments and returns {:ok, %{account_handle, project_handle, resource_type, resource_id}}. Each tool checks if the ID parameter starts with https:// and parses it automatically.
The CLI does not need native URL parsing. The compare skill handles URL parsing in its instructions, extracting the ID and project handle before calling CLI commands. This avoids a cross-cutting change to every CLI command’s argument parser for a use case that’s primarily agent-driven.
5. Add Comparison Prompts (MCP) and Skill (CLI)
MCP prompts and a CLI skill guide agents through comparisons using existing and newly added tools/commands. They follow the same pattern as fix_flaky_test (MCP) and fix-flaky-tests (CLI skill).
Implicit Baseline
When only a single resource is provided (one URL, one ID, or “this build”), the prompt/skill instructs the agent to resolve a baseline automatically by fetching the latest equivalent resource on the repository’s default branch (typically main). For example, if the user provides a single build URL, the agent fetches the latest build on main with matching scheme/configuration as the baseline. This means users can say “how does this build look?” and get a comparison without specifying both sides.
compare_builds (MCP prompt) / builds section of compare skill (CLI)
Parameters: base, head (ID, URL, or branch; head defaults to provided resource, base defaults to latest on main), account_handle, project_handle
Guidance covers:
- Resolving references (URLs go to
get_build/tuist build showdirectly, branch names uselist_builds/tuist build listwithpage_size: 1) - Comparing top-level metrics: status, duration, cache hit rate, environment, category, git context, custom metadata
- If duration regressed: drill into
list_build_targets/tuist build target listto find which targets slowed down - If cache hit rate dropped: drill into
list_build_cache_tasks/tuist build cache listto find new misses - If errors increased: drill into
list_build_issues/tuist build issue listto find new errors - Only drill into per-file data (
list_build_files/tuist build file list) if asked or if a specific target shows a large regression
Lead with regressions. Summarize improvements briefly.
compare_test_runs (MCP prompt) / test runs section of compare skill (CLI)
Parameters: base, head (ID, URL, or branch; base defaults to latest on main), account_handle, project_handle
Guidance covers:
- Resolving references
- Comparing top-level: status, failure count, flaky count, test count, duration, environment
- If failures increased: drill into
list_test_module_runs/tuist test module listto find which modules regressed, thenlist_test_case_runsfiltered bytest_run_idto identify new failures, thenget_test_case_run/tuist test case run showfor failure details (limit top 5) - If flaky count increased: identify newly flaky tests and chain into
fix_flaky_test/fix-flaky-tests - If duration regressed: drill into
list_test_module_runsto find slow modules, thenlist_test_suite_runsfor slow suites
compare_bundles (MCP prompt) / bundles section of compare skill (CLI)
Parameters: base, head (ID, URL, or branch; base defaults to latest on main), account_handle, project_handle
Guidance covers: resolving references, comparing total size, per-platform size, platform coverage changes, download vs install size delta.
compare_test_case (MCP prompt) / test case section of compare skill (CLI)
Parameters: test_case (ID, URL, or identifier), base_branch (defaults to main), head_branch, account_handle, project_handle
Guidance covers: fetching test case metrics, getting recent runs per branch via list_test_case_runs / tuist test case run list with git_branch filter, comparing status patterns, flakiness, duration, and failure messages. If newly flaky, chain into fix_flaky_test / fix-flaky-tests.
Example Interactions
Single resource with implicit baseline
User: How does this build look?
https://tuist.dev/tuist/ios-app/builds/build-runs/def456
[Agent fetches the build, sees it's on feature/new-cache, scheme: Debug]
[Agent fetches latest build on main with matching scheme as baseline]
Agent: Compared against latest build on main:
Duration improved by 5.2s (-11%), from 47.1s to 41.9s.
Cache hit rate improved from 87% to 94%.
Everything else unchanged. Looks good.
Drilling into target-level data
User: Why is this build slower?
https://tuist.dev/tuist/ios-app/builds/build-runs/abc123
[Agent fetches the build, resolves baseline on main]
[Agent sees duration regressed by 12s]
[Agent calls list_build_targets for both builds]
Agent: Build duration increased by 12.0s (+26.7%).
Slowest regressions by target:
- NetworkKit: +8.2s (was 15.1s, now 23.3s)
- CoreUI: +3.5s (was 9.8s, now 13.3s)
NetworkKit had 3 cache misses that were hits on main.
Want me to check which files in NetworkKit are slow?
User: Yes
[Agent calls list_build_files --target NetworkKit --sort-by compilation_duration]
Agent: Top slow files in NetworkKit:
- APIClient.swift: 4.1s
- RequestBuilder.swift: 2.8s
- ResponseParser.swift: 1.3s
APIClient.swift alone accounts for half the regression.
Two dashboard URLs
User: Compare these two builds
https://tuist.dev/tuist/ios-app/builds/build-runs/abc123
https://tuist.dev/tuist/ios-app/builds/build-runs/def456
[Agent calls get_build for each URL in parallel]
Agent: Duration increased by 12.0s (+26.7%), from 45.0s to 57.0s.
Cache hit rate dropped from 92% to 87% (3 additional misses).
Xcode version changed from 16.2 to 16.3.
The duration regression correlates with the cache misses and
the Xcode upgrade. Want me to check which targets missed?
Branch-based test run comparison
User: Did my PR make the tests worse?
[Agent resolves current branch, fetches latest test run on each branch]
Agent: Comparing latest test run on feature/auth-refactor vs main:
2 new test failures:
- AuthTests/testTokenRefresh: "Timeout waiting for expectation"
- AuthTests/testSessionExpiry: "Expected true, got false"
1 test fixed: BuildTests/testCleanBuild
Overall duration improved by 30s (-9.4%).
Both new failures are in AuthTests. Want me to investigate?
Drilling into test module data
User: Which test modules got slower?
[Agent calls list_test_module_runs for both test runs]
Agent: Module duration changes:
- NetworkTests: +12.3s (was 25.1s, now 37.4s) -- 2 new failures
- UITests: +3.1s (was 18.5s, now 21.6s)
- CoreTests: -1.2s (was 14.3s, now 13.1s) -- improved
NetworkTests has the biggest regression. Want me to drill
into which suites or test cases are responsible?
Bundle comparison
User: How much bigger is the latest bundle on release/2.0 vs main?
[Agent fetches latest bundle on each branch]
Agent: Bundle size increased by 1.2MB (+2.6%), from 45.3MB to 46.5MB.
iOS: +0.8MB, macOS: +0.4MB
watchOS platform was added on release/2.0.
Chaining into fix-flaky-tests
Agent: Test run comparison found 2 newly flaky tests:
- NetworkTests/testRetry
- CacheTests/testEviction
Want me to investigate and fix these?
User: Yes
[Agent chains into fix-flaky-tests skill/prompt for each test]
Implementation Plan
Phase 1: Granular Build Data Access
- Add server API endpoints for build sub-resources (targets, files, issues, cacheable tasks, CAS outputs)
- Add CLI subcommands:
tuist build target list|show,tuist build file list,tuist build issue list,tuist build cache list,tuist build cas list - Add MCP tools:
list_builds,get_build,list_build_targets,get_build_target,list_build_files,list_build_issues,list_build_cache_tasks,list_build_cas_outputs - Existing
tuist build showandtuist build listremain unchanged (no breaking changes)
Phase 2: Granular Test Data Access
- Add server API endpoints for test sub-resources (module runs, suite runs)
- Add CLI subcommands:
tuist test list,tuist test module list|show,tuist test suite list|show - Add MCP tools:
list_test_runs,list_test_module_runs,get_test_module_run,list_test_suite_runs,get_test_suite_run,list_test_case_runs - Existing
tuist test showand test case commands remain unchanged (no breaking changes)
Phase 3: Bundle MCP Tools
- Add
list_bundles,get_bundleMCP tools (reusebundles_controller.ex)
Phase 4: URL Support in MCP
- Implement
Tuist.MCP.URLParsermodule - Update all MCP tools that accept resource IDs to also accept dashboard URLs
Phase 5: Comparison Prompts and Skill
- Add MCP prompts:
compare_builds,compare_test_runs,compare_bundles,compare_test_case - Add CLI
compareskill (with URL parsing in skill instructions) - Document chaining with
fix-flaky-tests
Open Questions
tuist test liston CLI. The CLI hastuist test showbut notuist test list. Should we add it? It’s needed for branch-based test run comparison. (This RFC assumes yes.)- Contextual selectors. Should tools/commands support
latest:mainsyntax natively, or should prompts/skills handle the “list with page_size 1” pattern? The latter is more flexible but adds a round-trip. - Pagination for build sub-resources. Builds with thousands of files or CAS outputs may need pagination on the sub-resource endpoints. Should we add pagination from the start, or start without it and add if needed?