RFC: Comparison Primitives for MCP, CLI, and Agent Skills

pepicrft · March 4, 2026, 3:44pm

Summary

This RFC proposes adding comparison workflows to Tuist by combining existing data-fetching tools/commands with new MCP prompts and a CLI skill that guide AI agents through structured comparisons. No new server-side comparison endpoints are needed. The approach fills gaps in the current MCP tool surface, adds URL support to MCP tools, and layers comparison intelligence on top via prompts and skills.

Motivation

Tuist already collects rich data about builds, test runs, and bundles. Users and agents can inspect individual resources, but answering comparative questions like “Did this PR make the build slower?” or “Are there new test failures compared to main?” requires multi-step orchestration with no guidance. Prompts and skills solve this by teaching agents what to fetch, what to compare, and how to present findings.

Why Client-Side Comparison

I considered server-side compare_* endpoints. Client-side is better because:

Payloads are small. A build show or test run show response is ~300-500 tokens. Two side by side is ~1K, which is negligible.
LLMs are good at diffing. Given two JSON objects and instructions, an LLM produces context-aware comparisons better than a rigid delta endpoint.
Less server work. No new endpoints, controllers, or tests. Reuse existing infrastructure.
Faster iteration. Updating a prompt or skill is a text change, not a deployment.
Flexibility. Prompts can guide the agent to focus on whatever dimension matters for the user’s question.

If a specific comparison involves genuinely large payloads (e.g., diffing 1000+ test case results), we can add a targeted server endpoint later. Start without it.

Current Inventory

CLI Commands (all support `--json`)

Command	Filters	Pagination
`tuist build list`	`--git-branch`, `--status`, `--scheme`, `--configuration`, `--tags`, `--values`	`--page`, `--page-size`
`tuist build show <id>`	-	-
`tuist test show <test-run-id>`	-	-
`tuist test case list`	`--flaky`, `--quarantined`	`--page`, `--page-size`
`tuist test case show <id-or-identifier>`	-	-
`tuist test case run list [id-or-identifier]`	`--flaky`, `--test-run-id`	`--page`, `--page-size`
`tuist test case run show <id>`	-	-
`tuist bundle list`	`--git-branch`	-
`tuist bundle show <id>`	-	-

MCP Tools

Tool	Parameters
`list_projects`	(none)
`list_test_cases`	`account_handle`, `project_handle`, `flaky`, `quarantined`, `module_name`, `name`, `suite_name`, `page`, `page_size`
`get_test_case`	`test_case_id` OR `account_handle` + `project_handle` + `identifier`
`get_test_run`	`test_run_id`
`get_test_case_run`	`test_case_run_id`

MCP Prompts

Prompt	Purpose
`fix_flaky_test`	Guides agent through diagnosing and fixing a flaky test

Structured Data Currently Tracked

Understanding the full data surface is critical for designing a granular interface. Here is everything Tuist stores.

Build Data

Build Run (top-level)

Field	Type	Description
`id`	UUID	Unique identifier
`duration`	Int32 (ms)	Total build duration
`status`	Enum	`success`, `failure`
`category`	Enum	`clean`, `incremental`
`scheme`	String	Build scheme
`configuration`	String	Build configuration (Debug, Release, etc.)
`xcode_version`	String	Xcode version
`macos_version`	String	macOS version
`model_identifier`	String	Machine model (e.g., MacBookPro18,1)
`is_ci`	Boolean	Whether run on CI
`ci_provider`	Enum	github, gitlab, bitrise, circleci, buildkite, codemagic
`ci_run_id`	String	CI system run ID
`ci_project_handle`	String	CI project reference
`ci_host`	String	CI host URL
`git_branch`	String	Git branch
`git_commit_sha`	String	Git commit SHA
`git_ref`	String	Git reference/tag
`cacheable_tasks_count`	Int32	Total cacheable compilation tasks
`cacheable_task_local_hits_count`	Int32	Local cache hits
`cacheable_task_remote_hits_count`	Int32	Remote cache hits
`custom_tags`	Array(String)	Up to 50 custom tags
`custom_values`	Map(String, String)	Up to 20 custom key-value pairs
`ran_by`	String	Account handle of who ran the build

Build Targets (per-target metrics)

Field	Type	Description
`name`	String	Target name
`project`	String	Project name (within monorepo)
`build_duration`	UInt64 (ms)	Total build duration for this target
`compilation_duration`	UInt64 (ms)	Compilation-only duration
`status`	Enum	`success`, `failure`

Build Files (per-file compilation metrics)

Field	Type	Description
`type`	Enum	`swift`, `c`
`target`	String	Parent target
`project`	String	Parent project
`path`	String	File path relative to project root
`compilation_duration`	UInt64 (ms)	Compilation time for this file

Build Issues (errors and warnings)

Field	Type	Description
`type`	Enum	`warning`, `error`
`target`	String	Target where issue occurred
`project`	String	Project where issue occurred
`title`	String	Issue title
`message`	String	Full message
`signature`	String	Deduplication signature
`path`	String	File path
`starting_line`, `ending_line`	UInt64	Line range
`starting_column`, `ending_column`	UInt64	Column range
`step_type`	Enum	`c_compilation`, `swift_compilation`, `script_execution`, `create_static_library`, `linker`, `copy_swift_libs`, `compile_assets_catalog`, `compile_storyboard`, `write_auxiliary_file`, `link_storyboards`, `copy_resource_file`, `merge_swift_module`, `xib_compilation`, `swift_aggregated_compilation`, `precompile_bridging_header`, `validate_embedded_binary`, `validate`, `other`

Cacheable Tasks (per-task cache hit/miss)

Field	Type	Description
`type`	Enum	`clang`, `swift`
`status`	Enum	`hit_local`, `hit_remote`, `miss`
`key`	String	Cache key
`read_duration`	Float64 (ms)	Time to read from cache
`write_duration`	Float64 (ms)	Time to write to cache
`description`	String	Task description
`cas_output_node_ids`	Array(String)	Linked CAS output nodes

CAS Outputs (content-addressed storage operations)

Field	Type	Description
`node_id`	String	CAS node ID
`checksum`	String	Content checksum
`size`	UInt64 (bytes)	Uncompressed size
`compressed_size`	UInt64 (bytes)	Compressed size
`duration`	UInt64 (ms)	Transfer duration
`operation`	Enum	`download`, `upload`
`type`	String	Content type (swift, object, pch, dSYM, swiftmodule, etc. – 45+ types)

Test Data

Test Run (top-level)

Field	Type	Description
`id`	UUID	Unique identifier
`duration`	Int32 (ms)	Total test run duration
`status`	Enum	`success`, `failure`, `skipped`
`is_ci`	Boolean	Whether run on CI
`is_flaky`	Boolean	Whether any test was flaky
`scheme`	String	Build scheme
`xcode_version`	String	Xcode version
`macos_version`	String	macOS version
`model_identifier`	String	Machine model
`git_branch`	String	Git branch
`git_commit_sha`	String	Git commit SHA
`git_ref`	String	Git reference
`build_run_id`	UUID	Associated build run
`ci_provider`	String	CI provider
`ci_run_id`	String	CI run ID
`total_test_count`	Int	Total test cases
`failed_test_count`	Int	Failed test cases
`flaky_test_count`	Int	Flaky test cases
`avg_test_duration`	Int	Average test case duration

Test Module Run (per-module)

Field	Type	Description
`name`	String	Module/target name
`status`	Enum	`success`, `failure`
`is_flaky`	Boolean	Whether any test in module was flaky
`duration`	Int32 (ms)	Total module duration
`test_suite_count`	Int32	Number of test suites
`test_case_count`	Int32	Total test cases
`avg_test_case_duration`	Int32 (ms)	Average test case duration

Test Suite Run (per-suite)

Field	Type	Description
`name`	String	Suite name
`status`	Enum	`success`, `failure`, `skipped`
`is_flaky`	Boolean	Whether any test in suite was flaky
`duration`	Int32 (ms)	Total suite duration
`test_case_count`	Int32	Test cases in suite
`avg_test_case_duration`	Int32 (ms)	Average duration

Test Case Run (per-test)

Field	Type	Description
`name`	String	Test case name
`module_name`	String	Module/target
`suite_name`	String	Suite
`status`	Enum	`success`, `failure`, `skipped`
`is_flaky`	Boolean	Flaky
`is_new`	Boolean	First time this test was seen
`is_ci`	Boolean	Ran on CI
`duration`	Int32 (ms)	Duration
`scheme`	String	Build scheme
`git_branch`	String	Branch
`git_commit_sha`	String	Commit

Test Case Failure

Field	Type	Description
`message`	String	Failure message / assertion
`path`	String	File path
`line_number`	Int32	Source line
`issue_type`	String	`error_thrown`, `assertion_failure`, `issue_recorded`, `unknown`

Test Case Run Repetition (retry attempts)

Field	Type	Description
`repetition_number`	Int32	1 = first run, 2 = retry 1, etc.
`name`	String	Human-readable (e.g., “Retry 1”)
`status`	String	`success`, `failure`
`duration`	Int32 (ms)	Duration of this attempt

Test Case Run Attachment

Field	Type	Description
`file_name`	String	Attachment filename (screenshots, logs, etc.)

Crash Report

Field	Type	Description
`exception_type`	String	e.g., “EXC_CRASH”
`signal`	String	e.g., “SIGABRT”
`exception_subtype`	String	Subtype
`triggered_thread_frames`	String	Formatted stack trace

Test Case (deduplicated definition with latest metrics)

Field	Type	Description
`name`	String	Test name
`module_name`	String	Module
`suite_name`	String	Suite
`last_status`	Enum	Last observed status
`last_duration`	Int32 (ms)	Duration of last run
`is_flaky`	Boolean	Currently flaky
`is_quarantined`	Boolean	Currently quarantined
`recent_durations`	Array(Int32)	Recent run durations for trends
`avg_duration`	Int64	Average duration
`reliability_rate`	Float	Percentage of successful runs
`flakiness_rate`	Float	Percentage of flaky runs (last 30 days)
`total_runs`	Int	Lifetime run count
`failed_runs`	Int	Failed run count

Bundle Data

Bundle (top-level)

Field	Type	Description
`id`	UUID	Unique identifier
`name`	String	Bundle name
`app_bundle_id`	String	App bundle identifier
`version`	String	Version string
`type`	Enum	`ipa`, `app`, `xcarchive`, `aab`, `apk`
`supported_platforms`	Array(String)	iOS, Android, macOS, tvOS, watchOS, visionOS, simulators
`install_size`	Int (bytes)	Installed size
`download_size`	Int (bytes)	Download size (nullable)
`git_branch`	String	Git branch
`git_commit_sha`	String	Git commit SHA
`git_ref`	String	Git reference
`uploaded_by_account`	String	Account that uploaded

Bundle Artifact (recursive tree)

Field	Type	Description
`artifact_type`	String	Type of artifact
`path`	String	Relative path in bundle
`size`	Int (bytes)	Size
`shasum`	String	SHA checksum
`children`	Array	Recursive child artifacts

What’s Needed

1. Granular CLI Commands and MCP Tools for Builds

The current tuist build show returns all build data in a single response. As we track more data, this becomes unwieldy for agents that only need a specific slice. Rather than changing build show (which would break existing scripts and integrations), we add dedicated subcommands that let agents and users query individual pieces of a build.

tuist build show and tuist build list remain unchanged – they continue to return the same data they do today. The new subcommands are purely additive.

New CLI Subcommands

Command	Description	Key Filters
`tuist build target list <build-id>`	List targets in a build with per-target durations	`--status`
`tuist build target show <build-id> <target-name>`	Show detailed metrics for a single target	-
`tuist build file list <build-id>`	List files with compilation durations	`--target`, `--type`, `--sort-by`
`tuist build issue list <build-id>`	List build issues (errors/warnings)	`--type`, `--target`, `--step-type`
`tuist build cache list <build-id>`	List cacheable tasks with hit/miss status	`--status`, `--type`
`tuist build cas list <build-id>`	List CAS (content-addressed storage) operations	`--operation`, `--type`

These subcommands return the same data that is nested inside tuist build show, but individually and with filtering. This is valuable for agents (which can progressively drill down without fetching everything) and for scripts that only care about one dimension (e.g., “give me just the errors”).

New MCP Tools for Builds

New MCP Tool	Mirrors CLI Command	Key Parameters
`list_builds`	`tuist build list`	`account_handle`, `project_handle`, `git_branch`, `status`, `scheme`, `configuration`, `tags`, `values`, `page`, `page_size`
`get_build`	`tuist build show`	`build_run_id` (ID or URL)
`list_build_targets`	`tuist build target list`	`build_run_id`, `status`
`get_build_target`	`tuist build target show`	`build_run_id`, `target_name`
`list_build_files`	`tuist build file list`	`build_run_id`, `target`, `type`, `sort_by`
`list_build_issues`	`tuist build issue list`	`build_run_id`, `type`, `target`, `step_type`
`list_build_cache_tasks`	`tuist build cache list`	`build_run_id`, `status`, `type`
`list_build_cas_outputs`	`tuist build cas list`	`build_run_id`, `operation`, `type`

get_build returns the full build response (same as tuist build show today, including nested data). The granular tools exist so agents can fetch individual slices with filtering when they don’t need the full payload:

“Which targets took longest?” → list_build_targets sorted by duration
“What files are slow to compile?” → list_build_files --sort-by compilation_duration
“What errors happened?” → list_build_issues --type error
“How is the cache performing?” → list_build_cache_tasks
“How much data transferred?” → list_build_cas_outputs

2. Granular CLI Commands and MCP Tools for Tests

Similarly, test data should be navigable at each level of the hierarchy: run → module → suite → case. Like builds, tuist test show remains unchanged. The new subcommands are additive.

New CLI Subcommands

Command	Description	Key Filters
`tuist test list`	List test runs	`--git-branch`, `--status`, `--scheme`, `--page`, `--page-size`
`tuist test module list <test-run-id>`	List module runs within a test run	`--status`
`tuist test module show <test-run-id> <module-name>`	Show a specific module run	-
`tuist test suite list <test-run-id>`	List suite runs within a test run	`--module`, `--status`
`tuist test suite show <test-run-id> <suite-name>`	Show a specific suite run	-

Existing test case commands (tuist test case list, tuist test case show, tuist test case run list, tuist test case run show) remain as-is since they already follow the granular pattern.

New MCP Tools for Tests

New MCP Tool	Mirrors CLI Command	Key Parameters
`list_test_runs`	`tuist test list`	`account_handle`, `project_handle`, `git_branch`, `status`, `scheme`, `page`, `page_size`
`list_test_module_runs`	`tuist test module list`	`test_run_id`, `status`
`get_test_module_run`	`tuist test module show`	`test_run_id`, `module_name`
`list_test_suite_runs`	`tuist test suite list`	`test_run_id`, `module_name`, `status`
`get_test_suite_run`	`tuist test suite show`	`test_run_id`, `suite_name`
`list_test_case_runs`	`tuist test case run list`	`account_handle`, `project_handle`, `test_case_id`, `test_run_id`, `flaky`, `page`, `page_size`

3. Fill Remaining MCP Tool Gaps (Bundles)

New MCP Tool	Mirrors CLI Command	Key Parameters
`list_bundles`	`tuist bundle list`	`account_handle`, `project_handle`, `git_branch`, `page`, `page_size`
`get_bundle`	`tuist bundle show`	`bundle_id` (ID or URL)

4. Add URL Support to MCP Tools

MCP tools that accept a resource ID should also accept a Tuist dashboard URL. URLs follow these patterns:

https://tuist.dev/:account/:project/builds/build-runs/:id
https://tuist.dev/:account/:project/tests/test-runs/:id
https://tuist.dev/:account/:project/tests/test-cases/:id
https://tuist.dev/:account/:project/tests/test-cases/runs/:id
https://tuist.dev/:account/:project/bundles/:id

Implement a shared Tuist.MCP.URLParser module that pattern-matches on path segments and returns {:ok, %{account_handle, project_handle, resource_type, resource_id}}. Each tool checks if the ID parameter starts with https:// and parses it automatically.

The CLI does not need native URL parsing. The compare skill handles URL parsing in its instructions, extracting the ID and project handle before calling CLI commands. This avoids a cross-cutting change to every CLI command’s argument parser for a use case that’s primarily agent-driven.

5. Add Comparison Prompts (MCP) and Skill (CLI)

MCP prompts and a CLI skill guide agents through comparisons using existing and newly added tools/commands. They follow the same pattern as fix_flaky_test (MCP) and fix-flaky-tests (CLI skill).

Implicit Baseline

When only a single resource is provided (one URL, one ID, or “this build”), the prompt/skill instructs the agent to resolve a baseline automatically by fetching the latest equivalent resource on the repository’s default branch (typically main). For example, if the user provides a single build URL, the agent fetches the latest build on main with matching scheme/configuration as the baseline. This means users can say “how does this build look?” and get a comparison without specifying both sides.

`compare_builds` (MCP prompt) / builds section of `compare` skill (CLI)

Parameters: base, head (ID, URL, or branch; head defaults to provided resource, base defaults to latest on main), account_handle, project_handle

Guidance covers:

Resolving references (URLs go to get_build/tuist build show directly, branch names use list_builds/tuist build list with page_size: 1)
Comparing top-level metrics: status, duration, cache hit rate, environment, category, git context, custom metadata
If duration regressed: drill into list_build_targets / tuist build target list to find which targets slowed down
If cache hit rate dropped: drill into list_build_cache_tasks / tuist build cache list to find new misses
If errors increased: drill into list_build_issues / tuist build issue list to find new errors
Only drill into per-file data (list_build_files / tuist build file list) if asked or if a specific target shows a large regression

Lead with regressions. Summarize improvements briefly.

`compare_test_runs` (MCP prompt) / test runs section of `compare` skill (CLI)

Parameters: base, head (ID, URL, or branch; base defaults to latest on main), account_handle, project_handle

Guidance covers:

Resolving references
Comparing top-level: status, failure count, flaky count, test count, duration, environment
If failures increased: drill into list_test_module_runs / tuist test module list to find which modules regressed, then list_test_case_runs filtered by test_run_id to identify new failures, then get_test_case_run / tuist test case run show for failure details (limit top 5)
If flaky count increased: identify newly flaky tests and chain into fix_flaky_test / fix-flaky-tests
If duration regressed: drill into list_test_module_runs to find slow modules, then list_test_suite_runs for slow suites

`compare_bundles` (MCP prompt) / bundles section of `compare` skill (CLI)

Parameters: base, head (ID, URL, or branch; base defaults to latest on main), account_handle, project_handle

Guidance covers: resolving references, comparing total size, per-platform size, platform coverage changes, download vs install size delta.

`compare_test_case` (MCP prompt) / test case section of `compare` skill (CLI)

Parameters: test_case (ID, URL, or identifier), base_branch (defaults to main), head_branch, account_handle, project_handle

Guidance covers: fetching test case metrics, getting recent runs per branch via list_test_case_runs / tuist test case run list with git_branch filter, comparing status patterns, flakiness, duration, and failure messages. If newly flaky, chain into fix_flaky_test / fix-flaky-tests.

Example Interactions

Single resource with implicit baseline

User: How does this build look?
      https://tuist.dev/tuist/ios-app/builds/build-runs/def456

[Agent fetches the build, sees it's on feature/new-cache, scheme: Debug]
[Agent fetches latest build on main with matching scheme as baseline]

Agent: Compared against latest build on main:

       Duration improved by 5.2s (-11%), from 47.1s to 41.9s.
       Cache hit rate improved from 87% to 94%.
       Everything else unchanged. Looks good.

Drilling into target-level data

User: Why is this build slower?
      https://tuist.dev/tuist/ios-app/builds/build-runs/abc123

[Agent fetches the build, resolves baseline on main]
[Agent sees duration regressed by 12s]
[Agent calls list_build_targets for both builds]

Agent: Build duration increased by 12.0s (+26.7%).

       Slowest regressions by target:
         - NetworkKit: +8.2s (was 15.1s, now 23.3s)
         - CoreUI: +3.5s (was 9.8s, now 13.3s)

       NetworkKit had 3 cache misses that were hits on main.
       Want me to check which files in NetworkKit are slow?

User: Yes

[Agent calls list_build_files --target NetworkKit --sort-by compilation_duration]

Agent: Top slow files in NetworkKit:
         - APIClient.swift: 4.1s
         - RequestBuilder.swift: 2.8s
         - ResponseParser.swift: 1.3s

       APIClient.swift alone accounts for half the regression.

Two dashboard URLs

User: Compare these two builds
      https://tuist.dev/tuist/ios-app/builds/build-runs/abc123
      https://tuist.dev/tuist/ios-app/builds/build-runs/def456

[Agent calls get_build for each URL in parallel]

Agent: Duration increased by 12.0s (+26.7%), from 45.0s to 57.0s.
       Cache hit rate dropped from 92% to 87% (3 additional misses).
       Xcode version changed from 16.2 to 16.3.

       The duration regression correlates with the cache misses and
       the Xcode upgrade. Want me to check which targets missed?

Branch-based test run comparison

User: Did my PR make the tests worse?

[Agent resolves current branch, fetches latest test run on each branch]

Agent: Comparing latest test run on feature/auth-refactor vs main:

       2 new test failures:
         - AuthTests/testTokenRefresh: "Timeout waiting for expectation"
         - AuthTests/testSessionExpiry: "Expected true, got false"

       1 test fixed: BuildTests/testCleanBuild
       Overall duration improved by 30s (-9.4%).

       Both new failures are in AuthTests. Want me to investigate?

Drilling into test module data

User: Which test modules got slower?

[Agent calls list_test_module_runs for both test runs]

Agent: Module duration changes:
         - NetworkTests: +12.3s (was 25.1s, now 37.4s) -- 2 new failures
         - UITests: +3.1s (was 18.5s, now 21.6s)
         - CoreTests: -1.2s (was 14.3s, now 13.1s) -- improved

       NetworkTests has the biggest regression. Want me to drill
       into which suites or test cases are responsible?

Bundle comparison

User: How much bigger is the latest bundle on release/2.0 vs main?

[Agent fetches latest bundle on each branch]

Agent: Bundle size increased by 1.2MB (+2.6%), from 45.3MB to 46.5MB.
         iOS: +0.8MB, macOS: +0.4MB
       watchOS platform was added on release/2.0.

Chaining into fix-flaky-tests

Agent: Test run comparison found 2 newly flaky tests:
         - NetworkTests/testRetry
         - CacheTests/testEviction

       Want me to investigate and fix these?

User: Yes

[Agent chains into fix-flaky-tests skill/prompt for each test]

Implementation Plan

Phase 1: Granular Build Data Access

Add server API endpoints for build sub-resources (targets, files, issues, cacheable tasks, CAS outputs)
Add CLI subcommands: tuist build target list|show, tuist build file list, tuist build issue list, tuist build cache list, tuist build cas list
Add MCP tools: list_builds, get_build, list_build_targets, get_build_target, list_build_files, list_build_issues, list_build_cache_tasks, list_build_cas_outputs
Existing tuist build show and tuist build list remain unchanged (no breaking changes)

Phase 2: Granular Test Data Access

Add server API endpoints for test sub-resources (module runs, suite runs)
Add CLI subcommands: tuist test list, tuist test module list|show, tuist test suite list|show
Add MCP tools: list_test_runs, list_test_module_runs, get_test_module_run, list_test_suite_runs, get_test_suite_run, list_test_case_runs
Existing tuist test show and test case commands remain unchanged (no breaking changes)

Phase 3: Bundle MCP Tools

Add list_bundles, get_bundle MCP tools (reuse bundles_controller.ex)

Phase 4: URL Support in MCP

Implement Tuist.MCP.URLParser module
Update all MCP tools that accept resource IDs to also accept dashboard URLs

Phase 5: Comparison Prompts and Skill

Add MCP prompts: compare_builds, compare_test_runs, compare_bundles, compare_test_case
Add CLI compare skill (with URL parsing in skill instructions)
Document chaining with fix-flaky-tests

Open Questions

tuist test list on CLI. The CLI has tuist test show but no tuist test list. Should we add it? It’s needed for branch-based test run comparison. (This RFC assumes yes.)
Contextual selectors. Should tools/commands support latest:main syntax natively, or should prompts/skills handle the “list with page_size 1” pattern? The latter is more flexible but adds a round-trip.
Pagination for build sub-resources. Builds with thousands of files or CAS outputs may need pagination on the sub-resource endpoints. Should we add pagination from the start, or start without it and add if needed?

marekfort · March 4, 2026, 4:26pm

The vast majority of the information you mention is retrievable from the .xcactivitylog. I’d push back against tying this proposal strongly with the proxy. We can include the proxy, especially in CI environments, to get richer data, but I do think it’s better if we don’t over-rely on it.

why are we including Bazel into this RFC? We don’t plan to include Bazel support this year. I’d strongly advise not to include unsupported build systems into RFCs like this, unless there’s something specific we’re taking inspiration from.

why can’t we stick to the current pattern of the client sending a regular GET request?

If I’m completely honest, I’m a bit lost on the point of this RFC. We already have the architecture both for Xcode and Gradle to do comparisons. We need to:

evolve the CLI/API/MCP to give users access to all the data we already track
track more data. For some of the data, we might need to evolve the way we track them (as some Xcode data might be out of reach unless we integrate with the build proxy).

I am also completely missing the connection with the current architecture and how we evolve it. I don’t currently see a need for a big-bang approach where we do something completely different.

Similarly to Bazel, let’s not pollute this proposal with build systems we currently have no plans to support. We’ve just released support for Gradle and we’ll be busy with Xcode and Gradle for a while …

pepicrft · March 4, 2026, 6:28pm

Thanks for the feedback. Let me address each point.

On the proxy: you’re right that .xcactivitylog covers most of the basic build data. The piece it doesn’t expose is the build graph, the dependency relationships between targets, which I think is where the most valuable insights will come from. Understanding why something is slow, not just that it is. That said, I agree that’s future scope, so I’ll reframe the proxy as an optional enrichment layer rather than a core dependency for v1.

On Bazel and other unsupported build systems: the intent wasn’t to signal we’re supporting them soon, but to understand what data other build systems expose and reason through how it’d fit our model. That kind of thinking helps us avoid designing something too Xcode-specific that we’d have to rework later. Happy to make that clearer so it doesn’t read as scope creep.

On the ingestion approach: the reasoning behind storing raw reports in object storage was to avoid having to make hard decisions upfront about what’s valuable. If you store only structured metrics in ClickHouse, you risk needing data later that you already threw away. That said, I think you’re right that we don’t fully have that problem yet. The more pragmatic path is to start with a curated subset of structured metrics that covers the most actionable insights, and design the ingestion layer so adding more data points later is cheap. We can absorb the ClickHouse cost for now and revisit when the pressure is real.

On the broader confusion: that’s on me. I’ll rewrite the framing to make clear this is an evolution of what we already have, not a big bang rewrite.

pepicrft · March 5, 2026, 10:20am

@marekfort ready for another review

marekfort · March 5, 2026, 10:36am

Thanks @pepicrft for the updates, this is now much better scoped and digestable, making it easier for me to provide feedback.

All of the build data in a single CLI command/endpoint might be too much, especially as we want to track more.

I think we need to come up with a structure that will allow the agent to prompt the individual pieces of the build, such as individual module build times could be tuist build module list <build-id>. In this RFC, I’d include how we want to surface all the structured data we currently track.

I agree with this. Primarily because the data will actually get quite large and as we track more data, it will be hard to do 1-1 diff (for example, once we track CPU usage, how do you compare the CPU graph properly?)

I would still push for having a diff option in the dashboard to compare data that is easily diffable (change in build time, what build settings changed, etc.) But we can keep this RFC scoped to the agentic comparison workflow, which I would prioritize tackling.

Otherwise, very aligned with this proposal

pepicrft · March 5, 2026, 11:29am

That makes sense. I made the build interface more granular so the agent can drill as needed with the support of our skills and MCP prompts.

Agree, I’d do this in a follow-up once we have a better understanding of what kinds of insights people derive from the comparisons.

marekfort · March 5, 2026, 11:35am

note we will need to reconcile Gradle and Xcode as part of a single command. Alternatively, we have a new CLI scope for gradle, such as tuist build gradle .... I think a different scope might make more sense since otherwise some subcommands won’t make sense (Gradle doesn’t have a notion of CAS)

pepicrft · March 5, 2026, 11:37am

Good point. We can do that in the MCP tools too. The agent should be able to determine the right tool/CLI command to call based on the build system returned with the project.

RFC: Comparison Primitives for MCP, CLI, and Agent Skills

Summary

Motivation

Why Client-Side Comparison

Current Inventory

CLI Commands (all support --json)

MCP Tools

MCP Prompts

Structured Data Currently Tracked

Build Data

Build Run (top-level)

Build Targets (per-target metrics)

Build Files (per-file compilation metrics)

Build Issues (errors and warnings)

Cacheable Tasks (per-task cache hit/miss)

CAS Outputs (content-addressed storage operations)

Test Data

Test Run (top-level)

Test Module Run (per-module)

Test Suite Run (per-suite)

Test Case Run (per-test)

Test Case Failure

Test Case Run Repetition (retry attempts)

Test Case Run Attachment

Crash Report

Test Case (deduplicated definition with latest metrics)

Bundle Data

Bundle (top-level)

Bundle Artifact (recursive tree)

What’s Needed

1. Granular CLI Commands and MCP Tools for Builds

New CLI Subcommands

New MCP Tools for Builds

2. Granular CLI Commands and MCP Tools for Tests

New CLI Subcommands

New MCP Tools for Tests

3. Fill Remaining MCP Tool Gaps (Bundles)

4. Add URL Support to MCP Tools

5. Add Comparison Prompts (MCP) and Skill (CLI)

Implicit Baseline

compare_builds (MCP prompt) / builds section of compare skill (CLI)

compare_test_runs (MCP prompt) / test runs section of compare skill (CLI)

compare_bundles (MCP prompt) / bundles section of compare skill (CLI)

compare_test_case (MCP prompt) / test case section of compare skill (CLI)

Example Interactions

Single resource with implicit baseline

Drilling into target-level data

Two dashboard URLs

Branch-based test run comparison

Drilling into test module data

Bundle comparison

Chaining into fix-flaky-tests

Implementation Plan

Phase 1: Granular Build Data Access

Phase 2: Granular Test Data Access

Phase 3: Bundle MCP Tools

Phase 4: URL Support in MCP

Phase 5: Comparison Prompts and Skill

Open Questions

CLI Commands (all support `--json`)

`compare_builds` (MCP prompt) / builds section of `compare` skill (CLI)

`compare_test_runs` (MCP prompt) / test runs section of `compare` skill (CLI)

`compare_bundles` (MCP prompt) / bundles section of `compare` skill (CLI)

`compare_test_case` (MCP prompt) / test case section of `compare` skill (CLI)