We’re getting close to unlocking our first Tuist QA workflow:
- PR is created with your changes
tuist shareis run as part of the CI- You run the Tuist QA by adding a comment such as
/tuist qa Test my featureX - Tuist spins up a new agent that tests the Preview based on the prompt
- Once the agent is finished, Tuist posts the QA run summary, including:
- what was tested
- issues found
- screenshots as it tests the functionality
While initially, we will certainly spend a lot of time in improving the core functionality (improving the base prompt, iterating on the output, etc.), we also think it’s good to start thinking ahead. I will outline a couple of features that I feel would make the Tuist QA significantly better – but happy to hear your ideas for what you’d find useful.
App context
There is some context that, while not required, will make the agent more efficient across runs.
App description
The larger the app, the more difficult it will be for the agent to reliably test a feature based on the testing prompt. Features are named specific ways and they can be deep inside the app and hard for the agent to find. We will need to explore what techniques lead to the best results, but initially, I’m thinking of two different pieces of app context:
- Overall description of the app – what’s it used for, how individual app domains are called, etc.
- Detailed description of where features are located
We can try generating both with the AI, but it might be better if the former is manually written instead.
The latter, however, should be generated by an agent who would use the human-supplied description of the app and then it would go ahead and explore. Based on the exploration, it would provide a summary how to best navigate around the app and what are all the features of the app. This exploration run would be quite long for large apps, but it would then make subsequent QA runs way more efficient. I see this exploration being done regularly, such as once per day, to keep the description relatively fresh.
Log in data
Even though the agent could probably figure out how to sign up and sign in on every run, this is definitely not optimal. We’ll need a way to specify the login credentials and how to use them (such as “Sign by email, username: xx, password: 12345”).
We can also explore if we can supply the credentials from the command line instead of the agent re-running the same sign in flow over and over again.
Triggering QA outside of PRs
While our initial focus is on triggering Tuist QA from the PR, we definitely see the value in triggering QA also directly from the Tuist dashboard or from the Tuist app.
Tuist Previews should have a button to trigger Tuist QA along with the prompt. Additionally, each Preview detail should link to Tuist QA runs associated with that preview.
Tuist QA insights
We should have a page similar to our Previews or Bundles where we:
- List all QA runs as they happen
- Tuist QA analytics for a given time frame:
- Number of runs
- Average time it takes to run Tuist QA
- App issues found
- … ? I think what time-frame analytics will be useful will become more obvious once teams start using it more actively
Tuist QA detail
We should start with surfacing basic metadata about the QA run:
- length of the run
- preview that was tested
- triggered by
- started at
- duration
- …
We should also surface the same information we’re surfacing in the GitHub PR:
- Summary
- Steps taken
- Screenshots taken
Agent replay
Once we surface the most important information from the agent, the next step will be to do an agent replay, a chat-like replay, including artifacts like screenshots and the majority of the agent logs as it navigates the app.
This will help folks understand better how exactly the agent tested the app and debug their prompts.
Additionally, we should be taking a recording and show a simulator view with the recording, so you can see always exactly what was on the screen when the agent was testing the app.
Live sessions
This is not too different from the agent replay – but we’d be streaming what the agent is doing live, including the simulator screen.
Human-in-the-loop
This is definitely something more long-term, but once we can show what the agent is doing live, we can also let human redirect the agent as it’s running the tests, particularly useful when understanding what prompts work the best or when the feature has a wide scope and is more exploratory.
Automatically triggering Tuist QA
Another step will be automatically triggering the Tuist QA.
Deriving what to test based on the PR description
If the PR includes a PR description with a summary of what this feature adds, we can automatically run Tuist QA without engineers specifying a specific prompt. We can either take a conventional approach of what should be included in the PR description (such as ## Tuist QA instructions) or we can derive that with AI. In that case, we would trigger Tuist QA only if the AI had a high degree of certainty that the PR description describes well enough what should be tested.
Common scenarios
We see the first iteration of Tuist QA to be focused on testing PR-specific changes without re-running a given set of tests on every PR/merge to main. But we think Tuist QA could eventually be run more often also for more repetitive tasks, so teams don’t have to maintain their own UI test suite – and these tests could easily be written by non-engineers, too. This is something I’d leave to explore later once we nail the more dynamic UI test replacements with our first iteration.
Gathering more information from runs
If there’s an issue, it’s important to provide as much context as possible. Right now, we’re limited ourselves to the agent interactions and screenshots, but we can certainly expand this to:
- gathering network logs
- integrating with libraries like TCA to track the app state
- … and more
Feedback
This post is highlighting the overall direction of Tuist QA. We’re still early, so a lot of this is subject to change. And the space is moving fast.
If there are specific areas of Tuist QA that you’d like us to explore, we’d love to hear those.
Overall, any feedback is appreciated ![]()