Improve Tuist's performance

Tuist, especially tuist generate, is run often. We need to ensure that the command is fast. On top of that, we should put checks in place, so the performance does not deteriorate again in the future.

About

  • Champion: @marekfort
  • Expected finish date: end of 2024

Why

Tuist relies on project generation to get rid of excessive merge conflicts by defining projects in Swift and builds on top of the project generation to provide extra functionality such as caching.

But to work with a Tuist project, developers need to generate Xcode projects often. On warm runs, this action can take up to 10 seconds in large projects – this performance is subpar, to say the least.

Goals

The goal is to cut the generation time by 50 - 80 %. Additionally, we should track the command run times in our public dashboard to ensure we don’t regress performance again the future. That data will only surface anonymous telemetry data from users signed up on the Tuist server as otherwise, we don’t have any tracking.

Initial measurements

The baseline measurements on my MacBook M3 Pro with 36 GB of RAM are the following.

Fixture

The first point of measurements is a fixture generated by the tuistfixturegenerator command. The tested project was generated with tuistfixturegenerate --projects 10 --targets 30 --sources 300.

Using the hyperfine command, we got the following results:

# Cold runs
Benchmark 1: tuist generate --no-open
  Time (mean ± σ):      3.630 s ±  0.162 s    [User: 23.107 s, System: 3.319 s]
  Range (min … max):    3.400 s …  3.812 s    10 runs

# Warm runs

Benchmark 1: tuist generate --no-open
  Time (mean ± σ):      3.268 s ±  0.281 s    [User: 24.825 s, System: 2.128 s]
  Range (min … max):    2.895 s …  3.574 s    10 runs

Real-world project

We have an access to a real-world project that we unfortunately can’t share. The initial measurements are the following:

# Cold runs
Benchmark 1: tuist generate --no-open
  Time (mean ± σ):     12.006 s ±  0.181 s    [User: 21.492 s, System: 4.961 s]
  Range (min … max):   11.849 s … 12.471 s    10 runs

# Warm runs
Benchmark 1: tuist generate --no-open
  Time (mean ± σ):      6.354 s ±  0.244 s    [User: 10.129 s, System: 2.629 s]
  Range (min … max):    6.096 s …  6.816 s    10 runs

Proposed solution

After initial investigation with Xcode Instruments, we found out we’re not leveraging the CPU efficiently. The CPU is unused for large stretches of time:

The current migration to FileSystem and Command utilities that are built with Swift concurrency will help us to parallelize – primarily globbing and hashing files are operations where we can benefit a lot from parallelization. As part of this initiative, we aim to finish the migration and ensure we parallelize where it makes sense.

We will also optimize some of the slowest methods. For example, ConfigGenerator.swiftMacrosDerivedSettings is a blocking operation that takes ~4.7 % of the time. Blocking operations like this have an outsized impact on the overall performance.

By the end of this project, the CPU usage should be highly more efficient – there should be no long stretches of time when the CPU is underutilized. We will use the performance dashboard to track our progress.

We will continue updating this initiative as we dig deeper.

3 Likes

Hey, everyone!

This is the first update of this initiative.

Progress made

Performance dashboard

We added the command runs analytics in our public dashboard. The generation time average often reach 10-20 seconds, which highlights again the importance of this initiative. We will keep monitoring the dashboard as we start rolling out performance improvements.

Parallelization of hashing

Hashing files was identified as one of the bottlenecks in the profiler where hashing would take a long time but the CPU usage would be low. We fixed that by hashing files in parallel, improving performance by up to 15 % :tada:

Globs

Another bottleneck we identified was globbing. We migrated FileHandler.glob to the asynchronous FileSystem.glob. However, we don’t expect to see any performance benefits as we were forced to use the old implementation due to some issues we uncovered in the swift-glob library. We will start working on fixing those issues as we believe moving to that library will improve the performance of globbing significantly.

What’s next

  • We will migrate the AbsolutePath.glob to FileSystem.glob
  • We will start working on fixing issues with swift-glob
  • Another bottleneck we uncovered is the allSwiftPluginExecutables method in our GraphTraverser. That method goes repeatedly through each target’s direct and transitive dependencies which is the reason why it’s most likely slow. We will look into improving the performance of that method.
1 Like

Had time to actually put together this PR that hopefully should help performance in allSwiftPluginExecutables.

1 Like

Another week, another set of performance improvements!

Progress made

Globs

As reported last week, we migrated to the swift-glob API, but we were not able to use the library just yet. We have fixed the library’s issues in our fork that we plan to eventually upstream back to the original repository.

The new globbing implementation resulted in an improvement of around 20 % on warm runs.

Graph traversal

@waltflanagan has improved the performance of the allSwiftPluginExecutables by caching already visited nodes. Optimizing that single method resulted in a 7 % performance improvement!

Reading Swift tools version

Reading the Swift tools version by running swift package tools-version is slow – and the information is already included in the output produced by the swift package dump-package command. Reading the tools version from the manifest has resulted in an extra 12 % perf improvement!

What’s next

Overall, the warm tuist generate performance has improved when run in a real-world project from around 6.5 seconds to 3.9 – which is already a 40 % improvement.

We’re not stopping there!

We’ve identified these following methods where the CPU is not being efficiently utilized for a long stretch of time:

Most of the operations these methods do are necessary – but what we can do to improve performance is to run them in parallel. However, we will need to do some extra refactoring as those methods work with XcodeProj classes and update them in-place which is not concurrency-safe.

We’ll need to change those methods to be written as pure functions and only update the XcodeProj classes once we gather the results from a single thread. Updating the XcodeProj classes with precomputed results should be fast.

I will be moving for a bit to focus on Tuist Previews again, so I don’t know when I will be able to post the next updates with those improvements. If there’s someone from the community that would like to try, let me know! But rest assured, one way or another, we will get it done. With the parallelization of those methods, we should be able to achieve our goal of 50 % of the performance improvement.

1 Like

I’m curious if we want to take the leap to use actor types in XcodeProj to enable them to be thread safe and more easily parallelizable. It would be a larger interface change to the dependency and impact other consumer projects but may be a better approach than manually locking properties. Thoughts?

Yes, moving XcodeProj to be a value and actor type is something we should consider long-term and I’m aligned with that. But that’s something that’s a major piece of work. I do think we can parallelize at the tuist/tuist level with more minor changes by precomputing results and then work with shared XcodeProj classes on a single thread before diving into the XcodeProj refactor itself.