Flag to disable caching Manifests due Project generation

Why is this needed?

We have encountered a problem with unnecessary manifest caching on CI.
Because each build on CI has a new directory/workspace and environment variables → the necessary manifest cache is always missing in ~/.cache/tuist directory → manifests are always compiled → the cache is stored in ~/.cache/tuist and gradually grows.

I know that the cache directory can be overridden using the environment variable XDG_CACHE_HOME, but we do not want to disable the entire cache. We still need to share the dependency cache.

The tuist clean manifests command is unsafe in a multi-process scenario.
There may be several builds on a single CI node in parallel, which can lead to unexpected behavior if the cache is deleted during project generation. We have encountered such a problem in practice when use this approach.

Proposal

A new flag --no-manifests-cache could be added, which would not cache Manifests and ProjectDescriptionHelpers when generating a project:

tuist generate --no-manifests-cache

:wave: @Ernest0N

I’m not sure if I follow the problem here. Is the problem that:

  • The cache is not re-used across builds
  • The cache grows indefinitively and that’s concerning
  • The cache leads to non-deterministic behaviours in Tuist

Would you mind expanding more on the problem/challenge?

I believe this is not a problem that we should be solving by adding an extra flag.

Have you considered cleaning the cache weekly during your company’s downtime?

Alternatively, you can use a custom XDG_CACHE_HOME and symlink the parts of the Tuist cache that you still want to be in the global cache directory.

Alternatively, you can use a custom XDG_CACHE_HOME and symlink the parts of the Tuist cache that you still want to be in the global cache directory.

Yes, I do it this way in the end, but this solution seems to me to be a workaround that relies too much on the internal logic of tuist, which could break in the future.

Something similar already happened when the .cache/tuist-cloud/Binaries folder was renamed to .cache/tuist/Binaries and we didn’t notice it right away and spent a lot of time.


This hack also creates a different problem:
Other CLI tools also use this environment variable, which is why we can’t override it in mise.toml. We have to override it at the call site in each script where tuist is called.

The cache is not re-used across builds

That’s right. Each new build for each new PR is a new unique directory with a cloned repository and a partially unique set of environment variable values.

The cache grows indefinitively and that’s concerning

That’s right. It’s concerning. We believe that the manifest cache, like DerivedData, should be stored within the project directory, not in the user directory, so that it can be deleted after the build is complete.

But we don’t mind the dependency cache remaining in the user directory, since in the scenario with external dependencies, their change/update/hash reinvalidation does not happen that often (that’s why the SPM, Cocoapods cache is located in the user directory by design).

The cache leads to non-deterministic behaviours in Tuist

This also happens sometimes, but it is not the real reason for this discussion)


At the moment, all the solutions that a tuist user can make on his side are crutches

  • Clear cache once every N days
  • Overriding of XDG_CACHE_HOME and substitution of symbolic links (a kind of backpressure)

@Ernest0N what are the cacheable artifacts (e.g. manifests…) that you’d like to scope globally, and which ones per project?

@pepicrft

Globally:

  • Binaries
  • SelectiveTests

Locally:

  • EditProjects
  • ProjectDescriptionHelpers
  • Manifests

But a more flexible option (IMHO) would be to specify an ENV variable for each category:

TUIST_<CATEGORY>_CACHE_PATH

To maintain backward compatibility, you can reference xdg_cache_home by default, as mise does: Directory Structure | mise-en-place

I think it’s a good idea. Thoughts @core? Connecting the above with the XDG Base Directory Specification, which Mise follows:

  • XDG_CACHE_HOME: For Tuist’s global state that’s not intended to be shared across machines. As you mentioned, that’s server-specific state that’s not intended to be shared.
  • XDG_DATA_HOME: For everything else that’s portable across environments.

The spec also captures XDG_STATE_HOME for data that’s reusable across runs but not portable across environments. I believe we don’t have that scenario at Tuist, and XDG_CONFIG_HOME for global configuration of Tuist, which we don’t have support for yet.

I wouldn’t got that far in granularity unless strictly necessary.

Let’s wait to see what other people think before we commit to implementing anything. I quite like how the XDG spec uses different variables depending on the portability of the data being stored in the directories.

1 Like

granularity unless strictly necessary

Yes, in terms of flexibility now, this may create difficulties in the future if it is necessary to implement a cache whose structure the user does not need to know.

  • XDG_DATA_HOME: For everything else that’s portable across environments.

I think this might get us closer to a more convenient solution, but I still see a problem in that we rely on the tool independent variables XDG_foo_bar to override the values of a specific tool (in this case tuist ENV).

The XDG Base specification can be used by other tools as well (notably in mise) and it might still be useful for us to be able to override the environment not for the entire infrastracture, but only for specific tools.

It seems to me that a more refined solution would be to implement your own XDG Based variables within tuist env space:

TUIST_CACHE_DIR (default: XDG_CACHE_HOME)
TUIST_DATA_DIR (default: XDG_DATA_HOME)

A similar approach is used in mise-en-place (if you open the link i provided earlier):

MISE_CACHE_DIR (default: XDG_CACHE_HOME)
MISE_DATA_DIR (default: XDG_DATA_HOME)

I’m very onboard with the plan. As I said, let’s see what others think and align before we move to any implementation.

I’m aligned with adding TUIST_CACHE_DIR that defaults to XDG_CACHE_HOME.

I don’t think that distinction is correct. Here is the XDG specification for these directories:

  • There is a single base directory relative to which user-specific data files should be written. This directory is defined by the environment variable $XDG_DATA_HOME.
  • There is a single base directory relative to which user-specific non-essential (cached) data should be written. This directory is defined by the environment variable $XDG_CACHE_HOME.

Putting selective tests and binaries into XDG_DATA_HOME doesn’t feel right to me as both are a cached data type and not user-specific data files. User-specific data files would rather be for example credential files.

I’m not sure here. Many users believe the data is not user-specific. Hence many attempts to leverage CI caching APIs to copy data over across environments. However, the data is user-specific and tied to the environment where it got generated or pulled into (from the server), so establishing a distinction here might help eliminate some confusion and re-emphasize the fact that selective testing and binary cache artefacts are not shareable.

Selective tests and binaries are user-specific, what I meant is they are not user-specific data files. They are both user-specific cache data, the same way manifests and helpers are.

So, putting selective tests and binaries into XDG_DATA_HOME means imho going against the XDG specification.