We have encountered a problem with unnecessary manifest caching on CI.
Because each build on CI has a new directory/workspace and environment variables → the necessary manifest cache is always missing in ~/.cache/tuist directory → manifests are always compiled → the cache is stored in ~/.cache/tuist and gradually grows.
I know that the cache directory can be overridden using the environment variable XDG_CACHE_HOME, but we do not want to disable the entire cache. We still need to share the dependency cache.
The tuist clean manifests command is unsafe in a multi-process scenario.
There may be several builds on a single CI node in parallel, which can lead to unexpected behavior if the cache is deleted during project generation. We have encountered such a problem in practice when use this approach.
Proposal
A new flag --no-manifests-cache could be added, which would not cache Manifests and ProjectDescriptionHelpers when generating a project:
Alternatively, you can use a custom XDG_CACHE_HOME and symlink the parts of the Tuist cache that you still want to be in the global cache directory.
Yes, I do it this way in the end, but this solution seems to me to be a workaround that relies too much on the internal logic of tuist, which could break in the future.
Something similar already happened when the .cache/tuist-cloud/Binaries folder was renamed to .cache/tuist/Binaries and we didn’t notice it right away and spent a lot of time.
This hack also creates a different problem:
Other CLI tools also use this environment variable, which is why we can’t override it in mise.toml. We have to override it at the call site in each script where tuist is called.
That’s right. Each new build for each new PR is a new unique directory with a cloned repository and a partially unique set of environment variable values.
The cache grows indefinitively and that’s concerning
That’s right. It’s concerning. We believe that the manifest cache, like DerivedData, should be stored within the project directory, not in the user directory, so that it can be deleted after the build is complete.
But we don’t mind the dependency cache remaining in the user directory, since in the scenario with external dependencies, their change/update/hash reinvalidation does not happen that often (that’s why the SPM, Cocoapods cache is located in the user directory by design).
The cache leads to non-deterministic behaviours in Tuist
This also happens sometimes, but it is not the real reason for this discussion)
At the moment, all the solutions that a tuist user can make on his side are crutches
Clear cache once every N days
Overriding of XDG_CACHE_HOME and substitution of symbolic links (a kind of backpressure)
XDG_CACHE_HOME: For Tuist’s global state that’s not intended to be shared across machines. As you mentioned, that’s server-specific state that’s not intended to be shared.
XDG_DATA_HOME: For everything else that’s portable across environments.
The spec also captures XDG_STATE_HOME for data that’s reusable across runs but not portable across environments. I believe we don’t have that scenario at Tuist, and XDG_CONFIG_HOME for global configuration of Tuist, which we don’t have support for yet.
I wouldn’t got that far in granularity unless strictly necessary.
Let’s wait to see what other people think before we commit to implementing anything. I quite like how the XDG spec uses different variables depending on the portability of the data being stored in the directories.
Yes, in terms of flexibility now, this may create difficulties in the future if it is necessary to implement a cache whose structure the user does not need to know.
XDG_DATA_HOME: For everything else that’s portable across environments.
I think this might get us closer to a more convenient solution, but I still see a problem in that we rely on the tool independent variables XDG_foo_bar to override the values of a specific tool (in this case tuist ENV).
The XDG Base specification can be used by other tools as well (notably in mise) and it might still be useful for us to be able to override the environment not for the entire infrastracture, but only for specific tools.
It seems to me that a more refined solution would be to implement your own XDG Based variables within tuist env space:
I’m aligned with adding TUIST_CACHE_DIR that defaults to XDG_CACHE_HOME.
I don’t think that distinction is correct. Here is the XDG specification for these directories:
There is a single base directory relative to which user-specific data files should be written. This directory is defined by the environment variable $XDG_DATA_HOME.
There is a single base directory relative to which user-specific non-essential (cached) data should be written. This directory is defined by the environment variable $XDG_CACHE_HOME.
Putting selective tests and binaries into XDG_DATA_HOME doesn’t feel right to me as both are a cached data type and not user-specific data files. User-specific data files would rather be for example credential files.
I’m not sure here. Many users believe the data is not user-specific. Hence many attempts to leverage CI caching APIs to copy data over across environments. However, the data is user-specific and tied to the environment where it got generated or pulled into (from the server), so establishing a distinction here might help eliminate some confusion and re-emphasize the fact that selective testing and binary cache artefacts are not shareable.
Selective tests and binaries are user-specific, what I meant is they are not user-specific data files. They are both user-specific cache data, the same way manifests and helpers are.
So, putting selective tests and binaries into XDG_DATA_HOME means imho going against the XDG specification.