Researchers tested an elastic caching approach in Spanner and against several publicly available cache traces, measuring how the system could adjust cache time-to-live (TTL) settings based on workload patterns and costs. The results showed lower memory use with only a small increase in cache misses, suggesting a way to reduce cache footprint without significantly raising storage costs.
In production workloads, the team developed an algorithm that assigns a TTL to each cached page on each page request. Because Spanner handles billions of requests per second, the model had to be lightweight, so the team used a shallow decision tree that could be translated into a few lines of C++ code. The model considered features such as the size of the data, the cost of a cache miss, and the type of database operation.
The elastic caching policy was integrated into Spanner’s production servers over several months. Compared with a standard fixed-size cache, memory usage was reduced by 15.5%, cache misses increased by only 5.5%, and total cost of ownership (TCO) was reduced by approximately 5%.
The researchers said the algorithm is “cost-aware,” meaning the small increase in cache misses was concentrated on data that is cheap to fetch from storage. As a result, the impact on actual I/O costs was a negligible 0.5%.
They also evaluated the approach using several publicly available cache traces. For the baseline, they used an optimized implementation of the greedy dual size frequency (GDSF) eviction algorithm, described as a generalization of the well-known LRU policy that allows for pages of different sizes.
The public-trace tests used four variants of elastic caching depending on which ski rental algorithm was used and whether a machine learned model was included. Because the public traces did not have application-level features for training, the team did not implement decision trees for prediction. Instead, they split each trace in half and used the first half for training.
For each page in the training trace, the team computed the best TTL for that page to minimize cost over the training trace. They also warmed up all caches with one day’s worth of requests from the second half of the trace before running tests and measurements. During testing, pages seen during training were assigned their best precomputed TTL, while unseen pages used either the breakeven or randomized policies.
Source: research.google.
Companies can share verified announcements through Newz9’s international press release submission page.

