Ruby handles hash keys and values as mutable objects. This means that if you pass a hash as a bound parameter to a query, the query cache will not be able to detect if the hash has been mutated and will return a false-positive cache hit. This can lead to unexpected results.
First let’s understand hash mutation in Ruby. Ruby hashes are mutable objects. This means that even when a hash’s value changes, it still has the same object ID (a unique identifier for that object). Let’s see this in action:
However when the value of an integer changes, it gets a new object id. Let’s see this in action:
Interestingly, if you try this out in your Rails console, you will see that the object id of the integer does not change. This is because all integers are “frozen”. This means that they cannot be mutated, i.e., it’s value can not change unless it’s object ID also changes.
We can also freeze a hash to prevent it from being mutated. Let’s see this in action:
You can check if an object is frozen or not by calling the
frozen? method on it.
In fact objects like integers, strings, symbols, true, false, nil are all frozen by default. This means that they cannot be mutated.
Next, let’s look into how a hash queries for the key that is requested. It’s common knowledge that a hash is a key-value pair, however what is something that is less known is that the hash of the key is also stored along with this pair. This is what enables fast lookups. It is also what allows for a mutable object to be the key of a hash.
Let’s see this in action:
As you can see, the hash of the array changes when the array is mutated, since its data has changed. This might raise a question of how the value of the key continues to change within the hash. It is because the object ID of the array is used to resolve the value. Since the object ID of the array does not change, the value of the key is still resolved correctly. However, since the contents of that array have changed, it’s hash has also changed.
Now any good hash function will have a low collision rate. This means that the probability of two different keys having the same hash is low. This is why the hash of the array is a good key to use for caching. However, low collision rate does not mean no collision rate. This is the basis for the false-positive query cache hits when querying with mutable bound parameters.
Let’s look into how the query cache works.
When a query is executed, the query cache stores the result
and the query so that when the same query is requested again,
the result can be fetched from the cache.
This is a performance optimization that can be enabled
To see how such false-positives might occur, we will create a Rocket model with a payload attribute that can store hashes.
We will now create a Rocket model that stores a hash in its payload attribute and then search for it using a query that has a mutable bound parameter. A mutable bound parameter is just passing in a hash as a parameter to the query.
Now when we run this test, we get a false-positive rate of 0.029%. This is because when the new hash key is generated, sometimes, it falls into the same bucket as the original hash key. So, instead of a cache miss, the query cache returns the original record.
Though this is tiny, it can be a problem if you are using a query cache in production.
Thanks to this PR which worked off the bug reported here, the hash query is simply frozen before being stored in the cache. This means that the query cache will not be affected by any mutations to the bound parameters.
Here is the modification (among others) that was made to the Rails codebase: