Ruby handles hash keys and values as mutable objects. This means that if you pass a hash as a bound parameter to a query, the query cache will not be able to detect if the hash has been mutated and will return a false-positive cache hit. This can lead to unexpected results.
First let’s understand hash mutation in Ruby. Ruby hashes are mutable objects. This means that even when a hash’s value changes, it still has the same object ID (a unique identifier for that object). Let’s see this in action:
irb(main):001:0> hash = {a: 1}
=> {:a=>1}
irb(main):002:0> hash.object_id
=> 30880
irb(main):003:0> hash.merge!(b: 2)
=> {:a=>1, :b=>2}
irb(main):004:0> hash.object_id
=> 30880
irb(main):005:0>
However when the value of an integer changes, it gets a new object id. Let’s see this in action:
irb(main):001:0> integer = 1
=> 1
irb(main):002:0> integer.object_id
=> 3
irb(main):003:0> integer += 1
=> 2
irb(main):004:0> integer.object_id
=> 5
irb(main):005:0>
Interestingly, if you try this out in your Rails console, you will see that the object id of the integer does not change. This is because all integers are “frozen”. This means that they cannot be mutated, i.e., it’s value can not change unless it’s object ID also changes.
We can also freeze a hash to prevent it from being mutated. Let’s see this in action:
irb(main):001:0> hash = {a: 1}
=> {:a=>1}
irb(main):002:0> hash.object_id
=> 30880
irb(main):003:0> hash.freeze
=> {:a=>1}
irb(main):004:0> hash.merge!(b: 2)
RuntimeError (can't modify frozen Hash)
from (irb):4:in `merge!'
from (irb):4
from /Users/swaathi/.rbenv/versions/2.7.2/bin/irb:23:in `<main>'
irb(main):005:0> hash.object_id
=> 30880
irb(main):006:0>
You can check if an object is frozen or not by calling the frozen?
method on it.
irb(main):001:0> hash = {a: 1}
=> {:a=>1}
irb(main):002:0> hash.frozen?
=> false
irb(main):003:0> hash.freeze
=> {:a=>1}
irb(main):004:0> hash.frozen?
=> true
irb(main):005:0> integer = 1
=> 1
irb(main):006:0> integer.frozen?
=> true
In fact objects like integers, strings, symbols, true, false, nil are all frozen by default. This means that they cannot be mutated.
Next, let’s look into how a hash queries for the key that is requested. It’s common knowledge that a hash is a key-value pair, however what is something that is less known is that the hash of the key is also stored along with this pair. This is what enables fast lookups. It is also what allows for a mutable object to be the key of a hash.
Let’s see this in action:
irb(main):001:0> array = [0]
=> [0]
irb(main):002:0> array.object_id
=> 116740
irb(main):003:0> array.hash
=> -2503994910684356896
irb(main):004:0> hash = {array => true}
=> {[0]=>true}
irb(main):005:0> hash[array]
=> true
irb(main):006:0> hash.object_id
=> 154720
irb(main):007:0> array << 1
=> [0, 1]
irb(main):008:0> array.object_id
=> 116740
irb(main):009:0> array.hash
=> 3563926247310401007
irb(main):010:0> hash[array]
=> nil
irb(main):011:0> hash
=> {[0, 1]=>true}
irb(main):012:0>
As you can see, the hash of the array changes when the array is mutated, since its data has changed. This might raise a question of how the value of the key continues to change within the hash. It is because the object ID of the array is used to resolve the value. Since the object ID of the array does not change, the value of the key is still resolved correctly. However, since the contents of that array have changed, it’s hash has also changed.
Now any good hash function will have a low collision rate. This means that the probability of two different keys having the same hash is low. This is why the hash of the array is a good key to use for caching. However, low collision rate does not mean no collision rate. This is the basis for the false-positive query cache hits when querying with mutable bound parameters.
Before
Let’s look into how the query cache works.
When a query is executed, the query cache stores the result
and the query so that when the same query is requested again,
the result can be fetched from the cache.
This is a performance optimization that can be enabled
by calling ActiveRecord::Base.connection.enable_query_cache!
.
To see how such false-positives might occur, we will create a Rocket model with a payload attribute that can store hashes.
db/migrate/…create_rocket.rb
class CreateRocket < ActiveRecord::Migration[6.0]
def change
create_table :rockets do |t|
t.jsonb :payload
end
end
end
We will now create a Rocket model that stores a hash in its payload attribute and then search for it using a query that has a mutable bound parameter. A mutable bound parameter is just passing in a hash as a parameter to the query.
app/models/rocket.rb
def test_bug(iterations=10000, false_positives=0)
Rocket.connection.enable_query_cache!
iterations.times do
key, val = rand(100000), rand(100000)
record = Rocket.create(payload: { key => val })
search = { key => val }
the_record = Rocket.where(payload: search).first # this should populate the cache
raise unless the_record.present?
# cache now looks like this, essentially:
# { "SELECT * FROM rockets WHERE value = $1" =>
# { [search] => the_record }
# }
new_val = rand(100000) until new_val != val
search.merge!(key => new_val) # this mutates the key inside the query cache
# normally: because the hash of the key has changed, this is a cache miss
# however, if the new hash key's numerical hash falls into the same bucket
# as the original, the hash lookup will a) find the first query's entry and
# b) use it, because the objects are equal b/c the `search` hash was mutated
# is equal to key_obj (since it's a reference)
should_not_exist = Rocket.where(payload: search).first # this SHOULD not return a value
false_positives += 1 if should_not_exist.present?
record.destroy
Rocket.connection.clear_query_cache
end
return false_positives
end
Now when we run this test, we get a false-positive rate of 0.029%. This is because when the new hash key is generated, sometimes, it falls into the same bucket as the original hash key. So, instead of a cache miss, the query cache returns the original record.
irb(main):001:0> test_bug()
=> 29
Though this is tiny, it can be a problem if you are using a query cache in production.
After
Thanks to this PR which worked off the bug reported here, the hash query is simply frozen before being stored in the cache. This means that the query cache will not be affected by any mutations to the bound parameters.
irb(main):001:0> test_bug()
=> 0
Here is the modification (among others) that was made to the Rails codebase:
activemodel/lib/active_model/type/helpers/mutable.rb
module Type
module Helpers # :nodoc: all
module Mutable
def immutable_value(value)
value.deep_dup.freeze
end
def cast(value)
deserialize(serialize(value))
end
...
end
end
end