ActiveRecord::Core "#find" now reuses "#find_by" cache key

When querying using #find or #find_by results are stored to cache. This helps Rails load some queries from the cache instead of overloading the database server. Each query is responsible for generating a cache key and storing the result in the cache. This causes some irregularities to appear.

Before

One small oversight was #find and #find_by(id: ...) using different cache keys. Both queries return the same result but do not store the result to the exact cache location.

Let’s look into how ActiveRecord Core works:

def find(*ids) # :nodoc:
  # We don't have cache keys for this stuff yet
  return super unless ids.length == 1
  return super if block_given? ||
                  primary_key.nil? ||
                  scope_attributes? ||
                  columns_hash.key?(inheritance_column) && !base_class?

  id = ids.first

  return super if StatementCache.unsupported_value?(id)

  key = primary_key

  statement = cached_find_by_statement(key) { |params|
    where(key => params.bind).limit(1)
  }

  record = statement.execute([id], connection)&.first
  unless record
    raise RecordNotFound.new("Couldn't find #{name} with '#{key}'=#{id}", name, key, id)
  end
  record
end

We can see here that the cache key is just for primary_key (which in most scenarios is "id").

Let’s go through the #find_by method that accepts a hash of attributes.

def find_by(*args) # :nodoc:
  return super if scope_attributes? || reflect_on_all_aggregations.any? ||
                  columns_hash.key?(inheritance_column) && !base_class?

  hash = args.first

  return super if !(Hash === hash) || hash.values.any? { |v|
    StatementCache.unsupported_value?(v)
  }

  return super unless hash.keys.all? { |k| columns_hash.has_key?(k.to_s) }

  keys = hash.keys

  statement = cached_find_by_statement(keys) { |params|
    wheres = keys.each_with_object({}) { |param, o|
      o[param] = params.bind
    }
    where(wheres).limit(1)
  }
  begin
    statement.execute(hash.values, connection)&.first
  rescue TypeError
    raise ActiveRecord::StatementInvalid
  end
end

The cache key here gets set to hash.keys which returns an array of the columns that find_by searches with.

Which is where the ambiguity arises. While #find returns the cache key "id", find_by returns the cache key ["id"].

After

Rails ActiveRecord::Core “#find” now reuses “#find_by” cache key. Both queries use the same cache location.

Query	Cache Key
find(123)	[“id”]
find_by(id: 123)	[“id”]
find_by(id: 123, foo: true)	[“id”, “foo”]

It was a simple fix added to the #find method, which now pushes primary_key to an array.

def find(*ids) # :nodoc:
  # We don't have cache keys for this stuff yet
  return super unless ids.length == 1
  return super if block_given? || primary_key.nil? || scope_attributes?

  id = ids.first

  return super if StatementCache.unsupported_value?(id)

  cached_find_by([primary_key], [id]) ||
    raise(RecordNotFound.new("Couldn't find #{name} with '#{primary_key}'=#{id}", name, primary_key, id))
end

Minor tweaks to core libraries can lead to huge benefits across applications!

ActiveRecord::Core "#find" now reuses "#find_by" cache key

Before

After

Upgrading from Rails 4.2 to Rails 5 - A Complete Guide

Rails 8.1 Introduces Structured Event Reporting with Rails.event

Rails 8.1 introduces bin/ci to standardize CI workflows with a new DSL