Rails 7 updates through_reflection to distribute transactions across database connections


While ActiveRecord works pretty great across most scenarios, it struggles with multi-database architectures. Fortunately, the past few releases have had a lot of work done in this aspect.

One major drawback is the way transactions work in a multi-database architecture. A transaction is basically a set of database operations that are performed as a single unit. Here’s an example,

irb(main):001:1* ActiveRecord::Base.transaction do
irb(main):002:1*   Company.first.touch
irb(main):003:1*   Company.last.touch
irb(main):004:0> end
  TRANSACTION (0.1ms)  begin transaction
  Company Load (0.2ms)  SELECT "companies".* FROM "companies" ORDER BY "companies"."id" ASC LIMIT ?  [["LIMIT", 1]]
  Company Update (1.2ms)  UPDATE "companies" SET "updated_at" = ? WHERE "companies"."id" = ?  [["updated_at", "2022-08-21 10:48:22.853590"], ["id", 1]]
  Company Load (0.1ms)  SELECT "companies".* FROM "companies" ORDER BY "companies"."id" DESC LIMIT ?  [["LIMIT", 1]]
  Company Update (0.1ms)  UPDATE "companies" SET "updated_at" = ? WHERE "companies"."id" = ?  [["updated_at", "2022-08-21 10:48:22.860248"], ["id", 2]]
  TRANSACTION (1.1ms)  commit transaction
...

Though there are two database operations (Company.first.touch and Company.last.touch), they are performed within a single transaction.

However, when you use a transaction across multiple databases, it is essentially left up-to the developer to ensure that the transactions are distributed across the databases in a way that is consistent with the intended use case. This is because transactions work on a single database connection.

Before

Let’s take the example of a multi-database architecture where there is a Company model on the primary database and a Freelancer model on the secondary database. Both these models are connected via the has_many :through association via the Contract model on the primary database.

irb(main):001:0> company = Company.create! name: 'Saeloun'
...
irb(main):002:0> freelancer1 = Freelancer.create! name: 'Joe'
...
irb(main):003:0> freelancer2 = Freelancer.create! name: 'Schmoe'
...
irb(main):004:0> company.freelancers = [freelancer1, freelancer2]
  TRANSACTION (0.1ms)  begin transaction
  Contract Create (0.8ms)  INSERT INTO "contract" ("company_id", "freelancer_id", "created_at", "updated_at") VALUES (?, ?, ?, ?)  [["company_id", 1], ["freelancer_id", 1], ["created_at", "2022-08-21 07:23:41.521082"], ["updated_at", "2022-08-21 07:23:41.521082"]]
  TRANSACTION (1.2ms)  commit transaction
  TRANSACTION (0.0ms)  begin transaction
  Contract Create (0.4ms)  INSERT INTO "contract" ("company_id", "freelancer_id", "created_at", "updated_at") VALUES (?, ?, ?, ?)  [["company_id", 2], ["freelancer_id", 1], ["created_at", "2022-08-21 07:23:41.525693"], ["updated_at", "2022-08-21 07:23:41.525693"]]
  TRANSACTION (0.6ms)  commit transaction
...

As you can see, two transactions take place to insert two records into the contract table. This is because the company.freelancers = [freelancer1, freelancer2] operation is performed on the primary database connection and the Contract.create! operation is performed on the secondary database connection.

It is up to the developer to wrap this around a double transaction block to ensure that the two operations are performed within a single transaction.

irb(main):001:1* Company.transaction do
irb(main):002:2*   Freelancer.transaction do
irb(main):003:2*     company.freelancers = [freelancer1, freelancer2]
irb(main):004:1*   end
irb(main):005:0> end
  TRANSACTION (0.1ms)  begin transaction
  Contract Create (0.7ms)  INSERT INTO "contract" ("company_id", "freelancer_id", "created_at", "updated_at") VALUES (?, ?, ?, ?)  [["company_id", 1], ["freelancer_id", 1], ["created_at", "2022-08-21 07:32:41.351072"], ["updated_at", "2022-08-21 07:32:41.351072"]]
  Contract Create (0.1ms)  INSERT INTO "contract" ("company_id", "freelancer_id", "created_at", "updated_at") VALUES (?, ?, ?, ?)  [["company_id", 2], ["freelancer_id", 1], ["created_at", "2022-08-21 07:32:41.351083"], ["updated_at", "2022-08-21 07:32:41.351083"]]
  TRANSACTION (1.3ms)  commit transaction
...

After

Thanks to this PR that updates how through_reflection distributes transactions across database connections, we can now perform the same operation without the need for a double transaction block.

irb(main):001:0> company.freelancers = [freelancer1, freelancer2]
  TRANSACTION (0.1ms)  begin transaction
  Contract Create (0.7ms)  INSERT INTO "contract" ("company_id", "freelancer_id", "created_at", "updated_at") VALUES (?, ?, ?, ?)  [["company_id", 1], ["freelancer_id", 1], ["created_at", "2022-08-21 07:32:41.351072"], ["updated_at", "2022-08-21 07:32:41.351072"]]
  Contract Create (0.1ms)  INSERT INTO "contract" ("company_id", "freelancer_id", "created_at", "updated_at") VALUES (?, ?, ?, ?)  [["company_id", 2], ["freelancer_id", 1], ["created_at", "2022-08-21 07:32:41.351083"], ["updated_at", "2022-08-21 07:32:41.351083"]]
  TRANSACTION (1.3ms)  commit transaction
...

This was achieved via a very simple update to the ThroughAssociation module. The transaction method opens up the second block with the connection of the through_reflection instead of the source_reflection.

module ActiveRecord
  module Associations
    module ThroughAssociation
      ...
      private
        def transaction(&block)
          through_reflection.klass.transaction(&block)
        end
      ...
    end
  end
end

Join Our Newsletter