We often use multiple databases in our Rails application. Rails 6 made it easier to connect to multiple databases.
Let’s consider a use case where:
- User can create multiple posts.
- Anyone looking at the post can add a comment - crowd-sourced comments.
In this case, we can store the crowd-sourced comments in a separate database, as comments can grow quickly and may need a different kind of data management approach.
The database.yml
can be like below:
default: &default
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
username: root
password: password
development:
primary:
<<: *default
database: rails_demo_development
crowd_sourced:
<<: *default
migrations_paths: db/migrate_crowd_sourced
database: rails_demo_crowd_sourced_development
The models would look like below:
# app/models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
connects_to database: { writing: :primary, reading: :primary }
end
# app/models/crowd_sourced_record.rb
class CrowdSourcedRecord < ActiveRecord::Base
self.abstract_class = true
connects_to database: { writing: :crowd_sourced, reading: :crowd_sourced }
end
# app/models/user.rb
class User < ApplicationRecord
has_many :posts
end
# app/models/user.rb
class Post < ApplicationRecord
belongs_to :user
has_many :comments
end
# app/models/comment.rb
class Comment < CrowdSourcedRecord
end
Before
In Rails 6, to fetch comments on all posts of a user, we can add a custom method, as associations can’t join across databases.
# app/models/user.rb
class User < ApplicationRecord
has_many :posts
def comments
Comment.where(post_id: posts.pluck(:id))
end
end
Here, we are fetching the user-created post ids first, and then all the comments for those post ids.
> User.first.comments
User Load (0.3ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1]]
(0.2ms) SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = $1 [["user_id", 1]]
Comment Load (0.3ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 /* loading for inspect */ LIMIT $2 [["post_id", 1], ["LIMIT", 11]]
=> #<ActiveRecord::Relation [#<Comment id: 1, post_id: 1, name: "Chantay Hayes", email: "ronna.dooley@friesen.us"...>]>
For all such associations across different databases we will have to add custom methods as above.
After
Rails 7 has
added
an option in has many through
association to find the elegant way to solve the problem of multiple database joining.
We can use the has_many :through
association with the newly added disable_joins: true
option like below:
class User < ApplicationRecord
has_many :posts
has_many :comments, through: :posts, disable_joins: true
end
If we try to load the associations now, it fires 2 separate queries to load the post ids, and the comments with those post ids.
> User.first.comments
User Load (0.3ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1]]
Post Pluck (0.3ms) SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = $1 [["user_id", 1]]
Comment Load (0.3ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 /* loading for inspect */ LIMIT $2 [["post_id", 1], ["LIMIT", 11]]
=> #<ActiveRecord::Associations::CollectionProxy [#<Comment id: 1, post_id: 1, name: "Chantay Hayes", email: "ronna.dooley@friesen.us", .....]>
This allows us to eager load the comments. But if we iterate through the users, and try to load the comments there’s an additional query fired to select the post ids of the user.
> users = User.limit(5).includes(:comments).load
User Load (0.2ms) SELECT "users".* FROM "users" LIMIT $1 [["LIMIT", 5]]
Post Load (0.2ms) SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN ($1, $2, $3, $4, $5) [["user_id", 1], ["user_id", 2], ["user_id", 3], ["user_id", 4], ["user_id", 5]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5) [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
=> #<ActiveRecord::Relation [#<User id: 1, name: "Milton Dicki", email: "lyman@heller.us", created_at: "2021-04-20 17:13:20.492541000 +0000", updated_at: "2021-04-20 17:13:20.492541000...
> users.collect &:comments
Post Pluck (0.2ms) SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = $1 [["user_id", 1]]
Post Pluck (0.1ms) SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = $1 [["user_id", 2]]
Post Pluck (0.1ms) SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = $1 [["user_id", 3]]
Post Pluck (0.1ms) SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = $1 [["user_id", 4]]
Post Pluck (0.1ms) SELECT "posts"."id" FROM "posts" WHERE "posts"."user_id" = $1 [["user_id", 5]]
=> [#<ActiveRecord::Associations::CollectionProxy [#<Comment id: 1, post_id: 1, name: "Chantay Hayes", email: "ronna.dooley@friesen.us">.......]>]
Though, there’s a workaround to solve this N+1 query issue, with the cost of loading associated posts in the memory like below:
> users = User.limit(5).includes(posts: :comments).load
User Load (0.2ms) SELECT "users".* FROM "users" LIMIT $1 [["LIMIT", 5]]
Post Load (0.3ms) SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN ($1, $2, $3, $4, $5) [["user_id", 1], ["user_id", 2], ["user_id", 3], ["user_id", 4], ["user_id", 5]]
Comment Load (0.3ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5) [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
=> #<ActiveRecord::Relation [#<User id: 1, name: "Milton Dicki", email: "lyman@heller.us", created_at: "2021-04-20 17:13:20.492541000 +0000", updated_at: "2021-04-20 17:13:20.492541000...
> users.collect { |u| u.posts.collect(&:comments) }.flatten
=> [#<Comment id: 1, post_id: 1, name: "Chantay Hayes", email: "ronna.dooley@friesen.us", comment: "Bad song!".....>]