Rails 6 Bulk Inserts with insert_all and upsert_all Methods

There are cases when we want to insert records in bulk. For eg., we have a list of users in CSV and we want to import these users in our application.

Rails has methods like delete_all or update_all to delete and update bulk records respectively. A similar method to insert bulk records was missing.

Rails 6 added insert_all, insert_all! and upsert_all to ActiveRecord::Persistence, to solve the above issue.

Before Rails 6

Before Rails 6, a bulk insert was achieved -

Using activerecord-import gem

users = []
10.times do |i|
  users << User.new(name: "user #{i}")
end
User.import users

Creating records one by one

10.times do |i|
  User.create!(name: "user #{i}")
end

Using the SQL INSERT statement

# Assuming users is an array of user hash
# like [{ name: "Sam" }, { name: "Charls" }]

sql = "INSERT INTO users VALUES "

sql_values = []
users.each do |user|
  sql_values << "(#{user.values.join(", ")})"
end

sql += sql_values.join(", ")
ActiveRecord::Base.connection.insert_sql(sql)

In Rails 6

insert_all and insert_all!

Using insert_all we can perform bulk insert as below -

result = User.insert_all(
  [
    {
      name: "Sam",
      email: "[email protected]"
    },
    {
      name: "Charls",
      email: "[email protected]"
    }
  ]
)
# Bulk Insert (2.3ms) INSERT INTO "users"("name","email")
# VALUES("Sam", "sam@example"...)
# ON CONFLICT DO NOTHING RETURNING "id"

puts result.inspect
#<ActiveRecord::Result:0x00007fb6612a1ad8 @columns=["id"], @rows=[[1], [2]],
@hash_rows=nil, @column_types=
{"id"=>#<ActiveModel::Type::Integer:0x00007fb65f420078 ....>

puts User.count
=> 2

As mentioned above, we notice an ON CONFLICT DO NOTHING clause in the query. This is supported by SQLite and PostgreSQL databases. If there is a conflict or a unique key violation during the bulk insert process, it skips the conflicting record and proceeds with the insertion of the next record.

If we need to ensure all rows are inserted we can use insert_all! the bang version directly.

RETURNING "id" clause in the above query return the primary key(s) @rows=[[1], [2]]. If we want to check more attributes apart from id we can use an optional returning option, which expects an array of attribute names.

result = User.insert_all(
  [
    {
      name: "Sam",
      email: "[email protected]"
    },
    {
      name: "Charls",
      email: "[email protected]"
    }
  ],
  returning: %w[id name]
)
# Bulk Insert (2.3ms) INSERT INTO "users"("name","email")
# VALUES("Sam", "sam@example"...)
# ON CONFLICT DO NOTHING RETURNING "id", "name"

puts result.inspect
#<ActiveRecord::Result:0x00007fb6612a1ad8 @columns=["id", "name"],
@rows=[[1, "Sam"], [2, "Charls"]],
@hash_rows=nil, @column_types=
{"id"=>#<ActiveModel::Type::Integer:0x00007fb65f420078 ....>

upsert_all

insert_all and insert_all! either skip the duplicate records or raise an exception, if a duplicate record is encountered while bulk inserting.

If a record exists we want to update it or else create a new record. This is known as upsert.

The upsert_all method performs bulk upserts.

result = User.upsert_all(
  [
    {
      id: 1,
      name: "Sam new",
      email: "[email protected]"
    },
    {
      id: 1,                  # duplicate id here
      name: "Sam's new",
      email: "[email protected]"
    },
    {
      id: 2,
      name: "Charles",        # name updated
      email: "[email protected]"
    },
    {
      id: 3,                  # new entry
      name: "David",
      email: "[email protected]"
    }
  ]
)

# Bulk Insert (26.3ms) INSERT INTO `users`(`id`,`name`,`email`)
# VALUES (1, 'Sam new', '[email protected]')...
# ON DUPLICATE KEY UPDATE `name`=VALUES(`name`)

puts User.count
=> 3

The second row in the input array has duplicate id 1 and hence the name of the user will be Sam's new instead of Sam new.

The third row in the input array has no duplicate and it will perform an update.

The fourth row with id 3 is a new entry and hence insert operation will be performed here.

Note: upsert_all behaviour is different for different databases. The above example works for MySQL database but will not work for SQLite.

Inserting or updating records sequentially has huge performance issues.

Let’s try to insert 1000 users sequentially and benchmark it.

print Benchmark.measure { 1000.times {|t| User.create(name: "name - #{t}")} }
  7.913459    1.129483    9.439012 ( 15.329382 )
=> nil

1000 users created locally in 15.32 seconds in 1000 transactions executing 1000 sequential insert query.

Using insert_all we can prepare dataset of users and import them in one query.

users = 1000.times.map { |t| { name: "name - #{t}", created_at: Time.now, updated_at: Time.now }}
print Benchmark.measure { User.insert_all(users) }
  0.267381    0.018721    0.298123 ( 0.401876 )

As seen the time taken to import 1000 users has reduced from 15.32 seconds to just 0.40 seconds.

Rails 6 Bulk Inserts with insert_all and upsert_all Methods

Before Rails 6

In Rails 6

insert_all and insert_all!

upsert_all

Rails 7 allows passing raw SQL as `on_duplicate` value to `#upsert_all`

Master Rails Enums - Learn to Create and Use Them

The Ultimate Guide to Gemfile and Gemfile.lock