Rails 6 bulk insert records


There are cases when we want to insert records in bulk. For eg., we have a list of users in CSV and we want to import these users in our application.

Rails has methods like delete_all or update_all to delete and update bulk records respectively. Similar method to insert bulk records was missing.

Rails 6 added insert_all, insert_all! and upsert_all to ActiveRecord::Persistence, to solve the above issue.

Before Rails 6

Before Rails 6, bulk insert was achieved -

users = []
10.times do |i|
  users << User.new(name: "user #{i}")
end
User.import users
  • Creating records one by one
10.times do |i|
  User.create!(name: "user #{i}")
end
  • Using the SQL INSERT statement
# Assuming users is an array of user hash
# like [{ name: "Sam" }, { name: "Charls" }]

sql = "INSERT INTO users VALUES "

sql_values = []
users.each do |user|
  sql_values << "(#{user.values.join(", ")})"
end

sql += sql_values.join(", ")
ActiveRecord::Base.connection.insert_sql(sql)

In Rails 6

insert_all and insert_all!

Using insert_all we can perform bulk insert as below -

result = User.insert_all(
  [
    {
      name: "Sam",
      email: "sam@example.com"
    },
    {
      name: "Charls",
      email: "charls@example.com"
    }
  ]
)
# Bulk Insert (2.3ms) INSERT INTO "users"("name","email")
# VALUES("Sam", "sam@example"...)
# ON CONFLICT DO NOTHING RETURNING "id"

puts result.inspect
#<ActiveRecord::Result:0x00007fb6612a1ad8 @columns=["id"], @rows=[[1], [2]],
@hash_rows=nil, @column_types=
{"id"=>#<ActiveModel::Type::Integer:0x00007fb65f420078 ....>

puts User.count
=> 2

As mentioned above, we notice an ON CONFLICT DO NOTHING clause in the query. This is supported by SQLite and PostgreSQL databases. If there is a conflict or a unique key violation during the bulk insert process, it skips the conflicting record and proceeds with the insertion of the next record.

If we need to ensure all rows are inserted we can use insert_all! the bang version directly.

RETURNING "id" clause in the above query return the primary key(s) @rows=[[1], [2]]. If we want to check more attributes apart from id we can use an optional returning option, which expects an array of attribute names.

result = User.insert_all(
  [
    {
      name: "Sam",
      email: "sam@example.com"
    },
    {
      name: "Charls",
      email: "charls@example.com"
    }
  ],
  returning: %w[id name]
)
# Bulk Insert (2.3ms) INSERT INTO "users"("name","email")
# VALUES("Sam", "sam@example"...)
# ON CONFLICT DO NOTHING RETURNING "id", "name"

puts result.inspect
#<ActiveRecord::Result:0x00007fb6612a1ad8 @columns=["id", "name"],
@rows=[[1, "Sam"], [2, "Charls"]],
@hash_rows=nil, @column_types=
{"id"=>#<ActiveModel::Type::Integer:0x00007fb65f420078 ....>
upsert_all

insert_all and insert_all! either skip the duplicate records or raise an exception, if a duplicate record is encountered while bulk inserting.

If a record exists we want to update it or else create a new record. This is known as upsert.

The upsert_all method performs bulk upserts.

result = User.upsert_all(
  [
    {
      id: 1,
      name: "Sam new",
      email: "sam@example.com"
    },
    {
      id: 1,                  # duplicate id here
      name: "Sam's new",
      email: "sam@example.com"
    },
    {
      id: 2,
      name: "Charles",        # name updated
      email: "charls@example.com"
    },
    {
      id: 3,                  # new entry
      name: "David",
      email: "david@example.com"
    }
  ]
)

# Bulk Insert (26.3ms) INSERT INTO `users`(`id`,`name`,`email`)
# VALUES (1, 'Sam new', 'sam@example.com')...
# ON DUPLICATE KEY UPDATE `name`=VALUES(`name`)

puts User.count
=> 3

The second row in the input array has duplicate id 1 and hence the name of the user will be Sam's new instead of Sam new.

The third row in the input array has no duplicate and it will perform an update.

The fourth row with id 3 is a new entry and hence insert operation will be performed here.

Note: upsert_all behaviour is different for different databases. The above example works for MySQL database but will not work for SQLite.

Inserting or updating records sequentially has huge performance issue.

Let’s try to insert 1000 users sequentially and benchmark it.

print Benchmark.measure { 1000.times {|t| User.create(name: "name - #{t}")} }
  7.913459    1.129483    9.439012 ( 15.329382 )
=> nil

1000 users created locally in 15.32 seconds in 1000 transactions executing 1000 sequential insert query.

Using insert_all we can prepare dataset of users and import them in one query.

users = 1000.times.map { |t| { name: "name - "#{t}", created_at: Time.now, updated_at: Time.now }}
print Benchmark.measure { User.insert_all(users) }
  0.267381    0.018721    0.298123 ( 0.401876 )

As seen the time taken to import 1000 users has reduced from 15.32 seconds to just 0.40 seconds.