There are cases when we want to insert records in bulk. For eg., we have a list of users in CSV and we want to import these users in our application.
Rails has methods like delete_all
or update_all
to delete
and update bulk
records respectively.
A similar method to insert bulk records was missing.
Rails 6 added
insert_all
, insert_all!
and upsert_all
to ActiveRecord::Persistence
,
to solve the above issue.
Before Rails 6
Before Rails 6, a bulk insert was achieved -
- Using activerecord-import gem
- Creating records one by one
- Using the SQL INSERT statement
In Rails 6
insert_all and insert_all!
Using insert_all
we can perform bulk insert as below -
As mentioned above,
we notice an ON CONFLICT DO NOTHING
clause in the query.
This is supported by SQLite
and PostgreSQL databases.
If there is a
conflict or a unique key violation during the bulk insert process,
it
skips the conflicting record
and proceeds with the insertion of the next record.
If we need to ensure all rows are inserted
we can use insert_all!
the bang version directly.
RETURNING "id"
clause in the above query return the primary key(s)
@rows=[[1], [2]]
. If we want to check more attributes apart from id
we can use an optional returning
option,
which expects an array of
attribute names.
upsert_all
insert_all
and insert_all!
either skip the duplicate records
or
raise an exception,
if a duplicate record is encountered while bulk inserting.
If a record exists we want to update it or else create a new record. This is known as upsert.
The upsert_all
method performs bulk upserts.
The second row in the input array has duplicate id
1
and hence the
name of the user will be Sam's new
instead of Sam new
.
The third row in the input array has no duplicate and it will perform an update.
The fourth row with id
3 is a new entry
and hence insert operation
will be performed here.
Note: upsert_all
behaviour is different for different databases.
The above example works for MySQL
database
but will not work for
SQLite
.
Inserting or updating records sequentially has huge performance issues.
Let’s try to insert 1000 users sequentially and benchmark it.
1000 users created locally in 15.32 seconds in 1000 transactions executing 1000 sequential insert query.
Using insert_all
we can prepare dataset of users
and import them in one
query.
As seen the time taken to import 1000 users has reduced from 15.32 seconds to just 0.40 seconds.