Using Optimizer Hints in Rails


Query Plan

An execution plan describes a recommended method of execution for a SQL statement. The query optimizer attempts to generate the most optimal execution plan for a SQL statement.

The optimizer chooses the plan with the lowest cost in terms of I/O, CPU, and communication. The database gathers these statistics about our objects like the number of rows, distinct values, and data distribution. The optimizer has such internal statistics and tools at its disposal which are used to calculate the said cost and pick an execution plan.

Optimizer Hints

The Optimizer Hints give us the opportunity to influence the optimizer. We, as an application architect, might know information about our data that the optimizer does not. And that knowledge, on some occasions, can help us to choose a more optimized execution plan than the optimizer.

Before Rails 6

To use Optimizer Hints before Rails 6, we had to use raw SQL queries.

Example

pry(main)> sql = "Select /*+ FULL(students) */ first_name
pry(main)>*       From students 
pry(main)>*       Where year = 1;"
=> "Select /*+ FULL(students) */ first_name\nFrom students\nWhere year = 1;\n"
pry(main)> ActiveRecord::Base.connection.execute(sql)
   (0.4ms)  Select /*+ FULL(students) */ first_name
            From students
            Where year = 1;

=> #<PG::Result:0x00007fe47e521c90 status=PGRES_TUPLES_OK ntuples=12 nfields=1 cmd_tuples=18>
pry(main)>

In the example above, the + before the comment tells the database server that this is a hint. The FULL hint instructs that a full table scan should be performed on the students table.

Rails 6

Rails 6 adds support for setting optimizer hints on databases.

We can use ActiveRecord::QueryMethods#optimizer_hints to specify optimizer hints to be used in the SELECT statement.

Example

For MySQL:

pry(main)> User.optimizer_hints("MAX_EXECUTION_TIME(10000)", "INDEX_MERGE(students ssn, PRIMARY)")

User Load (0.2ms)  SELECT /*+ MAX_EXECUTION_TIME(10000) INDEX_MERGE(students ssn, PRIMARY) */ "users".* FROM "users" LIMIT ?  [["LIMIT", 11]]

For PostgreSQL:

pry(main)> User.joins(:comments).where(published: true).optimizer_hints("MergeJoin(users comments)", "SeqScan(users)")

User Load (0.2ms)  SELECT /*+ MAX_EXECUTION_TIME(10000) SeqScan(users) */ "users".* FROM "users" LIMIT ?  [["LIMIT", 11]]

In the example above, we can see that the query has our optimizer hints included.

Example from our Application

Let us discuss a live use-case of optimizer hints in one of our Rails App in production.

In one of our client’s projects, we were facing very long query times because we had to query a table with millions of records.

The solution was decided to be a timeout after 10 seconds on the user-facing query, and run the query in the background and cache for larger results.

It was implemented as below:

1. Using the timeout gem

# Return result from cache for large results
#
Timeout::timeout(SPECIFIC_CACHE_TIMEOUT) do
  ...
  # query with long execution time
  ...
rescue Timeout::Error => exception
  # Enqueue Job to cache the result and use in future
  ...
end

This example is based on work from our friends at Gumroad.

In the snippet above, we are using ruby’s timeout from the stdlib.

The timeout method allows us to terminate any operation taking longer than the specified duration.

This turned out to be a non-viable solution, as it stops the execution of the block, but does not stop the SQL query if it has already started, which was our intention, to begin with.

2. Using MySQL system variable max_execution_time

def timeout_queries(milliseconds:)
    connection = ActiveRecord::Base.connection
    ...
    connection.execute("set max_execution_time = #{milliseconds}")
    yield
  rescue ActiveRecord::StatementInvalid => e
    # query timed out 
  ensure
    # reset to original max_execution_time
  end
end

In the snippet above, we use the timeout_queries method to set the MySQL server system variable max_execution_time.

For every query that is wrapped around timeout_queries method call, we set the value of max_execution_time which results in query timing out if its execution time reaches milliseconds value.

We perform this extra overhead, so that we don’t have to rewrite all our existing queries with raw SQL.

Note: This only works on SELECT statements.

Use Optimizer Hints with Rails 6 for timeout

The above problem can now be fixed using Optimizer Hints instead of writing out the timeout method. MySQL provides an optimizer hint named MAX_EXECUTION_TIME.

This limits the query to not be executed for more than the specified time limit N(in milliseconds). This optimizer hint takes precedence over the value of the MySQL server system variable max_execution_time. Which means that even if there is a positive value of max_execution_time already set, the execution time limit given in the optimizer hint would be honored.

Here’s how we can achieve the same in Rails 6:

long_executing_query.optimizer_hints("MAX_EXECUTION_TIME(10000)")

In the snippet above, the query will be executed for a maximum of 10 seconds.

As we can see, this eliminates the need for setting the value of the MySQL server system variable max_execution_time before every query as well as using raw SQL.

Note: This only works on SELECT statements.

Conclusion

We saw what optimizer hints are and how we can use them in our queries. However, more often than not, we don’t require them. Modern-day databases do not encourage their use since with every upgrade, or a new patch, things are likely to go wrong.

However, we also saw a live example where using an optimizer hint, could help us achieve a particular behavior which was earlier done in a much more passive way.