Ruby introduces Regexp.timeout


Regular expressions (regexps) are codes that demonstrates the contents of a string. They’re generally used to test a string’s patterns and determine which portions of the string are a match to the output desired.

  • Syntax: Regexp.=~()
  • Parameter: Regexp values
  • Return: true – if two expressions matches string, if there is no match, then it will return False.

They are usually built with the /pat/ and %r{pat} literals or the Regexp.new constructor.

A regexp is usually confined within forward slashes (/). For example:

/par/ =~ 'parrot'   #=> 0
/p/.match('parrot') #=> #<MatchData "p">

If a string has the pattern we are looking for, then it is said to be a match.

Here, the word, ‘parrot’ does not have the pattern ‘beak’, so it doesn’t match:

/beak/.match('parrot') #=> nil

Here, ‘parrot’ contains the pattern ‘par’, so it matches:

/par/.match('parrot')    #=> #<MatchData "par">

Remember that any Regexp matching will display a RuntimeError when a timeout is set and exceeded. This is why, oftentimes, the codes may get exploited by malicious users for DoS or ReDoS.

Therefore, to prevent or mitigate the risk of DoS, Regexp.timeout is introduced by Ruby.

Timeout

There are two APIs to set timeout. They are:

  • Timeout.timeout= it is the process-global configuration of timeout for Regexp matching.
  • Timeout keyword of Regexp.new.= it is used when we want to try different timeout settings for some special Regexps
Timeout.timeout
Regexp.timeout = 4
q = 'a' * 25 + 'd' + 'a' * 4 + 's' #=> "aaaaaaaaaaaaaaaaaaaaaaaaadaaaas"
/(b|a+)*s/ =~ q  #=> Regexp::TimeoutError is raised in four seconds
Timeout keyword of Regexp.new
re = Regexp.new("(b|a+)*c", timeout: 4)
q = 'a' * 25 + 'd' + 'a' * 4 + 's' #=> "aaaaaaaaaaaaaaaaaaaaaaaaadaaaas"
/(b|a+)*s/ =~ q  #=> Regexp::TimeoutError is raised in four seconds

When we are using Regexp to run an untrusted output, its important to understand and use the timeout feature to prevent multiple backtracking.

If not done, the code will be prone to Denial-of-Service attack as an attacker might exploit it by providing an input to Regexp as the code might be otherwise matching an ineffecient Regexp.

Let’s not forget that the timeout is not set by default as an appropriate limit is usually determined by the application needs and content.

Join Our Newsletter