Rails 6 adds String#truncate_bytes


Rails 6 adds String#truncate_bytes to truncate a string to a maximum bytesize without breaking multibyte characters or grapheme clusters like πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦.

Before

Previously to truncate strings with multibyte characters, we had to first convert strings to mb_chars

We could achieve truncating such strings like so:

string = "πŸ‘©β€β€οΈβ€πŸ‘©"
string.bytesize #=> 20
string.mb_chars.limit(string.bytesize).to_s #=> "πŸ‘©β€β€οΈβ€πŸ‘©"  

String#truncate_bytes

Using mb_chars.limit is relatively slower and we have to do special conversion to mb_chars first on String.

String#truncate_bytes provides a faster and easier alternative.

We can now achieve the same like so:

string = "πŸ‘©β€β€οΈβ€πŸ‘©"
string.bytesize #=> 20
string.truncate_bytes(string.bytesize) #=> "πŸ‘©β€β€οΈβ€πŸ‘©"
string.truncate_bytes(string.bytesize - 1) #=> "…"  

Note how in the last example, truncate_bytes tries to preserves the grapheme cluster and treats it as a single unit. Instead of treating it as 3 separate characters’ bytes merged together(β€œπŸ‘©β€ + β€œβ€β€ + β€œπŸ‘©β€ = β€œπŸ‘©β€β€οΈβ€πŸ‘©β€), it ignores the cluster as a whole and truncates the cluster.

String#truncate_bytes takes an optional omission argument to end the string with, which defaults to β€œβ€¦β€.

chars = "✨✨✨✨✨✨✨✨"
chars.truncate_bytes(20) #=> "βœ¨βœ¨βœ¨βœ¨βœ¨β€¦"
chars.truncate_bytes(8) #=> "βœ¨β€¦"
chars.truncate_bytes(20, omission: "") #=> "✨✨✨✨✨✨"

Apart from being a convenient method available on String, String#truncate_bytes is also much faster compared to mb_chars.limit:

N= 1000
string = "πŸ‘©β€β€οΈβ€πŸ‘©"
Benchmark.bmbm do |x|
x.report("mbchars") {N.times {string.mb_chars.limit(string.bytesize).to_s}}
x.report("truncate_bytes") {N.times {string.truncate_bytes(string.bytesize)}}
end

#Rehearsal --------------------------------------------------
#mbchars          0.002105   0.000607   0.002712 (  0.002710)
#truncate_bytes   0.000387   0.000001   0.000388 (  0.000388)
#----------------------------------------- total: 0.003100sec
#
#                     user     system      total        real
#mbchars          0.002029   0.000028   0.002057 (  0.002054)
#truncate_bytes   0.000451   0.000003   0.000454 (  0.000449)