Rails 6 adds String#truncate_bytes to truncate a string to a maximum bytesize without breaking multibyte characters or grapheme clusters like π©βπ©βπ¦βπ¦.
Before
Previously to truncate strings with multibyte characters, we had to first convert strings to mb_chars
We could achieve truncating such strings like:
string = "π©ββ€οΈβπ©"
string.bytesize #=> 20
string.mb_chars.limit(string.bytesize).to_s #=> "π©ββ€οΈβπ©"
String#truncate_bytes
Using mb_chars.limit
is relatively slower
and we have to do a special conversion to mb_chars
first on String.
String#truncate_bytes
provides a faster and easier alternative.
We can now achieve the same like so:
string = "π©ββ€οΈβπ©"
string.bytesize #=> 20
string.truncate_bytes(string.bytesize) #=> "π©ββ€οΈβπ©"
string.truncate_bytes(string.bytesize - 1) #=> "β¦"
Note how in the last example,
truncate_bytes
tries to preserves the grapheme cluster
and treats it as a single unit.
Instead of treating it as 3 separate charactersβ bytes merged together(βπ©β + ββ€β + βπ©β = βπ©ββ€οΈβπ©β),
it ignores the cluster as a whole and truncates the cluster.
String#truncate_bytes
takes an optional
omission
argument to end the string with,
which defaults to ββ¦β.
chars = "β¨β¨β¨β¨β¨β¨β¨β¨"
chars.truncate_bytes(20) #=> "β¨β¨β¨β¨β¨β¦"
chars.truncate_bytes(8) #=> "β¨β¦"
chars.truncate_bytes(20, omission: "") #=> "β¨β¨β¨β¨β¨β¨"
Apart from being a convenient method available on String
,
String#truncate_bytes
is also much faster compared to mb_chars.limit
:
N= 1000
string = "π©ββ€οΈβπ©"
Benchmark.bmbm do |x|
x.report("mbchars") {N.times {string.mb_chars.limit(string.bytesize).to_s}}
x.report("truncate_bytes") {N.times {string.truncate_bytes(string.bytesize)}}
end
#Rehearsal --------------------------------------------------
#mbchars 0.002105 0.000607 0.002712 ( 0.002710)
#truncate_bytes 0.000387 0.000001 0.000388 ( 0.000388)
#----------------------------------------- total: 0.003100sec
#
# user system total real
#mbchars 0.002029 0.000028 0.002057 ( 0.002054)
#truncate_bytes 0.000451 0.000003 0.000454 ( 0.000449)