Premium Only Content

Unicode at gigabytes per second
We often represent text using Unicode formats (UTF-8 and UTF-16). UTF-8 is increasingly popular (XML, HTML, JSON, Rust, Go, Swift, Ruby). UTF-16 is most common in Java, .NET, and inside operating systems such as Windows. Software systems frequently have to validate text or convert text from one encoding to the other. While recent disks have bandwidths of 5 GB/s or more, conventional approaches transcode non-ASCII text at a fraction of a gigabyte per second. We show that we can transcode (UTF-8, UTF-16) at gigabytes per second on current systems (x64 and ARM) without sacrificing safety. Our open-source library can be ten times faster than the popular ICU library on non-ASCII strings and even faster on ASCII strings.
Invited talk at SPIRE 2021, 28th International Symposium on String Processing and Information Retrieval (October 4-6th, 2021 - Lille, France)
-
0:34
On_the_Other_Hand
3 years agoA Second Channel!
32 -
0:06
Womblefam1857
3 years agoSkunk second run
66 -
2:03
KNXV
3 years agoSecond chance bike drive
14 -
1:27
WMAR
3 years agoServe second Saturdays
16 -
1:56:04
The Sage Steele Show
1 day agoRob Schneider | The Sage Steele Show
321K20 -
2:40:47
TimcastIRL
12 hours agoEx-FBI Director Comey Under Investigation For THREATENING Trump With "86-47" Post | Timcast IRL
499K167 -
4:03:06
Alex Zedra
10 hours agoBack after a few weeks!
84.7K13 -
2:23:11
RiftTV/Slightly Offensive
14 hours agoWTF: Con Inc. Has INSANE CRASHOUT Over “WOKE RIGHT” | Guest: Gerard Michaels
96.6K44 -
3:14:54
Barry Cunningham
13 hours agoBREAKING NEWS: DID JAMES COMEY THREATEN PRESIDENT TRUMP? | TRUMP IN ABU DHABI | SCOTUS ISSUES
129K182 -
9:36
Colion Noir
12 hours ago3am TikTok Pranking Teens Get Shot, & Armed Homeowner Is Arrested In Virginia
79.8K95