Premium Only Content

Unicode at gigabytes per second
We often represent text using Unicode formats (UTF-8 and UTF-16). UTF-8 is increasingly popular (XML, HTML, JSON, Rust, Go, Swift, Ruby). UTF-16 is most common in Java, .NET, and inside operating systems such as Windows. Software systems frequently have to validate text or convert text from one encoding to the other. While recent disks have bandwidths of 5 GB/s or more, conventional approaches transcode non-ASCII text at a fraction of a gigabyte per second. We show that we can transcode (UTF-8, UTF-16) at gigabytes per second on current systems (x64 and ARM) without sacrificing safety. Our open-source library can be ten times faster than the popular ICU library on non-ASCII strings and even faster on ASCII strings.
Invited talk at SPIRE 2021, 28th International Symposium on String Processing and Information Retrieval (October 4-6th, 2021 - Lille, France)
-
0:34
On_the_Other_Hand
3 years agoA Second Channel!
32 -
0:06
Womblefam1857
4 years agoSkunk second run
66 -
2:03
KNXV
3 years agoSecond chance bike drive
17 -
1:27
WMAR
4 years agoServe second Saturdays
16 -
2:03:45
BEK TV
1 day agoTrent Loos in the Morning - 8/27/2025
12.2K -
LIVE
The Bubba Army
23 hours agoTaylor Swift & Travis Kelce ENGAGED! - Bubba the Love Sponge® Show | 8/27/25
1,288 watching -
40:14
Uncommon Sense In Current Times
18 hours ago $2.57 earnedThe Dating Crisis in America | J.P. De Gance on the Church’s Role in Restoring Family & Faith
47.1K6 -
12:35
Red Pill MMA
20 hours agoNow We Know The Truth.. Hero Who Stopped Raja Jackson Speaks Out
18.1K8 -
8:03
MattMorseTV
19 hours ago $9.35 earnedThings in the UK just got INSANE...
98.2K142 -
3:02:10
daniellesmithab
13 hours agoAlberta Next: Fort McMurray Town Hall
24.4K1