Premium Only Content
Unicode at gigabytes per second
We often represent text using Unicode formats (UTF-8 and UTF-16). UTF-8 is increasingly popular (XML, HTML, JSON, Rust, Go, Swift, Ruby). UTF-16 is most common in Java, .NET, and inside operating systems such as Windows. Software systems frequently have to validate text or convert text from one encoding to the other. While recent disks have bandwidths of 5 GB/s or more, conventional approaches transcode non-ASCII text at a fraction of a gigabyte per second. We show that we can transcode (UTF-8, UTF-16) at gigabytes per second on current systems (x64 and ARM) without sacrificing safety. Our open-source library can be ten times faster than the popular ICU library on non-ASCII strings and even faster on ASCII strings.
Invited talk at SPIRE 2021, 28th International Symposium on String Processing and Information Retrieval (October 4-6th, 2021 - Lille, France)
-
0:34
On_the_Other_Hand
3 years agoA Second Channel!
32 -
0:06
Womblefam1857
3 years agoSkunk second run
66 -
2:03
KNXV
3 years agoSecond chance bike drive
12 -
1:27
WMAR
3 years agoServe second Saturdays
16 -
2:43:18
Nerdrotic
15 hours ago $20.55 earnedCybertruck Explosion Rabbit Hole | Forbidden Frontier #086
113K31 -
3:28:23
vivafrei
21 hours agoEop. 244: FBI Seeks HELP for Jan. 6? FBI Taints New Orleans Crime Scene? Amos Miller, Lawfare & MORE
259K373 -
2:27:48
Joker Effect
14 hours ago2025 already started up with a bang! Alex Jones, Bree, Elon Musk, Nick Fuentes, Fousey
65.1K19 -
7:51:47
Vigilant News Network
20 hours agoEXPOSED: Secret Government Plot to Deploy Aerosolized ‘Vaccines’ Using Drones | Media Blackout
134K57 -
1:13:49
Josh Pate's College Football Show
16 hours ago $14.20 earnedSemifinal Predictions: OhioSt v Texas | Notre Dame v PennSt | Playoff Cinderella | Alabama’s Future
95.7K4 -
27:56
The Why Files
1 day agoThe Seventh Experiment: Lacerta Reveals the Truth of our Creation
148K82