#moarvm on 30 January 2025 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
01:20 MasterDuke joined
MasterDuke	after way too much flailing at the keyboard (i hate c++ libraries), i managed to call simdutf's validate_utf8() in moarvm. so i added a fast path (that still uses our normalization) when decoding utf8 if validate_utf8() says it's valid	01:23	Copy link Message link Add to gist Remove
	i was hoping to measure if doing the validation + fast path was faster than what we do now (decoding that checks for validity along the way)	01:24	Copy link Message link Add to gist Remove
	but annoyingly, `raku -e ''` dies with `insert duplicate key	01:25	Copy link Message link Add to gist Remove
	at SETTING::src/core.c/allomorphs.rakumod:309 (/home/dan/r/install/share/perl6/runtime/CORE.c.setting.moarvm:val)`		Copy link Message link Add to gist Remove
	github.com/MoarVM/MoarVM/compare/m...th_simdutf if anybody feels like seeing what i tried	01:28	Copy link Message link Add to gist Remove
	if i use a moarvm built on main with a printf added in the UTF8_REJECT case, it never hits. so i guess my `decode_utf8_byte_nocheck()` is wrong?	01:44	Copy link Message link Add to gist Remove
ugexe	do you know what the string (key) is? maybe that is a blue	01:45	Copy link Message link Add to gist Remove
	clue		Copy link Message link Add to gist Remove
MasterDuke	not yet, working on that now	01:46	Copy link Message link Add to gist Remove
timo	i think your code is treating utf8 bytes as if they were codepoints?	01:54	Copy link Message link Add to gist Remove
	ready = MVM_unicode_normalizer_process_codepoint_to_grapheme(tc, &norm, decode_utf8_byte_nocheck((MVMuint8)*utf8), &g);		Copy link Message link Add to gist Remove
	like, this only takes a single byte and feeds it directly into codepoint_to_grapheme?	01:56	Copy link Message link Add to gist Remove
MasterDuke	but the old code did the same? `decode_utf8_byte(&state, &codepoint, (MVMuint8)*utf8)` writes a value in to `codepoint`, and then `ready = MVM_unicode_normalizer_process_codepoint_to_grapheme(tc, &norm, codepoint, &g);`	01:57	Copy link Message link Add to gist Remove
timo	that takes the value from "codepoint", which is only valid if the state that decode_utf8_byte returns is UTF8_ACCEPT	01:58	Copy link Message link Add to gist Remove
	i think it does that only after a full utf8 codepoint was read		Copy link Message link Add to gist Remove
MasterDuke	ah, i was assuming that continually getting UTF8_ACCEPT would be the same as `validate_utf8()` returning true	01:59	Copy link Message link Add to gist Remove
timo	right, that's not the case		Copy link Message link Add to gist Remove
MasterDuke	oh, i guess the old code actually only throws if there are two UTF8_REJECT in a row	02:00	Copy link Message link Add to gist Remove
timo	i think it jumps backwards after the first UTF8_REJECT		Copy link Message link Add to gist Remove
MasterDuke	hm. is there actually an easy way to use simdutf?	02:03	Copy link Message link Add to gist Remove
	or does our use of NFG make it impossible?		Copy link Message link Add to gist Remove
	since simdutf doesn't (yet) do any normalization	02:04	Copy link Message link Add to gist Remove
	could we use simdutf to decode if valid, and then post-process the result to get it into NFG?	02:05	Copy link Message link Add to gist Remove
timo	no clue tbh	02:10	Copy link Message link Add to gist Remove
	the speed benefit to simdutf, and many other things, may be in large part that they don't have to split the input into codepoints? maybe?	02:12	Copy link Message link Add to gist Remove
MasterDuke	i just tried printing the results of simdutf::count_utf8(), simdutf::utf8_length_from_latin1(), and the final `count` from our processing. `count` was always equal or slightly smaller (which is what i expected), so maybe if it's fast enough using that to preallocate the buffer to guaranteed-to-be-big-enough would be helpful	02:24	Copy link Message link Add to gist Remove
	oh, looks like more often we allocate it bigger than we need to, so maybe we'd be more likely to save memory (and the realloc to shrink it that we do if it's too big)	02:29	Copy link Message link Add to gist Remove
	afk	02:47	Copy link Message link Add to gist Remove
02:52 MasterDuke left 04:53 ab5tract left 04:56 ab5tract joined 10:48 nebuchadnezzar joined 11:15 sena_kun joined 12:33 nebuchadnezzar left 14:06 sena_kun left 14:07 sena_kun joined 22:45 sena_kun left 23:27 bloatable6 left, benchable6 left, shareable6 left, greppable6 left, notable6 left, sourceable6 left, evalable6 left, releasable6 left, unicodable6 left, linkable6 left, coverable6 left, committable6 left, nativecallable6 left, quotable6 left, bisectable6 left, tellable6 left 23:29 committable6 joined 23:30 sourceable6 joined, linkable6 joined, nativecallable6 joined, notable6 joined, greppable6 joined, coverable6 joined, quotable6 joined, releasable6 joined, guifa joined 23:31 evalable6 joined, bloatable6 joined, bisectable6 joined, shareable6 joined, benchable6 joined 23:32 unicodable6 joined, tellable6 joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!