#moarvm on 15 February 2025 - Raku Programming Language Log

00:37 guifa left 10:23 sena_kun joined 11:11 mef joined
mef	Hi,	11:11	Copy link Message link Add to gist Remove
11:11 sena_kun left
lizmat	o/ mef	11:11	Copy link Message link Add to gist Remove
11:11 sena_kun joined
mef	I've downloaded thinkc11@makoto 20:11:53/250215 (devel/MoarVM)% ls ../../distfiles/MoarVM-2025.01.tar.gz	11:12	Copy link Message link Add to gist Remove
	../../distfiles/MoarVM-2025.01.tar.gz		Copy link Message link Add to gist Remove
	thinkc11@makoto 20:11:57/250215 (devel/MoarVM)% sha1 ../../distfiles/MoarVM-2025.01.tar.gz		Copy link Message link Add to gist Remove
	SHA1 (../../distfiles/MoarVM-2025.01.tar.gz) = d43fcbc8140457070e411d94af85b463876571f6		Copy link Message link Add to gist Remove
	the timestamp inside looks strange		Copy link Message link Add to gist Remove
	tar ztvf segfaults		Copy link Message link Add to gist Remove
lizmat	I think this was discussed here before...	11:13	Copy link Message link Add to gist Remove
mef	OK, thanks. Just use that file is OK ?		Copy link Message link Add to gist Remove
lizmat	irclogs.raku.org/moarvm/2025-01-25.html#23:23	11:14	Copy link Message link Add to gist Remove
mef	(sorry, I'm new on this channel)		Copy link Message link Add to gist Remove
lizmat	yeah, it's a known issue if you downloaded it from moarvm.org, it should be ok		Copy link Message link Add to gist Remove
mef	the packaging can be done OK, ignoring some time stamp warning.	11:17	Copy link Message link Add to gist Remove
	I'll try zip version.		Copy link Message link Add to gist Remove
	(I may be reading zip related topic wrongly ;-)	11:19	Copy link Message link Add to gist Remove
	(I'm on NetBSD by the way, thanks always)	11:20	Copy link Message link Add to gist Remove
15:37 sena_kun left 15:40 sena_kun joined 15:41 camelia left 15:42 camelia joined 15:49 sena_kun left 16:46 MasterDuke joined
MasterDuke	well, i've now found even more slower ways to do find_cclass() github.com/MoarVM/MoarVM/compare/m...ass_slower	16:47	Copy link Message link Add to gist Remove
tellable6	2025-02-13T20:43:26Z #moarvm <lizmat> MasterDuke17 looks like the find_cclass update is causing HTTP::Tiny to fail		Copy link Message link Add to gist Remove
	2025-02-13T20:44:04Z #moarvm <lizmat> MasterDuke17 symptoms are having newlines at the end of a string where they were not expected according to the test		Copy link Message link Add to gist Remove
MasterDuke	but thanks timo for fixing the 8-bit case	16:48	Copy link Message link Add to gist Remove
lizmat	yeah, glad t was fixed so quickly. also glad it was spotted by an early blin run	16:49	Copy link Message link Add to gist Remove
MasterDuke	jnthn, nine, timo: any idea how difficult that python interpreter change (from computed goto to tail-calling) would be to implement for moarvm?	16:53	Copy link Message link Add to gist Remove
	jnthn: it looks like moarvm uses the cytron paper to compute SSA form in spesh. i ran across bernsteinbear.com/blog/ssa/ the other day and wondered if there would be any benefit to switching to a different algorithm?	16:56	Copy link Message link Add to gist Remove
17:07 MasterDuke left 18:31 MasterDuke joined
MasterDuke	can i quickly tell if an array of utf8 bytes will result in any 32-bit graphemes without decoding it/them?	18:33	Copy link Message link Add to gist Remove
	we have `MVM_string_buf32_can_fit_into_8bit()`, but that takes an array of already-decoded graphemes		Copy link Message link Add to gist Remove
timo	it's not too hard to detect the case of "doesn't have even a single continuation byte"	18:43	Copy link Message link Add to gist Remove
	however, i'm not sure "only 7bit codepoints" corresponds directly to "no 32bit graphemes"	18:44	Copy link Message link Add to gist Remove
	after NFG happens		Copy link Message link Add to gist Remove
	are there any combining characters in the lowest 127 unicode codepoints?	18:45	Copy link Message link Add to gist Remove
MasterDuke	dunno	18:46	Copy link Message link Add to gist Remove
timo	if that is possible, a utf8 input that only has no-continuation-byte codepoints encoded in it could require a new synthetic codepoint to be allocated, and if that's possible, we just need 126 or 125 different sequences to be created in order to spill over into 32bit grapheme territory	18:47	Copy link Message link Add to gist Remove
MasterDuke	so as long as the byte sequence was less than 125 bytes long it could be know to be safe?	18:48	Copy link Message link Add to gist Remove
timo	no		Copy link Message link Add to gist Remove
	synthetics are global per instance		Copy link Message link Add to gist Remove
	they also do not get recycled	18:49	Copy link Message link Add to gist Remove
MasterDuke	hm. well, faking it by calling `MVM_string_buf32_can_fit_into_8bit((MVMGrapheme32 *)utf8, bytes)` and then duplicating the decoding code into branches for 8 vs 32 doesn't seem to be faster, so i guess it doesn't really matter	18:53	Copy link Message link Add to gist Remove
timo	have you been looking at something like perf's per-instruction annotated assembly output for some metrics?	18:54	Copy link Message link Add to gist Remove
MasterDuke	sort of. i know arm assembly even less than x86. mostly just looking at the time taken to call `"CORE.c.setting".IO.slurp.lines` a bunch of times in a loop	18:56	Copy link Message link Add to gist Remove
timo	yeah, it will probably be quite tricky to figure out what's making it slow without some very low-level output i'm thinking	18:57	Copy link Message link Add to gist Remove
MasterDuke	since i figure that has a good mix of line lengths and char sizes		Copy link Message link Add to gist Remove
timo	doesn't have to be perf, can be cachegrind's cache simulation for example that could be of interest		Copy link Message link Add to gist Remove
	we don't have anything at the raku level that decodes utf8 without also doing normalization?	19:00	Copy link Message link Add to gist Remove
ugexe	i haven't followed along, but is there still a performance benefit for those string code paths despite any additional fixes that came after the original bench marking?		Copy link Message link Add to gist Remove
	im curious but im also too lazy to look at the fixes to have an intuitive idea	19:01	Copy link Message link Add to gist Remove
timo	right, the fix made it more expensive again		Copy link Message link Add to gist Remove
MasterDuke	haven't checked		Copy link Message link Add to gist Remove
timo	that would be worthwhile. maybe we should actually revert that and continue searching for something faster		Copy link Message link Add to gist Remove
MasterDuke	gist.github.com/MasterDuke17/dc4d3...3ece2ee734	19:03	Copy link Message link Add to gist Remove
	that's MVM_string_utf8_decode() at HEAD of main		Copy link Message link Add to gist Remove
timo	oh in my head i was still at the "found more ways to make find_cclass slower" from earlier today	19:04	Copy link Message link Add to gist Remove
	i know almost none of these mnemonics :D	19:05	Copy link Message link Add to gist Remove
MasterDuke	i'm looking at both MVM_string_utf8_decode and MVM_string_find_cclass, they're 2 and 3 in perf for that example i gave	19:06	Copy link Message link Add to gist Remove
	i think MVM_unicode_normalizer_get_grapheme has tons of branches	19:07	Copy link Message link Add to gist Remove
	or MVM_unicode_normalizer_process_codepoint_to_grapheme	19:08	Copy link Message link Add to gist Remove
timo	branches can be unproblematic if they are overwhelmingly predicted correctly	19:10	Copy link Message link Add to gist Remove
	but i would presume that for diverse input text like the core setting there's plenty times when it hits some rare branches?	19:11	Copy link Message link Add to gist Remove
MasterDuke	also the code is largish, so might not get inlined well		Copy link Message link Add to gist Remove
timo	branch mispredictions are also a thing perf can measure	19:12	Copy link Message link Add to gist Remove
MasterDuke	0.33% of all branches missed	19:14	Copy link Message link Add to gist Remove
timo	since inlining of C code happens only at compile time, you should be able to see it from the assembly view, whether it goes back and forth between one function and the other	19:15	Copy link Message link Add to gist Remove
	if you crank the -F parameter up a bit, we should see a slightly more even spread of instructions with nonzero percent values, which will also give a little bit of a hint about coverage	19:21	Copy link Message link Add to gist Remove
	in the last gif you shared, i think that really just looks like the main place it's hitting is where it has to actually wait for new parts of the input to arrive from memory. so that's the "ldr q31, [x0] #16" that queues the load and gets hit by sampling only 0.29% and the "bic v31.4s, #0x7f" after it which gets 5.55% of the hits	19:24	Copy link Message link Add to gist Remove
	ah, there are entries with more than 10% further up too	19:25	Copy link Message link Add to gist Remove
	sorry, i'm really not entirely completely good at this	19:27	Copy link Message link Add to gist Remove
MasterDuke	gist.github.com/MasterDuke17/eefd4...d3f4d27ba2 is with `-F max` and for 200 loop iterations instead of 50	19:28	Copy link Message link Add to gist Remove
	gist.github.com/MasterDuke17/0213d...ed0db46104 is the same thing on my x86 desktop	19:32	Copy link Message link Add to gist Remove
	or what about a fast way to find out if anything will need normalization? then we could have a decode body that doesn't need to go through the normalizer	19:34	Copy link Message link Add to gist Remove
timo	that will probably require attention every time we do a unicode update	19:35	Copy link Message link Add to gist Remove
MasterDuke	oh, don't we need to do a unicode update now regardless?	19:36	Copy link Message link Add to gist Remove
	we're on 15...	19:37	Copy link Message link Add to gist Remove
	yeah, 16 came out in september	19:38	Copy link Message link Add to gist Remove
timo	here's an idea that is probably worth barely anything at all: is it always safe to skip the `if (n->buffer_norm_end == n->buffer_start)` check from inside this code? actually, i thought i saw some time spent there but now i can't find it again	19:41	Copy link Message link Add to gist Remove
	gist.github.com/MasterDuke17/0213d...1-txt-L215 maybe i was seeing this?	19:47	Copy link Message link Add to gist Remove
MasterDuke	don't really notice a difference after removing it	19:49	Copy link Message link Add to gist Remove
timo	OK		Copy link Message link Add to gist Remove
	the unicode stuff is unfortunately an area where i've only got a surface-level understanding :\|	19:58	Copy link Message link Add to gist Remove
MasterDuke	and i've got less than that	19:59	Copy link Message link Add to gist Remove
20:39 MasterDuke left
timo	i'm going to have a look at the intel vtune profiler	21:22	Copy link Message link Add to gist Remove
22:19 sena_kun joined
jnthn	.tell MasterDuke It's not clear to me that any of the other algorithms mentioend there offer a significant performance win. I believe it's using a more modern (and faster) algorithm than was available when the Cryton paper was written to calculate the dominance.	22:27	Copy link Message link Add to gist Remove
tellable6	jnthn, I'll pass your message to MasterDuke	22:28	Copy link Message link Add to gist Remove
jnthn	.tell MasterDuke to be clear, MoarVM is already using the faster dominance algorithm		Copy link Message link Add to gist Remove
tellable6	jnthn, I'll pass your message to MasterDuke		Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!