#moarvm on 5 July 2024 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
timo1	i've randomly become interested in how the worklists in our collect phase of the gc behave	00:01	Copy link Message link Add to gist Remove
[Coke]	timo! ~~	00:23	Copy link Message link Add to gist Remove
timo1	ohai coke	00:30	Copy link Message link Add to gist Remove
	i'm actually already dreadfully late to bed so i'll share my findings tomorrow at the earliest		Copy link Message link Add to gist Remove
[Coke]	~~	00:31	Copy link Message link Add to gist Remove
timo1	i was looking what alloc sizes the worklists ended up with when a gc run finished, and what reprs were causing it to double its size, and what reprs were causing it to grow by an exact amount with the "presize_for" function		Copy link Message link Add to gist Remove
	amusingly(?) there's some bits in our code where we first use presize_for to set an exact size based on some variable length array inside of an object, and then immediately use the regular worklist_add which, if the presize_for had just set the size exactly, will immediately cause a doubling of the size	00:33	Copy link Message link Add to gist Remove
	so that's not optimal, but it's also possibly not important at all		Copy link Message link Add to gist Remove
	then there's the core setting compile where i haven't run it to the end yet with my debug output spew, but the size of the worklist grows more and more over time, and i think a lot of that is driven by a single hash that becomes bigger and bigger	00:34	Copy link Message link Add to gist Remove
	i have a very rough prototype where objects that have a boatload of pointers in a row and want them all to be in the worklist can instead put a pointer to the start of these pointers, the distance between pointers, and the amount of pointers, into a separate thing inside the worklist so the collect process can grab pointers from there without copying them to a worklist first	00:37	Copy link Message link Add to gist Remove
	but this doesn't immediately work for hashes, because they are More Complicated, though not by very much	00:38	Copy link Message link Add to gist Remove
	just before stage mbc starts, there's a 590 kilobyte big worklist	00:42	Copy link Message link Add to gist Remove
	without my patch, at the end of stage parse the worklist is 100117 pointers (782 kbytes)	00:52	Copy link Message link Add to gist Remove
	75650 pointers (591 kbytes) before stage mbc starts	00:53	Copy link Message link Add to gist Remove
	ah, that means no change probably?		Copy link Message link Add to gist Remove
07:55 sena_kun joined
timo1	i have been thinking for a while if maybe a small cache of recently moved or seen collectables that can be rewritten immediately	11:17	Copy link Message link Add to gist Remove
	would work at all, and at what size		Copy link Message link Add to gist Remove
	i'll just fprintf every address i see to a log file, easy peasy! that will surely only result in a few megabytes of text! (smash cut to my hard drive exploding violently in a cloud of sharp spinning bits)	11:20	Copy link Message link Add to gist Remove
	boom, literally the first screenful of pointers are just three different pointers :D	11:24	Copy link Message link Add to gist Remove
lizmat	timo1++	11:29	Copy link Message link Add to gist Remove
timo1	this cache i'm thinking of may just have overhead that it can't bring back in	11:35	Copy link Message link Add to gist Remove
11:57 sena_kun left
timo1	welp, first result seems to say the cache sucks	12:32	Copy link Message link Add to gist Remove
lizmat	it's easy to underestimate the overhead a cache involves	12:35	Copy link Message link Add to gist Remove
timo1	this result is just from hit/miss percentages	12:37	Copy link Message link Add to gist Remove
	so here's my thinking on this:		Copy link Message link Add to gist Remove
	when we go collecting, we add every pointer an object has that goes to a collectable to the worklist, and then we take the next work item from the worklist		Copy link Message link Add to gist Remove
	when working on a work item, that's when we check, for example, if the object we're pointing to already has a valid forwarder, so the pointer can just be updated	12:38	Copy link Message link Add to gist Remove
	this involves following the pointer, so an indirect memory access	12:39	Copy link Message link Add to gist Remove
	if we happened to know with relative certainty that the target of the pointer is still available in the CPU's cache, then it would surely be cheaper to immediately follow the pointer to get the forwarder and update the pointer, instead of putting it into the worklist		Copy link Message link Add to gist Remove
	if we just always immediately check the forwarder, i think that would be an unfavorable memory access pattern? maybe?	12:41	Copy link Message link Add to gist Remove
	i'm not so sure about that any more tbh	12:42	Copy link Message link Add to gist Remove
	one thing that's beneficial about the "array refs" thing i came up with yesterday (that i turned off for this test, maybe i should turn it back on) is that instead of plopping a boatload of pointers into the worklist, then going through these worklist entries, this will interleave taking pointers from the array ref and updating them	12:46	Copy link Message link Add to gist Remove
	and currently my "check the cache if the pointer was recently updated" lives in the spot where a pointer gets added to the worklist		Copy link Message link Add to gist Remove
	so imagine an array with a few thousand pointers to the same object	12:47	Copy link Message link Add to gist Remove
	without arrayref, every one of these pointers would immediately be checked against "is it in the cache?" and would get "nope" unless it just happened to be in there already from an earlier reference to it		Copy link Message link Add to gist Remove
	and after the thousand pointers go in the worklist, they are checked and can go into the cache	12:48	Copy link Message link Add to gist Remove
	but at least then they are surely in the cpu cache instead, at least L3		Copy link Message link Add to gist Remove
	i've also been wondering if it's worth doubling the size of the worklist by storing the position of the pointer as well as the value of the pointer so we can skip one indirect memory lookup when taking a work item off the list	12:56	Copy link Message link Add to gist Remove
	but that should still be in the cpu cache as well? if the worklist doesn't balloon too much?		Copy link Message link Add to gist Remove
jnthn	A common case is a pointer within the bounds of the fromspace, which will very likely be in the cache.	14:14	Copy link Message link Add to gist Remove
	So chasing that immediately should be cheap	14:15	Copy link Message link Add to gist Remove
timo1	i'll have a look at counting that case vs the other case	14:30	Copy link Message link Add to gist Remove
	do you have a feeling about changing the order we traverse the worklist in? right now it's a FIFO i.e. we pop items off the end, maybe it should be a LIFO instead, a queue	14:33	Copy link Message link Add to gist Remove
	i'm also thinking we should maybe round up to next power of two when "presize_for" bumps us over the alloc limit, so we don't do it like five times in a row by like 1000 units	14:38	Copy link Message link Add to gist Remove
15:33 [Coke] left 15:35 [Coke] joined
timo1	.o( jit-compile gc_mark for different P6Opaque layouts ... )	15:57	Copy link Message link Add to gist Remove
	turns out 100% of gc_mark_slots in P6opaque in core setting compilation has P6str in them, so a fast path there to call the target function directly based on the repr might be worth 0.01%	17:25	Copy link Message link Add to gist Remove
lizmat	that feels like a lot of work for little gain?	17:26	Copy link Message link Add to gist Remove
timo1	yeah, my tuit-shape today is very strange	17:30	Copy link Message link Add to gist Remove
lizmat	timo1: are you aware of github.com/MoarVM/MoarVM/pull/1802 ?	17:36	Copy link Message link Add to gist Remove
	it appears stalled	17:37	Copy link Message link Add to gist Remove
timo1	the only issue i can tell from the conversation on the pull request is the flapping reproducible builds error; if it's a flapper, should we ignore it? doesn't seem like a good idea tbh. maybe i'll investigate that further	17:42	Copy link Message link Add to gist Remove
lizmat	always good to have a set of knowledgeable eyes on a problem	17:43	Copy link Message link Add to gist Remove
17:51 MasterDukeMobile joined
MasterDukeMobile	FYI, for 1802, the reproducible build test fails 100% of the time	17:53	Copy link Message link Add to gist Remove
	I also had seen it flap about less than 1% of the time on main, but it’s consistent on that branch	17:55	Copy link Message link Add to gist Remove
17:56 MasterDukeMobile left 17:57 MasterDukeMobile joined
MasterDukeMobile	I had hoped rebootstrapping would fix it, but no such luck	17:58	Copy link Message link Add to gist Remove
timo1	ohai there	17:59	Copy link Message link Add to gist Remove
MasterDukeMobile	I guess it’s something about strings changing their storage type during deserialization from wheat they were serialized as?		Copy link Message link Add to gist Remove
	I have no real idea, but that seems like the only thing that’s changed	18:00	Copy link Message link Add to gist Remove
	Hm, but then shouldn’t a third serialize/deserialize cycle be the same as the second?	18:01	Copy link Message link Add to gist Remove
	timo1: btw, happy to see you’ve got more optimization ideas	18:02	Copy link Message link Add to gist Remove
	And if you have any more ideas about where/how to make in-situ-strings I’m all ears	18:05	Copy link Message link Add to gist Remove
timo1	oh, we sometimes serialize strings better if we know they have no synthetics? i remember something like that, that kind of fits your description	18:08	Copy link Message link Add to gist Remove
MasterDukeMobile	I think something like that, though I can’t find it in the PR right now (on phone)	18:11	Copy link Message link Add to gist Remove
timo1	yup, i'll look into it right now		Copy link Message link Add to gist Remove
	Binary files QRegex.moarvm.three and QRegex.moarvm.two differ	18:12	Copy link Message link Add to gist Remove
	-0000c670: 0211 3802 114b 0211 1602 1119 0211 1a02 ..8..K..........	18:13	Copy link Message link Add to gist Remove
	+0000c670: 0211 3702 114b 0211 1602 1119 0211 1a02 ..7..K..........		Copy link Message link Add to gist Remove
MasterDukeMobile	Maybe the names of the funky capture variable? E.g., $€	18:14	Copy link Message link Add to gist Remove
timo1	how nice of it to have a single byte difference in one of these spots, and even visible ascii		Copy link Message link Add to gist Remove
	but yeah there's multiple differences, QRegex is the smallest of the files that differ	18:15	Copy link Message link Add to gist Remove
MasterDukeMobile	I didn’t think to look at the actual binary. I just looked at the dumps produced in the test, but I didn’t find them helpful		Copy link Message link Add to gist Remove
timo1	i haven't looked at these dumps yet	18:16	Copy link Message link Add to gist Remove
	i thought it'd be simpler to check in nqp before going to rakudo in the hopes that it'll be wrong there already, and easier to spot		Copy link Message link Add to gist Remove
MasterDukeMobile	ISTR lots of different frame numbers, but I had no intuition about why that was happening	18:17	Copy link Message link Add to gist Remove
timo1	ok it compares the output of moar --dump then?		Copy link Message link Add to gist Remove
MasterDukeMobile	Yep		Copy link Message link Add to gist Remove
timo1	it dumps the same in the QRegex case :D		Copy link Message link Add to gist Remove
	the moar --dump output could even be a red herring	18:18	Copy link Message link Add to gist Remove
MasterDukeMobile	Doh		Copy link Message link Add to gist Remove
lizmat	perhaps raku.land/zef:lizmat/MoarVM::Bytecode could be helpful ?	18:19	Copy link Message link Add to gist Remove
MasterDukeMobile	Hm, I remember thinking of that, but don’t remember if I actually used it…	18:20	Copy link Message link Add to gist Remove
timo1	ah nice	18:21	Copy link Message link Add to gist Remove
MasterDukeMobile	Is nqp supposed to have a reproducible build? If so, a test there would be nice, since like timo1 said it should be less to compare	18:26	Copy link Message link Add to gist Remove
lizmat	I think it has?	18:28	Copy link Message link Add to gist Remove
timo1	i haven't checked if it isn't or doesn't	18:29	Copy link Message link Add to gist Remove
lizmat	t/serialization/*.t		Copy link Message link Add to gist Remove
timo1	but i would expect nqp is even less fussy than rakudo about the build being deterministic	18:30	Copy link Message link Add to gist Remove
lizmat	hMmm maybe not indeed		Copy link Message link Add to gist Remove
MasterDukeMobile	Well, all the nqp test pass with the PR, so something isn’t getting tickled the same as it is in/with rakudo		Copy link Message link Add to gist Remove
timo1	MoarVM::Bytecode tests aren't passing, probably some version/compatibility drift?	18:31	Copy link Message link Add to gist Remove
MasterDukeMobile	Are you testing it with a MoarVM built on the branch?	18:32	Copy link Message link Add to gist Remove
timo1	yeah	18:34	Copy link Message link Add to gist Remove
	could be something wrong with the branch you think?		Copy link Message link Add to gist Remove
MasterDukeMobile	Well, we know there something going on with bytecode, right? Whether it’s “wrong”-error or just “wrong”-missing-some-adaptation I think is tbd	18:36	Copy link Message link Add to gist Remove
timo1	my uint $offset = -1;	18:38	Copy link Message link Add to gist Remove
	hehe. i see.		Copy link Message link Add to gist Remove
MasterDukeMobile	Completely unrelated, but you (and a lot of the people in this channel) mind find godbolt.org/z/dTEME34fn interesting	18:39	Copy link Message link Add to gist Remove
timo1	i'm not sure what i'm seeing. compiler bug, or unexpected behaviour of some compiler flags?	18:42	Copy link Message link Add to gist Remove
MasterDukeMobile	Compiler bug		Copy link Message link Add to gist Remove
timo1	ah, the left output completely does that work at compile time		Copy link Message link Add to gist Remove
MasterDukeMobile	Found because of github.com/jeaiii/itoa/issues/17	18:44	Copy link Message link Add to gist Remove
	Probably afk for a while, hopefully on later this evening	18:45	Copy link Message link Add to gist Remove
timo1	oh MoarVM::Bytecode is looking for all files that end in .moarvm inside my ~/raku, where there's also the folder of a fuzzing campaign	18:46	Copy link Message link Add to gist Remove
lizmat	hehe... you can be specific, no?	18:47	Copy link Message link Add to gist Remove
MasterDukeMobile	Ha, I think I have that same folder somewhere		Copy link Message link Add to gist Remove
18:49 sena_kun joined
timo1	oh yeah quite interesting find	18:51	Copy link Message link Add to gist Remove
18:51 MasterDukeMobile left
timo1	MoarVM::Bytecode doesn't seem to handle dispatch_* ops correctly yet, which makes sense since they are Special and New	19:16	Copy link Message link Add to gist Remove
lizmat	timo1: could you please make an issue for that, with examples ?	19:20	Copy link Message link Add to gist Remove
	and if you have any suggestions, please add them as a separate issue	19:27	Copy link Message link Add to gist Remove
timo1	rakudo --ll-exception -e 'use MoarVM::Bytecode; my $c = MoarVM::Bytecode.new("c"); $c.frames.raku.say'		Copy link Message link Add to gist Remove
	chars requires a concrete string, but got null		Copy link Message link Add to gist Remove
	how did i do this :D	19:28	Copy link Message link Add to gist Remove
lizmat	m: my str $a; use nqp; say nqp::chars($a)		Copy link Message link Add to gist Remove Run code
camelia	0		Copy link Message link Add to gist Remove
lizmat	well, in nqp then		Copy link Message link Add to gist Remove
	hmmm	19:29	Copy link Message link Add to gist Remove
	m: use nqp; my $s := nqp::list_s; nqp::chars(nqp::atpos_s($s,0))	19:30	Copy link Message link Add to gist Remove Run code
camelia	chars requires a concrete string, but got null in block <unit> at <tmp> line 1		Copy link Message link Add to gist Remove
timo1	there you go: github.com/lizmat/MoarVM-Bytecode/issues/1	19:41	Copy link Message link Add to gist Remove
	but i do see code that looks like it's meant to handle dispatch so i'm not sure what's going wrong	19:45	Copy link Message link Add to gist Remove
	i changed the title to reflect that	19:54	Copy link Message link Add to gist Remove
	i don't think i actually need to have this feature work properly for the purposes of finding the difference between the files	19:56	Copy link Message link Add to gist Remove
	yeah ok there's differences in the sc area	20:01	Copy link Message link Add to gist Remove
	i'll try the reproducible-builds test now just in case nqp is actually not expected to be reproducible	20:02	Copy link Message link Add to gist Remove
	i think i see the problem	20:13	Copy link Message link Add to gist Remove
	i may already have a fix	20:27	Copy link Message link Add to gist Remove
	we were comparing strings wrong, depending on the way they are stored	20:29	Copy link Message link Add to gist Remove
	that's kind of hard to test in a regular spectest, since we have no way to force a string to have a very specific storage type, and no way to introspect it either		Copy link Message link Add to gist Remove
	so even with nqp code it's tough to test		Copy link Message link Add to gist Remove
	as i see it, the code was accidentally comparing an in-situ-8 string with whatever bytes were in the storage attribute of the other, same with 32, which in some cases turned out to be string vs pointer?	20:34	Copy link Message link Add to gist Remove
	string storage types counts: MVM_STRING_GRAPHEME_32 45022 MVM_STRING_GRAPHEME_ASCII 44 MVM_STRING_GRAPHEME_8 186097 MVM_STRING_STRAND 173176 MVM_STRING_IN_SITU_8 658137 MVM_STRING_IN_SITU_32 2257	21:43	Copy link Message link Add to gist Remove
	this is the last gen2 collection during setting compilation, just adding the number of strings seen on the heap with each storage type		Copy link Message link Add to gist Remove
22:04 sena_kun left
timo1	gist.github.com/timo/530e0e99bb6ae...3345c8c0b2 looks kinda promising	22:13	Copy link Message link Add to gist Remove
	surprised to see that str_hash_demolish is so far up in allocations, like it's creating a large amount of entries in the "free at safepoint" linked list?	22:28	Copy link Message link Add to gist Remove
	i guess it's not such a huge amount in total	22:46	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!