Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
timo1 i've randomly become interested in how the worklists in our collect phase of the gc behave 00:01
[Coke] timo! ~~ 00:23
timo1 ohai coke 00:30
i'm actually already dreadfully late to bed so i'll share my findings tomorrow at the earliest
[Coke] ~~ 00:31
timo1 i was looking what alloc sizes the worklists ended up with when a gc run finished, and what reprs were causing it to double its size, and what reprs were causing it to grow by an exact amount with the "presize_for" function
amusingly(?) there's some bits in our code where we first use presize_for to set an exact size based on some variable length array inside of an object, and then immediately use the regular worklist_add which, if the presize_for had just set the size exactly, will immediately cause a doubling of the size 00:33
so that's not optimal, but it's also possibly not important at all
then there's the core setting compile where i haven't run it to the end yet with my debug output spew, but the size of the worklist grows more and more over time, and i think a lot of that is driven by a single hash that becomes bigger and bigger 00:34
i have a very rough prototype where objects that have a boatload of pointers in a row and want them all to be in the worklist can instead put a pointer to the start of these pointers, the distance between pointers, and the amount of pointers, into a separate thing inside the worklist so the collect process can grab pointers from there without copying them to a worklist first 00:37
but this doesn't immediately work for hashes, because they are More Complicated, though not by very much 00:38
just before stage mbc starts, there's a 590 kilobyte big worklist 00:42
without my patch, at the end of stage parse the worklist is 100117 pointers (782 kbytes) 00:52
75650 pointers (591 kbytes) before stage mbc starts 00:53
ah, that means no change probably?
07:55 sena_kun joined
timo1 i have been thinking for a while if maybe a small cache of recently moved or seen collectables that can be rewritten immediately 11:17
would work at all, and at what size
i'll just fprintf every address i see to a log file, easy peasy! that will surely only result in a few megabytes of text! (smash cut to my hard drive exploding violently in a cloud of sharp spinning bits) 11:20
boom, literally the first screenful of pointers are just three different pointers :D 11:24
lizmat timo1++ 11:29
timo1 this cache i'm thinking of may just have overhead that it can't bring back in 11:35
11:57 sena_kun left
timo1 welp, first result seems to say the cache sucks 12:32
lizmat it's easy to underestimate the overhead a cache involves 12:35
timo1 this result is just from hit/miss percentages 12:37
so here's my thinking on this:
when we go collecting, we add every pointer an object has that goes to a collectable to the worklist, and then we take the next work item from the worklist
when working on a work item, that's when we check, for example, if the object we're pointing to already has a valid forwarder, so the pointer can just be updated 12:38
this involves following the pointer, so an indirect memory access 12:39
if we happened to know with relative certainty that the target of the pointer is still available in the CPU's cache, then it would surely be cheaper to immediately follow the pointer to get the forwarder and update the pointer, instead of putting it into the worklist
if we just always immediately check the forwarder, i think that would be an unfavorable memory access pattern? maybe? 12:41
i'm not so sure about that any more tbh 12:42
one thing that's beneficial about the "array refs" thing i came up with yesterday (that i turned off for this test, maybe i should turn it back on) is that instead of plopping a boatload of pointers into the worklist, then going through these worklist entries, this will interleave taking pointers from the array ref and updating them 12:46
and currently my "check the cache if the pointer was recently updated" lives in the spot where a pointer gets added to the worklist
so imagine an array with a few thousand pointers to the same object 12:47
without arrayref, every one of these pointers would immediately be checked against "is it in the cache?" and would get "nope" unless it just happened to be in there already from an earlier reference to it
and after the thousand pointers go in the worklist, they are checked and can go into the cache 12:48
but at least then they are surely in the cpu cache instead, at least L3
i've also been wondering if it's worth doubling the size of the worklist by storing the position of the pointer as well as the value of the pointer so we can skip one indirect memory lookup when taking a work item off the list 12:56
but that should still be in the cpu cache as well? if the worklist doesn't balloon too much?
jnthn A common case is a pointer within the bounds of the fromspace, which will very likely be in the cache. 14:14
So chasing that immediately should be cheap 14:15
timo1 i'll have a look at counting that case vs the other case 14:30
do you have a feeling about changing the order we traverse the worklist in? right now it's a FIFO i.e. we pop items off the end, maybe it should be a LIFO instead, a queue 14:33
i'm also thinking we should maybe round up to next power of two when "presize_for" bumps us over the alloc limit, so we don't do it like five times in a row by like 1000 units 14:38
15:33 [Coke] left 15:35 [Coke] joined
timo1 .o( jit-compile gc_mark for different P6Opaque layouts ... ) 15:57
turns out 100% of gc_mark_slots in P6opaque in core setting compilation has P6str in them, so a fast path there to call the target function directly based on the repr might be worth 0.01% 17:25
lizmat that feels like a lot of work for little gain? 17:26
timo1 yeah, my tuit-shape today is very strange 17:30
lizmat timo1: are you aware of github.com/MoarVM/MoarVM/pull/1802 ? 17:36
it appears stalled 17:37
timo1 the only issue i can tell from the conversation on the pull request is the flapping reproducible builds error; if it's a flapper, should we ignore it? doesn't seem like a good idea tbh. maybe i'll investigate that further 17:42
lizmat always good to have a set of knowledgeable eyes on a problem 17:43
17:51 MasterDukeMobile joined
MasterDukeMobile FYI, for 1802, the reproducible build test fails 100% of the time 17:53
I also had seen it flap about less than 1% of the time on main, but itā€™s consistent on that branch 17:55
17:56 MasterDukeMobile left 17:57 MasterDukeMobile joined
MasterDukeMobile I had hoped rebootstrapping would fix it, but no such luck 17:58
timo1 ohai there 17:59
MasterDukeMobile I guess itā€™s something about strings changing their storage type during deserialization from wheat they were serialized as?
I have no real idea, but that seems like the only thing thatā€™s changed 18:00
Hm, but then shouldnā€™t a third serialize/deserialize cycle be the same as the second? 18:01
timo1: btw, happy to see youā€™ve got more optimization ideas 18:02
And if you have any more ideas about where/how to make in-situ-strings Iā€™m all ears 18:05
timo1 oh, we sometimes serialize strings better if we know they have no synthetics? i remember something like that, that kind of fits your description 18:08
MasterDukeMobile I think something like that, though I canā€™t find it in the PR right now (on phone) 18:11
timo1 yup, i'll look into it right now
Binary files QRegex.moarvm.three and QRegex.moarvm.two differ 18:12
-0000c670: 0211 3802 114b 0211 1602 1119 0211 1a02 ..8..K.......... 18:13
+0000c670: 0211 3702 114b 0211 1602 1119 0211 1a02 ..7..K..........
MasterDukeMobile Maybe the names of the funky capture variable? E.g., $ā‚¬ 18:14
timo1 how nice of it to have a single byte difference in one of these spots, and even visible ascii
but yeah there's multiple differences, QRegex is the smallest of the files that differ 18:15
MasterDukeMobile I didnā€™t think to look at the actual binary. I just looked at the dumps produced in the test, but I didnā€™t find them helpful
timo1 i haven't looked at these dumps yet 18:16
i thought it'd be simpler to check in nqp before going to rakudo in the hopes that it'll be wrong there already, and easier to spot
MasterDukeMobile ISTR lots of different frame numbers, but I had no intuition about why that was happening 18:17
timo1 ok it compares the output of moar --dump then?
MasterDukeMobile Yep
timo1 it dumps the same in the QRegex case :D
the moar --dump output could even be a red herring 18:18
MasterDukeMobile Doh
lizmat perhaps raku.land/zef:lizmat/MoarVM::Bytecode could be helpful ? 18:19
MasterDukeMobile Hm, I remember thinking of that, but donā€™t remember if I actually used itā€¦ 18:20
timo1 ah nice 18:21
MasterDukeMobile Is nqp supposed to have a reproducible build? If so, a test there would be nice, since like timo1 said it should be less to compare 18:26
lizmat I think it has? 18:28
timo1 i haven't checked if it isn't or doesn't 18:29
lizmat t/serialization/*.t
timo1 but i would expect nqp is even less fussy than rakudo about the build being deterministic 18:30
lizmat hMmm maybe not indeed
MasterDukeMobile Well, all the nqp test pass with the PR, so something isnā€™t getting tickled the same as it is in/with rakudo
timo1 MoarVM::Bytecode tests aren't passing, probably some version/compatibility drift? 18:31
MasterDukeMobile Are you testing it with a MoarVM built on the branch? 18:32
timo1 yeah 18:34
could be something wrong with the branch you think?
MasterDukeMobile Well, we know there something going on with bytecode, right? Whether itā€™s ā€œwrongā€-error or just ā€œwrongā€-missing-some-adaptation I think is tbd 18:36
timo1 my uint $offset = -1; 18:38
hehe. i see.
MasterDukeMobile Completely unrelated, but you (and a lot of the people in this channel) mind find godbolt.org/z/dTEME34fn interesting 18:39
timo1 i'm not sure what i'm seeing. compiler bug, or unexpected behaviour of some compiler flags? 18:42
MasterDukeMobile Compiler bug
timo1 ah, the left output completely does that work at compile time
MasterDukeMobile Found because of github.com/jeaiii/itoa/issues/17 18:44
Probably afk for a while, hopefully on later this evening 18:45
timo1 oh MoarVM::Bytecode is looking for all files that end in .moarvm inside my ~/raku, where there's also the folder of a fuzzing campaign 18:46
lizmat hehe... you *can* be specific, no? 18:47
MasterDukeMobile Ha, I think I have that same folder somewhere
18:49 sena_kun joined
timo1 oh yeah quite interesting find 18:51
18:51 MasterDukeMobile left
timo1 MoarVM::Bytecode doesn't seem to handle dispatch_* ops correctly yet, which makes sense since they are Special and New 19:16
lizmat timo1: could you please make an issue for that, with examples ? 19:20
and if you have any suggestions, please add them as a separate issue 19:27
timo1 rakudo --ll-exception -e 'use MoarVM::Bytecode; my $c = MoarVM::Bytecode.new("c"); $c.frames.raku.say'
chars requires a concrete string, but got null
how did i do this :D 19:28
lizmat m: my str $a; use nqp; say nqp::chars($a)
camelia 0
lizmat well, in nqp then
hmmm 19:29
m: use nqp; my $s := nqp::list_s; nqp::chars(nqp::atpos_s($s,0)) 19:30
camelia chars requires a concrete string, but got null
in block <unit> at <tmp> line 1
timo1 there you go: github.com/lizmat/MoarVM-Bytecode/issues/1 19:41
but i do see code that looks like it's meant to handle dispatch so i'm not sure what's going wrong 19:45
i changed the title to reflect that 19:54
i don't think i actually need to have this feature work properly for the purposes of finding the difference between the files 19:56
yeah ok there's differences in the sc area 20:01
i'll try the reproducible-builds test now just in case nqp is actually not expected to be reproducible 20:02
i think i see the problem 20:13
i may already have a fix 20:27
we were comparing strings wrong, depending on the way they are stored 20:29
that's kind of hard to test in a regular spectest, since we have no way to force a string to have a very specific storage type, and no way to introspect it either
so even with nqp code it's tough to test
as i see it, the code was accidentally comparing an in-situ-8 string with whatever bytes were in the storage attribute of the other, same with 32, which in some cases turned out to be string vs pointer? 20:34
string storage types counts: MVM_STRING_GRAPHEME_32 45022 MVM_STRING_GRAPHEME_ASCII 44 MVM_STRING_GRAPHEME_8 186097 MVM_STRING_STRAND 173176 MVM_STRING_IN_SITU_8 658137 MVM_STRING_IN_SITU_32 2257 21:43
this is the last gen2 collection during setting compilation, just adding the number of strings seen on the heap with each storage type
22:04 sena_kun left
timo1 gist.github.com/timo/530e0e99bb6ae...3345c8c0b2 looks kinda promising 22:13
surprised to see that str_hash_demolish is so far up in allocations, like it's creating a large amount of entries in the "free at safepoint" linked list? 22:28
i guess it's not such a huge amount in total 22:46