Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
vrurg [Coke]: to clone you need the original. But it's too late to steal it! 00:35
Nicholas good *, #moarvm 07:19
nine Good sunrise! 07:22
Geth MoarVM/fix_phi_out_of_bounds_read: bf106de221 | (Stefan Seifert)++ | 2 files
Fix out of bounds read of PHI facts in spesh

During spesh optimization, we remove reads of registers with dead writers from PHI nodes. It could happen that the PHI node ended up with no registers to read at all. However the following analysis code assumed that we'd always have at least 1 register to read from, resulting in an array read out of bounds error and a variety of failure modes. ... (5 more lines)
08:55
MoarVM: niner++ created pull request #1610:
Fix out of bounds read of PHI facts in spesh
nine dogbert17: fix for complex.t ^^^
Geth MoarVM/fix_phi_out_of_bounds_read: 8a684b3304 | (Stefan Seifert)++ | 2 files
Fix out of bounds read of PHI facts in spesh

During spesh optimization, we remove reads of registers with dead writers from PHI nodes. It could happen that the PHI node ended up with no registers to read at all. However the following analysis code assumed that we'd always have at least 1 register to read from, resulting in an array read out of bounds error and a variety of failure modes. ... (7 more lines)
08:56
nine (just added a reference to the GH issue to the commit message=
MasterDuke nice 08:59
dogbert17 nine++, was it an easy fix? 09:17
nine A few hours in total 09:22
Suddenly made a lot of sense when I dumped the spesh graph and saw that PHI that was not actually accessing the register that we got the bogus facts for
dogbert17 I wonder how often this actually happen 09:38
*happens 09:39
lizmat moarning! 09:47
so, with regards to the release... are we in agreement to postpone the release to 4 Dec ? 09:48
nine agrees
jnthnwrthngtn moarning o/ 09:54
Nicholas \o 09:55
lizmat and if we are in agreement on the postponement, does that also mean that the attrinited work by jnthn should still go in after that release 09:56
or that we move that forward ?
jnthnwrthngtn I locally ran blin on that branch over the weekend, didn't look at the results yet 09:58
dogbert17 dogbert@dogbert-VirtualBox:~/repos/oo-monitors$ perl6 -Ilib t/basic.t 11:37
===SORRY!=== Error while compiling /home/dogbert/repos/oo-monitors/t/basic.t
Missing or wrong version of dependency 'gen/moar/stage2/NQPHLL.nqp' (from '/home/dogbert/repos/oo-monitors/lib/OO/Monitors.pm6 (OO::Monitors)')
at /home/dogbert/repos/oo-monitors/t/basic.t:1
what would be needed in order to figure out why this problem occurs after a bump?
MasterDuke jnthnwrthngtn: an interesting change in behavior from new-disp, referenced in my most recent comment on github.com/rakudo/rakudo/pull/4650 12:05
committable6: 2021.09,2021.10 my Int $a; try { $a.VAR.log; CATCH { default { say $_.typename } } } 12:07
committable6 MasterDuke, ¦2021.09: «Int␤» ¦2021.10: «Scalar␤»
lizmat MasterDuke: I guess that enforces my point about improving the error message about containers :-) 12:25
jnthnwrthngtn MasterDuke: That looks more like a fix to me than anything? :) 12:37
MasterDuke heh, yeah. just wondering if there are any other places we should look for such things 12:38
though, tbh, i'm not sure why the invocant is `Int`. shouldn't it be `Scalar` also? 12:39
lizmat and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2021/11/23/2021-...adler-rip/ 13:30
timo: answer to your question about a golf of the reducing CPU usage on race 14:14
say (^5000000).roll(20).race(:1batch).map: { [+] ^$_ .map: { $_ * $_ } } 14:15
the answer is not important, the CPU usage is
timo having only four cores makes this possibly not easy to reproduce? 14:18
lizmat only has 8 though 14:19
saw similar behaviour on a 4 core machine 14:21
gist.github.com/lizmat/2e5bb69739d...24fdc0dd47 # snapper 14:22
MasterDuke yeah, i see pretty much the same thing 14:27
lizmat hmmm... maybe this is not a good example... 14:39
playing with some debug messages in core
hmmm... is nqp::time threadsafe ? 14:43
jnthnwrthngtn Struggle to imagine it not being; it calculates and returns a simple numeric value? 14:44
lizmat I'm just seeing strange values in debug messages, like "13: completed in 7936936 msecs" 14:45
that would be more than 2 hours :-)
I basically added: 14:46
my $from = nqp::time;
say "$*THREAD.id(): starting task";
evalable6 1: starting task
lizmat to !run-one
and:
say "$*THREAD.id(): completed in { ((nqp::time() - $from) / 1000).Int } msecs";
at the end
jnthnwrthngtn The the units of nqp::time micros or nanos? 14:51
lizmat yeah, the golf is flawed
nanos
jnthnwrthngtn Ah, maybe you intended msecs to be micro rather than mili, and I assumed milli... 14:52
lizmat say (^5000000).roll(20).race(:1batch).map: { [+] ^$_ .race.map: { $_ * $_ } } # better golf 15:00
and this better golf shows no decline in CPU usage, so I guess it *is* my log loading algorithm that is to blame
jnthnwrthngtn Is your loading CPU or I/O bound, ooc? 15:02
lizmat well, I'd say CPU bound, as the IO is just a .slurp
timo: please disregard my golf, until I have a better one 15:31
dogbert17 nine: I have now been running complex.t in a loop for a couple of hours and it hasn't crashed so your PHI fix works perfectly 16:26
dogbert17 wonders if nine's PR might have fixed the hyper bug as well 16:30
nine \o/ 16:38
MasterDuke nine: btw, did you see dev.azure.com/MoarVM/MoarVM/_build...559d8f7fdf ? 16:39
nine oh no 16:40
MasterDuke and while there's some recent talk about releases and whether to delay merging branches, anyone have thoughts/comments/suggestions on github.com/MoarVM/MoarVM/pull/1608 ?
nine MasterDuke: I actually have an idea about that failure 16:44
MasterDuke oh? 16:45
nine res is uninitialized here: github.com/MoarVM/MoarVM/blob/mast...ffi.c#L217 but added to frame roots here: github.com/MoarVM/MoarVM/blob/mast...ffi.c#L326 16:46
MasterDuke probably unrelated, but `values` leaks here github.com/MoarVM/MoarVM/blob/mast...ffi.c#L220 16:52
Geth MoarVM: 0006714d07 | (Stefan Seifert)++ | src/core/nativecall_libffi.c
Fix use of uninitialized memory in native callbacks with libffi

We're GC rooting res.o but didn't initialize the local variable. This could cause memory corruption or segfaults when the GC was trying to process this object.
MasterDuke it isn't used at all in callback_handler 16:55
nine MasterDuke: indeed! Please just remove it :) 16:56
MasterDuke it's alloca'ed in MVM_nativecall_invoke and MVM_nativecall_dispatch, but the individual elements are MVM_malloc'ed, so there could be a leak if an exception is thrown partway through the `for (i = 0; i < num_args; i++) {` loops 16:57
oh wait, don't all the element in `values` leak even if there's no exception? 17:41
nine MasterDuke: looks like, yes 17:44
Geth MoarVM: 66688b941e | (Daniel Green)++ | src/core/nativecall_libffi.c
Remove unused variable
18:19
MasterDuke that was simple enough to just do, the other stuff i'll do in a branch/pr 18:20
nine Looks to me like those little mallocs for arg handling could become allocas easily 18:21
MasterDuke ah, maybe i'll add them to github.com/MoarVM/MoarVM/pull/1608 18:22
nine Btw. I've managed to catch one of those segfaults in the p6-GLib build process in rr 19:32
lizmat nine++ 19:43
MasterDuke nice 19:44
nine Seems like none of the hypothesis so far is fitting to this new data 19:49
MasterDuke m: use nqp; my num $a; $a = nqp::time_n for ^1_000_000; say now - INIT now 19:54
camelia 0.04672441
MasterDuke m: use nqp; my num $a; $a = now.Num for ^1_000_000; say now - INIT now # huh, i thought this was closer to nqp::time_n than it is...
camelia 2.437410707
timo well, what does it spesh to? 19:56
nine What I've got so far: we're doing a return_o. This calls MVM_frame_try_return which finds an exit handler on the current frame. It then calls MVM_frame_dispatch_from_c to run this exit handler. MVM_frame_dispatch_from_c set's the caller's return address: cur_frame->return_address = *(tc->interp_cur_op)
But cur_op was already advanced, so it actually points right at the end of the bytecode 19:57
When we return from that exit handler, we then start processing whatever follows the bytecode in memory.
Now this so far is pretty clear. What isn't is why this only happens now and then not more deterministically. 19:58
MasterDuke ugh. $.tai needing to be a Rational in Instant is really annoying 20:06
even the 'after' spesh of 'Num has gcd_I 20:07
ah, but some manual inlining seems to help 20:12
got it down to 0.325s
nine Oooh....it has something to do with spesh. I added an assertion to the runloop to reliably catch when we exit the proper bytecode. With MVM_SPESH_BLOCKING=1 it fails every time while with MVM_SPESH_DISABLE=1 I haven't seen it fail yet. 21:09
But it's a quite unusual spesh issue. It happens even with MVM_SPESH_LIMIT=1 21:29
So it's not speshing of any particular frame that causes the issue, but spesh being active at all. But then, I don't understand how MVM_SPESH_BLOCKING=1 can make such a difference
MasterDuke NO_DELAY change anything? 21:31
japhb is looking forward to MasterDuke's PR speeding up `now` 22:44
japhb D'oh! Now I see it. Sigh. ETOOMANYCHANNELS 23:13