Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:03 reportable6 left 00:04 reportable6 joined 00:21 squashable6 joined
vrurg [Coke]: to clone you need the original. But it's too late to steal it! 00:35
00:38 squashable6 left 00:43 squashable6 joined 02:09 squashable6 left 02:10 squashable6 joined 03:07 squashable6 left 03:10 squashable6 joined 04:10 reportable6 left, shareable6 left, sourceable6 left, benchable6 left, squashable6 left, quotable6 left, nativecallable6 left, releasable6 left, bisectable6 left, statisfiable6 left, evalable6 left, coverable6 left, unicodable6 left, tellable6 left, committable6 left, greppable6 left, bloatable6 left, notable6 left, linkable6 left, coverable6 joined 04:11 statisfiable6 joined 04:12 sourceable6 joined, linkable6 joined, reportable6 joined, squashable6 joined, notable6 joined 04:13 committable6 joined, nativecallable6 joined 04:19 squashable6 left 04:21 squashable6 joined 05:11 shareable6 joined, tellable6 joined 05:12 unicodable6 joined, bisectable6 joined 05:13 benchable6 joined 06:02 reportable6 left 06:05 reportable6 joined 06:11 vrurg_ joined 06:12 releasable6 joined 06:13 vrurg left 07:10 quotable6 joined 07:12 bloatable6 joined
Nicholas good *, #moarvm 07:19
nine Good sunrise! 07:22
07:53 squashable6 left 08:13 evalable6 joined 08:54 squashable6 joined
Geth MoarVM/fix_phi_out_of_bounds_read: bf106de221 | (Stefan Seifert)++ | 2 files
Fix out of bounds read of PHI facts in spesh

During spesh optimization, we remove reads of registers with dead writers from PHI nodes. It could happen that the PHI node ended up with no registers to read at all. However the following analysis code assumed that we'd always have at least 1 register to read from, resulting in an array read out of bounds error and a variety of failure modes. ... (5 more lines)
MoarVM: niner++ created pull request #1610:
Fix out of bounds read of PHI facts in spesh
nine dogbert17: fix for complex.t ^^^
Geth MoarVM/fix_phi_out_of_bounds_read: 8a684b3304 | (Stefan Seifert)++ | 2 files
Fix out of bounds read of PHI facts in spesh

During spesh optimization, we remove reads of registers with dead writers from PHI nodes. It could happen that the PHI node ended up with no registers to read at all. However the following analysis code assumed that we'd always have at least 1 register to read from, resulting in an array read out of bounds error and a variety of failure modes. ... (7 more lines)
nine (just added a reference to the GH issue to the commit message=
MasterDuke nice 08:59
09:10 greppable6 joined
dogbert17 nine++, was it an easy fix? 09:17
nine A few hours in total 09:22
Suddenly made a lot of sense when I dumped the spesh graph and saw that PHI that was not actually accessing the register that we got the bogus facts for
dogbert17 I wonder how often this actually happen 09:38
*happens 09:39
lizmat moarning! 09:47
so, with regards to the release... are we in agreement to postpone the release to 4 Dec ? 09:48
nine agrees
jnthnwrthngtn moarning o/ 09:54
Nicholas \o 09:55
lizmat and if we are in agreement on the postponement, does that also mean that the attrinited work by jnthn should still go in after that release 09:56
or that we move that forward ?
jnthnwrthngtn I locally ran blin on that branch over the weekend, didn't look at the results yet 09:58
10:25 evalable6 left, linkable6 left 10:27 evalable6 joined 11:27 evalable6 left, evalable6 joined
dogbert17 dogbert@dogbert-VirtualBox:~/repos/oo-monitors$ perl6 -Ilib t/basic.t 11:37
===SORRY!=== Error while compiling /home/dogbert/repos/oo-monitors/t/basic.t
Missing or wrong version of dependency 'gen/moar/stage2/NQPHLL.nqp' (from '/home/dogbert/repos/oo-monitors/lib/OO/Monitors.pm6 (OO::Monitors)')
at /home/dogbert/repos/oo-monitors/t/basic.t:1
what would be needed in order to figure out why this problem occurs after a bump?
12:02 reportable6 left 12:05 reportable6 joined
MasterDuke jnthnwrthngtn: an interesting change in behavior from new-disp, referenced in my most recent comment on github.com/rakudo/rakudo/pull/4650 12:05
committable6: 2021.09,2021.10 my Int $a; try { $a.VAR.log; CATCH { default { say $_.typename } } } 12:07
committable6 MasterDuke, ¦2021.09: «Int␤» ¦2021.10: «Scalar␤»
lizmat MasterDuke: I guess that enforces my point about improving the error message about containers :-) 12:25
12:28 linkable6 joined
jnthnwrthngtn MasterDuke: That looks more like a fix to me than anything? :) 12:37
MasterDuke heh, yeah. just wondering if there are any other places we should look for such things 12:38
though, tbh, i'm not sure why the invocant is `Int`. shouldn't it be `Scalar` also? 12:39
lizmat and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2021/11/23/2021-...adler-rip/ 13:30
timo: answer to your question about a golf of the reducing CPU usage on race 14:14
say (^5000000).roll(20).race(:1batch).map: { [+] ^$_ .map: { $_ * $_ } } 14:15
the answer is not important, the CPU usage is
timo having only four cores makes this possibly not easy to reproduce? 14:18
lizmat only has 8 though 14:19
saw similar behaviour on a 4 core machine 14:21
gist.github.com/lizmat/2e5bb69739d...24fdc0dd47 # snapper 14:22
MasterDuke yeah, i see pretty much the same thing 14:27
lizmat hmmm... maybe this is not a good example... 14:39
playing with some debug messages in core
hmmm... is nqp::time threadsafe ? 14:43
jnthnwrthngtn Struggle to imagine it not being; it calculates and returns a simple numeric value? 14:44
lizmat I'm just seeing strange values in debug messages, like "13: completed in 7936936 msecs" 14:45
that would be more than 2 hours :-)
I basically added: 14:46
my $from = nqp::time;
say "$*THREAD.id(): starting task";
evalable6 1: starting task
lizmat to !run-one
say "$*THREAD.id(): completed in { ((nqp::time() - $from) / 1000).Int } msecs";
at the end
jnthnwrthngtn The the units of nqp::time micros or nanos? 14:51
lizmat yeah, the golf is flawed
jnthnwrthngtn Ah, maybe you intended msecs to be micro rather than mili, and I assumed milli... 14:52
lizmat say (^5000000).roll(20).race(:1batch).map: { [+] ^$_ .race.map: { $_ * $_ } } # better golf 15:00
and this better golf shows no decline in CPU usage, so I guess it *is* my log loading algorithm that is to blame
jnthnwrthngtn Is your loading CPU or I/O bound, ooc? 15:02
lizmat well, I'd say CPU bound, as the IO is just a .slurp
timo: please disregard my golf, until I have a better one 15:31
16:21 evalable6 left, linkable6 left 16:24 evalable6 joined
dogbert17 nine: I have now been running complex.t in a loop for a couple of hours and it hasn't crashed so your PHI fix works perfectly 16:26
dogbert17 wonders if nine's PR might have fixed the hyper bug as well 16:30
nine \o/ 16:38
16:38 [Coke] left
MasterDuke nine: btw, did you see dev.azure.com/MoarVM/MoarVM/_build...559d8f7fdf ? 16:39
nine oh no 16:40
MasterDuke and while there's some recent talk about releases and whether to delay merging branches, anyone have thoughts/comments/suggestions on github.com/MoarVM/MoarVM/pull/1608 ?
nine MasterDuke: I actually have an idea about that failure 16:44
MasterDuke oh? 16:45
nine res is uninitialized here: github.com/MoarVM/MoarVM/blob/mast...ffi.c#L217 but added to frame roots here: github.com/MoarVM/MoarVM/blob/mast...ffi.c#L326 16:46
MasterDuke probably unrelated, but `values` leaks here github.com/MoarVM/MoarVM/blob/mast...ffi.c#L220 16:52
Geth MoarVM: 0006714d07 | (Stefan Seifert)++ | src/core/nativecall_libffi.c
Fix use of uninitialized memory in native callbacks with libffi

We're GC rooting res.o but didn't initialize the local variable. This could cause memory corruption or segfaults when the GC was trying to process this object.
MasterDuke it isn't used at all in callback_handler 16:55
16:56 [Coke] joined
nine MasterDuke: indeed! Please just remove it :) 16:56
MasterDuke it's alloca'ed in MVM_nativecall_invoke and MVM_nativecall_dispatch, but the individual elements are MVM_malloc'ed, so there could be a leak if an exception is thrown partway through the `for (i = 0; i < num_args; i++) {` loops 16:57
oh wait, don't all the element in `values` leak even if there's no exception? 17:41
nine MasterDuke: looks like, yes 17:44
18:02 reportable6 left
Geth MoarVM: 66688b941e | (Daniel Green)++ | src/core/nativecall_libffi.c
Remove unused variable
MasterDuke that was simple enough to just do, the other stuff i'll do in a branch/pr 18:20
nine Looks to me like those little mallocs for arg handling could become allocas easily 18:21
18:21 linkable6 joined
MasterDuke ah, maybe i'll add them to github.com/MoarVM/MoarVM/pull/1608 18:22
18:28 TempIRCLogger__ left 18:31 lizmat left 18:35 lizmat joined 18:36 TempIRCLogger joined 18:39 TempIRCLogger left, TempIRCLogger joined 18:40 lizmat_ joined 18:43 lizmat left 18:44 lizmat_ left, lizmat joined 19:04 reportable6 joined
nine Btw. I've managed to catch one of those segfaults in the p6-GLib build process in rr 19:32
lizmat nine++ 19:43
MasterDuke nice 19:44
nine Seems like none of the hypothesis so far is fitting to this new data 19:49
MasterDuke m: use nqp; my num $a; $a = nqp::time_n for ^1_000_000; say now - INIT now 19:54
camelia 0.04672441
MasterDuke m: use nqp; my num $a; $a = now.Num for ^1_000_000; say now - INIT now # huh, i thought this was closer to nqp::time_n than it is...
camelia 2.437410707
timo well, what does it spesh to? 19:56
nine What I've got so far: we're doing a return_o. This calls MVM_frame_try_return which finds an exit handler on the current frame. It then calls MVM_frame_dispatch_from_c to run this exit handler. MVM_frame_dispatch_from_c set's the caller's return address: cur_frame->return_address = *(tc->interp_cur_op)
But cur_op was already advanced, so it actually points right at the end of the bytecode 19:57
When we return from that exit handler, we then start processing whatever follows the bytecode in memory.
Now this so far is pretty clear. What isn't is why this only happens now and then not more deterministically. 19:58
20:04 linkable6 left, evalable6 left
MasterDuke ugh. $.tai needing to be a Rational in Instant is really annoying 20:06
even the 'after' spesh of 'Num has gcd_I 20:07
ah, but some manual inlining seems to help 20:12
got it down to 0.325s
20:35 nine left 20:36 nine joined 21:06 evalable6 joined
nine Oooh....it has something to do with spesh. I added an assertion to the runloop to reliably catch when we exit the proper bytecode. With MVM_SPESH_BLOCKING=1 it fails every time while with MVM_SPESH_DISABLE=1 I haven't seen it fail yet. 21:09
But it's a quite unusual spesh issue. It happens even with MVM_SPESH_LIMIT=1 21:29
So it's not speshing of any particular frame that causes the issue, but spesh being active at all. But then, I don't understand how MVM_SPESH_BLOCKING=1 can make such a difference
MasterDuke NO_DELAY change anything? 21:31
japhb is looking forward to MasterDuke's PR speeding up `now` 22:44
23:04 linkable6 joined
japhb D'oh! Now I see it. Sigh. ETOOMANYCHANNELS 23:13