Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
lizmat misses the moarning banter 09:11
dogbert17 yes, it's very silent today, perhaps people are fighting their coffee machines 09:32
both Nine and Nicholas are missing and they both tend to show up quite early in the moarning 09:39
lizmat I think jnthnwrthngtn is working on their presentations for the TRC 09:44
lizmat has done that part :-)
dogbert17 sounds like you're well prepared then 09:45
jnthnwrthngtn moarning o/ 09:46
lizmat: You're waaay ahead of me, then :)
lizmat yeah, that rarely happens :-)
jnthnwrthngtn I'm...not only doing TRC prep this week, but also trying to finish up a $dayjob task that I put off for the last couple of weeks because I was having too much fun with new-disp. Oh, plus dental work, and as if that wasn't already enough, I have to go to the interior ministry now this week too. 09:47
nine cancels his anticipation of new-disp goodies for this week 09:49
lizmat is writing Javascript for the first time in a *looong* time 09:50
jnthnwrthngtn lizmat: Enjoying the improvements? 09:55
lizmat well... I'm glad I don't need to cater for different browsers anymore 09:56
Altai-man lizmat, what are you writing?
jnthnwrthngtn I was more thinking the langauge improvements, but yes, this is also welcome :)
dogbert17 is the a way to get hold of all output from a 'make test' run, not just the condensed version we usuually get? 09:57
lizmat working on the logs server :-)
nine dogbert17: yes, there is 09:58
dogbert17 nine: please elaborate :)
nine dogbert17: build.opensuse.org/package/view_fi...f?expand=1
Of course, simply having TAP::Harness::Archive installed would suffice. I just wanted to avoid the build dependency for the RPM package. 09:59
dogbert17 the reason I'm asking is that I get some random test failures when running 'MVM_SPESH_NODELAY=1 make test TEST_JOBS=<more jobs than you have CPU's> 10:02
but I can't see what's happening due to the lack of detail
dogbert17 nine: it worked and now I can, for example, see the following: 10:33
dogbert@dogbert-VirtualBox:/tmp$ cat t/05-messages/02-errors.t
===SORRY!===
Cannot find method 'push_code_object' on object of type Perl6::World
I ran 'MVM_SPESH_NODELAY=1 make test TEST_JOBS=10' my vm only has 8 cores allocated 10:35
nine What a weird error 10:48
nine Apparently I can reproduce it sometimes when running TEST_JOBS=40 make spectest in another shell 10:51
dogbert17 yes, it's very strange 11:17
lizmat m: say "hi" # just checking 13:13
camelia hi
dogbert17 Nine: I managed to get --ll-exception output for the weird error. Dunno if it helps though. 13:15
Cannot find method 'push_code_object' on object of type Perl6::World
at gen/moar/World.nqp:2536 (/home/dogbert/repos/rakudo/blib/Perl6/World.moarvm:stub_code_object) 13:16
from gen/moar/World.nqp:2527 (/home/dogbert/repos/rakudo/blib/Perl6/World.moarvm:create_code_object)
from gen/moar/World.nqp:2518 (/home/dogbert/repos/rakudo/blib/Perl6/World.moarvm:create_code_obj_and_add_child)
from gen/moar/World.nqp:2503 (/home/dogbert/repos/rakudo/blib/Perl6/World.moarvm:create_thunk)
from gen/moar/Actions.nqp:5933 (/home/dogbert/repos/rakudo/blib/Perl6/Actions.moarvm:trait_mod:sym<is>)
sena_kun just saw this error with `❌ TST: ===SORRY!=== Error while compiling /mnt/data/pakku/.cache/NCurses/NCurses-0.6.3-githubazawawi-/t/02-basic.t` 13:18
nine dogbert17: Perl6::World is indeed wrong there. That method is found on Perl6CompilationContext (a lexically defined class inside Perl6::World) 13:29
dogbert17 on my system it's enough to run 'while MVM_SPESH_NODELAY=1 ./rakudo-m -c --ll-exception -Ilib t/04-nativecall/02-simple-args.t; do :; done' for a while 13:41
nine I wonder if there's a way to golf this. 13:44
dogbert17 FWIW, if I run with --stagestats it says 'Stage start' but crashes before writing 'Stage parse' 13:45
jnthnwrthngtn Is it sensitive to inlining, for example? 13:49
nine jnthnwrthngtn: looks like 13:56
15 minutes with loops in 18 konsole tabs have not produced a single error with inlining disabled 14:07
at the same time the one loop with inlining enabled threw multiple
Nicholas rr!
jnthnwrthngtn
.oO( What's a pirate's favorite debugger? )
14:08
Nicholas (I found something in Perl 5 by having a shell loop that repeated until there was a failure)
dogbert17 where's timo, he usually like to tout the virtues of rr
Nicholas good *, timo :-) 14:09
timo :) 14:12
nine Good news: turning on the spesh log seems to fix this issue! 14:13
timo so, could be timing related? 14:21
nine That's my guess. Odd though that MVM_SPESH_BLOCKING=1 doesn't make it reliable either 14:23
Which...usually indicates that it's not timing, but memory contents 14:24
timo right. oof. 14:33
well, we do malloc some in the spesh log process
like when we turn mvm strings into c strings
wanna try removing those from the spesh logging code to see if that does anything? 14:34
nine Well, the first hard step seems to be reproducing it in a debug build 14:38
ah, finally :) At least that 14:39
At least it breaks with MVM_SPESH_INLINE_LOG=1. However, the case where it failed and 2 cases where it succeeded log exactly the same information about both push_code_object (the callee) and stub_code_object (the failing caller). And push_code_object actually gets inlined into stub_code_object 15:10
Nicholas rr!
nine 372 successfull runs in rr and counting... 15:11
timo did you try the chaos mode that rr has?
nine just turned that on
timo should have pointed it out earlier
nine FWIW I don't think I've ever seen chaos mode really helping. I have had some success with --num-cpu-ticks= though 15:15
sena_kun The number of modules having difficulties with new-disp (including some false positives, alas) decreased from 62 to 37! 15:37
nine Run 660 failed in rr! 15:40
jnthnwrthngtn Just 37...wow 15:51
nine Does anyone have any hypotheses about how we could up with a Perl6::World instead of a Perl6CompilationContext object in that dispatch that I could have a closer look at in that rr session? 15:53
lizmat since Perl6CompilationContext is a defined inside Perl6::World, an off-by-one in lookup of parents ? 15:55
nine The line that's failing is self.context().push_code_object($code); with method context() { $!context } defined in HLL::World 15:56
I find it a bit odd that there's nothing about method context in the inline log 15:57
jnthnwrthngtn nine: Does the code-gen look OK in the spesh log? 16:02
nine: Also, was there a deopt shortly before the effort? 16:03
uh, I'm not sure what word I wanted but "effort" was not it :D 16:05
oh, probably error
nine jnthnwrthngtn: haven't managed to get a spesh log yet :/ 16:33
Backtrace is: #0 lang_meth_not_found (tc=0x2130e70, arg_info=...) at src/disp/boot.c:513 #1 0x00002d2f3838126b in run_dispatch (tc=0x2130e70, record=0x5d65160cae98, disp=0x214a7f4, capture=0x7ab1593ee8f0, thunked=0x7ffd9e2e8ebc) at src/disp/program.c:482 #2 0x00002d2f3838912b in MVM_disp_program_record_end (tc=0x2130e70, record=0x5d65160cae98, thunked=0x7ffd9e2e8ebc) at src/disp/program.c:2468 #3 16:36
0x00002d2f382cc932 in handle_end_of_dispatch_record (tc=0x2130e70, thunked=0x7ffd9e2e8ebc) at src/core/callstack.c:353 #4 0x00002d2f382ccf3a in MVM_callstack_unwind_frame (tc=0x2130e70, exceptional=0 '\000', thunked=0x7ffd9e2e8ebc) at src/core/callstack.c:463 #5 0x00002d2f382c883f in remove_one_frame (tc=0x2130e70, unwind=0 '\000') at src/core/frame.c:996 #6 0x00002d2f382c8e06 in MVM_frame_try_return
(tc=0x2130e70) at src/core/frame.c:1122 #7 0x00002d2f3828aa96 in MVM_interp_run (tc=0x2130e70, initial_invoke=0x2d2f3841ebbe <toplevel_initial_invoke>, invoke_data=0x21e92e0, outer_runloop=0x0) at src/core/interp.c:570
Does that look sensible (except for missing newlines)?
The MVM_frame_try_return surprises me a little since we are kina in the middle of stub_code_object 16:37
s/kina/kinda/
jnthnwrthngtn Well, we're probably returning from a dispatcher (implemented in bytecode), and the lang-meth_not_found dispatcher (in C) thunks off that 16:38
So that bit is fine
If we're in this place, we're recording a new dispatch program
If we're in this place, we're recording a new dispatch program 16:39
There's a few possible hypotheses from that
nine Yeah, the frame we're returning from looks very dispatchy 16:40
jnthnwrthngtn An interesting question is if we're just after a deopt
You can probably reverse breakpoint the deopt impl and see if the frame below the dispatch-y one is the same
The other interesting thing is to look at the frame below the one we're currently returning from, which is the one with the dispatch instruction in it that we were trying to handle 16:41
Firstly, is it specialized?
Second, we should be just after a dispatch instruction; look at the bytecode of the frame and see what the instruction is before, and also what registers were passed. Do they look sane? 16:42
nine There was indeed a deopt in stub_code_object
jnthnwrthngtn If there was a deopt and the values to the dispatch look bogus (e.g. wrong invocant) then it's possible we're looking at a deopt bug *but* that doesn't at all explain why this only happens sometimes 16:43
nine We deopted because of a failed sp_guardconc. Wanted a concrete Str, got a Block type object 16:45
Or rather a Block STable
Oh, no, it's actually a defined Block. Got confused there 16:46
jnthnwrthngtn Hmm. Does a Block getting to that point instead of a Str make any sense whatsoever? 16:47
Or is the deopt likely "garbage in, deopt out"/
nine Looking at stub_code_object it's actually expecting a Str that makes no sense. Also it looks like the failing sp_guardconc is pretty close to the end of the function, most notably after the inlined call to push_code_object 16:49
That's the bytecode of the frame with the failing sp_guardconc: gist.github.com/niner/23cf8d105dae...a28b0aef04 16:52
Ignore the arrow, we're actually at line 92 according to the register numbers I see
(rr) p GET_UI16(cur_op, -10)
$33 = 38
(rr) p GET_UI16(cur_op, -8)
$34 = 29
dinner& 16:53
[Coke] nine: did a build with your commits intead of my patches, all good. 17:15
also: 'set TEST_JOBS=1' 'nmake test' everything passes except psuedohash. 17:16
so it looks like the nativecall tests for new-disp on windows do not like being run at the same time as other tests. 17:21
It's not all of them, and I don't think the specific failures are the same each run. 17:22
nine [Coke]: do those test failures still happen after one successful run? 17:24
[Coke] yes. i did a test with jobs=1, then jobs=10; first all passed, then many NC fails. 17:28
(with psuedohash failing always)
nine Is that unique to new-disp or does it actually happen on master as well?
the NC failures
[Coke] uild of master hung for me today. will retry
... building fine now, weird 17:32
[Coke] master *feels* slower on testing. should have timed it all 18:03
master TEST_JOBS=1 seems fine on nativecalll.
next... 18:04
no failure on pseudohash on master, btw. so that might e exacerbated y new-disp 18:06
nine Yes, pseudohash is known to fail 18:07
[Coke] concurrent nativecall failures are a new-disp regression 18:10
getting a debugger attached there will be harder, i think 18:11
te test files tht fail fail enitrely, with a wstat of 65280 18:19
jnthnwrthngtn wonders how it's managed to regress those on Windows but not on Linux... 20:13
Nicholas I see pseudohash fail on linux on new-disp
jnthnwrthngtn Yes, I know about that; I meant the nativecall ones 20:14
Nicholas ah OK. Sorry to be confused
jnthnwrthngtn I wasn't especially clear, to be fair :)