Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:03 reportable6 left 00:04 reportable6 joined 00:44 frost joined 02:28 vrurg left, vrurg joined 02:40 SmokeMachine left, leont left, SmokeMachine joined 02:41 leont joined 04:03 SmokeMachine left, SmokeMachine joined 04:04 leont left 04:21 leont joined 04:58 squashable6 left 06:02 reportable6 left 06:04 gfldex left, AlexDaniel left 06:05 gfldex joined, AlexDaniel joined 06:12 codesections left 06:13 codesections joined 06:32 squashable6 joined
Nicholas good *, #moarvm 06:42
japhb Good *, Nicholas :-) 06:52
07:03 reportable6 joined 08:12 frost left
nine This is...unbelievable. I already had the suspicion that the expr JIT version of gethow is somehow involved in the gh_1202.t segfaults. So from the point of the segfault I painstakingly reverse-stepied through a bunch of JIT compiled ops to the gethow code. The relevant part is: 09:14
│ 0x7f91c7e020ef mov rdx,QWORD PTR [rcx+0x70] │
│ 0x7f91c7e020f3 cmp QWORD PTR [rcx+0x70],0x0 │
│ 0x7f91c7e020f8 je 0x7f91c7e020fc
rcx+0x70 is STABLE(obj)->HOW which we store in rdx. Then we compare STABLE(obj)->HOW with 0 and if it's 0 jump to a call of MVM_6model_get_how which will should VMNull. But between the mov rdx,QWORD PTR [rcx+0x70] and cmp QWORD PTR [rcx+0x70],0x0 STABLE(obj)->HOW suddenly gets a value. 09:16
A change that gdb does not even catch, despite a watch -l on that location being in place. And AFAICT no other thread is in any code that could cause this.
The value is sensible though. It's Perl6::Metamodel::ClassHOW which is exactly what you'd expect 09:17
The expr jit template for gethow looks like: (template: gethow (let: (($how (^getf (^stable $1) MVMSTable HOW))) (if (nz $how) $how (...))))) so you'd think that the comparison would use rdx instead of reading that memory location again. But it doesn't, making this implementation thread-unsafe. 09:19
nine summons the brrt for ^^^ 09:20
The window of opportunity for this bug being just a single CPU instruction certainly explains why it needs ten thousands of runs to reproduce 09:21
Nicholas .tell brrt good *, brrt 09:35
tellable6 Nicholas, I'll pass your message to brrt
Nicholas I think there's one line missing from your "explanation for mortals". Which is: 09:39
so, if I have this right 09:40
1) we load the value from memory into a regiter
2) we have a conditional branch based on the "value" (if the value is NULL branch somewhere else), but that value we check comes from a second load of the same memory location 09:41
3) so we don't take the branch
4) and then we (attempt to) call the value in the register. Which is 0. So boom.
?
nine Almost. The rdx register, which we first load the value into is used as the return value of the gethow op, i.e. it will be written into a VM register and the explosion will happen when we try to decont the value in that register. The gethow op should return a VMNull pointer instead of the NULL. But since the cmp that decides whether that is necessary does find a value, we never put VMNull into rdx. 09:44
Nicholas aha right thanks
this is "how the heck do we get a C NULL pointer in a register at all?" bugs? (where the "right" answer was a C pointer to the VMNull singleton) ? 09:45
nine yes
jnthnwrthngtn moarning o/ 10:09
Nicholas \o 10:10
Geth MoarVM: 8aca849f90 | (Stefan Seifert)++ | 3 files
Remove counter productive NULL checks in decont ops

Object registers must never contain NULL values, so there should be no reason to check for NULL values when deconting. Historically bugs in other parts of the code have caused NULL values to appear in registers and these checks have worked around these bugs. But in truth they just hide them and make it more difficult to debug, so better to remove them. This might cost some stability in the short term, but should enable us to fix the underlying bugs for good.
10:11
MoarVM: 8b69d8e26c | (Jonathan Worthington)++ (committed using GitHub Web editor) | 3 files
Merge pull request #1593 from MoarVM/remove_counterproductive_null_checks

Remove counter productive NULL checks in decont ops
lizmat feels like we should unleash this short term potential instability on the world with a MoarVM bump ? 10:16
jnthnwrthngtn Prolly :) 11:06
lizmat on it 11:07
jnthnwrthngtn Oh wait
Hold a moment
lizmat waits
Geth MoarVM/master: 5 commits pushed by (Timo Paulssen)++, (Jonathan Worthington)++ 11:08
jnthnwrthngtn OK, go for it
lizmat oki 11:09
timo hooray 11:13
could merge the nqp and rakudo changes before bumping rakudo as well 11:15
for the replace-arg and replace-arg-literal-obj syscalls i mean 11:16
lizmat ah... saw too late 11:28
which PRs are we talking about ?
timo github.com/Raku/nqp/pull/745 11:32
github.com/rakudo/rakudo/pull/4604 11:33
lizmat ok will do in a mo
11:35 evalable6 left, linkable6 left 11:36 evalable6 joined
timo not sure if there's a "re-run all checks" button so i can have it tested against latest master + the pull request on github 11:38
11:39 frost joined
lizmat timo: github.com/rakudo/rakudo/pull/4604 has conflicts ? 11:40
timo ah, i'll have to check that out 11:41
lizmat I have merged the NQP one
timo yes, it does need an actual change 11:42
lizmat maybe I should first bump Rakudo
?
timo didn't you already? 11:47
ah rakudo build segfaults with that change :) :) 11:51
lizmat ah, oops? 11:57
timo oops indeed
lizmat so, wait for a fix?
or revert ?
timo revert please
12:02 reportable6 left 12:03 reportable6 joined
Nicholas timo: nqp master (now) fails an assertion during build: 12:37
+++ Compiling gen/moar/stage2/nqpmo.moarvm
moar: src/disp/program.c:2267: emit_args_ops: Assertion `p.path[i]->transformation == MVMDispProgramRecordingDrop' failed.
Aborted
12:37 linkable6 joined
Nicholas that might give an idea about what the problem is 12:38
timo wait what
Nicholas MVM_SPESH_NODELAY=1 MVM_SPESH_BLOCKING=1
+#define FSA_SIZE_DEBUG 1 12:39
+#define MVM_GC_DEBUG 1
+#define HASH_DEBUG_ITER 1
+#define MVM_SPESH_CHECK_DU 1
and
+#undef NDEBUG
they might be "helping" :-)
ASAN says nothing exciting. Rebuilding and using valgrind, nothing reported before the assertion fails 12:44
timo oh, assertions don't work if you just --debug=3, you literally need to put a -DNDEBUG somewhere? 12:46
er, undef
Nicholas I put it as the first line of src/moar.h
hangon. I think I'm just being paranoid 12:47
I think that makes sure it's unset (so assertions are enabled) for optimised builds
and ASAN generally is an "optimised" build
(it's too damn slow otherwise, and also you can't use gdb on an ASAN build, so most debugging stuff isn't useful) 12:48
12:48 linkable6 left
timo it seems like i should easily be able to reproduce that assertion failure, but it's not happening 12:50
where do you put the other defines? when i put them at the start of moar.h i get boatloads of redefinition warnings 13:00
Nicholas: i changed the assert that explodes on your end to an if statement with an MVM_panic and it still doesn't explode :| 13:48
13:49 linkable6 joined 13:53 frost left
timo there it is! 14:10
lizmat no worries 14:16
Geth MoarVM: e7ebdae40f | (Timo Paulssen)++ | src/disp/program.c
Make assertion more lenient to match new implementation
timo Nicholas: does the commit fix things? we may want to bump if so 14:42
nine I wonder if I have even a chance of fixing the gethow race condition. So far I haven't even managed to find how the expr JIT compiler deals with let. Let alone why it loads the expression into a register and uses the register in one case, but the original expression in another. 14:57
timo could be a scoring thing 15:00
from the tiles, i mean
16:58 sena_kun joined
nine jnthnwrthngtn: was the unlinked_dispatch thing we talked about yesterday by any chance the inspiration for the new way to handle uninitialized attributes? 17:38
MasterDuke nine: wow, impressive debugging re gethow. i presume you've found github.com/MoarVM/MoarVM/blob/mast...#L176-L209 already? 17:39
nine MasterDuke: thanks. I've found that, but not much about what happens later 17:40
MasterDuke ah
heh. use the rakudo/moarvm debugger + Inline:Perl5 to run the template compiler...maybe too Rube Goldberg 17:42
jnthnwrthngtn nine: No, it goes back a while further than that. 17:43
(Can't remember exactly when during the dispatcher work I thought of it...of course at the time was too busy with new-disp things to really think it through in any detail)
For Scalar attributes it's quite straightforward. For Array and Hash a bit less so
The interesting part is where to wedge it in... 17:45
Can't do it at the point Attribute is created, because the default and required are traits so are added later
Maybe needs to happen when we query it while doing the attribute compose protocol thing 17:46
18:02 reportable6 left 18:04 reportable6 joined
Nicholas timo: yes, that commit to MoarVM fixed the assertion fail and I can build nqp master and rakudo master to spectest completion 19:40
lizmat and yet another Rakudo Weekly hits the Net: rakudoweekly.blog/2021/11/08/2021-...wo-commas/ 20:04
21:52 linkable6 left, evalable6 left 21:54 evalable6 joined, linkable6 joined 22:07 dogbert17 joined 22:09 dogbert11 left
timo who knows why the nativecast sub from nativecall has a getcode + takeclosure in it at the start? 22:20