Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:03
reportable6 left
00:04
reportable6 joined
00:44
frost joined
02:28
vrurg left,
vrurg joined
02:40
SmokeMachine left,
leont left,
SmokeMachine joined
02:41
leont joined
04:03
SmokeMachine left,
SmokeMachine joined
04:04
leont left
04:21
leont joined
04:58
squashable6 left
06:02
reportable6 left
06:04
gfldex left,
AlexDaniel left
06:05
gfldex joined,
AlexDaniel joined
06:12
codesections left
06:13
codesections joined
06:32
squashable6 joined
|
|||
Nicholas | good *, #moarvm | 06:42 | |
japhb | Good *, Nicholas :-) | 06:52 | |
07:03
reportable6 joined
08:12
frost left
|
|||
nine | This is...unbelievable. I already had the suspicion that the expr JIT version of gethow is somehow involved in the gh_1202.t segfaults. So from the point of the segfault I painstakingly reverse-stepied through a bunch of JIT compiled ops to the gethow code. The relevant part is: | 09:14 | |
│ 0x7f91c7e020ef mov rdx,QWORD PTR [rcx+0x70] │ | |||
│ 0x7f91c7e020f3 cmp QWORD PTR [rcx+0x70],0x0 │ | |||
│ 0x7f91c7e020f8 je 0x7f91c7e020fc | |||
rcx+0x70 is STABLE(obj)->HOW which we store in rdx. Then we compare STABLE(obj)->HOW with 0 and if it's 0 jump to a call of MVM_6model_get_how which will should VMNull. But between the mov rdx,QWORD PTR [rcx+0x70] and cmp QWORD PTR [rcx+0x70],0x0 STABLE(obj)->HOW suddenly gets a value. | 09:16 | ||
A change that gdb does not even catch, despite a watch -l on that location being in place. And AFAICT no other thread is in any code that could cause this. | |||
The value is sensible though. It's Perl6::Metamodel::ClassHOW which is exactly what you'd expect | 09:17 | ||
The expr jit template for gethow looks like: (template: gethow (let: (($how (^getf (^stable $1) MVMSTable HOW))) (if (nz $how) $how (...))))) so you'd think that the comparison would use rdx instead of reading that memory location again. But it doesn't, making this implementation thread-unsafe. | 09:19 | ||
nine summons the brrt for ^^^ | 09:20 | ||
The window of opportunity for this bug being just a single CPU instruction certainly explains why it needs ten thousands of runs to reproduce | 09:21 | ||
Nicholas | .tell brrt good *, brrt | 09:35 | |
tellable6 | Nicholas, I'll pass your message to brrt | ||
Nicholas | I think there's one line missing from your "explanation for mortals". Which is: | 09:39 | |
so, if I have this right | 09:40 | ||
1) we load the value from memory into a regiter | |||
2) we have a conditional branch based on the "value" (if the value is NULL branch somewhere else), but that value we check comes from a second load of the same memory location | 09:41 | ||
3) so we don't take the branch | |||
4) and then we (attempt to) call the value in the register. Which is 0. So boom. | |||
? | |||
nine | Almost. The rdx register, which we first load the value into is used as the return value of the gethow op, i.e. it will be written into a VM register and the explosion will happen when we try to decont the value in that register. The gethow op should return a VMNull pointer instead of the NULL. But since the cmp that decides whether that is necessary does find a value, we never put VMNull into rdx. | 09:44 | |
Nicholas | aha right thanks | ||
this is "how the heck do we get a C NULL pointer in a register at all?" bugs? (where the "right" answer was a C pointer to the VMNull singleton) ? | 09:45 | ||
nine | yes | ||
jnthnwrthngtn | moarning o/ | 10:09 | |
Nicholas | \o | 10:10 | |
Geth | MoarVM: 8aca849f90 | (Stefan Seifert)++ | 3 files Remove counter productive NULL checks in decont ops Object registers must never contain NULL values, so there should be no reason to check for NULL values when deconting. Historically bugs in other parts of the code have caused NULL values to appear in registers and these checks have worked around these bugs. But in truth they just hide them and make it more difficult to debug, so better to remove them. This might cost some stability in the short term, but should enable us to fix the underlying bugs for good. |
10:11 | |
MoarVM: 8b69d8e26c | (Jonathan Worthington)++ (committed using GitHub Web editor) | 3 files Merge pull request #1593 from MoarVM/remove_counterproductive_null_checks Remove counter productive NULL checks in decont ops |
|||
lizmat | feels like we should unleash this short term potential instability on the world with a MoarVM bump ? | 10:16 | |
jnthnwrthngtn | Prolly :) | 11:06 | |
lizmat | on it | 11:07 | |
jnthnwrthngtn | Oh wait | ||
Hold a moment | |||
lizmat waits | |||
Geth | MoarVM/master: 5 commits pushed by (Timo Paulssen)++, (Jonathan Worthington)++ | 11:08 | |
jnthnwrthngtn | OK, go for it | ||
lizmat | oki | 11:09 | |
timo | hooray | 11:13 | |
could merge the nqp and rakudo changes before bumping rakudo as well | 11:15 | ||
for the replace-arg and replace-arg-literal-obj syscalls i mean | 11:16 | ||
lizmat | ah... saw too late | 11:28 | |
which PRs are we talking about ? | |||
timo | github.com/Raku/nqp/pull/745 | 11:32 | |
github.com/rakudo/rakudo/pull/4604 | 11:33 | ||
lizmat | ok will do in a mo | ||
11:35
evalable6 left,
linkable6 left
11:36
evalable6 joined
|
|||
timo | not sure if there's a "re-run all checks" button so i can have it tested against latest master + the pull request on github | 11:38 | |
11:39
frost joined
|
|||
lizmat | timo: github.com/rakudo/rakudo/pull/4604 has conflicts ? | 11:40 | |
timo | ah, i'll have to check that out | 11:41 | |
lizmat | I have merged the NQP one | ||
timo | yes, it does need an actual change | 11:42 | |
lizmat | maybe I should first bump Rakudo | ||
? | |||
timo | didn't you already? | 11:47 | |
ah rakudo build segfaults with that change :) :) | 11:51 | ||
lizmat | ah, oops? | 11:57 | |
timo | oops indeed | ||
lizmat | so, wait for a fix? | ||
or revert ? | |||
timo | revert please | ||
12:02
reportable6 left
12:03
reportable6 joined
|
|||
Nicholas | timo: nqp master (now) fails an assertion during build: | 12:37 | |
+++ Compiling gen/moar/stage2/nqpmo.moarvm | |||
moar: src/disp/program.c:2267: emit_args_ops: Assertion `p.path[i]->transformation == MVMDispProgramRecordingDrop' failed. | |||
Aborted | |||
12:37
linkable6 joined
|
|||
Nicholas | that might give an idea about what the problem is | 12:38 | |
timo | wait what | ||
Nicholas | MVM_SPESH_NODELAY=1 MVM_SPESH_BLOCKING=1 | ||
+#define FSA_SIZE_DEBUG 1 | 12:39 | ||
+#define MVM_GC_DEBUG 1 | |||
+#define HASH_DEBUG_ITER 1 | |||
+#define MVM_SPESH_CHECK_DU 1 | |||
and | |||
+#undef NDEBUG | |||
they might be "helping" :-) | |||
ASAN says nothing exciting. Rebuilding and using valgrind, nothing reported before the assertion fails | 12:44 | ||
timo | oh, assertions don't work if you just --debug=3, you literally need to put a -DNDEBUG somewhere? | 12:46 | |
er, undef | |||
Nicholas | I put it as the first line of src/moar.h | ||
hangon. I think I'm just being paranoid | 12:47 | ||
I think that makes sure it's unset (so assertions are enabled) for optimised builds | |||
and ASAN generally is an "optimised" build | |||
(it's too damn slow otherwise, and also you can't use gdb on an ASAN build, so most debugging stuff isn't useful) | 12:48 | ||
12:48
linkable6 left
|
|||
timo | it seems like i should easily be able to reproduce that assertion failure, but it's not happening | 12:50 | |
where do you put the other defines? when i put them at the start of moar.h i get boatloads of redefinition warnings | 13:00 | ||
Nicholas: i changed the assert that explodes on your end to an if statement with an MVM_panic and it still doesn't explode :| | 13:48 | ||
13:49
linkable6 joined
13:53
frost left
|
|||
timo | there it is! | 14:10 | |
lizmat | no worries | 14:16 | |
Geth | MoarVM: e7ebdae40f | (Timo Paulssen)++ | src/disp/program.c Make assertion more lenient to match new implementation |
||
timo | Nicholas: does the commit fix things? we may want to bump if so | 14:42 | |
nine | I wonder if I have even a chance of fixing the gethow race condition. So far I haven't even managed to find how the expr JIT compiler deals with let. Let alone why it loads the expression into a register and uses the register in one case, but the original expression in another. | 14:57 | |
timo | could be a scoring thing | 15:00 | |
from the tiles, i mean | |||
16:58
sena_kun joined
|
|||
nine | jnthnwrthngtn: was the unlinked_dispatch thing we talked about yesterday by any chance the inspiration for the new way to handle uninitialized attributes? | 17:38 | |
MasterDuke | nine: wow, impressive debugging re gethow. i presume you've found github.com/MoarVM/MoarVM/blob/mast...#L176-L209 already? | 17:39 | |
nine | MasterDuke: thanks. I've found that, but not much about what happens later | 17:40 | |
MasterDuke | ah | ||
heh. use the rakudo/moarvm debugger + Inline:Perl5 to run the template compiler...maybe too Rube Goldberg | 17:42 | ||
jnthnwrthngtn | nine: No, it goes back a while further than that. | 17:43 | |
(Can't remember exactly when during the dispatcher work I thought of it...of course at the time was too busy with new-disp things to really think it through in any detail) | |||
For Scalar attributes it's quite straightforward. For Array and Hash a bit less so | |||
The interesting part is where to wedge it in... | 17:45 | ||
Can't do it at the point Attribute is created, because the default and required are traits so are added later | |||
Maybe needs to happen when we query it while doing the attribute compose protocol thing | 17:46 | ||
18:02
reportable6 left
18:04
reportable6 joined
|
|||
Nicholas | timo: yes, that commit to MoarVM fixed the assertion fail and I can build nqp master and rakudo master to spectest completion | 19:40 | |
lizmat | and yet another Rakudo Weekly hits the Net: rakudoweekly.blog/2021/11/08/2021-...wo-commas/ | 20:04 | |
21:52
linkable6 left,
evalable6 left
21:54
evalable6 joined,
linkable6 joined
22:07
dogbert17 joined
22:09
dogbert11 left
|
|||
timo | who knows why the nativecast sub from nativecall has a getcode + takeclosure in it at the start? | 22:20 |