Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:02
reportable6 left
00:04
reportable6 joined
|
|||
MasterDuke | nine: read breakpoint in rr and see if something is accidentally reading it? | 00:06 | |
00:26
jgaz joined
01:26
jgaz left
02:34
discord-raku-bot left
04:42
unicodable6 left,
quotable6 left,
squashable6 left,
greppable6 left,
bisectable6 left,
nativecallable6 left,
evalable6 left,
committable6 left,
statisfiable6 left,
notable6 left,
tellable6 left,
benchable6 left,
shareable6 left,
sourceable6 left,
bloatable6 left,
releasable6 left,
coverable6 left,
linkable6 left,
reportable6 left
04:43
nativecallable6 joined,
evalable6 joined,
notable6 joined,
coverable6 joined,
statisfiable6 joined,
shareable6 joined
04:44
sourceable6 joined,
tellable6 joined,
squashable6 joined,
linkable6 joined,
committable6 joined
04:45
releasable6 joined,
reportable6 joined,
greppable6 joined,
unicodable6 joined,
benchable6 joined,
bloatable6 joined
04:46
bisectable6 joined,
quotable6 joined
05:17
dogbert11 left
05:36
dogbert11 joined
06:02
reportable6 left
06:04
reportable6 joined
06:23
squashable6 left,
squashable6 joined,
squashable6 left
06:25
squashable6 joined
06:32
squashable6 left
06:35
squashable6 joined
06:36
Merfont left,
Merfont joined
07:53
dogbert17 joined
07:55
dogbert11 left
|
|||
nine | This is getting weirder and weirder: I can give the nested compiler a clone of moar_frames and it's still OK. I can also manually copy all frames from the nested compiler's mast_frames to the outer compiler's after the compilation and it still works! | 08:24 | |
So the contents of mast_frames is exactly the same as with the original code, both during the nested compilation and after it. | 08:25 | ||
That can only mean that it's not about the contents of the hash. It must be about the reference to the hash itself. | |||
08:55
evalable6 left,
linkable6 left
08:56
linkable6 joined
08:57
evalable6 joined
|
|||
nine | Indeed. When I put the original mast_frames hash into a different key _and_ store it in an otherwise completely unused attribute in MASTCompilerInstance it breaks. | 08:58 | |
09:35
discord-raku-bot joined
|
|||
nine | A simple %!mast_frames := nqp::null; in MASTCompilerInstance's to_mast after the actual compilation fixes the problem. So I think there's enough evidence to conclude that somehow a reference to the MASTCompilerInstance makes it into the compiled bytecode. This will drag in a lot of objects (e.g. every closure and objects they reference). | 09:59 | |
Still, this doesn't actually explain why compilation results differ. Even if we reference everything in the world, this world should be exactly the same in each case. | 10:00 | ||
Thinking some more about it, it makes even less sense. After all I have already established that it's not about references to pre-existing outer frames, nor about references to inner frames. And it can hardly be about references to outer frames compiled after the nested compiler finishes, because the bytecode will already have been written by then. | 10:05 | ||
Also it's the outer compiler's bytecode that shows the differences. | |||
jnthnwrthngtn | In a threaded program it could be expected that the proxy readers get produced in different orders. Is the reproducible build test involving threads in the code used to test? | 11:37 | |
Though I think you also said only one gets produced, and so there'd not be an ordering question anyway | |||
One other thought: I compile a QAST::Block, does it get wrapped for me into a QAST::CompUnit at some point, and if so, does anything in that process cause issues? | 11:38 | ||
Those are the only thoughts I've had on it overnight. :) | |||
Yesterday evening I worked on findmethod/tryfindmethod/can all getting switched over to new-disp. findmethod and trymethod go fine; switching can over makes it all the way through the CORE.c.setting build and then mysteriously explodes during setting loading. | 11:40 | ||
nine | jnthnwrthngtn: no threads, except for ones triggered by the react/whenever on the Proc::Async in method precompile. But no actual concurrency going on in the test. | 11:41 | |
Trying to get a better picture of the differences between different compilation runs, I used the strings utility on the bytecode files and looked at the diff between the strings output. It seems like actually the only differences are between those mysterious strings containing only decimal numbers. | 11:43 | ||
That and different offsets, probably just caused by the difference in length of the strings section. | |||
So now I really want to know: what the hell are those numbers? | 11:44 | ||
MasterDuke | pause in gdb/rr and then grep through /proc/<...>/moar ? | 11:49 | |
think it might be related to that regression someone (xliff perhaps?) noticed that seems to cause doubled precomp for some things? | 11:50 | ||
nine | Possible. Those may also have been caused by the regression I fixed yesterday. And there's still the occasional reproducible-builds.t failure I've seen on CI that shows differences in check_routine_sanity's bytecode that I have totally failed to debug | 11:51 | |
Those numbers are part of the serialization context. | 11:53 | ||
Which means that I can at least catch and diagnose them in rr | 11:59 | ||
MasterDuke | are the numbers always the same? | 12:01 | |
12:02
reportable6 left
|
|||
nine | No, they are always different which is where the reproducibility issue comes from. But they seem to be always around 1 billion. Sometimes a little smaller, mostly a little larger. But strings containing 9 digit numbers don't seem to appear elsewhere | 12:02 | |
No, they are always different which is where the reproducibility issue comes from. But they seem to be always around 1 billion. Sometimes a little smaller, mostly a little larger. But strings containing 9 digit numbers don't seem to appear elsewhere | 12:03 | ||
12:03
reportable6 joined
|
|||
nine | These numbers are hash keys | 12:03 | |
MasterDuke | pointers perhaps? what else could be that large? | 12:04 | |
nine | That's exactly what I just thought | ||
Where do we have a hash that is keyed on pointers (or objectids)? | |||
Labels! | 12:05 | ||
MasterDuke | btw, i just pulled all repos up to HEAD and that precomp issue is still there | 12:07 | |
1 RMD: Repo changed: | 12:08 | ||
259C64E6436199E645AFDA4E251CCB6B06645979 | |||
18ADFBD55CA12AE71F64CF2A31B6BE6895AE4EED | |||
Need to re-check dependencies. | |||
1 RMD: Repo chain changed: | |||
259C64E6436199E645AFDA4E251CCB6B06645979 | |||
18ADFBD55CA12AE71F64CF2A31B6BE6895AE4EED | |||
Need to re-check dependencies. | |||
from `RAKUDO_MODULE_DEBUG=1 raku -e 'use Test'` | |||
nine | There's a %!labels lookup hash in MAST::Frame keyed on the nqp::objectid of MAST::Label objects. So the chain probably is: somehow the shared mast_frames hash leads to the MASTCompilerInstance getting references, which pulls in the MAST::Frame objects and this hash and the keys differ thanks to address layout randomization | 12:09 | |
12:11
Merfont is now known as Kaiepi,
Kaiepi left,
Kaiepi joined
|
|||
nine | Replacing nqp::objectid for that labels lookup hash with an id counter (like cuid for QAST::Block) fixes the issue | 12:22 | |
MasterDuke | does it make the precompiled setting bytecode smaller? | 12:23 | |
nine | No. Because we still reference stuff for no good reason | ||
MasterDuke | ha | 12:24 | |
how hard will it be to remove the reference? | |||
nine | Well....usually when the struggle to find a bug is as hard as this one, as soon as you got the first candidate fix, pretty much everything you try just works :D | 12:27 | |
That's just part of the general hostility and unfairness of the universe I guess :D | 12:28 | ||
Because sadly no one's gonna get that reference. It's from here: www.quotes.net/mquote/679153 | |||
MasterDuke | never watched babylon 5. will probably give it a try eventually | 12:31 | |
nine | Good time for it since despite years of claims that it's impossible, an HD remaster came out last winter | 12:32 | |
MasterDuke | oh, nice. i remember hearing people complain about that, didn't realize one had actually happened | 12:38 | |
nine | jnthnwrthngtn: my only guess is that the MASTCompilerInstance is referenced by a closure. Or maybe a capture? Do you have an idea where to look for those? Or can the newdisp way lead to more such references in general? | 12:40 | |
jnthnwrthngtn: my only guess is that the MASTCompilerInstance is referenced by a closure. Or maybe a capture? Do you have an idea where to look for those? Or can the newdisp way lead to more such references in general? | 12:41 | ||
MasterDuke | looks like i need hbo max to stream the hd version (or purchase from amazon or apple) | 12:42 | |
jnthnwrthngtn | nine: I'm struggling to imagine where that would happen. This is the only place that I'm having new-disp do some code generation. | 12:47 | |
nine: In general, any blocks that are handed back as code to invoke by new-disp are placed at unit level, not inside the dispatcher itself. | 12:48 | ||
MasterDuke | nine: does your fix change anything about that example i pasted above? | 12:55 | |
Geth | MoarVM/new-disp: 4fe1efd7e3 | (Stefan Seifert)++ | lib/MAST/Nodes.nqp Fix reproducible build issue caused by changing memory addresses of labels Fix reproducible build issue caused by changing memory addresses of labels When a QASTCompilerInstance is somehow referenced (directly or indirectly) by compiled bytecode, this pulls in references to frames and thus the frames' %!labels lookup hash which was keyed on the nqp::objectid of those labels. Since nqp::objectid uses memory addresses and those may change from run to run, this could lead to differences in build results. Use an incremented id counter instead, just like we do for QAST::Block's cuids. |
13:01 | |
nine | MasterDuke: please just try this ^^^ | ||
jnthnwrthngtn | nine: Does it help the reproducible-builds test also? | 13:16 | |
I think I've figured out the `can` problem. | |||
nine | jnthnwrthngtn: yes, this fixes the test | 13:17 | |
jnthnwrthngtn | Nice, will try it | 13:18 | |
Although sounds like there's still a bit of a mystery around how things are ended up in the precomp file | |||
nine | definitely | ||
13:24
linkable6 left,
evalable6 left,
evalable6 joined,
linkable6 joined
|
|||
jnthnwrthngtn | Yup, confirmed it's no longer failing | 13:26 | |
nine++ | |||
Also my findmethod/can transition do dispatcher builds and causes no `make test` regressions, now to see about spectest (I'm hoping for one more pass) | 13:27 | ||
I think there's now only 3 places that emit an invoke instruction rather than a dispatch instruction | 13:30 | ||
1. simple block invocations in the QAST compiler, 2. p6sink, which will get its own dispatcher written up, and p6bindassert (maybe the same story) | 13:31 | ||
s/and/3./ | 13:32 | ||
Yup, integration/advent2011-day07.t is fixed | 13:33 | ||
Geth | MoarVM/new-disp: 84ec79335f | (Jonathan Worthington)++ | src/disp/boot.c Fix a typo Spotted by MasterDuke++ |
13:37 | |
MasterDuke | Stage parse : 274.575 # ugh | 13:39 | |
jnthnwrthngtn | This is the "learning to appreciate what spesh does for us" phase of things, I guess :) | 13:40 | |
m: say 1321 / 1349 | 13:41 | ||
camelia | ( no output ) | ||
jnthnwrthngtn | ? | ||
m: say 1321 / 1349 | |||
camelia | ( no output ) | ||
nine | I was pretty well aware of spesh's benefits already :D | ||
jnthnwrthngtn | 97.9%, anyway :) | ||
MasterDuke | maybe since desktop video cards are impossible to find i should just replace this laptop instead | ||
13:42
cognominal left
|
|||
MasterDuke | but (eventually) no change in that RAKUDO_MODULE_DEBUG=1 output | 13:42 | |
m: say "hi?" | 13:43 | ||
camelia | ( no output ) | ||
dogbert17 | Hmm, the commit number is over a week old | 13:46 | |
MasterDuke | wasn't it throwing those out of space on device errors a couple days ago? | 13:48 | |
dogbert17 | that could definitely explain things | ||
nine | Oh, indeed: /dev/sda2 7.7G 7.3G 37M 100% / | 13:49 | |
jnthnwrthngtn goes to rest for a bit | 13:50 | ||
nine | Better: /dev/sda2 7.7G 5.6G 1.8G 77% / | 13:53 | |
Geth | MoarVM/new-disp: df9e48e615 | (Stefan Seifert)++ | lib/MAST/Nodes.nqp Fix reproducible build issue caused by changing memory addresses of labels When a QASTCompilerInstance is somehow referenced (directly or indirectly) by compiled bytecode, this pulls in references to frames and thus the frames' %!labels lookup hash which was keyed on the nqp::objectid of those labels. Since nqp::objectid uses memory addresses and those may change from run to run, this could lead to differences in build results. Use an incremented id counter instead, just like we do for QAST::Block's cuids. |
13:55 | |
nine | Fixed the type issue pointed out by zhuomingliang++ | 13:57 | |
dogbert17 | m: say 1321 / 1349 | 13:58 | |
camelia | sudo: /home/camelia/rakudo-m-inst/bin/perl6-m: command not found | ||
nine | may take a bit till the cron job runs and rebuilds | ||
dogbert17 | ah | 13:59 | |
14:09
cognominal joined
|
|||
dogbert17 | m: say "hi?" | 14:18 | |
camelia | hi? | ||
dogbert17 | m: say 1321 / 1349 | ||
camelia | 0.979244 | ||
dogbert17 | m: say "test" | 14:35 | |
camelia | test | ||
dogbert17 | ok, so now we're at the latest version, excellent | ||
14:35
squashable6 left
14:38
squashable6 joined
|
|||
MasterDuke | back to my roast questions/thoughts from yesterday. github.com/Raku/roast/commit/e657f...3bd3413c58 (from 7 years ago) changed an eval_dies_ok to a throws_like | 14:42 | |
Nicholas | Has commit 84ec79335f8dfd4c11ae251315f4f7eea995416b (temporarily) been lost from new-disp? | ||
MasterDuke | i'm going to assume lizmat just used the exception type it was throwing at the time (which wasn't a terrible exception) | 14:43 | |
but i think the type being thrown after my change is better | 14:44 | ||
and i doubt there will be any user code at all depending on the exact exception type thrown from a malformed embedded comment | 14:45 | ||
so i'm going to PR this and advocate for a roast change to go along with it | 14:46 | ||
Nicholas: maybe when nine force pushed df9e48e615? | 14:47 | ||
Nicholas | I assuume that that was the cause | 14:48 | |
Geth | MoarVM/new-disp: 7faf157570 | (Jonathan Worthington)++ (committed by Nicholas Clark) | src/disp/boot.c Fix a typo Spotted by MasterDuke++ |
14:55 | |
Nicholas | jnthnwrthngtn: beware of revisionist historians | ||
nine | Oh, sorry for my inattention | 15:23 | |
dogbert17 | heh, what does this mean? | 17:19 | |
===SORRY!=== | |||
Can only use manipulate a capture known in this dispatch | |||
I get it when running t/spec/S02-types/array.t with an 8k nursery | 17:25 | ||
timo | that looks like some dispatcher code is wrong and is calling a manipulation op on a value it hasn't guarded on or something like that | 17:36 | |
do dispatchers show up in backtraces? if so, it should lead you to the right spot | 17:37 | ||
18:02
reportable6 left
18:04
reportable6 joined
|
|||
dogbert17 | timo: I'll try it in gdb | 18:20 | |
running with MVM_GC_DEBUG=3 is instant SEGV | 18:44 | ||
Thread 1 "moar" received signal SIGSEGV, Segmentation fault. | 18:45 | ||
0x00007ffff77cf9ad in MVM_args_slurpy_positional (tc=0x55555555aea0, ctx=0x55555555b220, pos=2) at src/core/args.c:1081 | |||
1081 find_pos_arg(ctx, pos, arg_info); | |||
(gdb) bt | |||
#0 0x00007ffff77cf9ad in MVM_args_slurpy_positional (tc=0x55555555aea0, ctx=0x55555555b220, pos=2) at src/core/args.c:1081 | |||
#1 0x00007ffff77e169f in MVM_interp_run (tc=0x55555555aea0, initial_invoke=0x7ffff79bf2a0 <toplevel_initial_invoke>, invoke_data=0x5555555b8220, outer_runloop=0x0) at src/core/interp.c:1277 | |||
#2 0x00007ffff79bf358 in run_deserialization_frame (tc=0x55555555aea0, cu=0x5555555b7d20) at src/moar.c:492 | |||
timo: using MVM_GC_DEBUG=1 instead returns the original error, i.e. SORRY etc: gist.github.com/dogbert17/516531bb...68ef434ade | 18:50 | ||
timo | can you call MVM_dump_backtrace? | 18:52 | |
dogbert17 | timo: gist updated | 19:01 | |
timo | oh i guess the program in question is boot_code and it's in the C backtrace, d'oh | 19:02 | |
hm, which of the functions in boot_code in disp/boot.c can cause allocations i wonder | 19:04 | ||
with RR it might be easy to just step backwards and see where it happens | 19:05 | ||
dogbert17 | I was waiting for the rr comment :) | 19:09 | |
timo | haha | ||
dogbert17 | something for Nine or MasterDuke then since I still can't use rr | ||
VirtualBox and rr are not friends | 19:10 | ||
timo | would you like to try putting an MVMROOT for args_capture in the code? | ||
19:32
linkable6 left,
linkable6 joined
|
|||
dogbert17 | any lines in particular? | 19:40 | |
timo | 97 through the end of that block i assume | 19:41 | |
dogbert17 | I'm confused, as always, which file are you referring to? | 19:42 | |
timo | src/disp/boot.c | 19:44 | |
dogbert17 | ah, I see, let me try | 19:45 | |
it actually seems to work, timo++ | 19:48 | ||
timo | nobody knows if that fixes everything :) | 19:50 | |
dogbert17 | I'm running a spectest now, slow as h*ell given the tiny nursery | 19:52 | |
jnthnwrthngtn | Tiny nursery + no spesh = painfully slow, I imagine | 20:08 | |
timo++ dogbert17++ for fixing stuff | 20:09 | ||
timo | jnthnwrthngtn: you think the "compile disp programs to ops" branch needs a lot of changes to apply today? | 20:10 | |
jnthnwrthngtn | Dunno without looking at it (which I can't usefully do right now, since dose 2 may have made my arm less sore, but it's doing just as good at fatigue...) | 20:17 | |
Nudge me on Monday or so, I'll take a look. Although I think probably getting us setting params and correlation ID is a pre-req to it being useful. | |||
'cus we're probably just a few commits away from nothing going through the invoke code path at this point | 20:18 | ||
timo | ah, yes, i seem to recall something about that. i think i also did that in that branch | 20:19 | |
i should look at it myself first, it's been a whole year | |||
jnthnwrthngtn | I think we're probably about at the point where we have few enough failures that it's worth starting to reactivate spesh to get better build times to figure out the remaining stuff | 20:20 | |
I'm a bit annoyed because some night a couple of weeks ago as I was falling asleep I figured out a really neat way to do callwith/nextwith, and told myself to just sleep, I'd remember it... I can't remember what it was. :/ | 20:21 | ||
timo | if i recall correctly, the branch doesn't actually compile a single program successfully :D | ||
some stuff that everything uses is still missing or so? | 20:22 | ||
jnthnwrthngtn | I'd probably start from guards and value results | ||
Nicholas | It was all a dream :-( | ||
more pragmatically, is there a workable way to do it, until you rediscover the cool way? | |||
jnthnwrthngtn | And once those are working, add in the invokey bits | 20:23 | |
nine | If it was just a dream after all, there's no point in being annoyed :) | ||
jnthnwrthngtn | I suspect I'll figure out a decent way to do it :) | ||
The code for specializing invocations will get a huge amount simpler | 20:24 | ||
20:24
MasterDuke left
|
|||
jnthnwrthngtn | Because invoke spec goes away and multi caches go away | 20:25 | |
Inlining code gets simpler because takedispatcher and friends go away | |||
my `d` key is going to get some wear :) | 20:26 | ||
timo | gist.github.com/timo/eb3a20abba083...3f44eafcf0 is the diff between the then state of new-disp and the disp-spesh-codegen branch | 20:29 | |
(now in color) | 20:36 | ||
dogbert17 | timo: looks a lot better | 20:47 | |
timo | amazing what a splash of color can do | 20:48 | |
dogbert17 | but it didn't fix everything though | ||
MoarVM panic: Collectable 0x557051db04a8 in fromspace accessed | 20:49 | ||
the insta-SEGV is still there as well, should be something for Nine :) | 20:50 | ||
23:17
jgaz joined
|