Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
timo | sounds like we'd want a syscall for determining if an exception is resumable without actually resuming it, yeah | 01:00 | |
01:11
kjp left
01:13
kjp joined
01:14
kjp left,
kjp joined
|
|||
MasterDuke | timo: i checked and @fates where that patch is never has more than 5 elems. and switching to using last instead of $EMPTY isn't any faster (in a microbenchmark at least) | 01:48 | |
but then again, i just noticed and why is it using pop? just indexing might be a very tiny bit faster | |||
huh. callgrind reports that `nqp -e 'my $a; my $i := 0; while $i++ < 500_000 { my $b := nqp::list_i(0, 1, 2, 3, 4); my int $j := nqp::elems($b); while $j-- >= 0 { $a := nqp::atpos_i($b, $j); last if $a == 1 } }; say($a)'` is ~929m instructions, but `nqp -e 'my $a; my $i := 0; while $i++ < 500_000 { my $b := nqp::list_i(0, 1, 2, 3, 4); while | 02:03 | ||
nqp::elems($b) { $a := nqp::pop_i($b); last if $a == 1 } }; say($a)'` is only ~870m instructions (both with MVM_SPESH_BLOCKING=1) | |||
wasn't expecting that | |||
timo | in the spesh log, does one have noticeably larger jit bytecode? is there roughly the same difference with jit disabled? | 02:39 | |
MasterDuke | with jit disabled, ~1.24b for pop version and ~1.44b for indexing version | 02:44 | |
81716 total jit bytecode size for pop version and 82314 total jit bytecode size for indexing version | 02:49 | ||
timo | i do see some things that aren't optimal | ||
so, i'm looking at the spesh log for the version that uses pop right now. r9 is the register that has the inner array in it, so $b | 02:51 | ||
going into the loop we have r9 version 4 with its known type | 02:52 | ||
in the first block where the jump back meets up with the step into the loop we merge register versions 4 and 6 together into version 5, and we don't know what the type of version 6 is, so the resulting facts for the new version 5 is "dunno" | 02:53 | ||
the next block we merge versions 0, 5, and 6 together into version 6, and version 0 i'm not sure where it comes from actually. it has nothing known about it at all, and there's no writer to it anywhere either | 02:54 | ||
MasterDuke | ha | ||
timo | so i'm not sure if this is "the" problem, but the version 0 of register 9 is definitely not helpful here | ||
so inside the loop all our reprops, which are elems and pop_i, are not even devirtualized | 02:55 | ||
on the other version we have the inner array in register 9 as well, funnily enough | 02:57 | ||
MasterDuke | so the pop version is not well optimized, but it's still faster... | 02:58 | |
timo | we go into the loop with version 4 (known type), version 5, and version 6. version 6 is created later from merging together versions 0, 5, and 6, and here's the odd version 0 of the register coming in again | ||
if this were fixed properly, both versions would benefit at least a little bit | |||
one more thing about the version with the counter is that we're boxing the integer and not skipping the unbox, so even though we optimize the unbox_i into a low-level sp_get_i64, the overhead from having to box integers is still there | 02:59 | ||
i imagine it hits the int cache every time, though | |||
MasterDuke | huh. the difference is smaller if both use list_n, but the total instructions for each increases | 03:04 | |
timo | do we maybe lack some optimization for num arrays? | 03:05 | |
like, somewhere someone just didn't bother to write out the case for floats and doubles? | |||
MasterDuke | if only there was a utility to easily diff spesh logs... | 03:08 | |
_n version spends a bunch of time in coerce_boxed_num_to_int_impl | 03:11 | ||
huh. even if i change it to `== 1.0` | 03:12 | ||
timo | gist.github.com/timo/8376214a9b7bb...fae03fe9ce take this early version of comparify_jits and run with it if you like, i have to go to bed | 03:13 | |
be aware it will fill your working directory with many files | 03:14 | ||
MasterDuke | ah, `my num $a;` plus using num literals drops it back down to the same as the _i version | 03:15 | |
i'm off to bed too, i'll check it out tomorrow | 03:16 | ||
timo | ok good knight | ||
MasterDuke | later | ||
timo | the most likely explanation i have for the version 0 of the register coming into play is that BB 0 becomes a predecessor of every block that is part of a frame handler, which is necessary for some reason. and in BB 0, which is before every actual op, of course all registers have version 0 | 03:17 | |
not sure if we rely upon this behaviour to prevent us from optimizing stuff we can't prove stuff about | 03:18 | ||
08:24
MasterDuke left
|
|||
jnthn | iirc, it was to avoid having to initialize object registers to VMNull; write a prelude that nulls every register, and then all those that are initialized reliably before read leave the nulling out instructions with zero usages, and they're thus removed as part of the usual dead instruction elimination | 09:03 | |
lizmat | I have a fairly reproducible case of Moar crashing using App::Rak | 09:45 | |
I've collected a few of the lldb outputs in: gist.github.com/lizmat/8f8c3413b5f...9c85c72b9d | 09:51 | ||
hope they'll make sense to someone other than: memory corrupted | |||
I have a hunch this is somehow related to the ConcBlockingQueue repr | 09:55 | ||
ParaSeq uses quite a few of them... and it feels less stable than .hyper | |||
which only uses one of them, if I remember correctly | 09:56 | ||
timo | that just looks like memory corruption to me :( | 10:48 | |
lizmat | also the almost last one? if (REPR(new_addr_obj)->gc_mark) ? | 10:49 | |
timo | yeah, that'd likely be the st pointer being wrong | 10:51 | |
since REPR(o) is like o->header.st->repr or something like that | |||
lizmat | check | ||
timo | could also be the STable has been corrupted | 10:52 | |
oh actually | |||
it says the bad address is 0x10 | |||
so that's an offset to a null pointer | |||
that offset fits the st pointer in MVMObject | 10:54 | ||
have you looked at the crossthreadwritelog? i think it's very noisy and i haven't used it often, so not sure how useful it will be to you | |||
it might even be bit rotted a little bit and need some more recent additions ported to it | 10:55 | ||
lizmat | no I haven't, how do I activate that ? | 11:29 | |
timo | it's an env var | 11:51 | |
MVM_CROSS_THREAD_WRITE_LOG Log unprotected cross-thread object writes to stderr | |||
lizmat tries | 11:52 | ||
so ideally that should be empty, right? | 11:53 | ||
a lot of: Thread 7 bound to an attribute of an object (ThreadPoolScheduler::GeneralWorker) allocated by thread 1 | 11:54 | ||
this is updating stats of total number of runs donne | 11:58 | ||
which is a native int attribute | |||
that was $!stats, similar for $!working | 11:59 | ||
Thread 7 bound to an attribute of an object (IO::Handle) allocated by thread 1 | 12:00 | ||
is more interesting maybe | |||
resetting either $!PIO or $!decoder | 12:01 | ||
timo | there are valid reasons to write across threads, so it's expected for it to not be empty | ||
lizmat | so, is a cross-thread write to a native attribute safe or not ? | ||
in the case of $!total, it feels it should be an atomic if we want to trust the stats | 12:02 | ||
also $!times-nothing-completed appears to be needing to be atomic | 12:03 | ||
Thread 1 bound to a hash key of an object (BOOTHash) allocated by thread 4 | 12:05 | ||
at src/Raku/ast/compunit.rakumod:421 | |||
if !$!precompilation-mode | 12:07 | ||
&& !$*INSIDE-EVAL | |||
&& +(@*MODULES // []) == 0 | |||
&& (my $main := self.find-lexical('&MAIN')) | |||
I'm not seeing a BOOTHash there? | |||
nine ?? | |||
ah, probably $!live-decl-map{$name} // $!scope.find-generated-lexical($name) // Nil in resolver.rakumod:1271 | 12:09 | ||
hmmm no, that wouldn't bind ? | |||
it's basically from RakuAST::Node.EVAL | 12:11 | ||
Thread 4 bound to an attribute of an object (Channel) allocated by thread 1 | 12:12 | ||
at SETTING::src/core.c/Channel.rakumod:281 | |||
nine | Maybe @*MODULES auto-vivifies in a hash where we store dynamic variables? | ||
If those natives should be atomic depends on whether we prefer correctness over speed. If it's just statistical analysis feeding a heuristic decision we may be able to live with losing a few counts in races in favor of less book keeping overhead | 12:13 | ||
lizmat | indeed... but are we sure that cross-thread writes to native int attributes are safe? | 12:14 | |
the Channel one is about closing the channel | |||
Thread 4 bound to an attribute of an object (Rakudo::Internals::SupplySequencer) allocated by thread 1 | 12:16 | ||
also a write to a native int | |||
attribute | |||
Thread 4 bound to an attribute of an object (Rakudo::Internals::SupplySequencer) allocated by thread 1 | 12:19 | ||
at SETTING::src/core.c/Rakudo/Internals.rakumod:681 | |||
feels like an odd one, as the line in question is: | |||
if $!buffer-start-seq == $!done-target { | |||
&!on-completed(); | |||
} | |||
both native ints, &!on-completed obviously not... | 12:20 | ||
but where does it bind there? | |||
another interesting one: | 12:21 | ||
Thread 4 shifted an object (BOOTArray) allocated by thread 1 | |||
at SETTING::src/core.c/Rakudo/Internals.rakumod:678 | |||
&!on-data-ready(nqp::shift($!buffer)) | |||
nine | Well if in doubt switching it to an atomic is certainly preferrable. Especially when the failure mode might be a deadlock or crash. | ||
lizmat | that last shift appears to be safe, as only executed inside a protect block | 12:22 | |
ok, that's it for this log | 12:24 | ||
ok, so: raku -I. bin/rak lizmat -i --count-only --matches-only ../../REA/META.json | 12:29 | ||
crashes for me roughly 20% of the time | |||
either zsh: segmentation fault or MoarVM panic: Internal error: zeroed target thread ID in work pass | 12:30 | ||
with MVM_CROSS_THREAD_WRITE_LOG=1 I can not get it to crash | |||
I guess Heisenberg applies :-( | |||
timo | lizmat: sorry i was busy workworkworking all day | 16:48 | |
interesting, i'm zef installing --deps-only . in App-Rak and i'm getting two versions of ParaSeq installed | 16:54 | ||
i guess i also want rak cloned locally to futz around with it if i need to | 16:56 | ||
17:01
sena_kun joined
|
|||
lizmat | interesting... hmmm | 17:01 | |
ah... I forgot to update JSON::Fast::Hyper... | 17:02 | ||
timo | i was looking for a way to have dependencies of a module shown, maybe even as a tree, but i can't get anything out of "zef depends" other than "Failed to resolve some missing dependencies" (with no details what exactly) | ||
zef rdepends on 'App::Rak:ver<0.3.12>:auth<zef:lizmat>' gives me a few App::Rak::Complete entries at least, after like 40 seconds of full cpu usage %) | |||
lizmat | the raku.land page is very informative | ||
[Coke] | timo: raku.land/zef:coke/App::Zef-Deps ? | 17:08 | |
timo | neat | ||
[Coke] | (which is just a front end for some zef stuff) | ||
timo | yes that's perfect | ||
[Coke] | oh good | 17:17 | |
timo | oh | 17:19 | |
> Failed to resolve some missing dependencies (use e.g. --exclude="git" to skip) | |||
this is the error message i was getting | |||
i thought the "git" was just an example | |||
now i was able to see for myself that yes, ParaSeq 0.2.6+ is in rak, and ParaSeq:ver<0.2.5> is in JSON::Fast::Hyper | 17:20 | ||
[Coke] | which should *theoretically* work, yes? | 17:29 | |
timo | yeah it should be no problem, and i don't think it's causing any trouble | 17:52 | |
got a little zef ticket out of it, just a minor thing though | 17:57 | ||
i can't give it a label like "low priority" or so, i hope it's not annoying | |||
wow, REA big huh | |||
all the individual files in it are .tar.gz, i wonder if mumble-mumble not compressing them on that level will mumble-mumble give better results because different versions of the same module can be compressed as a whole when git makes its packfiles | 17:59 | ||
lizmat: why can you use -i and i get a message that i'm supposed to use --ignorecase instead? | 18:01 | ||
but i do get a segfault, so that's good progress | |||
> Invalid value '1' for :degree on method hyper. | 18:02 | ||
huh | |||
i also get alternatingly 1072 matches and 1071 matches in 1 file, hehe. | 18:04 | ||
so yeah something smells an awful lot like unsafe concurrency being used | 18:05 | ||
is rak doing any automatic sensing of what degree it should use for hypers? i'm not sure why i have to run it with --degree=2 or higher if i want it to not error out with the "invalid value for :degree" error when recording with rr | 18:12 | ||
actually i can pass --degree=1 just fine in the commandline and not get the error | |||
--num-cores=N pretend to have N cores (rr will still | 18:24 | ||
^- this might be required then | |||
lizmat | rak takes cpucores - 1 by default | 18:25 | |
timo | what gives me cpucores? | ||
lizmat | re 1072/1071 yeah, that's still something I don't understand, as the counting should be done in a threadsafe manner | 18:26 | |
timo | do the splits not overlap or something? | ||
lizmat | they shouldn't | ||
it's a JSON::Fast::Hyper file | |||
timo | ok | 18:27 | |
lizmat | each top-level JSON is a single line | ||
timo | Floating point exception (core dumped) | 18:30 | |
whew | |||
i even get 1070 results | 18:31 | ||
well, it could be memory corruption messing with stuff | |||
lizmat | looks like 1071 is the current correct number | 18:32 | |
fg | |||
timo | ok, --num-cores=10 gives me 10 for Kernel.cputcores, so that part works at least. rak is still not crashing under "rr record" | 18:35 | |
lizmat | well, that's the thing: also with MVM_CROSS_THREAD_WRITE_LOG=1 I could not get it to crash | ||
so it feels very timing dependent | 18:36 | ||
timo | yeah, piping the stdout to something seems to cause a change as well | ||
the -h flag to rr randomizes scheduling decisions, so that should theoretically do something as well | |||
cool, turning spesh off keeps the crashyness | 18:38 | ||
one less system to worry about | |||
also, it's probably relevant that the crossthreadwritelog writes to standard error which usually means stdio.h which has some locking going on | 18:51 | ||
and i imagine the writing out of the stack trace is also not very cheap | 18:52 | ||
haha yeah 35% of time spent in fprintf, woof | 18:53 | ||
i also got 1073 this time while using "perf record" :D | 18:54 | ||
lizmat | that I find weird: I can create a mental model of missing increments, but not of additional increments | 18:55 | |
timo | have you gotten it to apparently infinite-loop yet? | ||
lizmat | yes, once today | ||
timo | wow those stacks be messed *up* | 18:56 | |
lizmat | timo: I think I got a standalone version of something that does weird things | 19:27 | |
timo | cool, that might help | 19:28 | |
lizmat | gist.github.com/lizmat/d0f1eb60e77...91ba670daa | ||
all is fine *until* the / lizmat / is introduced | |||
.contains("lizmat") is ok | 19:29 | ||
so looks like something regexy going off the reails | |||
*rails | |||
timo | hm, people might take the wrong thing from this huh :P | 19:32 | |
lizmat | well... there's context | 19:36 | |
timo | well, i will not stop introducing / lizmat / to things | 19:37 | |
lizmat | hehe | ||
timo | phew, building rakudo with asan is quite slow | 19:41 | |
lizmat | another datapoint: .contains(/ lizmat /) does not crash either | 19:42 | |
timo | oh, does contains accept a regex? | ||
lizmat | yes, ever since it was created :-) | ||
.contains(/foo/) basically only runs a single cursor and sees if it hit anything | 19:43 | ||
does not create a Match object and does not set $/ | |||
timo | neat | ||
lizmat | so: I'd conclude that the cursor logic is not an issue | 19:44 | |
but the full creation of a Match object *and* setting $/ is | |||
and I would suspect setting $/ first | |||
hmmmm | 19:45 | ||
for @lines { | 19:46 | ||
$ib.push($_) if .match(/ lizmat /); | |||
} | |||
is also fine, so scratch that | |||
that sets $/ | |||
timo | why did i even think asan would be good enough to still get a crash .. but i do get 1072; 1071 was supposed to be correct yes? | 19:47 | |
lizmat | that's what I get, yes | ||
timo | ah yes there's a 1071 again | ||
lizmat | vi also things 1071 | 19:48 | |
1,$s/lizmat/fooo/ | |||
timo | and 1070 too :D | ||
AddressSanitizer can not provide additional info. | 19:49 | ||
wow i actually got a segfault with asan and i've got an rr recording | 19:54 | ||
lizmat | fwiw, I'm on Apple silicon, so no JIT... | 19:55 | |
so if you're on Intel, maybe disable the JIT, might make things easier ? | |||
timo | i should have thought to disable the jit since it makes stack traces happier and so on, but with rr it's often not such a big deal | 19:56 | |
nooooo the program finished without crashing in the recording wtf | |||
Dispatch callback failed to delegate to a dispatcher | 20:01 | ||
in block at standalone_crash.raku line 78 | |||
in block at standalone_crash.raku line 59 | |||
what on earth is all that | |||
lizmat | disp/program.c line 3102 | 20:02 | |
timo | Cannot resolve caller Bool(Nil:U: ); none of these signatures matches, Bool(Match:U: ) and Bool(Nil:U ) | ||
lizmat | yeah, that's a weird message in of itself | ||
fwiw, if you write the for loop out | 20:05 | ||
timo | well, the numbers i get differ, for example 657 | ||
should i refresh the gist and get your latest? | 20:06 | ||
lizmat | gist updated | ||
inn that version, it crashes in the same way with RAKUDO_RAKUAST=1 | 20:07 | ||
which implies to me the problem is in NQP or deeper | |||
as that is what the legacy grammar and Raku grammar share | |||
hmmm could also be the setting... hmmm | 20:08 | ||
/ foo / ASTs to: Regex.clone($_, $/) | 20:13 | ||
and that's nqp::p6bindattrinvres( | |||
nqp::p6bindattrinvres(self.Method::clone, Regex, '$!topic', $topic), | |||
Regex, '$!slash', $slash) | |||
I wonder if nqp::p6bindattrinvres is suspect | 20:14 | ||
timo | does clone not make a new one right there? | ||
lizmat | it does | ||
changing the nqp::p6bindattrinvres( into bindattrs does not make a difference | 20:17 | ||
so I guess we can rule p6bindattrinvres our for now | |||
timo | right, it's just a small desugar op right? | ||
lizmat | yeah, think so | 20:19 | |
timo | oh, ew, MVM_dump_backtrace actually interacts with the GC i never realized that | 20:24 | |
the way your code is corrupting the hell out of the moarvm is quite an impressive sight | 20:26 | ||
i wonder if i can actually figure it out | |||
lizmat | well, I tried very hard to golf it down | 20:27 | |
do you need more explanation about the code ? | |||
timo | i think for now i can do without | 20:29 | |
lizmat | it's basically the ParaSeq logic stripped down, with the backpressure logic removed | 20:30 | |
raku.land/zef:lizmat/ParaSeq#hyper...ntrol-flow | 20:31 | ||
timo | i think i've seen the module | 20:33 | |
i think we're in src/vm/moar/dispatchers.nqp on line 2895 when we try to call &multi-no-match-handler and that somehow goes kaboom, but that's still downstream of the actual cause, i bet | 20:43 | ||
lizmat | getting there is already wrong, I'd say | 20:45 | |
that's really to catch Junctions, and no junctions are in play | 20:46 | ||
so it's getting there for the wrong reason, I'd say | |||
timo | so the moment - or just before - things are going wrong, two threads are in a dispatcher in the Bool method at the same time. i don't expect the dispatcher code to be racy, but at least that's where i'll be looking | 21:03 | |
lizmat | location? | 21:04 | |
I'll be looking at that tomorrow then | |||
timo | i think it's Match.BOOL | 21:05 | |
er, Match::Bool | |||
lizmat | proto method Bool(|) {*} | 21:06 | |
multi method Bool(Match:U: --> False) { } | |||
multi method Bool(Match:D:) { nqp::hllbool($!pos >= $!from) } | |||
not a lot to dispatch there... | |||
lizmat replaces the multi by an only | 21:07 | ||
ok, doesn't fix the problem, but now my test program hangs half of the time | 21:10 | ||
and the other half crashes | 21:11 | ||
lizmat is going to sleep over it | |||
22:30
sena_kun left
|
|||
timo | when i give the block in line 78 a "my $/" it stops behaving bed | 22:47 | |
bad* | |||
but taking away the my $/ again seems to only cause wrong "seen" numbers right now, i don't see the crashes any more | 22:48 | ||
"Cannot resolve caller Bool(Nil:U: ); none of these signatures matches" now again after i upped the optimize level in moar compilation from -Og (optimize for debugging) to -O2 | 22:50 | ||
it's quite possible that the dispatcher code got unhappy because the value of the scalar was changing from under it? | 22:51 | ||
turning spesh off makes it worse, which makes sense because when spesh is on, dispatch recordings get turned into compiled code, so it's no longer using the dispatchers but instead building guards and such, and when the guards fail we deopt and try again i think? at which point the value is possibly stable again for a little moment | 22:53 |