Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
timo sounds like we'd want a syscall for determining if an exception is resumable without actually resuming it, yeah 01:00
01:11 kjp left 01:13 kjp joined 01:14 kjp left, kjp joined
MasterDuke timo: i checked and @fates where that patch is never has more than 5 elems. and switching to using last instead of $EMPTY isn't any faster (in a microbenchmark at least) 01:48
but then again, i just noticed and why is it using pop? just indexing might be a very tiny bit faster
huh. callgrind reports that `nqp -e 'my $a; my $i := 0; while $i++ < 500_000 { my $b := nqp::list_i(0, 1, 2, 3, 4); my int $j := nqp::elems($b); while $j-- >= 0 { $a := nqp::atpos_i($b, $j); last if $a == 1 } }; say($a)'` is ~929m instructions, but `nqp -e 'my $a; my $i := 0; while $i++ < 500_000 { my $b := nqp::list_i(0, 1, 2, 3, 4); while 02:03
nqp::elems($b) { $a := nqp::pop_i($b); last if $a == 1 } }; say($a)'` is only ~870m instructions (both with MVM_SPESH_BLOCKING=1)
wasn't expecting that
timo in the spesh log, does one have noticeably larger jit bytecode? is there roughly the same difference with jit disabled? 02:39
MasterDuke with jit disabled, ~1.24b for pop version and ~1.44b for indexing version 02:44
81716 total jit bytecode size for pop version and 82314 total jit bytecode size for indexing version 02:49
timo i do see some things that aren't optimal
so, i'm looking at the spesh log for the version that uses pop right now. r9 is the register that has the inner array in it, so $b 02:51
going into the loop we have r9 version 4 with its known type 02:52
in the first block where the jump back meets up with the step into the loop we merge register versions 4 and 6 together into version 5, and we don't know what the type of version 6 is, so the resulting facts for the new version 5 is "dunno" 02:53
the next block we merge versions 0, 5, and 6 together into version 6, and version 0 i'm not sure where it comes from actually. it has nothing known about it at all, and there's no writer to it anywhere either 02:54
MasterDuke ha
timo so i'm not sure if this is "the" problem, but the version 0 of register 9 is definitely not helpful here
so inside the loop all our reprops, which are elems and pop_i, are not even devirtualized 02:55
on the other version we have the inner array in register 9 as well, funnily enough 02:57
MasterDuke so the pop version is not well optimized, but it's still faster... 02:58
timo we go into the loop with version 4 (known type), version 5, and version 6. version 6 is created later from merging together versions 0, 5, and 6, and here's the odd version 0 of the register coming in again
if this were fixed properly, both versions would benefit at least a little bit
one more thing about the version with the counter is that we're boxing the integer and not skipping the unbox, so even though we optimize the unbox_i into a low-level sp_get_i64, the overhead from having to box integers is still there 02:59
i imagine it hits the int cache every time, though
MasterDuke huh. the difference is smaller if both use list_n, but the total instructions for each increases 03:04
timo do we maybe lack some optimization for num arrays? 03:05
like, somewhere someone just didn't bother to write out the case for floats and doubles?
MasterDuke if only there was a utility to easily diff spesh logs... 03:08
_n version spends a bunch of time in coerce_boxed_num_to_int_impl 03:11
huh. even if i change it to `== 1.0` 03:12
timo gist.github.com/timo/8376214a9b7bb...fae03fe9ce take this early version of comparify_jits and run with it if you like, i have to go to bed 03:13
be aware it will fill your working directory with many files 03:14
MasterDuke ah, `my num $a;` plus using num literals drops it back down to the same as the _i version 03:15
i'm off to bed too, i'll check it out tomorrow 03:16
timo ok good knight
MasterDuke later
timo the most likely explanation i have for the version 0 of the register coming into play is that BB 0 becomes a predecessor of every block that is part of a frame handler, which is necessary for some reason. and in BB 0, which is before every actual op, of course all registers have version 0 03:17
not sure if we rely upon this behaviour to prevent us from optimizing stuff we can't prove stuff about 03:18
08:24 MasterDuke left
jnthn iirc, it was to avoid having to initialize object registers to VMNull; write a prelude that nulls every register, and then all those that are initialized reliably before read leave the nulling out instructions with zero usages, and they're thus removed as part of the usual dead instruction elimination 09:03
lizmat I have a fairly reproducible case of Moar crashing using App::Rak 09:45
I've collected a few of the lldb outputs in: gist.github.com/lizmat/8f8c3413b5f...9c85c72b9d 09:51
hope they'll make sense to someone other than: memory corrupted
I have a hunch this is somehow related to the ConcBlockingQueue repr 09:55
ParaSeq uses quite a few of them... and it feels less stable than .hyper
which only uses one of them, if I remember correctly 09:56
timo that just looks like memory corruption to me :( 10:48
lizmat also the almost last one? if (REPR(new_addr_obj)->gc_mark) ? 10:49
timo yeah, that'd likely be the st pointer being wrong 10:51
since REPR(o) is like o->header.st->repr or something like that
lizmat check
timo could also be the STable has been corrupted 10:52
oh actually
it says the bad address is 0x10
so that's an offset to a null pointer
that offset fits the st pointer in MVMObject 10:54
have you looked at the crossthreadwritelog? i think it's very noisy and i haven't used it often, so not sure how useful it will be to you
it might even be bit rotted a little bit and need some more recent additions ported to it 10:55
lizmat no I haven't, how do I activate that ? 11:29
timo it's an env var 11:51
MVM_CROSS_THREAD_WRITE_LOG Log unprotected cross-thread object writes to stderr
lizmat tries 11:52
so ideally that should be empty, right? 11:53
a lot of: Thread 7 bound to an attribute of an object (ThreadPoolScheduler::GeneralWorker) allocated by thread 1 11:54
this is updating stats of total number of runs donne 11:58
which is a native int attribute
that was $!stats, similar for $!working 11:59
Thread 7 bound to an attribute of an object (IO::Handle) allocated by thread 1 12:00
is more interesting maybe
resetting either $!PIO or $!decoder 12:01
timo there are valid reasons to write across threads, so it's expected for it to not be empty
lizmat so, is a cross-thread write to a native attribute safe or not ?
in the case of $!total, it feels it should be an atomic if we want to trust the stats 12:02
also $!times-nothing-completed appears to be needing to be atomic 12:03
Thread 1 bound to a hash key of an object (BOOTHash) allocated by thread 4 12:05
at src/Raku/ast/compunit.rakumod:421
if !$!precompilation-mode 12:07
&& !$*INSIDE-EVAL
&& +(@*MODULES // []) == 0
&& (my $main := self.find-lexical('&MAIN'))
I'm not seeing a BOOTHash there?
nine ??
ah, probably $!live-decl-map{$name} // $!scope.find-generated-lexical($name) // Nil in resolver.rakumod:1271 12:09
hmmm no, that wouldn't bind ?
it's basically from RakuAST::Node.EVAL 12:11
Thread 4 bound to an attribute of an object (Channel) allocated by thread 1 12:12
at SETTING::src/core.c/Channel.rakumod:281
nine Maybe @*MODULES auto-vivifies in a hash where we store dynamic variables?
If those natives should be atomic depends on whether we prefer correctness over speed. If it's just statistical analysis feeding a heuristic decision we may be able to live with losing a few counts in races in favor of less book keeping overhead 12:13
lizmat indeed... but are we sure that cross-thread writes to native int attributes are safe? 12:14
the Channel one is about closing the channel
Thread 4 bound to an attribute of an object (Rakudo::Internals::SupplySequencer) allocated by thread 1 12:16
also a write to a native int
attribute
Thread 4 bound to an attribute of an object (Rakudo::Internals::SupplySequencer) allocated by thread 1 12:19
at SETTING::src/core.c/Rakudo/Internals.rakumod:681
feels like an odd one, as the line in question is:
if $!buffer-start-seq == $!done-target {
&!on-completed();
}
both native ints, &!on-completed obviously not... 12:20
but where does it bind there?
another interesting one: 12:21
Thread 4 shifted an object (BOOTArray) allocated by thread 1
at SETTING::src/core.c/Rakudo/Internals.rakumod:678
&!on-data-ready(nqp::shift($!buffer))
nine Well if in doubt switching it to an atomic is certainly preferrable. Especially when the failure mode might be a deadlock or crash.
lizmat that last shift appears to be safe, as only executed inside a protect block 12:22
ok, that's it for this log 12:24
ok, so: raku -I. bin/rak lizmat -i --count-only --matches-only ../../REA/META.json 12:29
crashes for me roughly 20% of the time
either zsh: segmentation fault or MoarVM panic: Internal error: zeroed target thread ID in work pass 12:30
with MVM_CROSS_THREAD_WRITE_LOG=1 I can not get it to crash
I guess Heisenberg applies :-(
timo lizmat: sorry i was busy workworkworking all day 16:48
interesting, i'm zef installing --deps-only . in App-Rak and i'm getting two versions of ParaSeq installed 16:54
i guess i also want rak cloned locally to futz around with it if i need to 16:56
17:01 sena_kun joined
lizmat interesting... hmmm 17:01
ah... I forgot to update JSON::Fast::Hyper... 17:02
timo i was looking for a way to have dependencies of a module shown, maybe even as a tree, but i can't get anything out of "zef depends" other than "Failed to resolve some missing dependencies" (with no details what exactly)
zef rdepends on 'App::Rak:ver<0.3.12>:auth<zef:lizmat>' gives me a few App::Rak::Complete entries at least, after like 40 seconds of full cpu usage %)
lizmat the raku.land page is very informative
[Coke] timo: raku.land/zef:coke/App::Zef-Deps ? 17:08
timo neat
[Coke] (which is just a front end for some zef stuff)
timo yes that's perfect
[Coke] oh good 17:17
timo oh 17:19
> Failed to resolve some missing dependencies (use e.g. --exclude="git" to skip)
this is the error message i was getting
i thought the "git" was just an example
now i was able to see for myself that yes, ParaSeq 0.2.6+ is in rak, and ParaSeq:ver<0.2.5> is in JSON::Fast::Hyper 17:20
[Coke] which should *theoretically* work, yes? 17:29
timo yeah it should be no problem, and i don't think it's causing any trouble 17:52
got a little zef ticket out of it, just a minor thing though 17:57
i can't give it a label like "low priority" or so, i hope it's not annoying
wow, REA big huh
all the individual files in it are .tar.gz, i wonder if mumble-mumble not compressing them on that level will mumble-mumble give better results because different versions of the same module can be compressed as a whole when git makes its packfiles 17:59
lizmat: why can you use -i and i get a message that i'm supposed to use --ignorecase instead? 18:01
but i do get a segfault, so that's good progress
> Invalid value '1' for :degree on method hyper. 18:02
huh
i also get alternatingly 1072 matches and 1071 matches in 1 file, hehe. 18:04
so yeah something smells an awful lot like unsafe concurrency being used 18:05
is rak doing any automatic sensing of what degree it should use for hypers? i'm not sure why i have to run it with --degree=2 or higher if i want it to not error out with the "invalid value for :degree" error when recording with rr 18:12
actually i can pass --degree=1 just fine in the commandline and not get the error
--num-cores=N pretend to have N cores (rr will still 18:24
^- this might be required then
lizmat rak takes cpucores - 1 by default 18:25
timo what gives me cpucores?
lizmat re 1072/1071 yeah, that's still something I don't understand, as the counting should be done in a threadsafe manner 18:26
timo do the splits not overlap or something?
lizmat they shouldn't
it's a JSON::Fast::Hyper file
timo ok 18:27
lizmat each top-level JSON is a single line
timo Floating point exception (core dumped) 18:30
whew
i even get 1070 results 18:31
well, it could be memory corruption messing with stuff
lizmat looks like 1071 is the current correct number 18:32
fg
timo ok, --num-cores=10 gives me 10 for Kernel.cputcores, so that part works at least. rak is still not crashing under "rr record" 18:35
lizmat well, that's the thing: also with MVM_CROSS_THREAD_WRITE_LOG=1 I could not get it to crash
so it feels very timing dependent 18:36
timo yeah, piping the stdout to something seems to cause a change as well
the -h flag to rr randomizes scheduling decisions, so that should theoretically do something as well
cool, turning spesh off keeps the crashyness 18:38
one less system to worry about
also, it's probably relevant that the crossthreadwritelog writes to standard error which usually means stdio.h which has some locking going on 18:51
and i imagine the writing out of the stack trace is also not very cheap 18:52
haha yeah 35% of time spent in fprintf, woof 18:53
i also got 1073 this time while using "perf record" :D 18:54
lizmat that I find weird: I can create a mental model of missing increments, but not of additional increments 18:55
timo have you gotten it to apparently infinite-loop yet?
lizmat yes, once today
timo wow those stacks be messed *up* 18:56
lizmat timo: I think I got a standalone version of something that does weird things 19:27
timo cool, that might help 19:28
lizmat gist.github.com/lizmat/d0f1eb60e77...91ba670daa
all is fine *until* the / lizmat / is introduced
.contains("lizmat") is ok 19:29
so looks like something regexy going off the reails
*rails
timo hm, people might take the wrong thing from this huh :P 19:32
lizmat well... there's context 19:36
timo well, i will not stop introducing / lizmat / to things 19:37
lizmat hehe
timo phew, building rakudo with asan is quite slow 19:41
lizmat another datapoint: .contains(/ lizmat /) does not crash either 19:42
timo oh, does contains accept a regex?
lizmat yes, ever since it was created :-)
.contains(/foo/) basically only runs a single cursor and sees if it hit anything 19:43
does not create a Match object and does not set $/
timo neat
lizmat so: I'd conclude that the cursor logic is not an issue 19:44
but the full creation of a Match object *and* setting $/ is
and I would suspect setting $/ first
hmmmm 19:45
for @lines { 19:46
$ib.push($_) if .match(/ lizmat /);
}
is also fine, so scratch that
that sets $/
timo why did i even think asan would be good enough to still get a crash .. but i do get 1072; 1071 was supposed to be correct yes? 19:47
lizmat that's what I get, yes
timo ah yes there's a 1071 again
lizmat vi also things 1071 19:48
1,$s/lizmat/fooo/
timo and 1070 too :D
AddressSanitizer can not provide additional info. 19:49
wow i actually got a segfault with asan and i've got an rr recording 19:54
lizmat fwiw, I'm on Apple silicon, so no JIT... 19:55
so if you're on Intel, maybe disable the JIT, might make things easier ?
timo i should have thought to disable the jit since it makes stack traces happier and so on, but with rr it's often not such a big deal 19:56
nooooo the program finished without crashing in the recording wtf
Dispatch callback failed to delegate to a dispatcher 20:01
in block at standalone_crash.raku line 78
in block at standalone_crash.raku line 59
what on earth is all that
lizmat disp/program.c line 3102 20:02
timo Cannot resolve caller Bool(Nil:U: ); none of these signatures matches, Bool(Match:U: ) and Bool(Nil:U )
lizmat yeah, that's a weird message in of itself
fwiw, if you write the for loop out 20:05
timo well, the numbers i get differ, for example 657
should i refresh the gist and get your latest? 20:06
lizmat gist updated
inn that version, it crashes in the same way with RAKUDO_RAKUAST=1 20:07
which implies to me the problem is in NQP or deeper
as that is what the legacy grammar and Raku grammar share
hmmm could also be the setting... hmmm 20:08
/ foo / ASTs to: Regex.clone($_, $/) 20:13
and that's nqp::p6bindattrinvres(
nqp::p6bindattrinvres(self.Method::clone, Regex, '$!topic', $topic),
Regex, '$!slash', $slash)
I wonder if nqp::p6bindattrinvres is suspect 20:14
timo does clone not make a new one right there?
lizmat it does
changing the nqp::p6bindattrinvres( into bindattrs does not make a difference 20:17
so I guess we can rule p6bindattrinvres our for now
timo right, it's just a small desugar op right?
lizmat yeah, think so 20:19
timo oh, ew, MVM_dump_backtrace actually interacts with the GC i never realized that 20:24
the way your code is corrupting the hell out of the moarvm is quite an impressive sight 20:26
i wonder if i can actually figure it out
lizmat well, I tried very hard to golf it down 20:27
do you need more explanation about the code ?
timo i think for now i can do without 20:29
lizmat it's basically the ParaSeq logic stripped down, with the backpressure logic removed 20:30
raku.land/zef:lizmat/ParaSeq#hyper...ntrol-flow 20:31
timo i think i've seen the module 20:33
i think we're in src/vm/moar/dispatchers.nqp on line 2895 when we try to call &multi-no-match-handler and that somehow goes kaboom, but that's still downstream of the actual cause, i bet 20:43
lizmat getting there is already wrong, I'd say 20:45
that's really to catch Junctions, and no junctions are in play 20:46
so it's getting there for the wrong reason, I'd say
timo so the moment - or just before - things are going wrong, two threads are in a dispatcher in the Bool method at the same time. i don't expect the dispatcher code to be racy, but at least that's where i'll be looking 21:03
lizmat location? 21:04
I'll be looking at that tomorrow then
timo i think it's Match.BOOL 21:05
er, Match::Bool
lizmat proto method Bool(|) {*} 21:06
multi method Bool(Match:U: --> False) { }
multi method Bool(Match:D:) { nqp::hllbool($!pos >= $!from) }
not a lot to dispatch there...
lizmat replaces the multi by an only 21:07
ok, doesn't fix the problem, but now my test program hangs half of the time 21:10
and the other half crashes 21:11
lizmat is going to sleep over it
22:30 sena_kun left
timo when i give the block in line 78 a "my $/" it stops behaving bed 22:47
bad*
but taking away the my $/ again seems to only cause wrong "seen" numbers right now, i don't see the crashes any more 22:48
"Cannot resolve caller Bool(Nil:U: ); none of these signatures matches" now again after i upped the optimize level in moar compilation from -Og (optimize for debugging) to -O2 22:50
it's quite possible that the dispatcher code got unhappy because the value of the scalar was changing from under it? 22:51
turning spesh off makes it worse, which makes sense because when spesh is on, dispatch recordings get turned into compiled code, so it's no longer using the dispatchers but instead building guards and such, and when the guards fail we deopt and try again i think? at which point the value is possibly stable again for a little moment 22:53