Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:02 reportable6 left 00:05 reportable6 joined 01:58 greppable6 left, evalable6 left, linkable6 left 01:59 greppable6 joined, evalable6 joined 02:01 nine left, nine joined 03:01 quotable6 left, squashable6 left, evalable6 left, tellable6 left, committable6 left, benchable6 left, greppable6 left, nativecallable6 left, coverable6 left, sourceable6 left, reportable6 left, bisectable6 left, shareable6 left, unicodable6 left, bloatable6 left, notable6 left, statisfiable6 left, releasable6 left 03:02 squashable6 joined, statisfiable6 joined, tellable6 joined, nativecallable6 joined 03:03 coverable6 joined, sourceable6 joined, unicodable6 joined 03:04 reportable6 joined, greppable6 joined, releasable6 joined 04:02 benchable6 joined, evalable6 joined, bisectable6 joined 04:04 shareable6 joined 05:03 committable6 joined 05:06 Util joined 05:59 linkable6 joined 06:02 reportable6 left 06:03 quotable6 joined 07:02 bloatable6 joined 07:03 reportable6 joined 07:15 frost joined 07:47 MasterDuke joined 08:03 notable6 joined 08:22 Altai-man left 08:28 Altai-man joined
MasterDuke hm. maybe all those times i saw leaks in --full-cleanup it was because of the sigabrt from gc_free (ConcBlockingQueue.c:80), but i didn't noticed the abort had happened 09:06
jnthnwrthngtn moarning o/ 09:08
MasterDuke timo, nine, jnthnwrthngtn: any suggestions for how to handle those sigabrts?
Nicholas \o 09:11
MasterDuke oh github.com/MoarVM/MoarVM/commit/af...ec9e7516d0 has some work in that area
09:18 TempIRCLogger left, TempIRCLogger joined 09:26 TempIRCLogger left, TempIRCLogger joined 09:33 TempIRCLogger left, TempIRCLogger joined
MasterDuke interesting, with that commit cherry-picked and deconflicted with HEAD of master, `raku --full-cleanup -e 'for ^1_000 { (^100).race(batch=>1).map({ $_ }).List }'` now gives a bunch of `No exception handler located for catch` with a rakudo backtrace 09:33
09:38 TempIRCLogger left, TempIRCLogger joined
jnthnwrthngtn Isn't the abort also in the "the only real fix is to make our own locks" camp? 09:42
While looking at the profile, I was surprised to see what over a quarter of the time the spesh worker thread spends is on spesh stats cleanup 09:44
Trying a couple of changes to see if we can get that down, since if we can we can spend more time doing useful work on the specializer thread 09:45
nine jnthnwrthngtn: the alternative fix is to properly cleanup all our threads. The more_asan_fixes branches contain my work in that direction. IIRC it works rather well and personally I'm fine with the approach. I figured this is something to request your architect's hat on when new-disp and RakuAST are done 09:52
MasterDuke re making our own locks i have no idea, don't think i've heard that mentioned before
jnthnwrthngtn nine: What's in the scope of "our threads", though? The ones that MoarVM itself creates, such as spesh and event loop? 09:53
MasterDuke: I've normally mentioned it as "the JVM approach" 09:54
Which may also ring a bell
MasterDuke ah
nine jnthnwrthngtn: also the worker threads the ThreadPoolScheduler creates. I added signalling (queue returns a VMNull) so those know when to exit. Of course if users create their own low level threads, all bets would be off 09:55
lizmat nine: what would be needed to make that work in Telemetry ? 09:56
(as it creates its own thread)
or is that already covered by the END block it has ?
nine lizmat: I think the END block already does the trick 09:57
lizmat oki!
Nicholas jnthnwrthngtn: the other "hack" would be to have another thread to dispose of stats. (Optionally enabled) On the assumption that most code isn't using as many cores as modern CPUs provide. But this feels like admitting defeat. 10:05
jnthnwrthngtn Nicholas: The problem seems perhaps more that it's just really wasteful. 10:07
Nicholas yes. that was the "admitting defeat" part.
jnthnwrthngtn Goodbye to another 4 billion cycles 10:23
spesh stats cleanup goes from 2.4% of CPU cycles of the process (all on the spesh thread, so a huge portion of its cycles) to just 0.12% 10:24
Nicholas \o/
jnthnwrthngtn I'd never really looked at this before
Nicholas "in parallel universes in constant time" - clearly Damian's solution to this problem would have been use more 'universes'; 10:25
MasterDuke nice 10:26
Geth MoarVM/new-disp: 085c05c5ee | (Jonathan Worthington)++ | 3 files
Greatly decrease cost of spesh stats cleanup

This takes it from around 2.4% of CPU cycles in CORE.setting compilation
  (all of them on the spesh thread, and so more like 25% of the time that
the spesh thread spends) to around 0.12%, measured in instruction fetches using callgrind.
... (9 more lines)
jnthnwrthngtn This wasn't even what I was meant to be improving, just spotted it last night when staring at profile output.
And yay, the thing I was hoping would get us another big win on CORE.setting build time does 10:40
MasterDuke oh!? 10:41
dogbert11 jnthnwrthngtn: why don't you continue staring for a bit longer :)
jnthnwrthngtn Stage parse is still longer than master, but optimize and mast and mbc are all faster then master
Total 51.818 on new-disp, total 50.395 on master 10:42
MasterDuke very cool
jnthnwrthngtn And there's still plenty of things to tune up yet
Nicholas nitrous kits and go faster stripes? :-) 10:43
Geth MoarVM/new-disp: 4ddb32cb70 | (Jonathan Worthington)++ | 3 files
Add syscall to check if a capture value is literal

For use when writing dispatch programs. Capture values may be literal either because they were originally marked that way in the original arguments capture, or because we inserted a literal into the capture
  (which may have happened in a dispatcher that delegated to us).
... (10 more lines)
10:52 evalable6 left, linkable6 left, evalable6 joined
jnthnwrthngtn That plus the NQP commit make the big difference. 10:53
10:54 linkable6 joined
jnthnwrthngtn We actaully have less CPU instruction fetches than master by this point 11:20
322,606,102,028 on new-disp vs 330,702,914,420 on master 11:21
Nicholas cool. Quick, TODO the remaining failing tests and ship it! :-) 11:23
MasterDuke but if the wallclock is still slightly high, then they must be slower instructions? or more branch misses? 11:25
jnthnwrthngtn Not quite so fast, alas. Since we don't yet translate dispatch programs that set up resumption init state, we'll not be running Raku programs too optimally yet
MasterDuke: Yes, something along those lines. One obvious target for improvement is the dispatch program running, which currently is done with a `switch`, but could be a computed goto 11:26
Well, using the same "if it's available" trick
12:03 reportable6 left
nine I guess in the long term we want to JIT the dispatch programs? 12:53
jnthnwrthngtn nine: We effectively do for any hot ones: they are translated into guard ops, sp_runbytecode, and similar, and then the JIT processes those 12:55
Well, hot and monomorphic so far, however the goal is to translate polymorphic sites too 12:56
But it's still worth interpreting them faster before we get to that point 12:57
Geth MoarVM/new-disp: 0e53708422 | (Jonathan Worthington)++ | src/disp/program.c
Use interned callsite during dispatch record

Otherwise, if we invoke something once - for example the main body of a program - we end up with an uninterned callsite, which in turn means that OSR's specializations come out far worse.
jnthnwrthngtn timo: ^^ sorts out OSR 13:19
Gets us rather better at the empty range loop again :) 13:20
Geth MoarVM/new-disp: c7efea857e | (Jonathan Worthington)++ | src/disp/program.c
Add missing dispatch program op names
MoarVM/new-disp: a1f52bd276 | (Jonathan Worthington)++ | src/spesh/disp.c
Comment further reasons for DP non-translation
MoarVM/new-disp: 329fa73c64 | (Jonathan Worthington)++ | src/spesh/disp.c
Translate object literal non-match guards in DPs

Also, since we have a guard for object literal, use that rather than producing a sequence of ops.
jnthnwrthngtn That gets us translating typical cases of raku-assign 13:55
14:00 frost left 14:05 reportable6 joined
Geth MoarVM/new-disp: 11a1c9de6f | (Jonathan Worthington)++ | src/spesh/disp.c
Translate DPs with non-tail arguments

That is, those that rewrite the argument capture in a more complex way than just chopping some arguments off the start for the eventual invocation of bytecode or C function.
jnthnwrthngtn That covers raku-capture-lex... 14:45
And now it really is all on the resumption init state 14:48
Which is the hard part, of course
dogbert11 I do get some extra test failures now. Am I the only one? E.g. t/spec/S32-list/categorize-list.t and t/spec/S02-literals/numeric.t 15:15
MasterDuke wow. i was complaining about pypandoc a while ago, just did some testing today. in a windows vm (which admittedly is slow) it takes pypandoc 6s to convert a tiny .md to .rst. on my linux desktop (which admittedly is bit faster) it takes 0.2s
each pypandoc call executes the pandoc binary three times. i thought windows had gotten much faster at starting processes recently (i know it used to be slow compared to linux), but maybe not? 15:16
dogbert11 the errors stems from the latest commit it seems, i.e. 11a1c9de6f 15:17
jnthnwrthngtn dogbert11: Hm, odd. I can reproduce it, at least 15:33
Got at least the first part of the design (up to enabling spesh linking) for dispatch programs with resumptions figured out. 15:35
Well, and roughly the inlining one too
It's pretty much all deopt-hard
(Not that we actaully deopt on resumption, just that looking for resumptions - if we want to keep the cost of a lack of them about zero - means figuring out the invocation chain using the callstack) 15:37
dogbert11 jnthnwrthngtn: golf from one of the files: for (^0x0FFF).grep({ .uniprop eq 'Nd' and .unival == 1|2|3 }).batch(3)>>.chr>>.join -> $string { } 15:45
jnthnwrthngtn Ah, thanks 15:46
Language class, afk for a bit 15:47
nine How on earth can we just somehow forget to call some 0.0077 % of destructors (or leak that many objects, still not sure)? 16:47
Easily reproducible: MVM_SPESH_DISABLE=1 rakudo -e 'my atomicint $t = 0; my atomicint $l = 0; class Foo { submethod BUILD() { $t⚛++; $l⚛++; }; submethod DESTROY() { $l⚛--; note "$t $l" } }; loop { Foo.new; put "."; }'
The second number is the live counter. It goes up, up, up, then GC kicks in, we get output of the total number and the live number and from GC run to GC run the lowest live number gets larger and larger and stays at about 0.0077 % of the total counter 16:49
lizmat nursery migration leakage ? 16:52
nine Actually....makeing the nursery smaller (e.g. 4K) raises the leakage rate to about 2 % 16:55
lizmat which means more migration, no ? 16:58
nine yes 16:59
18:02 reportable6 left 18:03 reportable6 joined
jnthnwrthngtn I have no guesses, alas 18:14
I'm not too happy with how finalization running is scheduled at the moment, tbb 18:15
It uses the special return mechanism I'd like to boot out in the future...in a way that doesn't really map neatly to the new callstack model 18:16
I think I've mentioned the idea of a finalizer thread in the past, but iirc there were concerns about the concurrency introduced by that
nine A finalizer thread would pretty much kill multi threaded Inline::PErl5 18:20
lizmat and that's one of the USPs of Inline::Perl5 :-) 18:22
or any other Inline:: of a system that doesn't have a good concurrency model
jnthnwrthngtn nine: Kill as in "make it impossible to implement" or "kill it given the current implementation"? 18:24
nine What I currently have is automatic support for use of Perl modules in e.g. Cro applications or the like, basically for non-communicating worker threads. Everything is fine as long as we only ever call methods on Perl objects on the thread that they were created on. Behind the scenes I create an Inline::Perl5 instance and attach it to $*THREAD 18:26
The current finalizer implementation with finalizers called on the owning threads works very well with that. 18:27
I wouldn't know how to deal with DESTROY methods in any other way. And they are essential for reference counting 18:28
jnthnwrthngtn OK, hmm.
Guess we'll have to find some kinda way to deal with it, then.
nine We're definitely running the finalize_handler_caller much less often than we set it up 18:29
jnthnwrthngtn Maybe something else steals the special return slot?
nine Can't find anything. But even if. The objects to finalize stay in the queue and the next handler run should pick them up. It is quite common to have objects in tc->finalizing at the start of walk_thread_finalize_queue 18:35
Btw. we're kinda relying on NQP never gaining a finalize_handler there 18:40
18:42 rakugest joined
nine I'm pretty sure now that we're not actually leaking those objects, but fail to run their finalizers. And we do so because we do not even find them in the finalize list 18:45
18:45 discord-raku-bot left 18:46 discord-raku-bot joined
Geth MoarVM/new-disp: dc4f49e3d0 | (Jonathan Worthington)++ | src/spesh/disp.c
Correct translation of copy args in DPs

The source argument index calculation needs to use the number of arguments to the original dispatch.
jnthnwrthngtn That seems to fix the regressions for 11a1c9de6f
The case I debugged is one of the cute things new-disp lets us do: a multiple dispatch with a Junction argument is specialized into a runbytecode of Junction.AUTOTHREAD(&code, arg1, arg2) 18:48
That is, we call directly to the junction auto-threader 18:49
No intermediate closure slurping and whatever
lizmat cool, so we don't need a lot of these global SUBS anymore either, we can make them class methods easily ?
like &DYNAMIC ? 18:50
jnthnwrthngtn Well, or stick them under Rakudo::Internals or some such
Though &DYNAMIC is already changed in rakuast, fwiw
lizmat ah, ok, cool! 18:51
jnthnwrthngtn Though I think just to a different sub
But I code-gen the op to do the normal lookup right off, and only call the sub form if we need to do global fallback
And pre-calculate the name without the * for that
But yeah, we could dispatcher up the fallback and get smaller bytecode and put the resolver elsewhere 18:52
lizmat as long as dynvar lookup becomes faster :-) 18:54
nine OMG I got it!
github.com/MoarVM/MoarVM/blob/mast...lize.c#L87 18:55
Nicholas I'm glad that you have because I haven't 18:56
nine We're looking at nursery objects and only if we do a gen2 collection we also consider gen2 objects. BUT this loop actually rewrites the finalize list and items we do not even consider will silently get dropped
Nicholas aha.
jnthnwrthngtn oops 18:57
lizmat that could be a signfiicant leak on long running processes, no ?
nine That's why the small nursery intensifies the problem. More objects make it into gen2 and have a chance of falling off the finalizer list 18:58
lizmat: leak memory not so much. But DESTROY methods won't get called when they should. So external ressources (like Perl objects) will leak.
Or file handles!
lizmat indeed.. :-)
MasterDuke huh, i think Xliff has had a couple cases of running out of file handles, maybe this was why 18:59
jnthnwrthngtn Maybe, though if they're seeing it on Mac, it can also be because the default file handle limit can be quite low 19:01
dogbert11 nine: perhaps this will get fixed: github.com/Raku/old-issue-tracker/issues/4420 19:03
Nicholas I think that this one is still current: 19:09
src/vm/moar/ops/perl6_ops.c:15:67: warning: excess elements in struct initializer
static MVMCallsite no_arg_callsite = { NULL, 0, 0, 0, 0, 0, NULL, NULL };
nine Ooooh...such stable memory usage, making me much happy 19:17
jnthnwrthngtn Just toss one of the NULLs 19:27
Nicholas I realise this, but I didn't have permission to do it directly 19:28
MasterDuke committable6: 2021.05,2021.06 use nqp; for ^5_000 { (^100).race(batch=>1).map({ $_ }).List }; my int @a; nqp::getrusage(@a); if @a[4] > 0 { say @a[4] } else { EVAL q|use nqp; say nqp::atpos_i(nqp::getrusage(), 4)| }; # interesting 19:43
committable6 MasterDuke, ¦2021.05: «436328␤» ¦2021.06: «275640␤»
japhb MasterDuke: Any particular reason for that gigantic drop in MAXRSS? (The only thing that occurs to me off the top of my head is the improved scheduling of short tasks, but that's more about time than space) 20:04
Although I suppose not building up a really long work queue could account for that .... 20:05
MasterDuke no idea
github.com/MoarVM/MoarVM/blob/mast...ngeLog#L48 maybe? 20:09
jnthnwrthngtn Nicholas: ah, right, I forgot that 20:29
Pushed 20:31
MasterDuke jnthnwrthngtn: don't know if you've looked at it much yet, but on new-disp `valgrind --leak-check=full raku --full-cleanup -e ''` reports two invalid frees and a whole lot of definitely and indirectly lost bytes 21:15
timo i need something not terribly deep today i think i will have a quick look to see if i can make a dent in this 21:24
MasterDuke looks like it's all dispatch programs and recordings 21:25
timo fascinating, translate dispatch program allocates something thats not getting freed, maybe were using alloc where spesh alloc could work, or we store stuff in a spot were not considering during cleaänup 21:26
we must be missing the annotation that keeps the temps to be freed around for after optimizing runbytecode 21:33
got a lead 21:41
yeah, the "end of translated disp program" comment is getting put in front of the "clear these temps please" annotation and the code that attempts to clear that annotation and free the array only looks at the very first spot 22:09
now i see some deopt annotations at the spot were looking for this annotation 22:14
so just removing the comment was not yet enough
something else may be up 22:17
ha, i got it now 22:20
MasterDuke nice 22:23
Geth MoarVM/new-disp: 370a8ea33d | (Timo Paulssen)++ | src/spesh/disp.c
fix adding clear temps annotation
MoarVM/new-disp: 5180ae0861 | (Timo Paulssen)++ | src/spesh/optimize.c
also clear temps for runcfunc ops
timo can you verify this makes the leaked allocations that came from spesh? 22:27
theres still a bunch from process_recording
23:03 rakugest left