[00:02] *** reportable6 left [00:05] *** reportable6 joined [01:58] *** greppable6 left [01:58] *** evalable6 left [01:58] *** linkable6 left [01:59] *** greppable6 joined [01:59] *** evalable6 joined [02:01] *** nine left [02:01] *** nine joined [03:01] *** quotable6 left [03:01] *** squashable6 left [03:01] *** evalable6 left [03:01] *** tellable6 left [03:01] *** committable6 left [03:01] *** benchable6 left [03:01] *** greppable6 left [03:01] *** nativecallable6 left [03:01] *** coverable6 left [03:01] *** sourceable6 left [03:01] *** reportable6 left [03:01] *** bisectable6 left [03:01] *** shareable6 left [03:01] *** unicodable6 left [03:01] *** bloatable6 left [03:01] *** notable6 left [03:01] *** statisfiable6 left [03:01] *** releasable6 left [03:02] *** squashable6 joined [03:02] *** statisfiable6 joined [03:02] *** tellable6 joined [03:02] *** nativecallable6 joined [03:03] *** coverable6 joined [03:03] *** sourceable6 joined [03:03] *** unicodable6 joined [03:04] *** reportable6 joined [03:04] *** greppable6 joined [03:04] *** releasable6 joined [04:02] *** benchable6 joined [04:02] *** evalable6 joined [04:02] *** bisectable6 joined [04:04] *** shareable6 joined [05:03] *** committable6 joined [05:06] *** Util joined [05:59] *** linkable6 joined [06:02] *** reportable6 left [06:03] *** quotable6 joined [07:02] *** bloatable6 joined [07:03] *** reportable6 joined [07:15] *** frost joined [07:47] *** MasterDuke joined [08:03] *** notable6 joined [08:22] *** Altai-man left [08:28] *** Altai-man joined [09:06] hm. maybe all those times i saw leaks in --full-cleanup it was because of the sigabrt from gc_free (ConcBlockingQueue.c:80), but i didn't noticed the abort had happened [09:08] moarning o/ [09:08] timo, nine, jnthnwrthngtn: any suggestions for how to handle those sigabrts? [09:11] \o [09:11] oh https://github.com/MoarVM/MoarVM/commit/af5e9158c72511d390a22a9450d874ec9e7516d0 has some work in that area [09:18] *** TempIRCLogger left [09:18] *** TempIRCLogger joined [09:26] *** TempIRCLogger left [09:26] *** TempIRCLogger joined [09:33] *** TempIRCLogger left [09:33] *** TempIRCLogger joined [09:33] interesting, with that commit cherry-picked and deconflicted with HEAD of master, `raku --full-cleanup -e 'for ^1_000 { (^100).race(batch=>1).map({ $_ }).List }'` now gives a bunch of `No exception handler located for catch` with a rakudo backtrace [09:38] *** TempIRCLogger left [09:38] *** TempIRCLogger joined [09:42] Isn't the abort also in the "the only real fix is to make our own locks" camp? [09:44] While looking at the profile, I was surprised to see what over a quarter of the time the spesh worker thread spends is on spesh stats cleanup [09:45] Trying a couple of changes to see if we can get that down, since if we can we can spend more time doing useful work on the specializer thread [09:52] jnthnwrthngtn: the alternative fix is to properly cleanup all our threads. The more_asan_fixes branches contain my work in that direction. IIRC it works rather well and personally I'm fine with the approach. I figured this is something to request your architect's hat on when new-disp and RakuAST are done [09:52] re making our own locks i have no idea, don't think i've heard that mentioned before [09:53] nine: What's in the scope of "our threads", though? The ones that MoarVM itself creates, such as spesh and event loop? [09:54] MasterDuke: I've normally mentioned it as "the JVM approach" [09:54] Which may also ring a bell [09:54] ah [09:55] jnthnwrthngtn: also the worker threads the ThreadPoolScheduler creates. I added signalling (queue returns a VMNull) so those know when to exit. Of course if users create their own low level threads, all bets would be off [09:56] nine: what would be needed to make that work in Telemetry ? [09:56] (as it creates its own thread) [09:56] or is that already covered by the END block it has ? [09:57] lizmat: I think the END block already does the trick [09:57] oki! [10:05] jnthnwrthngtn: the other "hack" would be to have another thread to dispose of stats. (Optionally enabled) On the assumption that most code isn't using as many cores as modern CPUs provide. But this feels like admitting defeat. [10:07] Nicholas: The problem seems perhaps more that it's just really wasteful. [10:07] yes. that was the "admitting defeat" part. [10:23] Goodbye to another 4 billion cycles [10:24] spesh stats cleanup goes from 2.4% of CPU cycles of the process (all on the spesh thread, so a huge portion of its cycles) to just 0.12% [10:24] \o/ [10:24] I'd never really looked at this before [10:25] "in parallel universes in constant time" - clearly Damian's solution to this problem would have been use more 'universes'; [10:26] nice [10:35] ¦ MoarVM/new-disp: 085c05c5ee | (Jonathan Worthington)++ | 3 files [10:35] ¦ MoarVM/new-disp: Greatly decrease cost of spesh stats cleanup [10:35] ¦ MoarVM/new-disp: [10:35] ¦ MoarVM/new-disp: This takes it from around 2.4% of CPU cycles in CORE.setting compilation [10:35] ¦ MoarVM/new-disp: (all of them on the spesh thread, and so more like 25% of the time that [10:35] ¦ MoarVM/new-disp: the spesh thread spends) to around 0.12%, measured in instruction [10:35] ¦ MoarVM/new-disp: fetches using callgrind. [10:35] ¦ MoarVM/new-disp: [10:35] ¦ MoarVM/new-disp: <…commit message has 9 more lines…> [10:35] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/085c05c5ee [10:35] This wasn't even what I was meant to be improving, just spotted it last night when staring at profile output. [10:40] And yay, the thing I was hoping would get us another big win on CORE.setting build time does [10:41] oh!? [10:41] jnthnwrthngtn: why don't you continue staring for a bit longer :) [10:41] Stage parse is still longer than master, but optimize and mast and mbc are all faster then master [10:42] Total 51.818 on new-disp, total 50.395 on master [10:42] very cool [10:42] And there's still plenty of things to tune up yet [10:43] nitrous kits and go faster stripes? :-) [10:48] ¦ MoarVM/new-disp: 4ddb32cb70 | (Jonathan Worthington)++ | 3 files [10:48] ¦ MoarVM/new-disp: Add syscall to check if a capture value is literal [10:48] ¦ MoarVM/new-disp: [10:48] ¦ MoarVM/new-disp: For use when writing dispatch programs. Capture values may be literal [10:48] ¦ MoarVM/new-disp: either because they were originally marked that way in the original [10:48] ¦ MoarVM/new-disp: arguments capture, or because we inserted a literal into the capture [10:48] ¦ MoarVM/new-disp: (which may have happened in a dispatcher that delegated to us). [10:48] ¦ MoarVM/new-disp: [10:48] ¦ MoarVM/new-disp: <…commit message has 10 more lines…> [10:48] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/4ddb32cb70 [10:52] *** evalable6 left [10:52] *** linkable6 left [10:52] *** evalable6 joined [10:53] That plus the NQP commit make the big difference. [10:54] *** linkable6 joined [11:20] We actaully have less CPU instruction fetches than master by this point [11:21] 322,606,102,028 on new-disp vs 330,702,914,420 on master [11:23] cool. Quick, TODO the remaining failing tests and ship it! :-) [11:25] but if the wallclock is still slightly high, then they must be slower instructions? or more branch misses? [11:25] Not quite so fast, alas. Since we don't yet translate dispatch programs that set up resumption init state, we'll not be running Raku programs too optimally yet [11:26] MasterDuke: Yes, something along those lines. One obvious target for improvement is the dispatch program running, which currently is done with a `switch`, but could be a computed goto [11:26] Well, using the same "if it's available" trick [12:03] *** reportable6 left [12:53] I guess in the long term we want to JIT the dispatch programs? [12:55] nine: We effectively do for any hot ones: they are translated into guard ops, sp_runbytecode, and similar, and then the JIT processes those [12:56] Well, hot and monomorphic so far, however the goal is to translate polymorphic sites too [12:57] But it's still worth interpreting them faster before we get to that point [13:14] ¦ MoarVM/new-disp: 0e53708422 | (Jonathan Worthington)++ | src/disp/program.c [13:14] ¦ MoarVM/new-disp: Use interned callsite during dispatch record [13:14] ¦ MoarVM/new-disp: [13:14] ¦ MoarVM/new-disp: Otherwise, if we invoke something once - for example the main body of a [13:14] ¦ MoarVM/new-disp: program - we end up with an uninterned callsite, which in turn means [13:14] ¦ MoarVM/new-disp: that OSR's specializations come out far worse. [13:14] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/0e53708422 [13:19] timo: ^^ sorts out OSR [13:20] Gets us rather better at the empty range loop again :) [13:35] ¦ MoarVM/new-disp: c7efea857e | (Jonathan Worthington)++ | src/disp/program.c [13:35] ¦ MoarVM/new-disp: Add missing dispatch program op names [13:35] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/c7efea857e [13:35] ¦ MoarVM/new-disp: a1f52bd276 | (Jonathan Worthington)++ | src/spesh/disp.c [13:35] ¦ MoarVM/new-disp: Comment further reasons for DP non-translation [13:35] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/a1f52bd276 [13:53] ¦ MoarVM/new-disp: 329fa73c64 | (Jonathan Worthington)++ | src/spesh/disp.c [13:53] ¦ MoarVM/new-disp: Translate object literal non-match guards in DPs [13:53] ¦ MoarVM/new-disp: [13:53] ¦ MoarVM/new-disp: Also, since we have a guard for object literal, use that rather than [13:53] ¦ MoarVM/new-disp: producing a sequence of ops. [13:53] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/329fa73c64 [13:55] That gets us translating typical cases of raku-assign [14:00] *** frost left [14:05] *** reportable6 joined [14:43] ¦ MoarVM/new-disp: 11a1c9de6f | (Jonathan Worthington)++ | src/spesh/disp.c [14:43] ¦ MoarVM/new-disp: Translate DPs with non-tail arguments [14:43] ¦ MoarVM/new-disp: [14:43] ¦ MoarVM/new-disp: That is, those that rewrite the argument capture in a more complex way [14:43] ¦ MoarVM/new-disp: than just chopping some arguments off the start for the eventual [14:43] ¦ MoarVM/new-disp: invocation of bytecode or C function. [14:43] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/11a1c9de6f [14:45] That covers raku-capture-lex... [14:48] And now it really is all on the resumption init state [14:48] Which is the hard part, of course [15:15] I do get some extra test failures now. Am I the only one? E.g. t/spec/S32-list/categorize-list.t and t/spec/S02-literals/numeric.t [15:15] wow. i was complaining about pypandoc a while ago, just did some testing today. in a windows vm (which admittedly is slow) it takes pypandoc 6s to convert a tiny .md to .rst. on my linux desktop (which admittedly is bit faster) it takes 0.2s [15:16] each pypandoc call executes the pandoc binary three times. i thought windows had gotten much faster at starting processes recently (i know it used to be slow compared to linux), but maybe not? [15:17] the errors stems from the latest commit it seems, i.e. 11a1c9de6f [15:33] dogbert11: Hm, odd. I can reproduce it, at least [15:35] Got at least the first part of the design (up to enabling spesh linking) for dispatch programs with resumptions figured out. [15:35] Well, and roughly the inlining one too [15:35] It's pretty much all deopt-hard [15:37] (Not that we actaully deopt on resumption, just that looking for resumptions - if we want to keep the cost of a lack of them about zero - means figuring out the invocation chain using the callstack) [15:45] jnthnwrthngtn: golf from one of the files: for (^0x0FFF).grep({ .uniprop eq 'Nd' and .unival == 1|2|3 }).batch(3)>>.chr>>.join -> $string { } [15:46] Ah, thanks [15:47] Language class, afk for a bit [16:47] How on earth can we just somehow forget to call some 0.0077 % of destructors (or leak that many objects, still not sure)? [16:47] Easily reproducible: MVM_SPESH_DISABLE=1 rakudo -e 'my atomicint $t = 0; my atomicint $l = 0; class Foo { submethod BUILD() { $t⚛++; $l⚛++; }; submethod DESTROY() { $l⚛--; note "$t $l" } }; loop { Foo.new; put "."; }' [16:49] The second number is the live counter. It goes up, up, up, then GC kicks in, we get output of the total number and the live number and from GC run to GC run the lowest live number gets larger and larger and stays at about 0.0077 % of the total counter [16:52] nursery migration leakage ? [16:55] Actually....makeing the nursery smaller (e.g. 4K) raises the leakage rate to about 2 % [16:58] which means more migration, no ? [16:59] yes [18:02] *** reportable6 left [18:03] *** reportable6 joined [18:14] I have no guesses, alas [18:15] I'm not too happy with how finalization running is scheduled at the moment, tbb [18:16] It uses the special return mechanism I'd like to boot out in the future...in a way that doesn't really map neatly to the new callstack model [18:16] I think I've mentioned the idea of a finalizer thread in the past, but iirc there were concerns about the concurrency introduced by that [18:20] A finalizer thread would pretty much kill multi threaded Inline::PErl5 [18:22] and that's one of the USPs of Inline::Perl5 :-) [18:22] or any other Inline:: of a system that doesn't have a good concurrency model [18:24] nine: Kill as in "make it impossible to implement" or "kill it given the current implementation"? [18:26] What I currently have is automatic support for use of Perl modules in e.g. Cro applications or the like, basically for non-communicating worker threads. Everything is fine as long as we only ever call methods on Perl objects on the thread that they were created on. Behind the scenes I create an Inline::Perl5 instance and attach it to $*THREAD [18:27] The current finalizer implementation with finalizers called on the owning threads works very well with that. [18:28] I wouldn't know how to deal with DESTROY methods in any other way. And they are essential for reference counting [18:28] OK, hmm. [18:28] Guess we'll have to find some kinda way to deal with it, then. [18:29] We're definitely running the finalize_handler_caller much less often than we set it up [18:29] Maybe something else steals the special return slot? [18:35] Can't find anything. But even if. The objects to finalize stay in the queue and the next handler run should pick them up. It is quite common to have objects in tc->finalizing at the start of walk_thread_finalize_queue [18:40] Btw. we're kinda relying on NQP never gaining a finalize_handler there [18:42] *** rakugest joined [18:45] I'm pretty sure now that we're not actually leaking those objects, but fail to run their finalizers. And we do so because we do not even find them in the finalize list [18:45] *** discord-raku-bot left [18:46] *** discord-raku-bot joined [18:47] ¦ MoarVM/new-disp: dc4f49e3d0 | (Jonathan Worthington)++ | src/spesh/disp.c [18:47] ¦ MoarVM/new-disp: Correct translation of copy args in DPs [18:47] ¦ MoarVM/new-disp: [18:47] ¦ MoarVM/new-disp: The source argument index calculation needs to use the number of [18:47] ¦ MoarVM/new-disp: arguments to the original dispatch. [18:47] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/dc4f49e3d0 [18:47] That seems to fix the regressions for 11a1c9de6f [18:48] The case I debugged is one of the cute things new-disp lets us do: a multiple dispatch with a Junction argument is specialized into a runbytecode of Junction.AUTOTHREAD(&code, arg1, arg2) [18:49] That is, we call directly to the junction auto-threader [18:49] No intermediate closure slurping and whatever [18:49] cool, so we don't need a lot of these global SUBS anymore either, we can make them class methods easily ? [18:50] like &DYNAMIC ? [18:50] Well, or stick them under Rakudo::Internals or some such [18:50] Though &DYNAMIC is already changed in rakuast, fwiw [18:51] ah, ok, cool! [18:51] Though I think just to a different sub [18:51] But I code-gen the op to do the normal lookup right off, and only call the sub form if we need to do global fallback [18:51] And pre-calculate the name without the * for that [18:52] But yeah, we could dispatcher up the fallback and get smaller bytecode and put the resolver elsewhere [18:54] as long as dynvar lookup becomes faster :-) [18:54] OMG I got it! [18:55] https://github.com/MoarVM/MoarVM/blob/master/src/gc/finalize.c#L87 [18:56] I'm glad that you have because I haven't [18:56] We're looking at nursery objects and only if we do a gen2 collection we also consider gen2 objects. BUT this loop actually rewrites the finalize list and items we do not even consider will silently get dropped [18:56] aha. [18:57] oops [18:57] that could be a signfiicant leak on long running processes, no ? [18:58] That's why the small nursery intensifies the problem. More objects make it into gen2 and have a chance of falling off the finalizer list [18:58] lizmat: leak memory not so much. But DESTROY methods won't get called when they should. So external ressources (like Perl objects) will leak. [18:58] Or file handles! [18:58] indeed.. :-) [18:59] huh, i think Xliff has had a couple cases of running out of file handles, maybe this was why [19:01] Maybe, though if they're seeing it on Mac, it can also be because the default file handle limit can be quite low [19:03] nine: perhaps this will get fixed: https://github.com/Raku/old-issue-tracker/issues/4420 [19:09] I think that this one is still current: [19:09] src/vm/moar/ops/perl6_ops.c:15:67: warning: excess elements in struct initializer [19:09] static MVMCallsite no_arg_callsite = { NULL, 0, 0, 0, 0, 0, NULL, NULL }; [19:09] ^~~~ [19:17] Ooooh...such stable memory usage, making me much happy [19:27] Just toss one of the NULLs [19:28] I realise this, but I didn't have permission to do it directly [19:43] committable6: 2021.05,2021.06 use nqp; for ^5_000 { (^100).race(batch=>1).map({ $_ }).List }; my int @a; nqp::getrusage(@a); if @a[4] > 0 { say @a[4] } else { EVAL q|use nqp; say nqp::atpos_i(nqp::getrusage(), 4)| }; # interesting [19:43] MasterDuke, ¦2021.05: «436328␤» ¦2021.06: «275640␤» [20:04] MasterDuke: Any particular reason for that gigantic drop in MAXRSS? (The only thing that occurs to me off the top of my head is the improved scheduling of short tasks, but that's more about time than space) [20:05] Although I suppose not building up a really long work queue could account for that .... [20:05] no idea [20:09] https://github.com/MoarVM/MoarVM/blob/master/docs/ChangeLog#L48 maybe? [20:29] Nicholas: ah, right, I forgot that [20:31] Pushed [21:15] jnthnwrthngtn: don't know if you've looked at it much yet, but on new-disp `valgrind --leak-check=full raku --full-cleanup -e ''` reports two invalid frees and a whole lot of definitely and indirectly lost bytes [21:24] i need something not terribly deep today i think i will have a quick look to see if i can make a dent in this [21:25] looks like it's all dispatch programs and recordings [21:26] fascinating, translate dispatch program allocates something thats not getting freed, maybe were using alloc where spesh alloc could work, or we store stuff in a spot were not considering during cleaänup [21:33] we must be missing the annotation that keeps the temps to be freed around for after optimizing runbytecode [21:41] got a lead [22:09] yeah, the "end of translated disp program" comment is getting put in front of the "clear these temps please" annotation and the code that attempts to clear that annotation and free the array only looks at the very first spot [22:14] now i see some deopt annotations at the spot were looking for this annotation [22:14] so just removing the comment was not yet enough [22:17] something else may be up [22:20] ha, i got it now [22:23] nice [22:26] ¦ MoarVM/new-disp: 370a8ea33d | (Timo Paulssen)++ | src/spesh/disp.c [22:26] ¦ MoarVM/new-disp: fix adding clear temps annotation [22:26] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/370a8ea33d [22:26] ¦ MoarVM/new-disp: 5180ae0861 | (Timo Paulssen)++ | src/spesh/optimize.c [22:26] ¦ MoarVM/new-disp: also clear temps for runcfunc ops [22:26] ¦ MoarVM/new-disp: review: https://github.com/MoarVM/MoarVM/commit/5180ae0861 [22:27] can you verify this makes the leaked allocations that came from spesh? [22:27] theres still a bunch from process_recording [23:03] *** rakugest left