#moarvm on 26 August 2021 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
00:02 reportable6 left 00:05 reportable6 joined 01:58 greppable6 left, evalable6 left, linkable6 left 01:59 greppable6 joined, evalable6 joined 02:01 nine left, nine joined 03:01 quotable6 left, squashable6 left, evalable6 left, tellable6 left, committable6 left, benchable6 left, greppable6 left, nativecallable6 left, coverable6 left, sourceable6 left, reportable6 left, bisectable6 left, shareable6 left, unicodable6 left, bloatable6 left, notable6 left, statisfiable6 left, releasable6 left 03:02 squashable6 joined, statisfiable6 joined, tellable6 joined, nativecallable6 joined 03:03 coverable6 joined, sourceable6 joined, unicodable6 joined 03:04 reportable6 joined, greppable6 joined, releasable6 joined 04:02 benchable6 joined, evalable6 joined, bisectable6 joined 04:04 shareable6 joined 05:03 committable6 joined 05:06 Util joined 05:59 linkable6 joined 06:02 reportable6 left 06:03 quotable6 joined 07:02 bloatable6 joined 07:03 reportable6 joined 07:15 frost joined 07:47 MasterDuke joined 08:03 notable6 joined 08:22 Altai-man left 08:28 Altai-man joined
MasterDuke	hm. maybe all those times i saw leaks in --full-cleanup it was because of the sigabrt from gc_free (ConcBlockingQueue.c:80), but i didn't noticed the abort had happened	09:06	Copy link Message link Add to gist Remove
jnthnwrthngtn	moarning o/	09:08	Copy link Message link Add to gist Remove
MasterDuke	timo, nine, jnthnwrthngtn: any suggestions for how to handle those sigabrts?		Copy link Message link Add to gist Remove
Nicholas	\o	09:11	Copy link Message link Add to gist Remove
MasterDuke	oh github.com/MoarVM/MoarVM/commit/af...ec9e7516d0 has some work in that area		Copy link Message link Add to gist Remove
09:18 TempIRCLogger left, TempIRCLogger joined 09:26 TempIRCLogger left, TempIRCLogger joined 09:33 TempIRCLogger left, TempIRCLogger joined
MasterDuke	interesting, with that commit cherry-picked and deconflicted with HEAD of master, `raku --full-cleanup -e 'for ^1_000 { (^100).race(batch=>1).map({ $_ }).List }'` now gives a bunch of `No exception handler located for catch` with a rakudo backtrace	09:33	Copy link Message link Add to gist Remove
09:38 TempIRCLogger left, TempIRCLogger joined
jnthnwrthngtn	Isn't the abort also in the "the only real fix is to make our own locks" camp?	09:42	Copy link Message link Add to gist Remove
	While looking at the profile, I was surprised to see what over a quarter of the time the spesh worker thread spends is on spesh stats cleanup	09:44	Copy link Message link Add to gist Remove
	Trying a couple of changes to see if we can get that down, since if we can we can spend more time doing useful work on the specializer thread	09:45	Copy link Message link Add to gist Remove
nine	jnthnwrthngtn: the alternative fix is to properly cleanup all our threads. The more_asan_fixes branches contain my work in that direction. IIRC it works rather well and personally I'm fine with the approach. I figured this is something to request your architect's hat on when new-disp and RakuAST are done	09:52	Copy link Message link Add to gist Remove
MasterDuke	re making our own locks i have no idea, don't think i've heard that mentioned before		Copy link Message link Add to gist Remove
jnthnwrthngtn	nine: What's in the scope of "our threads", though? The ones that MoarVM itself creates, such as spesh and event loop?	09:53	Copy link Message link Add to gist Remove
	MasterDuke: I've normally mentioned it as "the JVM approach"	09:54	Copy link Message link Add to gist Remove
	Which may also ring a bell		Copy link Message link Add to gist Remove
MasterDuke	ah		Copy link Message link Add to gist Remove
nine	jnthnwrthngtn: also the worker threads the ThreadPoolScheduler creates. I added signalling (queue returns a VMNull) so those know when to exit. Of course if users create their own low level threads, all bets would be off	09:55	Copy link Message link Add to gist Remove
lizmat	nine: what would be needed to make that work in Telemetry ?	09:56	Copy link Message link Add to gist Remove
	(as it creates its own thread)		Copy link Message link Add to gist Remove
	or is that already covered by the END block it has ?		Copy link Message link Add to gist Remove
nine	lizmat: I think the END block already does the trick	09:57	Copy link Message link Add to gist Remove
lizmat	oki!		Copy link Message link Add to gist Remove
Nicholas	jnthnwrthngtn: the other "hack" would be to have another thread to dispose of stats. (Optionally enabled) On the assumption that most code isn't using as many cores as modern CPUs provide. But this feels like admitting defeat.	10:05	Copy link Message link Add to gist Remove
jnthnwrthngtn	Nicholas: The problem seems perhaps more that it's just really wasteful.	10:07	Copy link Message link Add to gist Remove
Nicholas	yes. that was the "admitting defeat" part.		Copy link Message link Add to gist Remove
jnthnwrthngtn	Goodbye to another 4 billion cycles	10:23	Copy link Message link Add to gist Remove
	spesh stats cleanup goes from 2.4% of CPU cycles of the process (all on the spesh thread, so a huge portion of its cycles) to just 0.12%	10:24	Copy link Message link Add to gist Remove
Nicholas	\o/		Copy link Message link Add to gist Remove
jnthnwrthngtn	I'd never really looked at this before		Copy link Message link Add to gist Remove
Nicholas	"in parallel universes in constant time" - clearly Damian's solution to this problem would have been use more 'universes';	10:25	Copy link Message link Add to gist Remove
MasterDuke	nice	10:26	Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: 085c05c5ee \| (Jonathan Worthington)++ \| 3 files Greatly decrease cost of spesh stats cleanup This takes it from around 2.4% of CPU cycles in CORE.setting compilation (all of them on the spesh thread, and so more like 25% of the time that the spesh thread spends) to around 0.12%, measured in instruction fetches using callgrind. ... (9 more lines)	10:35	Copy link Message link Add to gist Remove
jnthnwrthngtn	This wasn't even what I was meant to be improving, just spotted it last night when staring at profile output.		Copy link Message link Add to gist Remove
	And yay, the thing I was hoping would get us another big win on CORE.setting build time does	10:40	Copy link Message link Add to gist Remove
MasterDuke	oh!?	10:41	Copy link Message link Add to gist Remove
dogbert11	jnthnwrthngtn: why don't you continue staring for a bit longer :)		Copy link Message link Add to gist Remove
jnthnwrthngtn	Stage parse is still longer than master, but optimize and mast and mbc are all faster then master		Copy link Message link Add to gist Remove
	Total 51.818 on new-disp, total 50.395 on master	10:42	Copy link Message link Add to gist Remove
MasterDuke	very cool		Copy link Message link Add to gist Remove
jnthnwrthngtn	And there's still plenty of things to tune up yet		Copy link Message link Add to gist Remove
Nicholas	nitrous kits and go faster stripes? :-)	10:43	Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: 4ddb32cb70 \| (Jonathan Worthington)++ \| 3 files Add syscall to check if a capture value is literal For use when writing dispatch programs. Capture values may be literal either because they were originally marked that way in the original arguments capture, or because we inserted a literal into the capture (which may have happened in a dispatcher that delegated to us). ... (10 more lines)	10:48	Copy link Message link Add to gist Remove
10:52 evalable6 left, linkable6 left, evalable6 joined
jnthnwrthngtn	That plus the NQP commit make the big difference.	10:53	Copy link Message link Add to gist Remove
10:54 linkable6 joined
jnthnwrthngtn	We actaully have less CPU instruction fetches than master by this point	11:20	Copy link Message link Add to gist Remove
	322,606,102,028 on new-disp vs 330,702,914,420 on master	11:21	Copy link Message link Add to gist Remove
Nicholas	cool. Quick, TODO the remaining failing tests and ship it! :-)	11:23	Copy link Message link Add to gist Remove
MasterDuke	but if the wallclock is still slightly high, then they must be slower instructions? or more branch misses?	11:25	Copy link Message link Add to gist Remove
jnthnwrthngtn	Not quite so fast, alas. Since we don't yet translate dispatch programs that set up resumption init state, we'll not be running Raku programs too optimally yet		Copy link Message link Add to gist Remove
	MasterDuke: Yes, something along those lines. One obvious target for improvement is the dispatch program running, which currently is done with a `switch`, but could be a computed goto	11:26	Copy link Message link Add to gist Remove
	Well, using the same "if it's available" trick		Copy link Message link Add to gist Remove
12:03 reportable6 left
nine	I guess in the long term we want to JIT the dispatch programs?	12:53	Copy link Message link Add to gist Remove
jnthnwrthngtn	nine: We effectively do for any hot ones: they are translated into guard ops, sp_runbytecode, and similar, and then the JIT processes those	12:55	Copy link Message link Add to gist Remove
	Well, hot and monomorphic so far, however the goal is to translate polymorphic sites too	12:56	Copy link Message link Add to gist Remove
	But it's still worth interpreting them faster before we get to that point	12:57	Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: 0e53708422 \| (Jonathan Worthington)++ \| src/disp/program.c Use interned callsite during dispatch record Otherwise, if we invoke something once - for example the main body of a program - we end up with an uninterned callsite, which in turn means that OSR's specializations come out far worse.	13:14	Copy link Message link Add to gist Remove
jnthnwrthngtn	timo: ^^ sorts out OSR	13:19	Copy link Message link Add to gist Remove
	Gets us rather better at the empty range loop again :)	13:20	Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: c7efea857e \| (Jonathan Worthington)++ \| src/disp/program.c Add missing dispatch program op names	13:35	Copy link Message link Add to gist Remove
	MoarVM/new-disp: a1f52bd276 \| (Jonathan Worthington)++ \| src/spesh/disp.c Comment further reasons for DP non-translation		Copy link Message link Add to gist Remove
	MoarVM/new-disp: 329fa73c64 \| (Jonathan Worthington)++ \| src/spesh/disp.c Translate object literal non-match guards in DPs Also, since we have a guard for object literal, use that rather than producing a sequence of ops.	13:53	Copy link Message link Add to gist Remove
jnthnwrthngtn	That gets us translating typical cases of raku-assign	13:55	Copy link Message link Add to gist Remove
14:00 frost left 14:05 reportable6 joined
Geth	MoarVM/new-disp: 11a1c9de6f \| (Jonathan Worthington)++ \| src/spesh/disp.c Translate DPs with non-tail arguments That is, those that rewrite the argument capture in a more complex way than just chopping some arguments off the start for the eventual invocation of bytecode or C function.	14:43	Copy link Message link Add to gist Remove
jnthnwrthngtn	That covers raku-capture-lex...	14:45	Copy link Message link Add to gist Remove
	And now it really is all on the resumption init state	14:48	Copy link Message link Add to gist Remove
	Which is the hard part, of course		Copy link Message link Add to gist Remove
dogbert11	I do get some extra test failures now. Am I the only one? E.g. t/spec/S32-list/categorize-list.t and t/spec/S02-literals/numeric.t	15:15	Copy link Message link Add to gist Remove
MasterDuke	wow. i was complaining about pypandoc a while ago, just did some testing today. in a windows vm (which admittedly is slow) it takes pypandoc 6s to convert a tiny .md to .rst. on my linux desktop (which admittedly is bit faster) it takes 0.2s		Copy link Message link Add to gist Remove
	each pypandoc call executes the pandoc binary three times. i thought windows had gotten much faster at starting processes recently (i know it used to be slow compared to linux), but maybe not?	15:16	Copy link Message link Add to gist Remove
dogbert11	the errors stems from the latest commit it seems, i.e. 11a1c9de6f	15:17	Copy link Message link Add to gist Remove
jnthnwrthngtn	dogbert11: Hm, odd. I can reproduce it, at least	15:33	Copy link Message link Add to gist Remove
	Got at least the first part of the design (up to enabling spesh linking) for dispatch programs with resumptions figured out.	15:35	Copy link Message link Add to gist Remove
	Well, and roughly the inlining one too		Copy link Message link Add to gist Remove
	It's pretty much all deopt-hard		Copy link Message link Add to gist Remove
	(Not that we actaully deopt on resumption, just that looking for resumptions - if we want to keep the cost of a lack of them about zero - means figuring out the invocation chain using the callstack)	15:37	Copy link Message link Add to gist Remove
dogbert11	jnthnwrthngtn: golf from one of the files: for (^0x0FFF).grep({ .uniprop eq 'Nd' and .unival == 1\|2\|3 }).batch(3)>>.chr>>.join -> $string { }	15:45	Copy link Message link Add to gist Remove
jnthnwrthngtn	Ah, thanks	15:46	Copy link Message link Add to gist Remove
	Language class, afk for a bit	15:47	Copy link Message link Add to gist Remove
nine	How on earth can we just somehow forget to call some 0.0077 % of destructors (or leak that many objects, still not sure)?	16:47	Copy link Message link Add to gist Remove
	Easily reproducible: MVM_SPESH_DISABLE=1 rakudo -e 'my atomicint $t = 0; my atomicint $l = 0; class Foo { submethod BUILD() { $t⚛++; $l⚛++; }; submethod DESTROY() { $l⚛--; note "$t $l" } }; loop { Foo.new; put "."; }'		Copy link Message link Add to gist Remove
	The second number is the live counter. It goes up, up, up, then GC kicks in, we get output of the total number and the live number and from GC run to GC run the lowest live number gets larger and larger and stays at about 0.0077 % of the total counter	16:49	Copy link Message link Add to gist Remove
lizmat	nursery migration leakage ?	16:52	Copy link Message link Add to gist Remove
nine	Actually....makeing the nursery smaller (e.g. 4K) raises the leakage rate to about 2 %	16:55	Copy link Message link Add to gist Remove
lizmat	which means more migration, no ?	16:58	Copy link Message link Add to gist Remove
nine	yes	16:59	Copy link Message link Add to gist Remove
18:02 reportable6 left 18:03 reportable6 joined
jnthnwrthngtn	I have no guesses, alas	18:14	Copy link Message link Add to gist Remove
	I'm not too happy with how finalization running is scheduled at the moment, tbb	18:15	Copy link Message link Add to gist Remove
	It uses the special return mechanism I'd like to boot out in the future...in a way that doesn't really map neatly to the new callstack model	18:16	Copy link Message link Add to gist Remove
	I think I've mentioned the idea of a finalizer thread in the past, but iirc there were concerns about the concurrency introduced by that		Copy link Message link Add to gist Remove
nine	A finalizer thread would pretty much kill multi threaded Inline::PErl5	18:20	Copy link Message link Add to gist Remove
lizmat	and that's one of the USPs of Inline::Perl5 :-)	18:22	Copy link Message link Add to gist Remove
	or any other Inline:: of a system that doesn't have a good concurrency model		Copy link Message link Add to gist Remove
jnthnwrthngtn	nine: Kill as in "make it impossible to implement" or "kill it given the current implementation"?	18:24	Copy link Message link Add to gist Remove
nine	What I currently have is automatic support for use of Perl modules in e.g. Cro applications or the like, basically for non-communicating worker threads. Everything is fine as long as we only ever call methods on Perl objects on the thread that they were created on. Behind the scenes I create an Inline::Perl5 instance and attach it to $*THREAD	18:26	Copy link Message link Add to gist Remove
	The current finalizer implementation with finalizers called on the owning threads works very well with that.	18:27	Copy link Message link Add to gist Remove
	I wouldn't know how to deal with DESTROY methods in any other way. And they are essential for reference counting	18:28	Copy link Message link Add to gist Remove
jnthnwrthngtn	OK, hmm.		Copy link Message link Add to gist Remove
	Guess we'll have to find some kinda way to deal with it, then.		Copy link Message link Add to gist Remove
nine	We're definitely running the finalize_handler_caller much less often than we set it up	18:29	Copy link Message link Add to gist Remove
jnthnwrthngtn	Maybe something else steals the special return slot?		Copy link Message link Add to gist Remove
nine	Can't find anything. But even if. The objects to finalize stay in the queue and the next handler run should pick them up. It is quite common to have objects in tc->finalizing at the start of walk_thread_finalize_queue	18:35	Copy link Message link Add to gist Remove
	Btw. we're kinda relying on NQP never gaining a finalize_handler there	18:40	Copy link Message link Add to gist Remove
18:42 rakugest joined
nine	I'm pretty sure now that we're not actually leaking those objects, but fail to run their finalizers. And we do so because we do not even find them in the finalize list	18:45	Copy link Message link Add to gist Remove
18:45 discord-raku-bot left 18:46 discord-raku-bot joined
Geth	MoarVM/new-disp: dc4f49e3d0 \| (Jonathan Worthington)++ \| src/spesh/disp.c Correct translation of copy args in DPs The source argument index calculation needs to use the number of arguments to the original dispatch.	18:47	Copy link Message link Add to gist Remove
jnthnwrthngtn	That seems to fix the regressions for 11a1c9de6f		Copy link Message link Add to gist Remove
	The case I debugged is one of the cute things new-disp lets us do: a multiple dispatch with a Junction argument is specialized into a runbytecode of Junction.AUTOTHREAD(&code, arg1, arg2)	18:48	Copy link Message link Add to gist Remove
	That is, we call directly to the junction auto-threader	18:49	Copy link Message link Add to gist Remove
	No intermediate closure slurping and whatever		Copy link Message link Add to gist Remove
lizmat	cool, so we don't need a lot of these global SUBS anymore either, we can make them class methods easily ?		Copy link Message link Add to gist Remove
	like &DYNAMIC ?	18:50	Copy link Message link Add to gist Remove
jnthnwrthngtn	Well, or stick them under Rakudo::Internals or some such		Copy link Message link Add to gist Remove
	Though &DYNAMIC is already changed in rakuast, fwiw		Copy link Message link Add to gist Remove
lizmat	ah, ok, cool!	18:51	Copy link Message link Add to gist Remove
jnthnwrthngtn	Though I think just to a different sub		Copy link Message link Add to gist Remove
	But I code-gen the op to do the normal lookup right off, and only call the sub form if we need to do global fallback		Copy link Message link Add to gist Remove
	And pre-calculate the name without the * for that		Copy link Message link Add to gist Remove
	But yeah, we could dispatcher up the fallback and get smaller bytecode and put the resolver elsewhere	18:52	Copy link Message link Add to gist Remove
lizmat	as long as dynvar lookup becomes faster :-)	18:54	Copy link Message link Add to gist Remove
nine	OMG I got it!		Copy link Message link Add to gist Remove
	github.com/MoarVM/MoarVM/blob/mast...lize.c#L87	18:55	Copy link Message link Add to gist Remove
Nicholas	I'm glad that you have because I haven't	18:56	Copy link Message link Add to gist Remove
nine	We're looking at nursery objects and only if we do a gen2 collection we also consider gen2 objects. BUT this loop actually rewrites the finalize list and items we do not even consider will silently get dropped		Copy link Message link Add to gist Remove
Nicholas	aha.		Copy link Message link Add to gist Remove
jnthnwrthngtn	oops	18:57	Copy link Message link Add to gist Remove
lizmat	that could be a signfiicant leak on long running processes, no ?		Copy link Message link Add to gist Remove
nine	That's why the small nursery intensifies the problem. More objects make it into gen2 and have a chance of falling off the finalizer list	18:58	Copy link Message link Add to gist Remove
	lizmat: leak memory not so much. But DESTROY methods won't get called when they should. So external ressources (like Perl objects) will leak.		Copy link Message link Add to gist Remove
	Or file handles!		Copy link Message link Add to gist Remove
lizmat	indeed.. :-)		Copy link Message link Add to gist Remove
MasterDuke	huh, i think Xliff has had a couple cases of running out of file handles, maybe this was why	18:59	Copy link Message link Add to gist Remove
jnthnwrthngtn	Maybe, though if they're seeing it on Mac, it can also be because the default file handle limit can be quite low	19:01	Copy link Message link Add to gist Remove
dogbert11	nine: perhaps this will get fixed: github.com/Raku/old-issue-tracker/issues/4420	19:03	Copy link Message link Add to gist Remove
Nicholas	I think that this one is still current:	19:09	Copy link Message link Add to gist Remove
	src/vm/moar/ops/perl6_ops.c:15:67: warning: excess elements in struct initializer		Copy link Message link Add to gist Remove
	static MVMCallsite no_arg_callsite = { NULL, 0, 0, 0, 0, 0, NULL, NULL };		Copy link Message link Add to gist Remove
	^~~~		Copy link Message link Add to gist Remove
nine	Ooooh...such stable memory usage, making me much happy	19:17	Copy link Message link Add to gist Remove
jnthnwrthngtn	Just toss one of the NULLs	19:27	Copy link Message link Add to gist Remove
Nicholas	I realise this, but I didn't have permission to do it directly	19:28	Copy link Message link Add to gist Remove
MasterDuke	committable6: 2021.05,2021.06 use nqp; for ^5_000 { (^100).race(batch=>1).map({ $_ }).List }; my int @a; nqp::getrusage(@a); if @a[4] > 0 { say @a[4] } else { EVAL q\|use nqp; say nqp::atpos_i(nqp::getrusage(), 4)\| }; # interesting	19:43	Copy link Message link Add to gist Remove
committable6	MasterDuke, ¦2021.05: «436328␤» ¦2021.06: «275640␤»		Copy link Message link Add to gist Remove
japhb	MasterDuke: Any particular reason for that gigantic drop in MAXRSS? (The only thing that occurs to me off the top of my head is the improved scheduling of short tasks, but that's more about time than space)	20:04	Copy link Message link Add to gist Remove
	Although I suppose not building up a really long work queue could account for that ....	20:05	Copy link Message link Add to gist Remove
MasterDuke	no idea		Copy link Message link Add to gist Remove
	github.com/MoarVM/MoarVM/blob/mast...ngeLog#L48 maybe?	20:09	Copy link Message link Add to gist Remove
jnthnwrthngtn	Nicholas: ah, right, I forgot that	20:29	Copy link Message link Add to gist Remove
	Pushed	20:31	Copy link Message link Add to gist Remove
MasterDuke	jnthnwrthngtn: don't know if you've looked at it much yet, but on new-disp `valgrind --leak-check=full raku --full-cleanup -e ''` reports two invalid frees and a whole lot of definitely and indirectly lost bytes	21:15	Copy link Message link Add to gist Remove
timo	i need something not terribly deep today i think i will have a quick look to see if i can make a dent in this	21:24	Copy link Message link Add to gist Remove
MasterDuke	looks like it's all dispatch programs and recordings	21:25	Copy link Message link Add to gist Remove
timo	fascinating, translate dispatch program allocates something thats not getting freed, maybe were using alloc where spesh alloc could work, or we store stuff in a spot were not considering during cleaänup	21:26	Copy link Message link Add to gist Remove
	we must be missing the annotation that keeps the temps to be freed around for after optimizing runbytecode	21:33	Copy link Message link Add to gist Remove
	got a lead	21:41	Copy link Message link Add to gist Remove
	yeah, the "end of translated disp program" comment is getting put in front of the "clear these temps please" annotation and the code that attempts to clear that annotation and free the array only looks at the very first spot	22:09	Copy link Message link Add to gist Remove
	now i see some deopt annotations at the spot were looking for this annotation	22:14	Copy link Message link Add to gist Remove
	so just removing the comment was not yet enough		Copy link Message link Add to gist Remove
	something else may be up	22:17	Copy link Message link Add to gist Remove
	ha, i got it now	22:20	Copy link Message link Add to gist Remove
MasterDuke	nice	22:23	Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: 370a8ea33d \| (Timo Paulssen)++ \| src/spesh/disp.c fix adding clear temps annotation	22:26	Copy link Message link Add to gist Remove
	MoarVM/new-disp: 5180ae0861 \| (Timo Paulssen)++ \| src/spesh/optimize.c also clear temps for runcfunc ops		Copy link Message link Add to gist Remove
timo	can you verify this makes the leaked allocations that came from spesh?	22:27	Copy link Message link Add to gist Remove
	theres still a bunch from process_recording		Copy link Message link Add to gist Remove
23:03 rakugest left

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!