#moarvm on 23 September 2021 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
00:00 linkable6 joined, evalable6 joined 00:02 reportable6 left
	timo is too bed	00:05	Copy link Message link Add to gist Remove
	i'm not actually in bed	01:01	Copy link Message link Add to gist Remove
01:43 frost joined
timo	hmm, for a callsite transformation cache design, it'll have to not only be threadsafe, but ownership is also important. i guess only interned callsites are allowed to go in anyway	01:48	Copy link Message link Add to gist Remove
	drop_arg is apparently never called on an interned callsite object	02:18	Copy link Message link Add to gist Remove
	oh, or drop_arg is not used instead of drop_args	02:19	Copy link Message link Add to gist Remove
	using directly the intern cache seems to be an okay idea	02:27	Copy link Message link Add to gist Remove
02:44 linkable6 left, evalable6 left 02:49 [Coke] left 02:51 [Coke] joined
timo	MVM_callsite_drop_positionals, even when it looks through the intern cache first and tries to intern at the end if the incoming cs was interned but nothing appropriate was found in the intern cache	03:26	Copy link Message link Add to gist Remove
	is at only 0.16% as per perf		Copy link Message link Add to gist Remove
	callstack_find_topmost_dispatch_recording is at 0.16% but a little further up	03:27	Copy link Message link Add to gist Remove
	0.02% drop_positionals when there's neither the looking through the cache nor the intern attempt at the end	03:30	Copy link Message link Add to gist Remove
	all of this measured for rakudo -e ''		Copy link Message link Add to gist Remove
03:45 linkable6 joined 03:47 evalable6 joined
japhb	Exclusive times, I assume?	03:55	Copy link Message link Add to gist Remove
05:00 notable6 left, benchable6 left, nativecallable6 left, committable6 left, bisectable6 left, squashable6 left, releasable6 left, quotable6 left, bloatable6 left, unicodable6 left, evalable6 left, statisfiable6 left, greppable6 left, coverable6 left, shareable6 left, sourceable6 left, tellable6 left, linkable6 left, linkable6 joined, sourceable6 joined, tellable6 joined 05:01 coverable6 joined, squashable6 joined, unicodable6 joined, evalable6 joined 05:02 benchable6 joined, nativecallable6 joined 05:35 codesections left 05:36 codesections joined
Nicholas	good ,	05:43	Copy link Message link Add to gist Remove
	en.wikipedia.org/wiki/X86_calling_...onventions -- Microsoft x64 calling convention ... ... System V AMD64 ABI ... If the callee is a variadic function, then the number of floating point arguments passed to the function in vector registers must be provided by the caller in the AL register.	05:50	Copy link Message link Add to gist Remove
	I thought that there was also some requirement on integer arguments. Anyway, that one bites	05:51	Copy link Message link Add to gist Remove
06:01 releasable6 joined, statisfiable6 joined 06:05 reportable6 joined 07:00 quotable6 joined, notable6 joined 07:02 committable6 joined 07:24 sena_kun joined 07:57 dogbert17 left 08:00 greppable6 joined 08:01 bisectable6 joined 08:25 dogbert17 joined 08:31 discord-raku-bot left 08:32 discord-raku-bot joined 09:00 bloatable6 joined 10:01 shareable6 joined
MasterDuke	huh. on (roughly) master, i got 19s then 14s for m-test, and 130s and 126s for m-spectest	10:26	Copy link Message link Add to gist Remove
lizmat	so faster than master ?	10:38	Copy link Message link Add to gist Remove
MasterDuke	well, i was comparing to yesterday when i was on new-disp and got 19s and 19s for m-test, and 178s and 171s for m-spectest	10:40	Copy link Message link Add to gist Remove
jnthnwrthngtn	MasterDuke: Curious, I get 18s and 15s here	10:47	Copy link Message link Add to gist Remove
	No way is spectest going to be faster than master given spectest is hugely dependent on startup time.		Copy link Message link Add to gist Remove
MasterDuke	well, you do have a much faster machine. but i'm surprised my second run isn't any faster than the first		Copy link Message link Add to gist Remove
jnthnwrthngtn	Yes, that's the part I'm surprised about.	10:48	Copy link Message link Add to gist Remove
	I mean, nativecall is already pre-compiled, so it should be no trouble		Copy link Message link Add to gist Remove
	(For the second run)		Copy link Message link Add to gist Remove
MasterDuke	maybe it was just a bad hash randomization/spesh not being blocking interaction		Copy link Message link Add to gist Remove
jnthnwrthngtn	I figure that's the main slowdown		Copy link Message link Add to gist Remove
MasterDuke	timo++'s prs should help with startup/spectest, correct?	10:49	Copy link Message link Add to gist Remove
lizmat	perhaps it's an effect of my work ?	10:50	Copy link Message link Add to gist Remove
	:-)		Copy link Message link Add to gist Remove
MasterDuke	my new-disp times don't account for your commits from today, so hopefully they'll be faster next time i run it	10:51	Copy link Message link Add to gist Remove
jnthnwrthngtn	lizmat: I highly doubt the setting changes you're doing would have an impact on this, if that's what you're meaning	10:52	Copy link Message link Add to gist Remove
lizmat	ah, ok	10:53	Copy link Message link Add to gist Remove
	jnthnwrthngtn takes a look at timo++'s work	10:54	Copy link Message link Add to gist Remove
11:01 linkable6 left, evalable6 left 11:02 linkable6 joined
jnthnwrthngtn	Hm, I was going to say that fix to args tail is making an assumption that something with no capture is an arg drop, and that could be fragile if we later do a replace arg, although replace would be drops + insert... It'd take an insert multiple to be a problem I guess	11:03	Copy link Message link Add to gist Remove
MasterDuke	there are a bunch of ops where the interpreter implementation is roughly `if (REPR(foo)->ID != MVM_REPR_ID_something \|\| !IS_CONCRETE(foo)) MVM_exception_throw_adhoc(tc, msg) else <do something>`. sometimes the jit version it just <do something>, and sometimes it has that `if`	11:07	Copy link Message link Add to gist Remove
lizmat	jnthnwrthngtn: sanity check: use nqp; sub a(Range:D $a) { dd nqp::iscont($a) }; my $r = 1..3; a $ gives 1 on new-disp	11:12	Copy link Message link Add to gist Remove
	I thought they were to be deconted?		Copy link Message link Add to gist Remove
MasterDuke	is there an easy way to know if the jitted version does/does not need the repr and/or concreteness checks?	11:13	Copy link Message link Add to gist Remove
lizmat	/they were/$a is/		Copy link Message link Add to gist Remove
MasterDuke	m: use nqp; sub a(Range:D $a) { dd nqp::iscont($a) }; my $r = 1..3; a $r	11:14	Copy link Message link Add to gist Remove Run code
camelia	1		Copy link Message link Add to gist Remove
jnthnwrthngtn	lizmat: Hm, I'd have expected master to do the same given the Range constraint	11:20	Copy link Message link Add to gist Remove
	lizmat: Is the :D significant?		Copy link Message link Add to gist Remove
lizmat	no		Copy link Message link Add to gist Remove
jnthnwrthngtn	One can't rely on the caller side of have decont'd in new-disp, it just may have done it	11:21	Copy link Message link Add to gist Remove
	Oh!		Copy link Message link Add to gist Remove
	m: say Range ~~ Iterable		Copy link Message link Add to gist Remove Run code
camelia	True		Copy link Message link Add to gist Remove
jnthnwrthngtn	The container is required here because it's an Iterable		Copy link Message link Add to gist Remove
	Otherwise it would flatten		Copy link Message link Add to gist Remove
	So the signature binder re-wraps it		Copy link Message link Add to gist Remove
	So yeah, it's correct	11:22	Copy link Message link Add to gist Remove
	MasterDuke: If the C thing being called does the checks, then the interpreter (and JIT if needed) can have them removed. If it doesn't, they're needed in both		Copy link Message link Add to gist Remove
	MasterDuke: One of the reasons we'll gradually move towards syscalls, though, is that we can enforce the types outside of the C function and elide them	11:23	Copy link Message link Add to gist Remove
11:23 brrt joined
jnthnwrthngtn	lunch, bbiab	11:23	Copy link Message link Add to gist Remove
MasterDuke	ok. for some reason i thought we could assume in some cases that things were concrete and/or the right REPR when being run by the jit. but i'll make sure any checks done by the interpreter are also done in the jitted versions i'm making	11:26	Copy link Message link Add to gist Remove
lizmat	down to 1.251 / .703	11:47	Copy link Message link Add to gist Remove
12:02 reportable6 left 12:03 evalable6 joined
jnthnwrthngtn	MasterDuke: Yes, though do check the interpreter really needs them also	12:04	Copy link Message link Add to gist Remove
MasterDuke	ah, how do i check that?	12:05	Copy link Message link Add to gist Remove
jnthnwrthngtn	lizmat: With further ops converted to $ ?		Copy link Message link Add to gist Remove
	MasterDuke: Look at the C function called by the interp and see if it repeats the check		Copy link Message link Add to gist Remove
MasterDuke	oh, ha. so far i'm just creating the c funcs, so not repeating anything, but i can check the existing ones i come across	12:06	Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: edc4fb9d57 \| (Timo Paulssen)++ (committed by Jonathan Worthington) \| 7 files add dispatcher-drop-n-args to optimize allocations Instead of creating a MVMCapture and MVMCallsite for each step of removing arguments, we now offer a syscall that drops multiple arguments that live at the same index in one go. The result is that the transformations tree can now contain null entries for the capture entry, which we have to interpret and deal with.		Copy link Message link Add to gist Remove
jnthnwrthngtn	timo: I did a small change in order I could get some asserts in that will help us if we try to do further things like this	12:09	Copy link Message link Add to gist Remove
	timo: Merged the NQP and Rakudo ones as is (well, rebsaed Rakudo one)	12:13	Copy link Message link Add to gist Remove
lizmat	jnthnwrthngtn: yes	12:14	Copy link Message link Add to gist Remove
	fwiw, no noticeable change in test-t after timo's work	12:20	Copy link Message link Add to gist Remove
MasterDuke	any changes in spectest?	12:21	Copy link Message link Add to gist Remove
lizmat	I don't time that atm		Copy link Message link Add to gist Remove
MasterDuke	`t/02-rakudo/03-corekeys-6d.t .................................... Dubious, test returned 1 (wstat 256, 0x100)`, doh	12:23	Copy link Message link Add to gist Remove
lizmat	did I broke that?	12:30	Copy link Message link Add to gist Remove
jnthnwrthngtn	lizmat: It was primarily aimed at startup, and does managed to give sound couple of thousand allocations less at lesat.	12:31	Copy link Message link Add to gist Remove
	*least		Copy link Message link Add to gist Remove
MasterDuke	no, this is on my branch off of master. it was just a flap, but that test is pretty simple...	12:32	Copy link Message link Add to gist Remove
jnthnwrthngtn	ah, actually more than that, the thousands are only those one profile-compile starts measuring but there are others in NQP setup	12:33	Copy link Message link Add to gist Remove
lizmat	argh I didn't pull rakudo itself, so I missed that part of timo's work	13:03	Copy link Message link Add to gist Remove
	new timings in a mo		Copy link Message link Add to gist Remove
13:04 reportable6 joined 13:05 frost left
jnthnwrthngtn	Be sure to pull NQP too	13:05	Copy link Message link Add to gist Remove
lizmat	no noticeable change in test-t		Copy link Message link Add to gist Remove
	perl Configure.pl --force-rebuild --gen-moar=new-disp --gen-nqp=new-disp --make-install	13:06	Copy link Message link Add to gist Remove
	will do that, but not for Rakudo itself :-)		Copy link Message link Add to gist Remove
	afk for a few hours&		Copy link Message link Add to gist Remove
13:36 brrt left
jnthnwrthngtn	Just been doing some comparative measurements of master/new-disp (mostly microbenchmarks, also measuring a Cro app). We're doing a lot better at a bunch of the targetted features, of course, but also a bit better on various things that are effectively just multi/method dispatch based.	13:50	Copy link Message link Add to gist Remove
MasterDuke	very cool	13:51	Copy link Message link Add to gist Remove
jnthnwrthngtn	The Cro app gets a couple of hundred more requests per second, around 10% more.	13:52	Copy link Message link Add to gist Remove
Nicholas	./rakudo-m -Ilib t/spec/S32-list/grep.rakudo.moar		Copy link Message link Add to gist Remove
jnthnwrthngtn	What's worse, other than the obvious (startup)		Copy link Message link Add to gist Remove
Nicholas	MoarVM oops in spesh thread: Spesh: failed to fix up inline 1 () -1 -1		Copy link Message link Add to gist Remove
	that was MoarVM edc4fb9d57d245929ee5d4d013b22bef1a63bf9b		Copy link Message link Add to gist Remove
	(not sure if the rest matters)		Copy link Message link Add to gist Remove
jnthnwrthngtn	Are almost entirely I/O benchmarks (reading lines, writing lines)		Copy link Message link Add to gist Remove
Nicholas	ASAN made no comment		Copy link Message link Add to gist Remove
jnthnwrthngtn	The reason for the I/O ones is seemingly that OSR does not function	13:54	Copy link Message link Add to gist Remove
MasterDuke	ah. that might explain why my spesh log processing one-liner is slower		Copy link Message link Add to gist Remove
jnthnwrthngtn	This is due to a7c0cc8d2b, which is a bug fix		Copy link Message link Add to gist Remove
	Somewhere on the I/O path we do a role composition, and do a `ctx` op. We actually only want it for the current context, not for traversal.	13:55	Copy link Message link Add to gist Remove
Nicholas	my assertion failure was with rakudo back at 9c587d92d0cdb2aa86c2ca70ed15b5c478443b02 -- Use new dispatcher-drop-n-args syscall	13:56	Copy link Message link Add to gist Remove
jnthnwrthngtn	However, we don't have a way to indicate that, and so it assumes it's wanted for traversal and marks everything in the caller chain		Copy link Message link Add to gist Remove
	We then are unable to OSR		Copy link Message link Add to gist Remove
	m: say 0.6814 / 0.4247	13:57	Copy link Message link Add to gist Remove Run code
camelia	1.604427		Copy link Message link Add to gist Remove
Nicholas	"obviously" (it seems to be, as a poor quality teddy bear) that the brute force solution to this is a second op that is just "the current context". But is there a better way?		Copy link Message link Add to gist Remove
jnthnwrthngtn	A 60% slowdown. That's not nice.	13:58	Copy link Message link Add to gist Remove
	So that probably needs a solution		Copy link Message link Add to gist Remove
	The other case that I don't have an explnation for yet is an object creation benchmark	13:59	Copy link Message link Add to gist Remove
MasterDuke	is `ctxlexpad` sort of "the current context"?		Copy link Message link Add to gist Remove
jnthnwrthngtn	ctxlexpad turns out to be the identity function :/	14:01	Copy link Message link Add to gist Remove
	I suspect it hadn't used to be		Copy link Message link Add to gist Remove
	Nowadays the thing from ctx is just directly indexable for the current context	14:02	Copy link Message link Add to gist Remove
Nicholas	"in spesh thread" - this might be the first "win" from commit 998ea76a17cb8dbafc6dc392d15d40a487d236c3	14:04	Copy link Message link Add to gist Remove
14:04 linkable6 left
jnthnwrthngtn	Nicholas: I've been happy about that at least a copule of times before recently	14:05	Copy link Message link Add to gist Remove
	s/before//		Copy link Message link Add to gist Remove
14:05 linkable6 joined
Nicholas	ah OK. It's the first that I noticed. I slack more.	14:05	Copy link Message link Add to gist Remove
	(a lot more)		Copy link Message link Add to gist Remove
timo	hm looks like the discord bridge works only one-way at the moment	14:12	Copy link Message link Add to gist Remove
	cdn.discordapp.com/attachments/633...004322.jpg	14:20	Copy link Message link Add to gist Remove
jnthnwrthngtn	Nicholas: I just did a spectest with blocking/nodelay to verify my change to get OSR back and also see that inline fixup exception	15:01	Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: 6f2b01c275 \| (Jonathan Worthington)++ \| 8 files Introduce non-traversable contexts These are for when we will only read the lexicals of the exact frame we obtained it in, and thus can avoid marking the whole callstack up as needing caller position information preserved.	15:06	Copy link Message link Add to gist Remove
	MoarVM/new-disp: baf1423327 \| (Jonathan Worthington)++ \| 3 files Be more precise about OSR caller positions There are two situations in which we set the caller info needed flag: one when we throw an exception and want to produce a backtrace, and another when we need to do context introspection. Only the latter is in absolute need of accurate position information, and thus must poison OSR. This, together with non-traversable contexts, lets us get OSR back in various situations, including some common cases of I/O, fixing a performance regression relative to `master`.		Copy link Message link Add to gist Remove
Nicholas	running that spectest with a non-ASAN build with valgrind produced quite a bit of excitement at optimize_bb_switch (optimize.c:2299) and optimize_bb_switch (optimize.c:2280)	15:07	Copy link Message link Add to gist Remove
	Conditional jump or move depends on uninitialised value(s)		Copy link Message link Add to gist Remove
	and once at at 0x4B82E97: build_cfg (graph.c:487)		Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: ea63d91730 \| (Jonathan Worthington)++ \| src/core/ext.c Fix uninitialized read in spesh graph building	15:09	Copy link Message link Add to gist Remove
jnthnwrthngtn	Nicholas: The second of those was easy, the first I've spent a while trying to figure out and can't		Copy link Message link Add to gist Remove
	(A while before now, that is)		Copy link Message link Add to gist Remove
Geth	MoarVM: MasterDuke17++ created pull request #1550: Add '.new()' suggestion to type object errors		Copy link Message link Add to gist Remove
Nicholas	jnthnwrthngtn: the ones you can't figure out - is there (at least) a short(er) way to trigger then?	15:11	Copy link Message link Add to gist Remove
	them		Copy link Message link Add to gist Remove
jnthnwrthngtn	Nicholas: I didn't repro them, just went through the code involved a few times	15:12	Copy link Message link Add to gist Remove
	Hm, or if I did it didn't give me any extra clues...		Copy link Message link Add to gist Remove
Geth	MoarVM/attempt_use_intern_cache_for_drop_positionals: 23ebdab1ce \| (Timo Paulssen)++ \| src/core/callsite.c If possible, use the intern cache for transforms Doesn't actually seem faster than allocating them every time we do transformations. I have only measured using an empty raku program, however, since I was hoping to make startup cheaper.	15:18	Copy link Message link Add to gist Remove
timo	ah, yes. sometimes you ctx, but sometimes you ctxn't	15:19	Copy link Message link Add to gist Remove
jnthnwrthngtn	Well, there really are no annotations for inline 1 in the spesh graph...	15:21	Copy link Message link Add to gist Remove
MasterDuke	i just switched to new-disp and pulled all three repos, built, and ran two `make m-test m-spectest`. got 22s and 19s for m-test, and 176s and 171s for m-spectest	15:22	Copy link Message link Add to gist Remove
Nicholas	timo: reason why you spotted that PyPy blog post and I didn't - it's not on the front page.	15:24	Copy link Message link Add to gist Remove
timo	that's odd	15:25	Copy link Message link Add to gist Remove
Nicholas	not totally.		Copy link Message link Add to gist Remove
timo	you think it's not "common interest" or whatever?		Copy link Message link Add to gist Remove
Nicholas	I didn't dig into how they made the side, but it looked like it might have been that it required "manual" work to update the front page. (No idea if that's a script to bake a new front page, or what)	15:26	Copy link Message link Add to gist Remove
	I think that this was oversight. But I failed to be helpful and try to create a decent bug report		Copy link Message link Add to gist Remove
MasterDuke	hm, does look like maybe my spesh log processing one-liner is a bit faster after that OSR fix though...	15:27	Copy link Message link Add to gist Remove
timo	that's the code that uses -n that you mentioned the other day, yes?	15:30	Copy link Message link Add to gist Remove
MasterDuke	yeah		Copy link Message link Add to gist Remove
jnthnwrthngtn	OK, I figured out the inline fixup bug and it's terrible	15:31	Copy link Message link Add to gist Remove
Nicholas	um, like "headdesk, how did I make that mistake?" or "oh, erk, this is gnarly to get right?"	15:32	Copy link Message link Add to gist Remove
jnthnwrthngtn	It occurs when all of the following happens:		Copy link Message link Add to gist Remove
	1. We are doing a nested inline	15:33	Copy link Message link Add to gist Remove
	2. The thing we are inlining, which has its own inlines, has an inline that shrank to zero instructions		Copy link Message link Add to gist Remove
	3. The annotations about it end up on an sp_bindcomplete, which we delete as part of inlining		Copy link Message link Add to gist Remove
	It processes the annotations on the bindcomplete instruction and fixes them up. We then delete said instruction. The annotations then move onto the next instruction so as not to get lost.	15:34	Copy link Message link Add to gist Remove
	We then fix them up again		Copy link Message link Add to gist Remove
	Making them bogus		Copy link Message link Add to gist Remove
timo	and then they bug us		Copy link Message link Add to gist Remove
MasterDuke	am i correct in thinking that if possible, it's better to jit something via writing some asm in emit.dasc than moving it to a function and calling that from the interpreter and the jit?	15:37	Copy link Message link Add to gist Remove
timo	we're essentially making the same trade-off the compiler does when deciding whether to inline a given function	15:39	Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: 75560fd2ec \| (Jonathan Worthington)++ \| src/spesh/inline.c Correct deletion of sp_bindcomplete We cannot do it immediately, as annotation motion might cause us to fix up the same annotation twice, which is wrong. Thus do the deletion after all fixups of annotations are completed.	15:40	Copy link Message link Add to gist Remove
timo	if the code to Do The Thing is about as short as the stuff to call the function and the parts of the function that deal with being called, then we can probably prefer emit.dasc		Copy link Message link Add to gist Remove
jnthnwrthngtn	Nicholas: That seems to do it.		Copy link Message link Add to gist Remove
MasterDuke	i'm going to guess they are here github.com/MoarVM/MoarVM/blob/mast...3010-L3027	15:45	Copy link Message link Add to gist Remove
jnthnwrthngtn	With OSR reinstated we now beat master at the I/O benchmarks, and have caught up with Ruby in a "write a million lines of utf-8" one	15:49	Copy link Message link Add to gist Remove
timo	in theory spesh could put markcode* into the repr-speshed ops and allow MVMCode REPR to optimize it into sp_get_i* and sp_bind_i* or whatever	15:50	Copy link Message link Add to gist Remove
jnthnwrthngtn	timo: Yes, I'd already figured we want something like that, just didn't quite figure out how	15:51	Copy link Message link Add to gist Remove
	(As in, a nice way to factor it)		Copy link Message link Add to gist Remove
timo	getstaticcode and gedcodecuid could also		Copy link Message link Add to gist Remove
	do you mean how i described it isn't that nice way to factor it?	15:53	Copy link Message link Add to gist Remove
japhb	jnthnwrthngtn: Nice to hear we're caught up with Ruby on that benchmark, but where is Ruby on the utf-8 I/O efficiency scale? Is this a major achievement?		Copy link Message link Add to gist Remove
MasterDuke	somehow i missed that comma 2021.08 was released, i'll have to give its profile viewer a try	15:54	Copy link Message link Add to gist Remove
jnthnwrthngtn	japhb: More efficient than Python, less than Perl.	15:55	Copy link Message link Add to gist Remove
	japhb: Although I should add: less than recent Perl.		Copy link Message link Add to gist Remove
	(I think there were UTF-8 I/O speedups there)	15:56	Copy link Message link Add to gist Remove
	japhb: Major achievement only really in so far as our I/O handle impl and coordination of encoding is all in Raku, whereas I suspect in Ruby one ends up far more quickly in C code.	15:57	Copy link Message link Add to gist Remove
	OK, so about the object creation benchmark we've lost something on from master: profiler says we JIT 99.95% of frames on master but 36.35% of frames on new-disp. 4% less inlining.	15:58	Copy link Message link Add to gist Remove
timo	oof		Copy link Message link Add to gist Remove
jnthnwrthngtn	Though oddly, I can't find any "bailed completely" in the spesh log	15:59	Copy link Message link Add to gist Remove
	lesson, bbl	16:01	Copy link Message link Add to gist Remove
timo	are there any prof_enter that should have become enterspesh in the spesh log?		Copy link Message link Add to gist Remove
MasterDuke	that sounds like exactly what a profile of my one-liner shows	16:02	Copy link Message link Add to gist Remove
16:02 AlexDaniel left, psydroid left 16:04 AlexDaniel joined 16:14 psydroid joined
japhb	jnthnwrthngtn: Ah, interesting re: UTF-8 I/O efficiency. That all gives me good context, thanks.	16:28	Copy link Message link Add to gist Remove
MasterDuke	timo: show can i know if a prof_enter should have been prof_enterspesh? if it's in the 'after'?	16:30	Copy link Message link Add to gist Remove
timo	yeah		Copy link Message link Add to gist Remove
MasterDuke	they're all in the 'Before', don't see in an 'After'	16:36	Copy link Message link Add to gist Remove
timo	OK		Copy link Message link Add to gist Remove
16:36 Altai-man joined 16:37 Altai-man left
MasterDuke	any other ideas?	16:45	Copy link Message link Add to gist Remove
timo	i'd perhaps perf record and see if there's actually a big portion of samples in interp_run rather than jitted frames which would be identified from having the perf map on	16:47	Copy link Message link Add to gist Remove
	except i've seen a boatload of 0x000000asdfgh frames in perf report results as well even with the perf map turned on	16:48	Copy link Message link Add to gist Remove
	MasterDuke: could you give me your -n code right quick? i thought i had it but i don't	16:59	Copy link Message link Add to gist Remove
MasterDuke	raku -ne 'BEGIN my (%h, $f); if .starts-with(q\|Spesh of \|) and /^"Spesh of " $<func>=(<-[\ ]>+)/ { $f = ~$<func> } elsif .contains(q\|JIT: bailed completely because of <\|) and /"JIT: bailed completely because of <" $<op>=(<-[>]>+)/ { %h{q\|l_\|~$<op>}.push($f) } elsif .contains(q\|expr bail: Cannot get template for: \|) and /"expr bail: Cannot get		Copy link Message link Add to gist Remove
	template for: " $<op>=(\w+)/ { %h{q\|t_\|~$<op>}.push($f) }; END for %h.keys.sort -> $k { say qq\|$k: %h{$k}.Bag()\| }'		Copy link Message link Add to gist Remove
timo	haha that's long		Copy link Message link Add to gist Remove
MasterDuke	23.75% MVM_interp_run	17:00	Copy link Message link Add to gist Remove
	10.01% MVM_string_utf8_decodestream		Copy link Message link Add to gist Remove
	yeah, guess i could pull those strings out into a variable	17:01	Copy link Message link Add to gist Remove
	of course i only duplicated them when it was too slow with just the regex	17:03	Copy link Message link Add to gist Remove
	of course i only duplicated them when it was too slow with just the regex	17:04	Copy link Message link Add to gist Remove
timo	hehe.	17:05	Copy link Message link Add to gist Remove
MasterDuke	23% interp_run is way higher than usual, seems to suggest stuff actually isn't getting jitted. but why can't we tell why?	17:07	Copy link Message link Add to gist Remove
timo	<anon> from -e:1 here has 1.3 mega entries and 0% jit	17:08	Copy link Message link Add to gist Remove
	hum.	17:11	Copy link Message link Add to gist Remove
	there's no complete bail		Copy link Message link Add to gist Remove
	but it does say "jit not successful"		Copy link Message link Add to gist Remove
jnthnwrthngtn	oh	17:13	Copy link Message link Add to gist Remove
timo	it does succeed jitting in my non-profiled version here	17:14	Copy link Message link Add to gist Remove
	wonder what's wrong there		Copy link Message link Add to gist Remove
jnthnwrthngtn	I'd missed the "jit was not sucessful"		Copy link Message link Add to gist Remove
timo	oh, is it normal to have more than one return_o in a resulting frame?		Copy link Message link Add to gist Remove
jnthnwrthngtn	timo: "resulting"?	17:16	Copy link Message link Add to gist Remove
	It's OK for there to be more than one return_o in general		Copy link Message link Add to gist Remove
timo	"After:"		Copy link Message link Add to gist Remove
jnthnwrthngtn	Oh	17:17	Copy link Message link Add to gist Remove
	Well, did the before have it?	17:18	Copy link Message link Add to gist Remove
timo	ok i searched further, there's more than one -e:1 and the longer one is also not usccessfully jitted without profile		Copy link Message link Add to gist Remove
nine	Darn.... the "Type check failed for return value; expected CompUnit::Handle:D but got BOOTIO (BOOTIO)" is still here. Will have a look at this on Saturday I guess		Copy link Message link Add to gist Remove
jnthnwrthngtn	nine: I was gonna see how new-disp did on agrammon and it also blew up with that		Copy link Message link Add to gist Remove
nine	Oooh...so it's not just this one application.	17:19	Copy link Message link Add to gist Remove
	Gives hope for a reduced test case		Copy link Message link Add to gist Remove
jnthnwrthngtn	Agrammon is kinda the opposite of a reduced test case, but I dunno how big your application is :D		Copy link Message link Add to gist Remove
nine	tree says 47 directories, 201 files		Copy link Message link Add to gist Remove
jnthnwrthngtn	Hm, it may be smaller	17:20	Copy link Message link Add to gist Remove
	(Agrammon, that is)		Copy link Message link Add to gist Remove
	Wonder if it's a deopt-o		Copy link Message link Add to gist Remove
nine	I guess the deciding factor is just: load tons of modules so load-precomp-file gets speshed		Copy link Message link Add to gist Remove
	not sensitive to inlining for a change	17:21	Copy link Message link Add to gist Remove
jnthnwrthngtn	I couldn't imagine there would be as many ways to screw up deopt as I've managed to create...		Copy link Message link Add to gist Remove
nine	MVM_JIT_DISABLE=1 MVM_SPESH_OSR_DISABLE=1 MVM_SPESH_PEA_DISABLE=1 MVM_SPESH_INLINE_DISABLE=1 and its still there		Copy link Message link Add to gist Remove
jnthnwrthngtn	timo: I shoved in some debug prints and it turns out that we make a JIT graph but fail to compile it	17:22	Copy link Message link Add to gist Remove
	nine ought to start making dinner though		Copy link Message link Add to gist Remove
jnthnwrthngtn	timo: And it happens without profiling, so I suspsect that really is the problem		Copy link Message link Add to gist Remove
timo	right, i see it regardless of profiling or not as well		Copy link Message link Add to gist Remove
	so somewhere in the final jitting step it's failing but not bailing	17:23	Copy link Message link Add to gist Remove
jnthnwrthngtn	I made a nice batch of potato salad yesterday and so dinner preparation is easy today :)		Copy link Message link Add to gist Remove
timo	if you want i'll reverse-step in rr to see where exactly it stops		Copy link Message link Add to gist Remove
jnthnwrthngtn	I was trying to do exactly that, set a breakpoint on printf, and it didn't hit it, wat.	17:25	Copy link Message link Add to gist Remove
17:25 Xliff joined
timo	we vsprintf	17:26	Copy link Message link Add to gist Remove
	that may not go through printf		Copy link Message link Add to gist Remove
jnthnwrthngtn	ah	17:34	Copy link Message link Add to gist Remove
	ah, with MVM_JIT_DEBUG I get:		Copy link Message link Add to gist Remove
	JIT ERROR: Negative offset for dynamic label 18		Copy link Message link Add to gist Remove
	Ohh. With MVM_JIT_EXPR_DISABLE=1 it goes away	17:43	Copy link Message link Add to gist Remove
	And we get 99.9% JIT		Copy link Message link Add to gist Remove
	So it's apparently about the expression JIT	17:44	Copy link Message link Add to gist Remove
	Though despite that it still doesn't really get back all the perf...		Copy link Message link Add to gist Remove
	Ah, the bigger discrepancy may well be that `new` doesn't get inlined	17:46	Copy link Message link Add to gist Remove
	Ah, just a "bytecode too large". Guess I need to look at why	17:48	Copy link Message link Add to gist Remove
	But first, food		Copy link Message link Add to gist Remove
dogbert17	so, it's time for brrt to make an appearance ...	18:01	Copy link Message link Add to gist Remove
18:02 reportable6 left 18:05 reportable6 joined
MasterDuke	wow, with MVM_JIT_EXPR_DISABLE=1 i get 12.36% MVM_string_utf8_decodestream and 9.25% MVM_interp_run	18:13	Copy link Message link Add to gist Remove
Xliff	Anybody wanna play around on a 64-core Google Engine instance?	18:38	Copy link Message link Add to gist Remove
	Looks like Raku may have a parallelism bug ... or maybe my script does.	18:39	Copy link Message link Add to gist Remove
	Working script conks out when parallel compiling my p6-ICal project with no reason.		Copy link Message link Add to gist Remove
	I have no idea what to look for.		Copy link Message link Add to gist Remove
	If no takers, I'll pack it up and shut it down. It's costing me $3/hour		Copy link Message link Add to gist Remove
	I'll check again in a couple of hours		Copy link Message link Add to gist Remove
Nicholas	I sugest that you shut it down for now, assuming it doesn't cost you lots more than $3 to set it all up again	18:40	Copy link Message link Add to gist Remove
	I'm going AFK soon, and I think most folks are not really awake		Copy link Message link Add to gist Remove
[Coke]	I'm in the right time zone, but also am zonking.		Copy link Message link Add to gist Remove
Nicholas	Thanks to the GCC compiler farm I have access to 32 core x86_64 machines. Which, sure, aren't 64 cores. But aren't costing me.	18:42	Copy link Message link Add to gist Remove
	(and insane PPC machines, I think thanks to IBM. But if the bug is in the JIT, they won't help)		Copy link Message link Add to gist Remove
Xliff	I just tried at concurrency level 32 and the hangup did NOT occur.	18:50	Copy link Message link Add to gist Remove
	I will try my whole set of projects at level 32 and see what I get! ;q	18:51	Copy link Message link Add to gist Remove
Nicholas	jnthnwrthngtn: yes, you fixed the failure in t/spec/S32-list/grep.rakudo.moar	18:54	Copy link Message link Add to gist Remove
	(forgot to confirm)		Copy link Message link Add to gist Remove
19:05 linkable6 left, evalable6 left 19:06 evalable6 joined
timo	i wonder if i should try exposing "percentage of calls that aren't jitted because they were inlined calls from a non-jitted frame" in moarperf?	19:09	Copy link Message link Add to gist Remove
	at the moment you can open the "callers" table in the routines list, then you see one with 99.4% inlined, 0.0757% jitted and one with - inlined but 99.5% jit	19:10	Copy link Message link Add to gist Remove
MasterDuke	huh, could be useful	19:14	Copy link Message link Add to gist Remove
timo	we don't have separate nodes in the call graph for inlined calls vs regular calls, so we can't "follow" inlined calls to the original inliner so to speak	19:16	Copy link Message link Add to gist Remove
	anybody feel like we should maybe statically determine what branches have not been taken at all and throwing them out of our spesh graphs and put an unconditional deopt there?	19:37	Copy link Message link Add to gist Remove
nine	timo: sounds like it help with inlining by getting the bytecode size under the limit. Also doesn't sound like something very common?	19:45	Copy link Message link Add to gist Remove
timo	can search for "never dispatched"		Copy link Message link Add to gist Remove
	it's common for subs that have a path that throws an exception in some cases		Copy link Message link Add to gist Remove
	like division that has to check for zero for example	19:46	Copy link Message link Add to gist Remove
jnthnwrthngtn	Exception paths often aren't taken and could indeed be handled with deopt	20:00	Copy link Message link Add to gist Remove
	And yeah, lack of inline cache entries is a really good hint.	20:01	Copy link Message link Add to gist Remove
	We don't even have to record branch stats that way		Copy link Message link Add to gist Remove
timo	since we have a dispatch_* every few meters now anyway ... :) :)	20:03	Copy link Message link Add to gist Remove
jnthnwrthngtn	Indeed :)	20:05	Copy link Message link Add to gist Remove
	Hm, this is weird. `method new` is not getting any type tuples recorded in its stats	20:06	Copy link Message link Add to gist Remove
	(Mu.new, that is)	20:07	Copy link Message link Add to gist Remove
	omg	20:17	Copy link Message link Add to gist Remove
	I missed updating a spot in the spesh stats code for the change to the way named parameters are handled	20:21	Copy link Message link Add to gist Remove
	As a result, we lost type info for everything with named args	20:22	Copy link Message link Add to gist Remove
timo	:D		Copy link Message link Add to gist Remove
jnthnwrthngtn	Anyway, that gets things much better :)	20:31	Copy link Message link Add to gist Remove
	Will do a blocking + nodelay test to make sure the extra optortunities don't shake out new problems		Copy link Message link Add to gist Remove
Geth	MoarVM/new-disp: 7d3cba4e2d \| (Jonathan Worthington)++ \| src/spesh/stats.c Correct handling of named arg type stats This wasn't updated for the new calling conventions, and thus we would consider type tuples with named arguments to have incomplete type info, and thus specialize them suboptimally.	20:47	Copy link Message link Add to gist Remove
21:37 jgaz joined
MasterDuke	whoops	21:58	Copy link Message link Add to gist Remove
22:11 jgaz left
jnthnwrthngtn	Good for my private benchmark set though; it's caught two regressions that have both been fixed today.	22:13	Copy link Message link Add to gist Remove
22:15 jgaz joined
jnthnwrthngtn	I'm content so far as performance goes with the merge now. The startup hit is the only significant regression I'm aware of. Guess we'll see what blin's verdict is.	22:16	Copy link Message link Add to gist Remove
MasterDuke	i still see the problem where my script is only 55% jitted	22:19	Copy link Message link Add to gist Remove
22:19 jgaz left
jnthnwrthngtn	Was it more JItted on master?	22:23	Copy link Message link Add to gist Remove
	And does the rate go up with MVM_JIT_EXPR_DISABLE=1 ?		Copy link Message link Add to gist Remove
MasterDuke	yes, it looks like the rate goes up with MVM_JIT_EXPR_DISABLE=1. i didn't get a profile like that, but interp_run goes from ~23% to ~10%	22:24	Copy link Message link Add to gist Remove
jnthnwrthngtn	OK, probably it's the thing I discovered earlier then		Copy link Message link Add to gist Remove
lizmat	down to 1.241 / .719 # all within noise, but feels a bit faster		Copy link Message link Add to gist Remove
jnthnwrthngtn	MasterDuke: If you run MVM_JIT_DEBUG=1 does it spit out a message about a negative label?	22:25	Copy link Message link Add to gist Remove
MasterDuke	let me see...	22:26	Copy link Message link Add to gist Remove
	JIT ERROR: Negative offset for dynamic label 185		Copy link Message link Add to gist Remove
	JIT ERROR: Negative offset for dynamic label 65		Copy link Message link Add to gist Remove
jnthnwrthngtn	That's the one		Copy link Message link Add to gist Remove
	I do wonder if we can isolate it to a particular template		Copy link Message link Add to gist Remove
	Also wonder how widespread this is	22:27	Copy link Message link Add to gist Remove
	Ah, I see a bunch of them during the Rakudo build if I set it while doing that	22:28	Copy link Message link Add to gist Remove
	Several of them in Test::CSV too	22:29	Copy link Message link Add to gist Remove
MasterDuke	ha. profile with no env variables is 2mb. profile with MVM_JIT_EXPR_DISABLE=1 is 11mb		Copy link Message link Add to gist Remove
	and 89% jitted instead of 55%		Copy link Message link Add to gist Remove
	both have 35k deopts	22:30	Copy link Message link Add to gist Remove
	whoops, without is 44% jitted, not 55%		Copy link Message link Add to gist Remove
jnthnwrthngtn	Yeah, this is worth trying to hunt down.	22:33	Copy link Message link Add to gist Remove
MasterDuke	interesting. on master, the profile says it's 20s slower (than new-disp with MVM_JIT_EXPR_DISABLE=1), but 93% jitted instead of 89%. still ~33k deopts	22:42	Copy link Message link Add to gist Remove
jnthnwrthngtn	MasterDuke: What's the wallclock times on master/new-disp?	22:51	Copy link Message link Add to gist Remove
MasterDuke	master is ~58s	22:52	Copy link Message link Add to gist Remove
	building new-disp...	22:54	Copy link Message link Add to gist Remove
timo	were you able to rr it by breakpointing vsnprintf or whatever, jnthnwrthngtn?	22:55	Copy link Message link Add to gist Remove
MasterDuke	new-disp is ~55s	22:56	Copy link Message link Add to gist Remove
	new-disp + MVM_JIT_EXPR_DISABLE=1 is ~45s		Copy link Message link Add to gist Remove
	hot damn	22:57	Copy link Message link Add to gist Remove
jnthnwrthngtn	MasterDuke: Ah, so the issue not that new-disp is slower, but that it should be even faster. OK, that's a nice problem. :)		Copy link Message link Add to gist Remove
	(I'd misunderstood it as new-disp being slower)	22:58	Copy link Message link Add to gist Remove
MasterDuke	a couple days ago it was ~15s slower, so there have been some good improvements recently		Copy link Message link Add to gist Remove
timo	worst case ever would be: when this problem from exprjit happens, start again from the start but turn exprjit off, either completely, or after the given BB or ins or whatever that caused trouble		Copy link Message link Add to gist Remove
jnthnwrthngtn	timo: I just put a breakpoint on the line where printf was. I did trace it back a bit further, but then spotted there's a jit debug option, ran with that, and it told me where it was bailing out	22:59	Copy link Message link Add to gist Remove
timo	i wonder if i can find anything interesting by finding the exact spot where it happens tho	23:00	Copy link Message link Add to gist Remove
jnthnwrthngtn	At a wild guess, we create a label but never emit it, so it never gets fixed up by dynasm		Copy link Message link Add to gist Remove
timo	did JIT_DEBUG also spit out the graphs and tile lists in the spesh log? perhaps that only comes after the error, so wouldn't show anything in our troubled case	23:01	Copy link Message link Add to gist Remove
jnthnwrthngtn	Hm, not sure...I think that's another option?		Copy link Message link Add to gist Remove
	The error is really late, fwiw		Copy link Message link Add to gist Remove
	It's storing the labels produced by dynasm into the spesh candidate and notices a negative one	23:02	Copy link Message link Add to gist Remove
timo	yeah, after all the nodes have been put down, which includes one node for each exprjit graph		Copy link Message link Add to gist Remove
jnthnwrthngtn	Yeah, we've even produced machine code by that point		Copy link Message link Add to gist Remove
	I wonder if labels get negative offsets before dynasm emits code and fixes them up as it does so	23:03	Copy link Message link Add to gist Remove
	Thus the "not emitted label" theory		Copy link Message link Add to gist Remove
	I don't know the expr jit well enough to know how plausible/likely that is		Copy link Message link Add to gist Remove
timo	right. same, really	23:06	Copy link Message link Add to gist Remove
23:07 linkable6 joined
timo	13: (branch (label $name))	23:13	Copy link Message link Add to gist Remove
	and then the label is nowhere to be seen!		Copy link Message link Add to gist Remove
	there is only (label $name) and (label :fail) in the tile list logs? ok that just means that's the tile that implements a piece of the tree, that's why it says $name there	23:16	Copy link Message link Add to gist Remove
	but it's definitely missing another appearance of the (label ...) tile	23:17	Copy link Message link Add to gist Remove
	it's a bit tricky to navigate the huge spesh logs we have, especially when there's thousands and thousands of lines just for updating stats but no specializations	23:23	Copy link Message link Add to gist Remove
	i'll print out some pointers or something to help me find the spot where actually a thing happens	23:25	Copy link Message link Add to gist Remove
	yeah i don't actually know where to look here to see what's going on	23:51	Copy link Message link Add to gist Remove
	i can dump the compiled bytecode before it tries to do the dynamic label fixup	23:53	Copy link Message link Add to gist Remove
	but i think i still need brrt to make sense of this problem	23:57	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!