#moarvm on 7 October 2021 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
00:02 reportable6 left 00:04 reportable6 joined 03:32 linkable6 left, evalable6 left 06:02 reportable6 left 06:06 reportable6 joined 07:34 linkable6 joined 07:35 evalable6 joined 07:56 cognominal_ joined 08:00 cognominal left
Nicholas	good *, #moarvm	08:14	Copy link Message link Add to gist Remove
MasterDuke	releasable6: status	08:23	Copy link Message link Add to gist Remove
releasable6	MasterDuke, Next release in ≈15 days and ≈10 hours. 3 blockers. Changelog for this release was not started yet		Copy link Message link Add to gist Remove
	MasterDuke, Details: gist.github.com/5403529993f6bb901d...8fabfc4930		Copy link Message link Add to gist Remove
MasterDuke	the last two blockers might've already been fixed?	08:25	Copy link Message link Add to gist Remove
	any objections to merging github.com/MoarVM/MoarVM/pull/1555 ?		Copy link Message link Add to gist Remove
Nicholas	I'm not competant to review it, so I can't usefully comment. (But obviously, d'oh, I can't really object either. Which was your actual question)	08:38	Copy link Message link Add to gist Remove
MasterDuke	those commits have had quite a large number of spectests run, with no (new) problems. however, if people want to wait until after the release since we did already have the large new-disp merge that's fine	08:42	Copy link Message link Add to gist Remove
08:55 MasterDuke left 09:35 MasterDuke joined
jnthnwrthngtn	moarning o/	09:41	Copy link Message link Add to gist Remove
Nicholas	\o		Copy link Message link Add to gist Remove
jnthnwrthngtn	MasterDuke: I think 15 days is plenty of time to shake out issues, and running with JIT disabled is a good way to see if any issues might relate to them.	09:42	Copy link Message link Add to gist Remove
	MasterDuke: I assume you've done spectest with blocking + nodelay also?		Copy link Message link Add to gist Remove
MasterDuke	no, but i can run that now		Copy link Message link Add to gist Remove
jnthnwrthngtn	OK, do nqp and rakudo build and test with that; if no regressions in those, I'd say merge it.	09:47	Copy link Message link Add to gist Remove
MasterDuke	wow, i don't usually run full spectests with those. so much slower!	10:05	Copy link Message link Add to gist Remove
Geth	MoarVM/master: 16 commits pushed by (Daniel Green)++, Unknown++, MasterDuke17++ review: github.com/MoarVM/MoarVM/compare/6...33aef886e7	10:09	Copy link Message link Add to gist Remove
MasterDuke	i guess probably a good time for nqp+rakudo bumps to help with any bisecting if needed	10:11	Copy link Message link Add to gist Remove
lizmat	shall I do the honours then?		Copy link Message link Add to gist Remove
MasterDuke	sure	10:12	Copy link Message link Add to gist Remove
lizmat	2021.09-624-ge733aef88 # wow, that's a high number of commits since the release :-)		Copy link Message link Add to gist Remove
	hmmm... not sure if it's something to do with my MBP, but test-t times appear to have almost doubled for me?	10:34	Copy link Message link Add to gist Remove
jnthnwrthngtn	lizmat: Hm, can you isolate it to a particular change?	10:36	Copy link Message link Add to gist Remove
lizmat	it was a few days ago since I last did it... :-(		Copy link Message link Add to gist Remove
	I thought: let's run it again, see if MasterDuke's changes helped		Copy link Message link Add to gist Remove
MasterDuke	there haven't been all that many changes after the new-disp merge, right? so mine probably caused it?	10:39	Copy link Message link Add to gist Remove
jnthnwrthngtn	oh, gah, I was about to say "I don't see much change" but was running the MQTT test instead of test-t	10:40	Copy link Message link Add to gist Remove
lizmat	MasterDuke: I'm not sure		Copy link Message link Add to gist Remove
	please let someone else confirm my numbers		Copy link Message link Add to gist Remove
	it could well be something on my machine...		Copy link Message link Add to gist Remove
	seems I have a Spotlight indexing run atm	10:42	Copy link Message link Add to gist Remove
	will check again in 30 mins	10:43	Copy link Message link Add to gist Remove
	yeah, it was something local:	11:22	Copy link Message link Add to gist Remove
	1.105 as a new lowest for me		Copy link Message link Add to gist Remove
	sorry for the noise	11:23	Copy link Message link Add to gist Remove
11:30 sena_kun joined
MasterDuke	does anybody have any idea how to diagnose/debug why the expr jit currently can make things slower?	11:43	Copy link Message link Add to gist Remove
lizmat	what was the way to disable it again?	11:46	Copy link Message link Add to gist Remove
MasterDuke	MVM_JIT_EXPR_DISABLE=1		Copy link Message link Add to gist Remove
	but, uh, i now get a segv in that mqtt test if i disable it	11:47	Copy link Message link Add to gist Remove
	Thread 1 "raku" received signal SIGSEGV, Segmentation fault.		Copy link Message link Add to gist Remove
	0x00007ffff78db71b in compose (tc=0x55555555a110, st=0x48000000c8ec8148, info_hash=0x7fffefa0dde8) at src/6model/reprs/P6opaque.c:691		Copy link Message link Add to gist Remove
	691 if (st->REPR_data)		Copy link Message link Add to gist Remove
lizmat	lowest test-t with expr jit disabled: 1.043	11:49	Copy link Message link Add to gist Remove
	m: say 1.105 / 1.043		Copy link Message link Add to gist Remove Run code
camelia	1.059444		Copy link Message link Add to gist Remove
lizmat	so 5% faster ?		Copy link Message link Add to gist Remove
MasterDuke	and i just jitted newtype, newmixintype, and composetype (which calls compose in the emit.dasc implementation i added)		Copy link Message link Add to gist Remove
	think i see the problem	11:51	Copy link Message link Add to gist Remove
lizmat	ah?		Copy link Message link Add to gist Remove
Geth	MoarVM: a6ff2c031b \| (Daniel Green)++ \| src/jit/x64/emit.dasc Fix segfault in lego jit of composetype FUNCTION is aliased with TMP5, so TMP5 was being overwritten and that meant we were getting the wrong STABLE later.	11:56	Copy link Message link Add to gist Remove
12:02 reportable6 left 12:03 reportable6 joined
lizmat	MasterDuke: another bump warranted ?	12:03	Copy link Message link Add to gist Remove
MasterDuke	it's unlikely that people are running with the expr jit disabled (the template for composetype is fine), so i wouldn't say it's vital, but it couldn't hurt	12:05	Copy link Message link Add to gist Remove
jnthnwrthngtn	MasterDuke: It'd help to figure out if the slowdown we see is either a) because the machine code produced is worse, or b) because we spend more time producing said machine code, and so spend more time interpreting		Copy link Message link Add to gist Remove
	If it's b) then we'd expect to see the difference fade away by increasing the amount of time the benchmark runs for.		Copy link Message link Add to gist Remove
lizmat	is it easy to switch off the actual deployment of machine code ?		Copy link Message link Add to gist Remove
	to find out how much overhead it is	12:06	Copy link Message link Add to gist Remove
jnthnwrthngtn	lizmat: Don't know of an easy way. We can probably somewhat see the effect in profiles of MoarVM though (by looking at functions involved in the expr JIT)		Copy link Message link Add to gist Remove
	The spesh log also has times taken to JIT things.	12:07	Copy link Message link Add to gist Remove
	We could grep those out and sum them		Copy link Message link Add to gist Remove
lizmat	ah... but is that all of jitting, or just the expr jit ?	12:08	Copy link Message link Add to gist Remove
	*jit		Copy link Message link Add to gist Remove
jnthnwrthngtn	All	12:09	Copy link Message link Add to gist Remove
	But you could still compare the numbers with it enabled and disabled		Copy link Message link Add to gist Remove
MasterDuke	just built everything and ran all tests with the expr jit disabled, no problems	12:10	Copy link Message link Add to gist Remove
jnthnwrthngtn	I can believe there's a sitaution where the machine code produced is worse, but taking longer to produce the machine code in the first place is worth investigating.		Copy link Message link Add to gist Remove
	Analyzing what's going on if code quality is worse will be much harder, so it'd be better to not do that if it's not really to blame.	12:11	Copy link Message link Add to gist Remove
	bbi10	12:12	Copy link Message link Add to gist Remove
MasterDuke	637077us total for with the expr jit	12:19	Copy link Message link Add to gist Remove
	344700us total for without the expr jit		Copy link Message link Add to gist Remove
	from a spesh log of running the mqtt test	12:20	Copy link Message link Add to gist Remove
	874 instances of 'JIT was successful and compilation took' with the expr jit	12:22	Copy link Message link Add to gist Remove
	870 instances without the expr jit		Copy link Message link Add to gist Remove
	longest individual time with the expr jit was 26658us	12:23	Copy link Message link Add to gist Remove
	longest individual time without the expr jit was 18840us		Copy link Message link Add to gist Remove
dogbert17	I have a program which runs in 34s, without the expr-jit it's 26s		Copy link Message link Add to gist Remove
lizmat	test-t on a 20x larger files shows with / without expr jit ENABLED as: 15.020 / 14.447	12:25	Copy link Message link Add to gist Remove
	so even on a longer running process, not using expr jit is faster		Copy link Message link Add to gist Remove
MasterDuke	dogbert17: that would seem to indicate bad code being generated. can you check the compilation times in spesh logs		Copy link Message link Add to gist Remove
lizmat	which to me indicates the generated code is not an advantage?		Copy link Message link Add to gist Remove
MasterDuke	the routine that took the longest to compile with the expr jit was 'lexical_vars_to_locals'	12:27	Copy link Message link Add to gist Remove
	316 BBs, Frame size: 9180 bytes (1568 from inlined frames), Specialization took 39093us (total 71578us), Bytecode size: 43270 byte		Copy link Message link Add to gist Remove
jnthnwrthngtn	m: say 15.020 / 14.447	12:39	Copy link Message link Add to gist Remove Run code
camelia	1.039662		Copy link Message link Add to gist Remove
jnthnwrthngtn	So around 4% rather than 5% after some time, so we could maybe interpret that as "compilation time is a factor but not the dominating one"	12:40	Copy link Message link Add to gist Remove
	dogbert17: That's a really interesting case. Are there any indications in a spesh log of JIT being unsuccessful?		Copy link Message link Add to gist Remove
	Or alternatively can profile and see percent JITted or not	12:41	Copy link Message link Add to gist Remove
	lizmat: Percent JITted in a comparative profile of test-t is also interesting, also any difference in deopt rates.	12:44	Copy link Message link Add to gist Remove
lizmat	feels all within noise levels for the standard test-t run	12:49	Copy link Message link Add to gist Remove
	only significant difference I see is 2 On Stack Replacements with the expr jit enabled, and none with it disabled	12:50	Copy link Message link Add to gist Remove
jnthnwrthngtn	Hm, and deopts?		Copy link Message link Add to gist Remove
lizmat	both 7 deopts		Copy link Message link Add to gist Remove
	and no global deopts		Copy link Message link Add to gist Remove
jnthnwrthngtn	Curious.		Copy link Message link Add to gist Remove
	No smoking gun there, then.	12:51	Copy link Message link Add to gist Remove
	Although the extra OSRs are a little curious		Copy link Message link Add to gist Remove
lizmat	oddly enough, disabling the expr jit results in more jit compiled frames		Copy link Message link Add to gist Remove
	98.19% with disabled, 98..08% enabled	12:52	Copy link Message link Add to gist Remove
	but that feels like noise		Copy link Message link Add to gist Remove
jnthnwrthngtn	That's frames in the dynamic sense, not the static one, so it's showing that we spend more time before the JITted version is available	12:54	Copy link Message link Add to gist Remove
	I'd expect its repeatedly observable rather than noise, but it's also a small effect.	12:55	Copy link Message link Add to gist Remove
MasterDuke	dogbert17: can you share that program?		Copy link Message link Add to gist Remove
dogbert17	jnthnwrthngtn: I can check		Copy link Message link Add to gist Remove
jnthnwrthngtn	So is another hint we're looking at a machine code quality issue	12:56	Copy link Message link Add to gist Remove
	(I asked about deopts in case there's a bug in the expr JIT guard generation that sees us deopt in cases we should not.)		Copy link Message link Add to gist Remove
	(But no evidence so far.)		Copy link Message link Add to gist Remove
MasterDuke	fwiw, 39 'JIT was not successful' with the expr jit, 47 without (still for the mqtt test)	12:57	Copy link Message link Add to gist Remove
dogbert17	Masterduke, jnthnwrthngtn: since I'm a nice guy :) I'll share the code. gist.github.com/dogbert17/7099a67e...3b6ac0a08f	12:58	Copy link Message link Add to gist Remove
12:58 brrt joined
dogbert17	hello brrt	13:01	Copy link Message link Add to gist Remove
	# [012] dispatch not compiled: op MVMDispOpcodeBindFailureToResumption NYI	13:02	Copy link Message link Add to gist Remove
brrt	ohai dogbert17	13:03	Copy link Message link Add to gist Remove
dogbert17	we're trying to figure out why a program runs faster when the expr jit is turned off		Copy link Message link Add to gist Remove
brrt	ah, that's... a good thing	13:05	Copy link Message link Add to gist Remove
	and the answer is 'we don't have a benchmarking suite'		Copy link Message link Add to gist Remove
	or if we do, we don't have a systematic way to run it	13:06	Copy link Message link Add to gist Remove
dogbert17	this is a bizarre case, from 34s with exprjit to 26s without		Copy link Message link Add to gist Remove
	in case you're intrigued the src gist is about ten lines up in the irc log	13:08	Copy link Message link Add to gist Remove
brrt	I am		Copy link Message link Add to gist Remove
	(I am also chronically short in time)		Copy link Message link Add to gist Remove
jnthnwrthngtn	dogbert17: Thanks for the script; I can reproduce the difference too (29.9 with, 22.3 without)	13:11	Copy link Message link Add to gist Remove
dogbert17	cool but now I'm getting envious of your hardware :)		Copy link Message link Add to gist Remove
brrt	hmmm... if it were bad code generation, that much worse?	13:13	Copy link Message link Add to gist Remove
	there's an obvious fix though		Copy link Message link Add to gist Remove
	disable the expression JIT :-)		Copy link Message link Add to gist Remove
	imo the register allocator is suspect...	13:15	Copy link Message link Add to gist Remove
	and consider; the expr jit needs to be 'clever' about function calls, the lego jit does not		Copy link Message link Add to gist Remove
13:17 sena_kun left
MasterDuke	interesting, i can't repro the time difference	13:24	Copy link Message link Add to gist Remove
jnthnwrthngtn	Uhhh...did I mess something up or does --profile make the difference vanish?		Copy link Message link Add to gist Remove
	Or at least produce identical profiles	13:25	Copy link Message link Add to gist Remove
dogbert17	FWIW, there are two 'JIT was not successful and compilation took 123us' when the exprjit is enbled but three such messages when it's disabled	13:28	Copy link Message link Add to gist Remove
MasterDuke	what about sum of jit compilation times?	13:29	Copy link Message link Add to gist Remove
dogbert17	normal, i.e. with exprjit I get 133756 and without 42342	13:30	Copy link Message link Add to gist Remove
MasterDuke	so almost triple, but the actual time it took wouldn't explain the runtime difference	13:32	Copy link Message link Add to gist Remove
dogbert17	and as jnthnwrthngtn wrote above, the profiles look remarkably similar	13:34	Copy link Message link Add to gist Remove
MasterDuke	what about perf, does it show any noticeable differences?	13:35	Copy link Message link Add to gist Remove
dogbert17	MasterDuke: strange that you couldn't repro though		Copy link Message link Add to gist Remove
MasterDuke	it does look like i'm seeing a difference now, it's just pretty small. ~24s with expr jit, ~22.5 without	13:38	Copy link Message link Add to gist Remove
	wild thought, but what if you clear out your precomp directory? i just had to do that to fix a problem after i tested building nqp/rakudo with the expr jit disabled	13:41	Copy link Message link Add to gist Remove
jnthnwrthngtn	This is a bit odd: if I make it 400 rather than 500 then the difference is pretty small	13:44	Copy link Message link Add to gist Remove
dogbert17	if I run the program normally I get many runs taking 27s (more or less the same as with exprjit disabled) but all of a sudden runtime jumps to 34s	13:45	Copy link Message link Add to gist Remove
	what could cause the runtime to differ so much between executions		Copy link Message link Add to gist Remove
	and no, my system isn't loaded		Copy link Message link Add to gist Remove
jnthnwrthngtn	m: say 6.085 / 5.228		Copy link Message link Add to gist Remove Run code
camelia	1.163925		Copy link Message link Add to gist Remove
jnthnwrthngtn	m: say 29.2 / 22.3	13:46	Copy link Message link Add to gist Remove Run code
camelia	1.309417		Copy link Message link Add to gist Remove
jnthnwrthngtn	m: say 1.070 / 0.837		Copy link Message link Add to gist Remove Run code
camelia	1.278375		Copy link Message link Add to gist Remove
brrt	that is very odd yes	13:47	Copy link Message link Add to gist Remove
MasterDuke	dogbert17: what if you disable hash randomization and/or run with spesh blocking?		Copy link Message link Add to gist Remove
dogbert17	MasterDuke: let me try with spesh blocking	13:48	Copy link Message link Add to gist Remove
	with MVM_SPESH_BLOCKING=1 all runs, 10 atm, takes 28s	13:54	Copy link Message link Add to gist Remove
	is it just a coincidence	13:55	Copy link Message link Add to gist Remove
MasterDuke	i'm seeing about the same 1.5s difference with MVM_SPESH_BLOCKING=1	13:57	Copy link Message link Add to gist Remove
	and still the same if i disable hash randomization	14:04	Copy link Message link Add to gist Remove
jnthnwrthngtn	Did a callgrind run; 88,387,766,589 IR with expr JIT, 81,491,356,897 without	14:06	Copy link Message link Add to gist Remove
dogbert17	ha, perf top shows that when a run is slow MVM_fixed_size_alloc is on top of the chart, when the program suddenly run fast it's in like third or fourth place	14:07	Copy link Message link Add to gist Remove
	jnthnwrthngtn: about ten percent difference		Copy link Message link Add to gist Remove
	nah; i was mistaken MVM_fixed_size_alloc is always on top regardless	14:08	Copy link Message link Add to gist Remove
MasterDuke	so now we just dump all generated machine code for with/without and compare, should just take a min or two, right?	14:11	Copy link Message link Add to gist Remove
jnthnwrthngtn	The callgrind output is a bit odd	14:12	Copy link Message link Add to gist Remove
	It shows 53.45% under MVM_jit_code_enter with the expr JIT and 71.55 without it	14:13	Copy link Message link Add to gist Remove
	And oddly 35 million calls to MVM_frame_dispatch with it, 41 million calls without?	14:15	Copy link Message link Add to gist Remove
	!		Copy link Message link Add to gist Remove
dogbert17	how is that possible?	14:16	Copy link Message link Add to gist Remove
jnthnwrthngtn	40 million calls to dispatch_monomorphic with expr JIT, only 21 million without		Copy link Message link Add to gist Remove
brrt	that is indeed very, very odd		Copy link Message link Add to gist Remove
jnthnwrthngtn	That last one is...wat		Copy link Message link Add to gist Remove
	4285 calls to deopt_one with, 9,128 without	14:17	Copy link Message link Add to gist Remove
	None of this makes ense		Copy link Message link Add to gist Remove
MasterDuke	would it be any easier to debug this before the new-disp merge?		Copy link Message link Add to gist Remove
jnthnwrthngtn	*sense		Copy link Message link Add to gist Remove
	Other question: did this discrepancy exist before the new-disp merge?		Copy link Message link Add to gist Remove
MasterDuke	i'm pretty sure we were talking about it before the merge. don't remember if everybody was on the branch though	14:18	Copy link Message link Add to gist Remove
	who has a 2021.09 lying around...		Copy link Message link Add to gist Remove
dogbert17	jnthnwrthngtn: I believe that it did		Copy link Message link Add to gist Remove
	although I'm not 100% certain	14:19	Copy link Message link Add to gist Remove
MasterDuke	shareable6: 2021.09	14:20	Copy link Message link Add to gist Remove
shareable6	MasterDuke, whateverable.6lang.org/2021.09		Copy link Message link Add to gist Remove
MasterDuke	so it's much slower overall with ^^^, and the number vary quite a bit	14:30	Copy link Message link Add to gist Remove
	but with expr jit the numbers were consistently ~42s. without had greater variation, as low as 33s once, but usually ~40s	14:32	Copy link Message link Add to gist Remove
dogbert17	this is so bizarre	14:42	Copy link Message link Add to gist Remove
MasterDuke	www.youtube.com/watch?v=C2cMG33mWVY	14:43	Copy link Message link Add to gist Remove
brrt	:-D	15:08	Copy link Message link Add to gist Remove
timo	oh, could yall tr comenting out reprops from jit/graph.c	15:13	Copy link Message link Add to gist Remove
	since the exprjit doesn't do devirt of reprops et, removing that from the lego jit could get us an idea how much we save from that feature		Copy link Message link Add to gist Remove
	i can't work right now, a cat is sitting right in front of monitor making the bottom half prett much unusable	15:14	Copy link Message link Add to gist Remove
MasterDuke	just comment out the cases in consume_reprop()?	15:15	Copy link Message link Add to gist Remove
timo	i'm not sure if that causes trouble	15:16	Copy link Message link Add to gist Remove
	actually, there's one spot in consume_reprop where we can turn devirt off		Copy link Message link Add to gist Remove
	by making sure the facts near the top don't give us the type		Copy link Message link Add to gist Remove
	so just null it out or skip looking at the facts or something		Copy link Message link Add to gist Remove
Nicholas	timo: the cat doesn't have some sort of icon you can use to minimise it? Or it does, but your mouse is scared of it?	15:17	Copy link Message link Add to gist Remove
timo	cdn.discordapp.com/attachments/557...715493.jpg	15:19	Copy link Message link Add to gist Remove
MasterDuke	ok, commented out all but the default case at the top of consume_reprop() and ran dogbert17's script with MVM_SPESH_BLOCKING=1	15:24	Copy link Message link Add to gist Remove
	~25s with expr jit, ~27s without	15:25	Copy link Message link Add to gist Remove
timo	praise the devirtualization		Copy link Message link Add to gist Remove
MasterDuke	so without slows down by ~4-5s	15:26	Copy link Message link Add to gist Remove
jnthnwrthngtn	I compared MVM_SPESH_INLINE_LOG output between the two of them and there are some curious differences there	15:27	Copy link Message link Add to gist Remove
	For example:		Copy link Message link Add to gist Remove
	-Can inline slip-all (1003) with bytecode size 180 into push-all (2091)		Copy link Message link Add to gist Remove
	-Can inline push (4911) with bytecode size 28 into push-all (2091)		Copy link Message link Add to gist Remove
	+Can NOT inline slip-all (1003) with bytecode size 416 into push-all (2091): no spesh candidate available and bytecode too large to produce an inline		Copy link Message link Add to gist Remove
	+Can inline unspecialized push (4911) with bytecode size 124 into push-all (2091)		Copy link Message link Add to gist Remove
	Notice how the dependent things aren't specialized yet in the second case		Copy link Message link Add to gist Remove
timo	hm, max stack depth getting updated at unlucky spots during deep recursion?	15:28	Copy link Message link Add to gist Remove
jnthnwrthngtn	Maybe yes, given that's the sort order	15:29	Copy link Message link Add to gist Remove
	Turning on the spesh log seems to hide the issue though	15:30	Copy link Message link Add to gist Remove
timo	does spesh blocking help for that particular part of the issue?	15:31	Copy link Message link Add to gist Remove
jnthnwrthngtn	It gets me a much smaller difference	15:33	Copy link Message link Add to gist Remove
	Which is perhaps the reprops one you just mentioned?		Copy link Message link Add to gist Remove
	But is much smaller in magnitude than the whole difference		Copy link Message link Add to gist Remove
	So it seems the repr ops thing is one part of it		Copy link Message link Add to gist Remove
	But also that the spesh thread working for longer causes longer periods where we don't record stats, in turn leading to instability	15:35	Copy link Message link Add to gist Remove
timo	true	15:36	Copy link Message link Add to gist Remove
jnthnwrthngtn	I note that this code probably does gather/take and wonder if that makes issues more likely	15:38	Copy link Message link Add to gist Remove
	I wonder what'd happen if we did something like blocking in normal execution, except only do it when we've run out of log buffers	15:39	Copy link Message link Add to gist Remove
15:39 brrt left
jnthnwrthngtn	So we get concurrent specialization and execution to a point	15:39	Copy link Message link Add to gist Remove
	But stop and wait if we get too ahead	15:40	Copy link Message link Add to gist Remove
	Hm, quick impl of that fixes it	15:44	Copy link Message link Add to gist Remove
MasterDuke	fixes it == you don't see a speedup disabling expr jit?	15:46	Copy link Message link Add to gist Remove
jnthnwrthngtn	Uhh...I thought so but in fact it only makes it less likely, so there must be something about log boundary handling that makes it interesting.	15:52	Copy link Message link Add to gist Remove
	oops, gotta go for lesson, bbiab	16:01	Copy link Message link Add to gist Remove
16:20 ilogger2 left 16:30 brrt joined 16:32 ilogger2 joined 17:30 rypervenche left
nine	Ok, got NativeCall callbacks up and running :)	17:42	Copy link Message link Add to gist Remove
lizmat	whee!		Copy link Message link Add to gist Remove
	nine++	17:43	Copy link Message link Add to gist Remove
brrt	\o/	17:49	Copy link Message link Add to gist Remove
Nicholas	"dispatch all the things"	17:51	Copy link Message link Add to gist Remove
17:53 squashable6 left 17:57 squashable6 joined
brrt	so, if I get it correctly, the current hypothesis is that it's reprops which are slow in expr JIT	17:58	Copy link Message link Add to gist Remove
	... did we by any chance have an optimization there, that we don't have in the expr jit		Copy link Message link Add to gist Remove
	timotimo: do I recall correctly that you had a devirtualization for reprops in the lego jit but not in the expr jit?	17:59	Copy link Message link Add to gist Remove
	timo: ^		Copy link Message link Add to gist Remove
MasterDuke	well, lego jit does stuff like github.com/MoarVM/MoarVM/blob/mast...#L799-L856 but the template for atpo_i is just github.com/MoarVM/MoarVM/blob/mast...1041-L1050	18:02	Copy link Message link Add to gist Remove
18:02 reportable6 left
timo	correct, the exprjit only has a tiny start of devirt in a branch, but it's erally just a new repr method that the jit calls	18:05	Copy link Message link Add to gist Remove
nine	Ok, fixed the segfault in t/04-nativecall/00-misc.t which actually wasn't because of the dispatcher but was a pre-existing issue with cloned native subs and serialization. Of course no idea why this hasn't been an issue so far.	18:13	Copy link Message link Add to gist Remove
18:14 brrt left
MasterDuke	m: my int @a = (^100); my int $b; for ^10_000_000 -> int $i { $b = @a[$i % 64] }; say now - INIT now; say $b	18:29	Copy link Message link Add to gist Remove Run code
camelia	0.409680642 63		Copy link Message link Add to gist Remove
MasterDuke	m: my int @a = (^100); my int $b; my int $i = (^100).pick; say $i; for ^10_000_000 { $b = @a[$i] }; say now - INIT now; say $b		Copy link Message link Add to gist Remove Run code
camelia	52 1.77087144 52		Copy link Message link Add to gist Remove
nine	jnthnwrthngtn: I think there's a bug in pass-decontainerized: my $track-arg := nqp::dispatch('boot-syscall', 'dispatcher-track-arg', $args, $i); runs only if in the first run the arg is in a Scalar container. But what if it is not and instead is in one in a following run? Then no guard would trigger and we wouldn't run the dispatcher again and wouldn't decontainerize.		Copy link Message link Add to gist Remove
MasterDuke	^^^ seems very counterintuitive		Copy link Message link Add to gist Remove
	huh. postcircumfix:<[]> is the third most expensive for both according to a profile, and has essential the same time. but <unit> is twice as long for the version with .pick and <anon> is also longer (both are 1 and 2 when sorted by exclusive time)	18:35	Copy link Message link Add to gist Remove
	ha. mod version enters 19k frames, but the pick version enters 50m	18:38	Copy link Message link Add to gist Remove
nine	jnthnwrthngtn: also it's missing a nqp::dispatch('boot-syscall', 'dispatcher-guard-type', $track-arg); in any case!	18:42	Copy link Message link Add to gist Remove
	Sadly just restoring tc->stack_top after a callback doesn't seem to be enough.	18:46	Copy link Message link Add to gist Remove
	rakudo: src/core/callstack.c:472: MVM_callstack_unwind_frame: Assertion `(char *)tc->stack_top < tc->stack_current_region->alloc' failed.	18:47	Copy link Message link Add to gist Remove
camelia	5===SORRY!5=== Error while compiling <tmp> Confused at <tmp>:1 ------> 3src/core/callstack.c:7⏏05472: MVM_callstack_unwind_frame: Asserti expecting any of: colon pair		Copy link Message link Add to gist Remove
19:04 brrt joined 19:05 reportable6 joined
brrt	then I think that's a direction to investigate	19:05	Copy link Message link Add to gist Remove
MasterDuke	ha	19:21	Copy link Message link Add to gist Remove
	m: my int @a = (^100); my int $b; my Int $i = (^100).pick; say $i; for ^50_000_000 -> int $n { $b = @a[$i] }; say now - INIT now; say $b		Copy link Message link Add to gist Remove Run code
camelia	88 1.496132246 88		Copy link Message link Add to gist Remove
MasterDuke	m: my int @a = (^100); my int $b; my int $i = (^100).pick; say $i; for ^50_000_000 -> int $n { $b = @a[$i] }; say now - INIT now; say $b		Copy link Message link Add to gist Remove Run code
camelia	59 8.831049123 59		Copy link Message link Add to gist Remove
MasterDuke	i noticed a `inline-preventing instruction: getlexref_i` in the spesh log of that ^^^ second version	19:23	Copy link Message link Add to gist Remove
	m: my int @a = (^100); my int $b; my int $c; for ^50_000_000 -> int $i { $c = $i % 64; $b = @a[$c] }; say now - INIT now; say $b # and now we can make the mod version slow	19:24	Copy link Message link Add to gist Remove Run code
camelia	8.875664986 63	19:25	Copy link Message link Add to gist Remove
MasterDuke	really a dramatic difference		Copy link Message link Add to gist Remove
	timo: weren't you talking recently about how to do better with lexrefs?	19:26	Copy link Message link Add to gist Remove
19:33 brrt left
timo	more like how we have to do better :P	19:54	Copy link Message link Add to gist Remove
21:24 leont left, tbrowder left, SmokeMachine left 21:26 SmokeMachine joined 21:28 tbrowder joined 21:38 leont joined
jnthnwrthngtn	nine: When one does track-attr, it implies both a type and concreteness guard on the thing we're reading from.	22:13	Copy link Message link Add to gist Remove
	nine: Since it actually doesn't store attribute name/class handle, just an offset		Copy link Message link Add to gist Remove
	If we add explicit guards they are deduplicated, but it's wasteful.	22:14	Copy link Message link Add to gist Remove
	(Wasteful to make the two syscalls when the track-attr one does the same job anyway)		Copy link Message link Add to gist Remove
23:29 squashable6 left
jnthnwrthngtn	The experiment to make spesh only somewhat concurrent seems to be a failure: there's a (somewhat mitigatable) startup penalty, 10% spectest time penalty, a minor but negative effect across microbenchmarks...and to top it off, it doesn't even reliably fix the instability anyway.	23:29	Copy link Message link Add to gist Remove
	A change to how stack depth is tracked provides a slightly improved chance of the triangle number script running in a better time with expr JIT enabled. Increasing the spesh buffer sizes has a bigger chance of doing that, but we still sometimes see the worse result.	23:32	Copy link Message link Add to gist Remove
	However, the latter two make it clear that discrepancies in timing and log buffer send points between runs (likely aided by hash randomization) are a dominating factor.	23:34	Copy link Message link Add to gist Remove
	The expr JIT probably does carry some "blame" (the repr op devirt missing), but it seems the primary factor is that it being enabled causes us to fill spesh log buffers, stop logging for a while, and end up with some problems as a result.	23:35	Copy link Message link Add to gist Remove
	(Where the problems seem to be sub-optimal specialization, perhaps due to wrong ordering)		Copy link Message link Add to gist Remove
	Sleep time, will poke at it a bit more tomorrow.	23:38	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!