#moarvm on 19 March 2018 - Raku Programming Language Log

Geth	MoarVM: a017f61472 \| (Samantha McVey)++ \| docs/ChangeLog Remove some unneeded items from the ChangeLog	00:25	Copy link Message link Add to gist Remove
	MoarVM: 752879fcd6 \| (Samantha McVey)++ \| tools/release.sh Include 3rdparty/cmp in the release .tar.gz		Copy link Message link Add to gist Remove
	MoarVM: b88799fb6a \| (Samantha McVey)++ \| VERSION Release 2018.03		Copy link Message link Add to gist Remove
MasterDuke	samcv++	00:39	Copy link Message link Add to gist Remove
02:56 ilbot3 joined 03:05 dogbert11 joined 04:47 releasable6 joined 06:53 brrt joined 07:04 domidumont joined 07:10 domidumont joined 07:28 brrt joined, robertle joined 09:39 AlexDaniel joined 10:48 dalek joined, synopsebot joined, p6lert joined, Geth joined, SourceBaby joined 10:49 releasable6 joined 10:52 scovit joined 11:58 dogbert2_ joined 12:57 AlexDaniel joined 13:47 greppable6 joined 13:57 domidumont joined 14:54 domidumont joined 15:43 zakharyas joined 16:44 domidumont joined 17:12 unicodable6 joined 17:34 domidumont joined
dogbert11	any objections for closing github.com/MoarVM/MoarVM/issues/791 ?	18:02	Copy link Message link Add to gist Remove
timotimo	no objections	18:17	Copy link Message link Add to gist Remove
dogbert11	closed :)	18:24	Copy link Message link Add to gist Remove
	timotimo: I have a question for you		Copy link Message link Add to gist Remove
timotimo	please go ahead	18:25	Copy link Message link Add to gist Remove
dogbert11	I have a program with a loop like this		Copy link Message link Add to gist Remove
	my int $max = 2_000_000; for (2..($max div 2)) -> int $i { ...		Copy link Message link Add to gist Remove
	the ($max div 2) could be exchanged for the constant 1_000_000	18:26	Copy link Message link Add to gist Remove
	doing that cuts runtime with 50%, does that seem reasonable?		Copy link Message link Add to gist Remove
	or do you need to see the entire script (quite short btw)	18:27	Copy link Message link Add to gist Remove
timotimo	we don't yet do something like that on anything that doesn't have a compile-time-known value, and "my int $max" isn't a constant so it could be changed	18:28	Copy link Message link Add to gist Remove
dogbert11	so we retrieve the value of max and do the div calc each iteration?	18:29	Copy link Message link Add to gist Remove
	gist.github.com/dogbert17/b864cc4d...c85f0f76e3	18:30	Copy link Message link Add to gist Remove
timotimo	no, the range object gets created once and then iterated over	18:31	Copy link Message link Add to gist Remove
	the difference is when we find a range with constant end points in a for loop we turn it into a loop loop		Copy link Message link Add to gist Remove
dogbert11	these loop loops seems to be a lot faster	18:34	Copy link Message link Add to gist Remove
timotimo	yes	18:38	Copy link Message link Add to gist Remove
	they don't have to go through pull-one calls at all, they just do the calculations inline immediately		Copy link Message link Add to gist Remove
	oh, huh, there's a getspeshslot here followed by an prof_allocated	18:43	Copy link Message link Add to gist Remove
	i wonder how that came to be, that seems wrong		Copy link Message link Add to gist Remove
	a simple for loop benchmark shows that the pull-one calls are entirely inlined, that's good	18:45	Copy link Message link Add to gist Remove
	haha, oh that's precious		Copy link Message link Add to gist Remove
	somehow we forgot to rewrite a prof_enterspesh into a prof_enterinlined or what have you	18:46	Copy link Message link Add to gist Remove
	leading to <unit> calling <unit> in the call graph (rather than more correctly adding the calls to pull-one)		Copy link Message link Add to gist Remove
	aha, we expect the enterspesh op to be in an exact position, which perhaps isn't the case here	18:47	Copy link Message link Add to gist Remove
dogbert11	how can we figure that out?	18:49	Copy link Message link Add to gist Remove
timotimo	indeed, the null instructions go there now	18:53	Copy link Message link Add to gist Remove
	which also means we're dropping tiny amounts of wallclock time on the floor		Copy link Message link Add to gist Remove
	excellent.	18:58	Copy link Message link Add to gist Remove
dogbert11	it is ?	18:59	Copy link Message link Add to gist Remove
timotimo	so it'd appear that ever since we added the "create null instructions to make object registers clean" feature inlining has been b0rked in the profiler :D		Copy link Message link Add to gist Remove
	but here's the fix		Copy link Message link Add to gist Remove
dogbert11	cool	19:00	Copy link Message link Add to gist Remove
Geth	MoarVM: ed4ed0e947 \| (Timo Paulssen)++ \| src/debug/debugserver.c metadata for ReentrantMutex and Semaphore		Copy link Message link Add to gist Remove
	MoarVM: 63348dce08 \| (Timo Paulssen)++ \| src/spesh/graph.c insert null-out-instructions before prof_enter this caused us to miss the prof_enterspesh instruction when inlining because we expected it to be the first instruction of the second BB.		Copy link Message link Add to gist Remove
timotimo	Calls (Inlined)		Copy link Message link Add to gist Remove
	9176 + 90824 (90.82%)		Copy link Message link Add to gist Remove
dogbert11	should this change affect program speed ?	19:01	Copy link Message link Add to gist Remove
timotimo	no		Copy link Message link Add to gist Remove
dogbert11	but the profiler will be happy :)	19:02	Copy link Message link Add to gist Remove
timotimo	yup	19:06	Copy link Message link Add to gist Remove
	profile files may also become a bit smaller	19:07	Copy link Message link Add to gist Remove
	right now you'd get one "wrong" self-call for every frame that has inlines, and the self-call would have inside itself all calls that any inlined function would have done	19:08	Copy link Message link Add to gist Remove
19:09 FROGGS joined
timotimo	snrt, the race-is-prime profile is still >50 megs	19:11	Copy link Message link Add to gist Remove
dogbert11	:(	19:22	Copy link Message link Add to gist Remove
timotimo	did you see the thing with the call graph getting deeper and deeper?		Copy link Message link Add to gist Remove
dogbert11	in the profiler you mean	19:23	Copy link Message link Add to gist Remove
timotimo	well, you can see it in the profiler		Copy link Message link Add to gist Remove
	but the corresponding stack traces match up		Copy link Message link Add to gist Remove
	thing is, when we await and get resumed by the Thread Pool Scheduler later, it seems like we keep adding frames to the call stack, where perhaps we should be re-using previous frames or something?	19:25	Copy link Message link Add to gist Remove
	if this is actually the case, we'll keep growing our memory usage, and GCs will get a little slower over time as well	19:27	Copy link Message link Add to gist Remove
	hack.p6c.org/~timo/exponential_framecounts.png	19:28	Copy link Message link Add to gist Remove
	this is a random screenshot i took the other day		Copy link Message link Add to gist Remove
	oh, no it isn't		Copy link Message link Add to gist Remove
dogbert11	looking at the profile it seems as if postcircumfix [] took most of the time, 58%	19:34	Copy link Message link Add to gist Remove
timotimo	what's your code?	19:40	Copy link Message link Add to gist Remove
19:41 bisectable6 joined
dogbert11	the gist above	19:43	Copy link Message link Add to gist Remove
timotimo	ah	19:44	Copy link Message link Add to gist Remove
	yeah, postcircumfix:<[ ]> can be expensive; try replacing it with ASSIGN-POS and measure again	19:45	Copy link Message link Add to gist Remove
dogbert11	the line with $max div 2 comes second with 24%		Copy link Message link Add to gist Remove
timotimo	that'd be the block that starts on that line		Copy link Message link Add to gist Remove
dogbert11	ah, of course		Copy link Message link Add to gist Remove
timotimo	it spends about a third of its time in postcircumfix:<[ ]> indeed	19:46	Copy link Message link Add to gist Remove
dogbert11	I'm on 32 bit atm, that might possibly account for the difference		Copy link Message link Add to gist Remove
timotimo	oh, huh, ASSIGN-POS is actually a bunch slower	19:48	Copy link Message link Add to gist Remove
	even though ASSIGN-POS is 99.98% inlined into postcircumfix:<[ ]>, it only accounts for 46% of time spent	19:49	Copy link Message link Add to gist Remove
dogbert11	I get the impression that something odd is going on, can't put my finger on it though		Copy link Message link Add to gist Remove
	dogbert11 now profiling the fast version , i.e. '$max div 2' => '1_000_000'	19:51	Copy link Message link Add to gist Remove
	in the fast version postcircumfix [] takes 40% followed by ASSIGN-POS with 22%	19:55	Copy link Message link Add to gist Remove
timotimo	anyway, we're getting the Int candidate rather than int for the assignment here	19:56	Copy link Message link Add to gist Remove
	you get a few % speedup if you @a[...] = (my int $ = 1) instead of @a[...] = 1	19:58	Copy link Message link Add to gist Remove
dogbert11	timotimo: this line 'my int @a = (0..$max);' is this array setup with a single malloc?		Copy link Message link Add to gist Remove
timotimo	likely, check out if it calls push-all	19:59	Copy link Message link Add to gist Remove
dogbert11	can't see it	20:00	Copy link Message link Add to gist Remove
timotimo	if i see it right, we're taking like 300 msec to get it stored		Copy link Message link Add to gist Remove
	it'd be under a call to STORE		Copy link Message link Add to gist Remove
dogbert11	STORE is there	20:01	Copy link Message link Add to gist Remove
timotimo	one store is from native_array, i believe that's the line you pasted		Copy link Message link Add to gist Remove
dogbert11	ok, cool	20:02	Copy link Message link Add to gist Remove
timotimo	the other STORE– the one from regular array, is probably from the grep at the end		Copy link Message link Add to gist Remove
	yes, indeed, it calls into <anon> on line 15		Copy link Message link Add to gist Remove
dogbert11	here's an odd thing. I changed the main loop to 'for (2..10) -> int $i {' in my last profile and removed the the grep line at the end		Copy link Message link Add to gist Remove
	still according to the profiler, there are 3857926 calls to ASSIGN-POS (looking at the entries column)	20:03	Copy link Message link Add to gist Remove
	and equally many entries for postcircumfix[]	20:04	Copy link Message link Add to gist Remove
	I must be misunderstanding what the 'entries' column mean		Copy link Message link Add to gist Remove
timotimo	so only 2..10 runs of the outer, yeah?	20:05	Copy link Message link Add to gist Remove
	don't forget that you'll still go from 0 to $max with your inner loop		Copy link Message link Add to gist Remove
dogbert11	ah, got it, missed that one (oops)		Copy link Message link Add to gist Remove
	yup, that was it	20:06	Copy link Message link Add to gist Remove
timotimo	huh, what the hell is this bytecode in Int:D + Int:D	20:07	Copy link Message link Add to gist Remove
	am i looking at this right? it looks so wrong ?!?!	20:08	Copy link Message link Add to gist Remove
	i mean it only takes 0.41% of total run time		Copy link Message link Add to gist Remove
	gist.github.com/timo/ed27b750840ed...37ba4f925f		Copy link Message link Add to gist Remove
	in the "after" section, it's grabbing the same attribute from the object (i assume this is unboxing the bigint from inside the Int object) into r5 and r8	20:09	Copy link Message link Add to gist Remove
	and then it overwrites both r8 and r5 with the same thing from the other argument		Copy link Message link Add to gist Remove
dogbert11	it does look strange	20:10	Copy link Message link Add to gist Remove
timotimo	i should use my trusty "trace spesh optimization" script for that	20:11	Copy link Message link Add to gist Remove
	but i hvae to go grocery shopping so we can has some dinner		Copy link Message link Add to gist Remove
dogbert11	dinner is important :)		Copy link Message link Add to gist Remove
	thx for the help though	20:12	Copy link Message link Add to gist Remove
	at least you found a bug :)		Copy link Message link Add to gist Remove
timotimo	yup!	21:07	Copy link Message link Add to gist Remove
	graphviz is still fiddling around with the 104k nodes callgraph from hyper is-prime	21:08	Copy link Message link Add to gist Remove
lizmat	And another Perl 6 Weekly hits the Net: p6weekly.wordpress.com/2018/03/19/...y-edument/	22:30	Copy link Message link Add to gist Remove
22:47 notable6 joined 23:22 Kaiepi joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!