Geth MoarVM: a017f61472 | (Samantha McVey)++ | docs/ChangeLog
Remove some unneeded items from the ChangeLog
00:25
MoarVM: 752879fcd6 | (Samantha McVey)++ | tools/release.sh
Include 3rdparty/cmp in the release .tar.gz
MoarVM: b88799fb6a | (Samantha McVey)++ | VERSION
Release 2018.03
MasterDuke samcv++ 00:39
02:56 ilbot3 joined 03:05 dogbert11 joined 04:47 releasable6 joined 06:53 brrt joined 07:04 domidumont joined 07:10 domidumont joined 07:28 brrt joined, robertle joined 09:39 AlexDaniel joined 10:48 dalek joined, synopsebot joined, p6lert joined, Geth joined, SourceBaby joined 10:49 releasable6 joined 10:52 scovit joined 11:58 dogbert2_ joined 12:57 AlexDaniel joined 13:47 greppable6 joined 13:57 domidumont joined 14:54 domidumont joined 15:43 zakharyas joined 16:44 domidumont joined 17:12 unicodable6 joined 17:34 domidumont joined
dogbert11 any objections for closing github.com/MoarVM/MoarVM/issues/791 ? 18:02
timotimo no objections 18:17
dogbert11 closed :) 18:24
timotimo: I have a question for you
timotimo please go ahead 18:25
dogbert11 I have a program with a loop like this
my int $max = 2_000_000; for (2..($max div 2)) -> int $i { ...
the ($max div 2) could be exchanged for the constant 1_000_000 18:26
doing that cuts runtime with 50%, does that seem reasonable?
or do you need to see the entire script (quite short btw) 18:27
timotimo we don't yet do something like that on anything that doesn't have a compile-time-known value, and "my int $max" isn't a constant so it could be changed 18:28
dogbert11 so we retrieve the value of max and do the div calc each iteration? 18:29
gist.github.com/dogbert17/b864cc4d...c85f0f76e3 18:30
timotimo no, the range object gets created once and then iterated over 18:31
the difference is when we find a range with constant end points in a for loop we turn it into a loop loop
dogbert11 these loop loops seems to be a lot faster 18:34
timotimo yes 18:38
they don't have to go through pull-one calls at all, they just do the calculations inline immediately
oh, huh, there's a getspeshslot here followed by an prof_allocated 18:43
i wonder how that came to be, that seems wrong
a simple for loop benchmark shows that the pull-one calls are entirely inlined, that's good 18:45
haha, oh that's precious
somehow we forgot to rewrite a prof_enterspesh into a prof_enterinlined or what have you 18:46
leading to <unit> calling <unit> in the call graph (rather than more correctly adding the calls to pull-one)
aha, we expect the enterspesh op to be in an exact position, which perhaps isn't the case here 18:47
dogbert11 how can we figure that out? 18:49
timotimo indeed, the null instructions go there now 18:53
which also means we're dropping tiny amounts of wallclock time on the floor
excellent. 18:58
dogbert11 it is ? 18:59
timotimo so it'd appear that ever since we added the "create null instructions to make object registers clean" feature inlining has been b0rked in the profiler :D
but here's the fix
dogbert11 cool 19:00
Geth MoarVM: ed4ed0e947 | (Timo Paulssen)++ | src/debug/debugserver.c
metadata for ReentrantMutex and Semaphore
MoarVM: 63348dce08 | (Timo Paulssen)++ | src/spesh/graph.c
insert null-out-instructions before prof_enter

this caused us to miss the prof_enterspesh instruction when inlining because we expected it to be the first instruction of the second BB.
timotimo Calls (Inlined)
9176 + 90824 (90.82%)
dogbert11 should this change affect program speed ? 19:01
timotimo no
dogbert11 but the profiler will be happy :) 19:02
timotimo yup 19:06
profile files may also become a bit smaller 19:07
right now you'd get one "wrong" self-call for every frame that has inlines, and the self-call would have inside itself all calls that any inlined function would have done 19:08
19:09 FROGGS joined
timotimo *snrt*, the race-is-prime profile is still >50 megs 19:11
dogbert11 :( 19:22
timotimo did you see the thing with the call graph getting deeper and deeper?
dogbert11 in the profiler you mean 19:23
timotimo well, you can see it in the profiler
but the corresponding stack traces match up
thing is, when we await and get resumed by the Thread Pool Scheduler later, it seems like we keep adding frames to the call stack, where perhaps we should be re-using previous frames or something? 19:25
if this is actually the case, we'll keep growing our memory usage, and GCs will get a little slower over time as well 19:27
hack.p6c.org/~timo/exponential_framecounts.png 19:28
this is a random screenshot i took the other day
oh, no it isn't
dogbert11 looking at the profile it seems as if postcircumfix [] took most of the time, 58% 19:34
timotimo what's your code? 19:40
19:41 bisectable6 joined
dogbert11 the gist above 19:43
timotimo ah 19:44
yeah, postcircumfix:<[ ]> can be expensive; try replacing it with ASSIGN-POS and measure again 19:45
dogbert11 the line with $max div 2 comes second with 24%
timotimo that'd be the block that starts on that line
dogbert11 ah, of course
timotimo it spends about a third of its time in postcircumfix:<[ ]> indeed 19:46
dogbert11 I'm on 32 bit atm, that might possibly account for the difference
timotimo oh, huh, ASSIGN-POS is actually a bunch slower 19:48
even though ASSIGN-POS is 99.98% inlined into postcircumfix:<[ ]>, it only accounts for 46% of time spent 19:49
dogbert11 I get the impression that something odd is going on, can't put my finger on it though
dogbert11 now profiling the fast version , i.e. '$max div 2' => '1_000_000' 19:51
in the fast version postcircumfix [] takes 40% followed by ASSIGN-POS with 22% 19:55
timotimo anyway, we're getting the Int candidate rather than int for the assignment here 19:56
you get a few % speedup if you @a[...] = (my int $ = 1) instead of @a[...] = 1 19:58
dogbert11 timotimo: this line 'my int @a = (0..$max);' is this array setup with a single malloc?
timotimo likely, check out if it calls push-all 19:59
dogbert11 can't see it 20:00
timotimo if i see it right, we're taking like 300 msec to get it stored
it'd be under a call to STORE
dogbert11 STORE is there 20:01
timotimo one store is from native_array, i believe that's the line you pasted
dogbert11 ok, cool 20:02
timotimo the other STOREā€“ the one from regular array, is probably from the grep at the end
yes, indeed, it calls into <anon> on line 15
dogbert11 here's an odd thing. I changed the main loop to 'for (2..10) -> int $i {' in my last profile and removed the the grep line at the end
still according to the profiler, there are 3857926 calls to ASSIGN-POS (looking at the entries column) 20:03
and equally many entries for postcircumfix[] 20:04
I must be misunderstanding what the 'entries' column mean
timotimo so only 2..10 runs of the outer, yeah? 20:05
don't forget that you'll still go from 0 to $max with your inner loop
dogbert11 ah, got it, missed that one (oops)
yup, that was it 20:06
timotimo huh, what the hell is this bytecode in Int:D + Int:D 20:07
am i looking at this right? it looks so wrong ?!?! 20:08
i mean it only takes 0.41% of total run time
gist.github.com/timo/ed27b750840ed...37ba4f925f
in the "after" section, it's grabbing the same attribute from the object (i assume this is unboxing the bigint from inside the Int object) into r5 and r8 20:09
and then it overwrites both r8 and r5 with the same thing from the other argument
dogbert11 it does look strange 20:10
timotimo i should use my trusty "trace spesh optimization" script for that 20:11
but i hvae to go grocery shopping so we can has some dinner
dogbert11 dinner is important :)
thx for the help though 20:12
at least you found a bug :)
timotimo yup! 21:07
graphviz is still fiddling around with the 104k nodes callgraph from hyper is-prime 21:08
lizmat And another Perl 6 Weekly hits the Net: p6weekly.wordpress.com/2018/03/19/...y-edument/ 22:30
22:47 notable6 joined 23:22 Kaiepi joined