Geth | MoarVM: a017f61472 | (Samantha McVey)++ | docs/ChangeLog Remove some unneeded items from the ChangeLog |
00:25 | |
MoarVM: 752879fcd6 | (Samantha McVey)++ | tools/release.sh Include 3rdparty/cmp in the release .tar.gz |
|||
MoarVM: b88799fb6a | (Samantha McVey)++ | VERSION Release 2018.03 |
|||
MasterDuke | samcv++ | 00:39 | |
02:56
ilbot3 joined
03:05
dogbert11 joined
04:47
releasable6 joined
06:53
brrt joined
07:04
domidumont joined
07:10
domidumont joined
07:28
brrt joined,
robertle joined
09:39
AlexDaniel joined
10:48
dalek joined,
synopsebot joined,
p6lert joined,
Geth joined,
SourceBaby joined
10:49
releasable6 joined
10:52
scovit joined
11:58
dogbert2_ joined
12:57
AlexDaniel joined
13:47
greppable6 joined
13:57
domidumont joined
14:54
domidumont joined
15:43
zakharyas joined
16:44
domidumont joined
17:12
unicodable6 joined
17:34
domidumont joined
|
|||
dogbert11 | any objections for closing github.com/MoarVM/MoarVM/issues/791 ? | 18:02 | |
timotimo | no objections | 18:17 | |
dogbert11 | closed :) | 18:24 | |
timotimo: I have a question for you | |||
timotimo | please go ahead | 18:25 | |
dogbert11 | I have a program with a loop like this | ||
my int $max = 2_000_000; for (2..($max div 2)) -> int $i { ... | |||
the ($max div 2) could be exchanged for the constant 1_000_000 | 18:26 | ||
doing that cuts runtime with 50%, does that seem reasonable? | |||
or do you need to see the entire script (quite short btw) | 18:27 | ||
timotimo | we don't yet do something like that on anything that doesn't have a compile-time-known value, and "my int $max" isn't a constant so it could be changed | 18:28 | |
dogbert11 | so we retrieve the value of max and do the div calc each iteration? | 18:29 | |
gist.github.com/dogbert17/b864cc4d...c85f0f76e3 | 18:30 | ||
timotimo | no, the range object gets created once and then iterated over | 18:31 | |
the difference is when we find a range with constant end points in a for loop we turn it into a loop loop | |||
dogbert11 | these loop loops seems to be a lot faster | 18:34 | |
timotimo | yes | 18:38 | |
they don't have to go through pull-one calls at all, they just do the calculations inline immediately | |||
oh, huh, there's a getspeshslot here followed by an prof_allocated | 18:43 | ||
i wonder how that came to be, that seems wrong | |||
a simple for loop benchmark shows that the pull-one calls are entirely inlined, that's good | 18:45 | ||
haha, oh that's precious | |||
somehow we forgot to rewrite a prof_enterspesh into a prof_enterinlined or what have you | 18:46 | ||
leading to <unit> calling <unit> in the call graph (rather than more correctly adding the calls to pull-one) | |||
aha, we expect the enterspesh op to be in an exact position, which perhaps isn't the case here | 18:47 | ||
dogbert11 | how can we figure that out? | 18:49 | |
timotimo | indeed, the null instructions go there now | 18:53 | |
which also means we're dropping tiny amounts of wallclock time on the floor | |||
excellent. | 18:58 | ||
dogbert11 | it is ? | 18:59 | |
timotimo | so it'd appear that ever since we added the "create null instructions to make object registers clean" feature inlining has been b0rked in the profiler :D | ||
but here's the fix | |||
dogbert11 | cool | 19:00 | |
Geth | MoarVM: ed4ed0e947 | (Timo Paulssen)++ | src/debug/debugserver.c metadata for ReentrantMutex and Semaphore |
||
MoarVM: 63348dce08 | (Timo Paulssen)++ | src/spesh/graph.c insert null-out-instructions before prof_enter this caused us to miss the prof_enterspesh instruction when inlining because we expected it to be the first instruction of the second BB. |
|||
timotimo | Calls (Inlined) | ||
9176 + 90824 (90.82%) | |||
dogbert11 | should this change affect program speed ? | 19:01 | |
timotimo | no | ||
dogbert11 | but the profiler will be happy :) | 19:02 | |
timotimo | yup | 19:06 | |
profile files may also become a bit smaller | 19:07 | ||
right now you'd get one "wrong" self-call for every frame that has inlines, and the self-call would have inside itself all calls that any inlined function would have done | 19:08 | ||
19:09
FROGGS joined
|
|||
timotimo | *snrt*, the race-is-prime profile is still >50 megs | 19:11 | |
dogbert11 | :( | 19:22 | |
timotimo | did you see the thing with the call graph getting deeper and deeper? | ||
dogbert11 | in the profiler you mean | 19:23 | |
timotimo | well, you can see it in the profiler | ||
but the corresponding stack traces match up | |||
thing is, when we await and get resumed by the Thread Pool Scheduler later, it seems like we keep adding frames to the call stack, where perhaps we should be re-using previous frames or something? | 19:25 | ||
if this is actually the case, we'll keep growing our memory usage, and GCs will get a little slower over time as well | 19:27 | ||
hack.p6c.org/~timo/exponential_framecounts.png | 19:28 | ||
this is a random screenshot i took the other day | |||
oh, no it isn't | |||
dogbert11 | looking at the profile it seems as if postcircumfix [] took most of the time, 58% | 19:34 | |
timotimo | what's your code? | 19:40 | |
19:41
bisectable6 joined
|
|||
dogbert11 | the gist above | 19:43 | |
timotimo | ah | 19:44 | |
yeah, postcircumfix:<[ ]> can be expensive; try replacing it with ASSIGN-POS and measure again | 19:45 | ||
dogbert11 | the line with $max div 2 comes second with 24% | ||
timotimo | that'd be the block that starts on that line | ||
dogbert11 | ah, of course | ||
timotimo | it spends about a third of its time in postcircumfix:<[ ]> indeed | 19:46 | |
dogbert11 | I'm on 32 bit atm, that might possibly account for the difference | ||
timotimo | oh, huh, ASSIGN-POS is actually a bunch slower | 19:48 | |
even though ASSIGN-POS is 99.98% inlined into postcircumfix:<[ ]>, it only accounts for 46% of time spent | 19:49 | ||
dogbert11 | I get the impression that something odd is going on, can't put my finger on it though | ||
dogbert11 now profiling the fast version , i.e. '$max div 2' => '1_000_000' | 19:51 | ||
in the fast version postcircumfix [] takes 40% followed by ASSIGN-POS with 22% | 19:55 | ||
timotimo | anyway, we're getting the Int candidate rather than int for the assignment here | 19:56 | |
you get a few % speedup if you @a[...] = (my int $ = 1) instead of @a[...] = 1 | 19:58 | ||
dogbert11 | timotimo: this line 'my int @a = (0..$max);' is this array setup with a single malloc? | ||
timotimo | likely, check out if it calls push-all | 19:59 | |
dogbert11 | can't see it | 20:00 | |
timotimo | if i see it right, we're taking like 300 msec to get it stored | ||
it'd be under a call to STORE | |||
dogbert11 | STORE is there | 20:01 | |
timotimo | one store is from native_array, i believe that's the line you pasted | ||
dogbert11 | ok, cool | 20:02 | |
timotimo | the other STOREā the one from regular array, is probably from the grep at the end | ||
yes, indeed, it calls into <anon> on line 15 | |||
dogbert11 | here's an odd thing. I changed the main loop to 'for (2..10) -> int $i {' in my last profile and removed the the grep line at the end | ||
still according to the profiler, there are 3857926 calls to ASSIGN-POS (looking at the entries column) | 20:03 | ||
and equally many entries for postcircumfix[] | 20:04 | ||
I must be misunderstanding what the 'entries' column mean | |||
timotimo | so only 2..10 runs of the outer, yeah? | 20:05 | |
don't forget that you'll still go from 0 to $max with your inner loop | |||
dogbert11 | ah, got it, missed that one (oops) | ||
yup, that was it | 20:06 | ||
timotimo | huh, what the hell is this bytecode in Int:D + Int:D | 20:07 | |
am i looking at this right? it looks so wrong ?!?! | 20:08 | ||
i mean it only takes 0.41% of total run time | |||
gist.github.com/timo/ed27b750840ed...37ba4f925f | |||
in the "after" section, it's grabbing the same attribute from the object (i assume this is unboxing the bigint from inside the Int object) into r5 and r8 | 20:09 | ||
and then it overwrites both r8 and r5 with the same thing from the other argument | |||
dogbert11 | it does look strange | 20:10 | |
timotimo | i should use my trusty "trace spesh optimization" script for that | 20:11 | |
but i hvae to go grocery shopping so we can has some dinner | |||
dogbert11 | dinner is important :) | ||
thx for the help though | 20:12 | ||
at least you found a bug :) | |||
timotimo | yup! | 21:07 | |
graphviz is still fiddling around with the 104k nodes callgraph from hyper is-prime | 21:08 | ||
lizmat | And another Perl 6 Weekly hits the Net: p6weekly.wordpress.com/2018/03/19/...y-edument/ | 22:30 | |
22:47
notable6 joined
23:22
Kaiepi joined
|