github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:44
hoelzro joined
00:45
p6bannerbot sets mode: +v hoelzro
00:49
hoelzro left,
annieslmaos joined
00:50
p6bannerbot sets mode: +v annieslmaos
00:56
annieslmaos left
01:13
avar left
|
|||
timotimo | jeez, my moar is a bit unstable at the moment | 01:14 | |
01:16
avar joined,
avar left,
avar joined,
p6bannerbot sets mode: +v avar
01:17
p6bannerbot sets mode: +v avar
|
|||
timotimo | so, ovid had this benchmark, summing all 1 / n from 1 through 50_000_000 | 01:17 | |
on his machine, perl5 did it with a "sub reciprocal" in like 9 seconds, inlining it brought it down to 2.5 | |||
on my machine, it takes about 7.5 seconds manually inlined and 9.3 seconds not inlined manually | 01:18 | ||
anyway, turning the profiler on made it run in a minute instead of the 10 seconds, so ... that's fine, right? :D | 01:25 | ||
MasterDuke | manually inlined or not? | 01:35 | |
timotimo | manually inlined | 01:43 | |
MasterDuke | perf shows the top 2 functions are mp_mul_2d and mp_set_long at 10% and 4.25% | 01:49 | |
for 'my $s = 0e0; $s += 1/$_ for ^10_000_000+1; say $s' | |||
01:50
Alex`21 joined
01:51
p6bannerbot sets mode: +v Alex`21
|
|||
timotimo | yeah, use 1e0 / $_ | 01:51 | |
that makes a major difference, unsurprisingly | |||
01:52
Alex`21 left
|
|||
MasterDuke | ah, now it's get_num and MVM_gc_collect_free_nursery_uncopied at 5% and 4.3% (and total time was much shorter) | 01:53 | |
timotimo | cool, something went wrong with the graphs on the gc page | ||
MasterDuke | ugh, why is it calling both Num and Real's infix:</> ? | 01:55 | |
timotimo | perhaps it's only calling one of them only once to do dispacth checking? | 01:56 | |
anyway, i'll go to bed now | |||
less than 1ms on average per GC run, beautiful | 01:57 | ||
MasterDuke | nope, 10m for each | ||
timotimo | OK | ||
MasterDuke | Real's is just `a.Bridge / b.Bridge` | ||
so there's also 10m calls to Int's and Num's Bridge | 01:58 | ||
timotimo | those get a tiny bit better when you write 1e0 / $_.Num | 01:59 | |
seeya! | |||
my next step would have been to look at the spesh bytecode | |||
probably some boxing/unboxing wins to be had later on | |||
MasterDuke | later... | 02:00 | |
02:22
Humbedooh3 joined
02:23
p6bannerbot sets mode: +v Humbedooh3
02:27
Humbedooh3 left
02:36
Peng5 joined,
Peng5 left
02:41
mist23 joined,
p6bannerbot sets mode: +v mist23
02:46
mist23 left
02:53
hoelzro joined,
p6bannerbot sets mode: +v hoelzro
03:10
Kaiepi left
03:20
Guest60854 joined,
p6bannerbot sets mode: +v Guest60854
03:21
Guest60854 left
03:37
belak2 joined,
p6bannerbot sets mode: +v belak2
03:38
belak2 left
03:45
matze6 joined,
p6bannerbot sets mode: +v matze6
03:47
jrslepak12 joined,
jrslepak12 left
03:48
matze6 left
03:49
avar left
03:54
avar joined,
avar left,
avar joined,
p6bannerbot sets mode: +v avar
03:55
p6bannerbot sets mode: +v avar
04:36
orb joined
04:37
orb left
05:13
linear8 joined
05:14
p6bannerbot sets mode: +v linear8,
linear8 left
05:29
avar left
05:30
qassim0 joined,
p6bannerbot sets mode: +v qassim0
05:31
qassim0 left
05:42
Erynnn8 joined
05:43
p6bannerbot sets mode: +v Erynnn8
05:45
avar joined,
avar left,
avar joined,
p6bannerbot sets mode: +v avar
05:46
p6bannerbot sets mode: +v avar
05:50
Erynnn8 left
06:56
supercool21 joined
06:57
p6bannerbot sets mode: +v supercool21,
BruceS6 joined
06:58
p6bannerbot sets mode: +v BruceS6
06:59
BruceS6 left
07:00
supercool21 left
07:38
macky joined
07:39
p6bannerbot sets mode: +v macky
07:41
macky left
07:48
irc-5225225 joined
07:49
p6bannerbot sets mode: +v irc-5225225
07:52
irc-5225225 left
08:16
deltam3 joined,
p6bannerbot sets mode: +v deltam3
08:21
deltam3 left,
Kaiepi joined
08:22
p6bannerbot sets mode: +v Kaiepi
08:24
phillid joined,
phillid left
08:39
lizmat joined,
p6bannerbot sets mode: +v lizmat
08:49
MasterDuke left
09:11
CurryWurst joined
09:12
p6bannerbot sets mode: +v CurryWurst
09:15
CurryWurst left
09:37
supercool20 joined
09:38
p6bannerbot sets mode: +v supercool20,
supercool20 left
09:49
CoJaBo24 joined
09:50
p6bannerbot sets mode: +v CoJaBo24
09:53
OwenBarfield joined
09:54
p6bannerbot sets mode: +v OwenBarfield
09:55
CoJaBo24 left
09:57
OwenBarfield left
10:33
verm1n9 joined
10:34
verm1n9 left
10:46
Kaypie joined
10:47
p6bannerbot sets mode: +v Kaypie
10:48
Kaiepi left
10:56
Guest738 joined
10:57
p6bannerbot sets mode: +v Guest738
10:58
Guest738 left
11:11
Kaypie left
11:14
Kaiepi joined
11:15
p6bannerbot sets mode: +v Kaiepi
11:29
ski_ joined
11:30
p6bannerbot sets mode: +v ski_
11:31
ski_ left
|
|||
timotimo | the reciprocal function, which is just 'sub reciprocal(num $int) { 1e0 / $int }' is 570 bytes big, 410 of that from inlined frames | 11:54 | |
that's infix:</>, which was 404 bytes big before | 11:55 | ||
the majority of it comes from the test against division-by-zero | |||
jnthn | I'm surprised / didn't staticly inline... | ||
Ohh...that. | |||
timotimo | at least that's my guess | ||
jnthn | *nod* | ||
Note that under my postrelease-opts branch, it would subtract the size of the inline when considering whether reciprocal itself should be possible to inline | 11:56 | ||
Where is the code involved here, and is there a Perl 5 and a Perl 6 version? | |||
timotimo | that came from a slide in ovid's talk about perl 5 and perl 6 | ||
he showed how the code in perl5 went from 9 seconds to 2.5 seconds by manually inlining the reciprocal function into the hot loop | 11:57 | ||
jnthn | Ah | ||
timotimo | m: my $s = 0e0; $s += 1/$_ for ^10_000_000+1; say $s | ||
jnthn | Yeah, we suffer from expensive calling too, which is why automated inlining saves us :) | ||
timotimo | this is basically the code, but in my case it was wordier | ||
camelia | 16.695311365857272 | ||
timotimo | right before it enters the infix:</> inline it grabs three spesh slots that aren't visibly used, so another case of deopt forcing us to keep stuff around | 11:58 | |
jnthn | It should tell which deopt points cause that though | ||
timotimo | one second | 11:59 | |
jnthn | Also note that if there are inlines, there's at the moment always one register holding the code object kept around | ||
Which we do so that we can reconstruct the callstack should there be an exception | |||
Or so we can deopt from the inline | |||
If we know an inlinee can never possibly throw or cause deopt in any situation whatsoever (like identity, which optimizes away entirely) then we can avoid that, but spesh slot loads are cheap so I didn't make that a priority :) | 12:00 | ||
timotimo | for one it's deopt=1, for the next it's also deopt=1, the next is -1, and it also sets the value from the last one into a register on the inside of the inline, but that doesn't have a deopt printed in the facts list | ||
ah | |||
that's the one you were refering to just now | |||
holding the code for uninline/stack reconstruct | |||
so deopt=-1 refers to "kept around because inline" | 12:01 | ||
BB 3 is the one that has the getspeshslots; it starts FH Start (3), Logged, Ins Deopt One idx=0, then the two spesh slots, then INS deopt one idx=1, INS deopt one idx=2, getspeshslot again | 12:02 | ||
12:02
MasterDuke joined,
p6bannerbot sets mode: +v MasterDuke
|
|||
timotimo | it doesn't seem like we should keep deopts around when they're on a getspeshslot, though? | 12:02 | |
12:03
MasterDuke left,
MasterDuke joined,
herbert.freenode.net sets mode: +v MasterDuke,
p6bannerbot sets mode: +v MasterDuke,
danielhuman joined
|
|||
timotimo | or is that for "all instructions up to the next deopt annotation"? | 12:03 | |
12:03
p6bannerbot sets mode: +v danielhuman
12:04
danielhuman left
12:05
drakythe joined,
drakythe left
|
|||
jnthn | -1 means "unconditionally retained" | 12:10 | |
Also...you need to look at the original code to see what the deopt point was originally one | |||
*on | |||
Because the annotations shift during instruction deletions etc. | |||
timotimo | OK | 12:11 | |
jnthn | It's keeping a *lot* less for deopt these days, though, and when I've looked carefully at a few cases where I thought it should not be, then - aside from the case I already mentioned where an inline could never deopt or throw - it's turned out to have been correct. | ||
timotimo | the first was on decont from getlexstatic of infix:</>, next one for prepargs, then a One and an All on the invoke_o | ||
i'm not sure how exactly to look at the deopt situation | 12:12 | ||
a tool that puts original and optimized side-by-side and matches up parts automatically would surely be super nice | |||
jnthn | *nod* | ||
Yeah, though that will get harder with time too | |||
(When we add code motion to move stuff out of loops for example) | 12:13 | ||
timotimo | ah, indeed | ||
as long as we keep re-using the actual ins struct, we can totally output the addresses in the msgpack version of the spesh log and match those up | 12:14 | ||
jnthn | But yeah, stuff is increasingly being lowered to the point where it's hard to look at the optimized output and know that it maybe used to be :) | ||
timotimo | if we want to, we can be extra sneaky and output the starting addresses of the spesh alloc blocks and figure out the order of allocations of things :) | ||
jnthn | This is a good sign overall :) | 12:15 | |
timotimo | it is! | ||
jnthn | lunch, bb :) | 12:17 | |
*bbl | |||
12:19
d__b joined
12:20
d__b left,
MasterDuke left
12:27
MasterDuke joined,
p6bannerbot sets mode: +v MasterDuke
12:29
MasterDuke left,
MasterDuke joined,
herbert.freenode.net sets mode: +v MasterDuke,
p6bannerbot sets mode: +v MasterDuke
|
|||
timotimo | kind of looks like the failure creation and returning is keeping the decontrv from being inlined, i.e. it stays as an sp_speshresolve in the reciprocal code body | 12:31 | |
the profiler may want to learn about speshresolve in particular | 12:33 | ||
any objections to giving the spesh plugin subs names? that way they'll show up clearly in the call graph and routine overview | 12:35 | ||
oh my, i just now see that jitting wasn't even successful for the reciprocal sub | 12:39 | ||
that'll be interesting | |||
ah, param_rp_n bails it | |||
8.9s instead of 24.5s when switching reciprocal's parameter from num $int (haha) to just $int | 12:42 | ||
MasterDuke | whoops | 12:44 | |
12:45
acerbic joined
12:46
p6bannerbot sets mode: +v acerbic
12:47
casdr8 joined,
p6bannerbot sets mode: +v casdr8
|
|||
jnthn | Yeah, native/non-native boundary cases can go pretty badly at the moment. | 12:48 | |
12:48
casdr8 left
12:53
acerbic left,
ThiefMaster20 joined
|
|||
timotimo | wow, haha | 12:54 | |
12:54
p6bannerbot sets mode: +v ThiefMaster20
|
|||
timotimo | that seems weird | 12:55 | |
what inline am i looking at here ā¦ | |||
12:56
ThiefMaster20 left
|
|||
timotimo | ooh it's pull-one | 12:56 | |
now it makes total sense | |||
mhhh, let's put spesh comments on p6obind_* and friends that tell us what the attribute's name was | 13:00 | ||
13:08
Selfsigned joined,
p6bannerbot sets mode: +v Selfsigned
13:10
Selfsigned left
|
|||
MasterDuke | huh. changing `for ^10_000_000+1 { ... }` to `for 1..10_000_000 { ... }` is a bit faster. and the 10m calls to pull-one in Rakudo::Iterator is completely gone from the profile | 13:21 | |
13:22
Sitri joined
13:23
p6bannerbot sets mode: +v Sitri
13:24
Sitri left
13:36
Shrooms18 joined
13:37
p6bannerbot sets mode: +v Shrooms18
13:39
Shrooms18 left
|
|||
timotimo | that means the range optimization takes hold in that case, right? | 13:40 | |
my benchmark has a sub sum_reciprocals_to($int) and for 1..$int | |||
13:54
Awesomecase joined
13:55
p6bannerbot sets mode: +v Awesomecase
|
|||
timotimo | time perl6 -e 'my num @parts = 1e0 / ++$ xx 5_000_000; say @parts.sum' | 13:59 | |
4.56user 0.23system 0:04.23elapsed 113%CPU (0avgtext+0avgdata 376736maxresident)k | |||
if you have loads of ram, this is also a possibility %) | |||
14:00
FuzzySockets joined,
Awesomecase left,
p6bannerbot sets mode: +v FuzzySockets
14:04
FuzzySockets left
|
|||
MasterDuke | m: for ^1_000_00+1 { Nil for ^100+1 }; say now - INIT now | 14:20 | |
camelia | 1.9058206 | ||
MasterDuke | m: for 1..1_000_00 { Nil for 1..100 }; say now - INIT now | ||
camelia | 0.6517606 | ||
MasterDuke | could the first be optimized into the second? | ||
timotimo | /* getattr_o of '$!do' in Code of a Block */ | 14:21 | |
[Annotation: Logged (bytecode offset 72)] | |||
jnthn | Umm....I think so, but note that prec really is (^100) + 1 | ||
timotimo | sp_p6ogetvc_o r10(15), r1(2), liti16(8), sslot(3) | ||
that's the intent here | |||
to go from 1 to 100 instead of 0 to 99 | |||
jnthn | But in theory it can constant fold, I think | ||
timotimo | i'd assume ^(100 + 1) already constfolds | ||
jnthn: would you like to see that kind of comment in the spesh log? | 14:22 | ||
jnthn | timotimo: Yeah, though the off indentation will probably drive me nuts :P :P | ||
timotimo | that was intentional, but can just as easily be adjusted | ||
jnthn | ah :) | ||
timotimo | oh, i see that we're only doing getattr_i lowering if the bits are 64; you think that's something worthwhile to expand to other sizes? | 14:24 | |
(also, no check for signed vs unsigned) | |||
MasterDuke | ah. `for ^10` after optimization is a while, but `for ^10+1` is a p6forstmt and a Range | 14:27 | |
timotimo | yup | 14:28 | |
the optimization looks directly for a range operator as first child | |||
but here it's a + operator instead | |||
MasterDuke | so we could add a check if it's + a constant, just add that constant to the initial value and condition of the while? | 14:30 | |
jnthn | timotimo: Yeah, that can be extended to the other sizes, they're just less common so less to win | 14:31 | |
timotimo | right | ||
MasterDuke: that's right. check for the range in both the first or second argument and maybe also support - and *? | 14:32 | ||
MasterDuke | and / ? | 14:33 | |
timotimo | perhaps, but that's kind of likely to get us into Rats and then we no longer optimize the thing | 14:34 | |
MasterDuke | true | 14:35 | |
timotimo | gist.github.com/timo/92101baccc059...d2f48af1d8 - looks pretty good i'd say | 14:56 | |
+/- indentation of the comments | 14:57 | ||
indentation is changed now | 14:58 | ||
jnthn | m: say 0.963 / 1.323 | 15:03 | |
camelia | 0.727891 | ||
jnthn | That's for `my $a = 0; for ^10_000_000 { $a = $a + 2 }; say $a` | ||
timotimo | nice! | 15:04 | |
jnthn | Second number is after I add lowering and JIT of add_I, with doing the calculation directly in the JIT output if the inputs are smallint | ||
timotimo | *nice* | ||
jnthn | We don't have to range check the result in assembly either, we just do it in a 32-bit register and jump on overflow :) | 15:05 | |
timotimo | m: say (2 ** 32 - 1) - 10_000_000 | ||
camelia | 4284967295 | ||
timotimo | m: say (2 ** 32 - 1) | ||
camelia | 4294967295 | ||
timotimo | ah, that fits very comfortably into 32bit, too | ||
at some point i really should develop an intuition for these literal values | 15:06 | ||
jnthn | The allocation of the result is fastcreate'd too | ||
Which no doubtt helps | |||
It's another 7% off the utf8 million line reading benchmark that adds up the number of chars too :) | 15:07 | ||
timotimo | that sounds very good | ||
jnthn | Yeah. Will clean up the patch a bit later and push. I stubbed in sub_I and mul_I lowering too, but still need to fill them out | ||
timotimo | how do you feel about annotating lots and lots of getspeshslot ins's with comments saying what it's for? | 15:08 | |
jnthn | Could we just put that on the same line but after the instruction? | ||
timotimo | there's surely some point where adding more comments is just extra noise | ||
very possible; what if there's multiple comments on one instruction? | |||
jnthn | There is, but this one could safe a lot of cross-referencing | ||
Oh, I meant that we could do this as a special case in the dumper for sp_getspeshslot :) | 15:09 | ||
timotimo | oh | ||
jnthn | But yeah, maybe we could do it generally for comments too | ||
timotimo | yeah, could do that | ||
jnthn | Comment on the line when it's just one comment | ||
Comments before when multiple | |||
Like #= vs #| in Perl 6 ;) | |||
timotimo | would it be fine to put all comments after all annotations in that case? | ||
jnthn | Yeah | 15:10 | |
timotimo | then i don't have to do a pre-scan for comments | ||
is /* ... */ fine with you? or perhaps use # instead? ;) | |||
jnthn | I guess # is 3 less characters of clutter :) | 15:11 | |
Even more with whitespace not considered | 15:12 | ||
Time for a break | 15:14 | ||
timotimo | will update the gist soon | ||
there it is | 15:16 | ||
15:18
Vorpal26 joined
15:19
Vorpal26 left
|
|||
timotimo | not bad. i accidentally left /* */ for more-than-one, but somehow i like it, too. i'll turn it into # soon, though | 15:19 | |
i wonder if i should go to the trouble of looking up the attribute name for the unboxes and output that in a spesh comment, too | 15:33 | ||
probably not quite as useful, though if you can just search for an attribute name and find every actual use of it in the spesh log, that could be good, too | |||
sp_p6oget_i r8(3), r0(2), liti16(8) # getattr_i of '$!i' in <anon|19> | 15:35 | ||
sp_fastbox_bi_ic r6(3), liti16(40), sslot(5), liti16(32), r8(3), liti16(1) # box_i into a Int | |||
and also: | |||
sp_fastcreate r9(2), liti16(40), sslot(10) # box_n into a Num | |||
sp_bind_n r9(2), liti16(32), r8(2) | |||
the comment there could go either on the fastcreate or on the bind, don't really have a preference there. | |||
15:43
reportable6 joined
15:44
p6bannerbot sets mode: +v reportable6
15:46
ZofBot left,
ZofBot joined
15:47
p6bannerbot sets mode: +v ZofBot
15:58
lizmat left
16:03
fake_space_whale joined
16:04
p6bannerbot sets mode: +v fake_space_whale
16:12
zakharyas joined
16:13
p6bannerbot sets mode: +v zakharyas
16:19
nullrouted joined
16:20
p6bannerbot sets mode: +v nullrouted
16:22
nullrouted left
16:32
Fleet21 joined
16:33
Fleet21 left
17:05
zakharyas left
17:06
zakharyas joined
17:07
p6bannerbot sets mode: +v zakharyas
17:10
Kaiepi left
17:23
Ambroisie joined
17:24
p6bannerbot sets mode: +v Ambroisie
17:26
Ambroisie left
|
|||
jnthn | timotimo: On the fastcreate is probably fair enough | 17:29 | |
17:30
MikeoftheEast7 joined
17:31
p6bannerbot sets mode: +v MikeoftheEast7
17:34
MikeoftheEast7 left
17:45
Erynnn19 joined
17:46
zakharyas left,
Erynnn19 left
17:55
acronix14 joined,
p6bannerbot sets mode: +v acronix14
18:00
acronix14 left
18:09
BrianBlaze21 joined
18:10
p6bannerbot sets mode: +v BrianBlaze21
18:14
BrianBlaze21 left
18:15
metax joined
18:16
p6bannerbot sets mode: +v metax
18:17
TingPing4 joined,
p6bannerbot sets mode: +v TingPing4
18:18
TingPing4 left
18:19
metax left
18:34
zakharyas joined
18:35
p6bannerbot sets mode: +v zakharyas
|
|||
timotimo | OK, i need to sort out this mess of commits i've spread out between spesh_comments and postrelease_ops | 18:38 | |
18:48
Alex`16 joined,
Alex`16 left
|
|||
Geth | MoarVM/spesh_comments: 6 commits pushed by (Timo Paulssen)++
|
19:03 | |
timotimo | i think this branch is clean to be merged | ||
jnthn | After release ;) | 19:05 | |
Geth | MoarVM: jstuder-gh++ created pull request #942: Improve exception msg for slice op on VMArray |
||
timotimo | i meant into the postrelease-opts branch, which i rebased it onto :) | 19:06 | |
Geth | MoarVM/postrelease-opts: 477dc4cf4c | (Jonathan Worthington)++ | 13 files Lower add_I, sub_I, and mul_I where possible When the input and output types are consistent (which should be the overwhelmingly common case) we JIT-compile these into code that tries to do the operation directly if we are dealing with two smallint input values, and provided it doesn't overflow stores it back. If either of those two conditions isn't met, it falls back to a slow path. Since we ... (8 more lines) |
19:08 | |
jnthn | timotimo: oh, that's OK :) | ||
walk & | 19:10 | ||
timotimo | i'll merge :) | 19:11 | |
19:13
domidumont joined
|
|||
Geth | MoarVM/postrelease-opts: 7 commits pushed by (Timo Paulssen)++
|
19:13 | |
19:13
p6bannerbot sets mode: +v domidumont
19:19
domidumont left
19:20
domidumont joined
19:21
p6bannerbot sets mode: +v domidumont,
JSharp16 joined
19:22
p6bannerbot sets mode: +v JSharp16
19:23
JSharp16 left
19:24
domidumont left
19:47
Kaiepi joined
19:48
p6bannerbot sets mode: +v Kaiepi
19:52
alphor20 joined
19:53
p6bannerbot sets mode: +v alphor20
19:56
alphor20 left
20:05
zakharyas left
20:08
JustTheDoctor2 joined,
p6bannerbot sets mode: +v JustTheDoctor2
20:13
JustTheDoctor2 left
20:24
zakharyas joined
20:25
p6bannerbot sets mode: +v zakharyas
20:31
zakharyas left
20:35
deedra13 joined
20:36
p6bannerbot sets mode: +v deedra13,
deedra13 left,
Soni22 joined,
p6bannerbot sets mode: +v Soni22
20:38
Soni22 left
20:40
chaoscon14 joined
20:41
p6bannerbot sets mode: +v chaoscon14
20:46
zakharyas joined
20:47
p6bannerbot sets mode: +v zakharyas
20:48
chaoscon14 left
|
|||
timotimo | goto BB(224) # throwcatdyn of category 16 for handler 9 | 20:49 | |
that could be helpful and interesting? | |||
the reason why a frame couldn't be inlined can also go in a spesh comment on one of the inliner's instructions | 21:00 | ||
21:04
zakharyas left
21:05
zakharyas joined,
p6bannerbot sets mode: +v zakharyas
21:07
bungle0 joined,
p6bannerbot sets mode: +v bungle0
|
|||
timotimo | cool. | 21:08 | |
21:10
lizmat joined
21:11
p6bannerbot sets mode: +v lizmat
21:13
bungle0 left
|
|||
timotimo | nice. | 21:18 | |
MasterDuke | ? | ||
timotimo | now it also puts a comment "inline of 'foo' (123) candidate 99" on the first instruction of an inline | ||
21:19
catfuneral joined
|
|||
MasterDuke | cool. you also added the reason things couldn't be inlined? | 21:19 | |
timotimo | yup! | ||
21:19
catfuneral left
|
|||
timotimo | sp_fastinvoke_o r5(23), r45(0), liti16(0) # could not inline 'symbol' (157) candidate 0: bytecode is too large to inline | 21:19 | |
MasterDuke | does that remove the need for MVM_SPESH_INLINE_LOG? | 21:20 | |
timotimo | the inline log is much denser and maybe better for some use cases | ||
MasterDuke | ah | ||
timotimo | sp_getspeshslot r33(3), sslot(9) # method lookup of '!sort_dispatchees_internal' on a Method | 21:21 | |
^- also nice, i think | |||
Geth | MoarVM/postrelease-opts: 4efe1b3b2e | (Timo Paulssen)++ | src/spesh/optimize.c comment for result of optimize_method_lookup will put a "method lookup of '$name'" after the resulting getspeshslot instruction |
21:32 | |
MoarVM/postrelease-opts: a091eb6cc8 | (Timo Paulssen)++ | src/spesh/optimize.c comment on inline success/failure on success: puts the name, cuuid, and spesh candidate id on the first instruction of the inlined code (potentially after the inlined code or not into the spesh graph at all if it was reduced to nothingness?) on failure: the same as above, plus the failure reason. |
|||
MoarVM/postrelease-opts: 74b219bc2f | (Timo Paulssen)++ | src/spesh/optimize.c comment on throwcat* with category and handler id if it's optimized to a goto |
|||
timotimo | jnthn: do you think any of the changes made inside inline.c deserve a comment added to the spesh log? | 21:45 | |
jnthn | Hm, like "rewritten return" or "rewritten arg" or something? | 21:49 | |
Maybe | |||
timotimo | hm, i guess "arg 0", "arg 1", "named arg foo" could be interesting; do we even still have the info about named args at that point? | 21:50 | |
jnthn | No | ||
Not easily | |||
in args.c we do the transform | |||
But by the time we inline it we've formed and re-parsed bytecode | |||
bbs | |||
timotimo | OK, so perhaps the spesh candidate is also gone already | 21:51 | |
21:54
brrt joined,
p6bannerbot sets mode: +v brrt
|
|||
brrt | jnthn++ | 21:56 | |
timotimo | oh hey brrt! | ||
how often do you use the graphviz stuff in the jit log? i'm a little annoyed i have to constantly skip past it :D | |||
brrt | nine: re: devbranch, releasebranch, master - I'm also in favor of having a release-branch+master, mostly so we can continue doing what we always do whenever the release process is underway | 21:57 | |
timotimo: when debugging | |||
i find it invaluable | |||
timotimo | OK, i'll just have to come up with something :) | ||
brrt | hm | ||
timotimo | maybe i'll just keep using grep for Constructing, Entering, BAIL | ||
brrt | I was actually thinking of killing the JIT log entirely | ||
and folding it into the spesh log | 21:58 | ||
timotimo | oh, spesh logs are already often in the hundreds of megabytes %) | ||
brrt | that way, we get spesh info + JIT info in the same place | ||
timotimo | that's true, it'll be in the right spot immediately | ||
brrt | 'disk is cheap' | ||
:-) | |||
21:58
zakharyas left
|
|||
timotimo | reading a million lines with perl6 is getting faster and faster, too ;) | 21:59 | |
brrt | significantly, even | 22:00 | |
timotimo | is that so? | ||
well, add to it a check or two, like "contains" or "starts-with" and suddenly it's much more expensive :) | 22:01 | ||
brrt | :-( | ||
timotimo | i don't have actual numbers to back this up | ||
brrt | re: the reciprocal benchmark | ||
on my machine, the naive perl5 version, 0.6s | |||
timotimo | whoa | ||
brrt | perl6 runs the same code, 26s | ||
timotimo | what kind of potato does ovid have? :) | ||
it already gets lots cheaper if you remove "return" ;) | 22:02 | ||
brrt | well, what was the number of iterations of his version? | ||
timotimo | 50_000_000 | ||
brrt | ah | ||
i have 10_000_000 | |||
let me try that out too... | |||
2.6s | 22:03 | ||
for perl5 | |||
timotimo | that is the one without manual inlining? | ||
brrt | that is the one with manual inlining | 22:04 | |
timotimo | OK | ||
the reciprocal speshlog is only 165k lines | 22:05 | ||
so the 0.4 seconds it takes to count all lines starting with "Total" isn't saying so much | 22:06 | ||
brrt | you know, ideal world, we'd both have single textual debug log, potentially with a bunch of flags, and a structured way of getting the same from the deubg server | 22:07 | |
gist.github.com/bdw/42819001c1a083...818acf99b6 | 22:18 | ||
anyway | |||
if i write it in nqp, I get 0.636s of runtime | 22:19 | ||
if i use non-native objects, this increases to 18s | 22:20 | ||
so. our boxing and unboxing is quite costly | |||
fwiw, the same code in C, on my machine, runs in 0.22s | 22:24 | ||
so | |||
the long story very short | |||
MoarVM is withinin a factor of three of C, including asynchronous specialization jit compilation, when using native types | 22:25 | ||
perl6 is a factor of 1000 off | |||
the lesson here is that there is about 70% gain to be expected, at most, from better JIT compilation | 22:27 | ||
java is 0.4s | 22:32 | ||
so better than MoarVM, but not by all that much | |||
lizmat | brrt: isn't that a factor of 100 compared to C ? | ||
brrt | lizmat: i'm talking about this one specifically: gist.github.com/bdw/42819001c1a083...procal-nqp | ||
that reliably runs in 0.6s on my machine | 22:33 | ||
lizmat | m: for ^1_000_000 { }; say now - INIT now | 22:34 | |
camelia | 0.0703057 | ||
lizmat | m: for ^10_000_000 { }; say now - INIT now | ||
camelia | 0.50295888 | ||
lizmat | m: for ^10_000_000 -> int $_ { }; say now - INIT now | ||
camelia | 0.1287333 | ||
lizmat | that's boxing for you | ||
brrt | :-( | 22:35 | |
jnthn | It's worse than that. | ||
lizmat | ? | ||
jnthn | The good news is that this one is quite a bit faster in postrelease-opts | ||
Because boxing got a good bit faster | |||
brrt | :-) | 22:36 | |
jnthn | In that branch if you write the equivalent code in Python we're faster, and if you write the equivalent code in Ruby we're only a little slower. Perl 5 still beats us, but within a factor of 2, for the for ^10_000_000 { } case | ||
lizmat | jnthn: also, I was thinking that -> int $_ could be the default signature in 6.d ? | ||
jnthn | But what's *really* annoying about this case is that $_ is dynamic | 22:37 | |
brrt is not seeing any reasonable code in which that'd break, so is not against it | |||
jnthn | If it were just a boring old lexical it'd already be lowered | ||
And then the box would be thrown out | |||
But because $_ is declared `is dynamic` then we can't do that | |||
In fact, thanks to anything anywhere any number of levels deep being able to do CALLER::CALLER::blah, we can't do much | 22:38 | ||
I've been pondering how to deal with this for the last month | |||
And it's really icky | |||
brrt | CALLER makes many things impossible | ||
jnthn | In hot loops if we inline everything we can sorta do away with it, if we learn to analyze lexicals better | ||
brrt | or well, much harder than they ought to be | 22:39 | |
jnthn | It's not that bad, because most things aren't dynamic | ||
The problem is that $_ *is* and tons of things use it | |||
As in, lots of common idioms | |||
I don't think `int $_` helps, because 1) it's probably an inconsistency and 2) it doesn't do anything for the "we consider $_ dynamic" case | 22:40 | ||
lizmat | .oO( torturing the core developers ) |
||
jnthn | I'm very tempted to submit an RFC for $_ to no longer be dynamic | ||
But I don't think lizmat would receive this too well ;) | |||
As the heaviest user I'm aware of of this feature :-) | 22:41 | ||
lizmat | well, if that would mean that CALLERS::<$_> wouldn't work anymore | ||
well, actually, if there could be a *si | 22:42 | ||
brrt | btw, what's stopping us from implementing dynamics in a single (thread)global table, and pushing, popping them on overrides | ||
(which is how i understand them to be implemented in perl5) | |||
lizmat | *signature* that would indicate "lift the $_ from the callers scope" | ||
jnthn | lizmat: Yes, CALLERS::<$_> not working any more would be the implication | ||
lizmat | that would take care of 100% of my usage of CALLERS:<$_> | ||
brrt | we.... could.. hack that together.... | 22:43 | |
jnthn | It did occur to me that if we could find an alternative solution we might be able to push such a change through | ||
If it's just a hack then it can be a symbol that we export that makes the compiler treat $_ as dynamic within the scope | |||
Then a use of a P5quotemeta or whatever would cause $_ to be dynamic | 22:44 | ||
lizmat | similar to the "_" prototype in Perl 5 | ||
jnthn | I feel kinda like I'm just not being creative/smart enough when I start pushing for a lang change because it makes optimization too hard... :P | ||
brrt is getting the feeling that the whirlpool is swirling a bit faster again | |||
Hmm | 22:45 | ||
Here's my take on it. | |||
lizmat | everything becomes fluid under enough pressure | ||
jnthn | But I've been pondering this one for a long time and I'm struggling on anything that seems like a good way to deal with it. | ||
brrt | Perl6 can't be 10 times slower on naive code, than perl5 is | 22:46 | |
If I see that nqp reaches pretty close in a 'good benchmark' to pre-compiled c, what with my naive JIT implementation and all, then I think that there's not *that* much more to expect there | 22:47 | ||
I mean, a factor 2 improvement would be nice, but not worldshattering | |||
and a factor 2 improvement is, I think, as good as we can be expected to do | 22:48 | ||
hell, a factor two improvement is substantial. And I'm not ruling out that the JIT can make a bigger impact on other benchmarks | 22:49 | ||
jnthn | I suspect it can do better than my hand-written bits of assembly :) | ||
(The expr JIT optimizing things some, that is :)) | |||
lizmat | jnthn: the idea of a special signature | ||
does that make sense ?\ | 22:50 | ||
jnthn | lizmat: It goes against the grain a bit much, I think | ||
lizmat | ah? syntax wise ? | ||
jnthn | No, just that we've not had signatures of callees determining caller semantics | 22:51 | |
Because multi-dispatch, and because it falls apart once you get any kind of later binding. | |||
22:51
fake_space_whale left
|
|||
lizmat | ok, I see | 22:51 | |
jnthn | my &a = $foo ?? &foo !! &bar; a() | 22:52 | |
brrt | especially when we can inline object accesses | ||
It's just a far cry from a factor 100, and we're going to have to look elsehwere for that | |||
anyway, /me will sleep | |||
jnthn | 'night, brrt o/ | ||
lizmat | 'gnight! | ||
jnthn | Thus why I suggested some kind of exportable pragma or some such | 22:53 | |
brrt | 'night | 22:54 | |
22:54
brrt left
|
|||
lizmat | in which scope? | 22:54 | |
jnthn | The scope that does the `use` | ||
That's the scope that'd be affected, I mean | |||
lizmat | and that would not be the quotemeta scope, right? | 22:55 | |
jnthn | The alternative is a more boring pragma and folks are expected to `use dynamic-var <$_>;` | ||
Well, my idea is the module providing P5quotemeta would do <insert whatever here> that causes the scope that does a `use` of that module to compile $_ as `is dynamic` | |||
lizmat | my $_ is dynamic # no new syntax needed? | ||
jnthn | So then you can CALLERS::<$_> as you do today | 22:56 | |
I was hoping to make a bit less boilerplate than that. | |||
lizmat | that scope and all scopes within it ? | ||
jnthn | Yes, just like any other pragma | ||
(scopes *lexically* within it, to be precise) | |||
lizmat | jnthn: FWIW, I think we need a mechanism for exportable pragma's more generally as well | 22:59 | |
23:00
Kazuto joined
|
|||
jnthn | Yes, true | 23:00 | |
Eventually that'd be solvable with macro/quasi stuff, but that's a bit further out | |||
23:00
p6bannerbot sets mode: +v Kazuto
23:01
Randy28 joined,
Randy28 left
|
|||
lizmat | we don't want to tell users of module X that they should also do a "use foo" pragma in that scope to make that module work properly | 23:02 | |
23:02
jim20 joined
|
|||
jnthn | Sure. So, provided we had some mechanism to make the user experience of your P5 modules the same as it is today under 6.d, would you be good with a change to make $_ not be `is dynamic` by default? | 23:02 | |
23:03
p6bannerbot sets mode: +v jim20,
avar left,
avar joined,
avar left,
avar joined,
p6bannerbot sets mode: +v avar,
Kazuto left,
p6bannerbot sets mode: +v avar
|
|||
lizmat | jnthn: as long as the module has a way to find out what $_ of the caller is | 23:04 | |
jnthn | Yes, that'd work with CALLERS::<$_> as today | ||
The only thing you'd need to do differently is export some pragma (and we may get a pragma export mechanism out of this) | 23:05 | ||
Heck, I'm willing to implement a pragma export mechanism in return for this :P | |||
lizmat | then sure: I mean, this is not about performance, this is about ease of migration | ||
jnthn | Yeah, and my feeling is that $_ was made dynamic by default precisely to aid such things | 23:06 | |
I'm not currently aware of a use of this feature outside of that | |||
And it wasn't until more recently that I realized just how much it costs us | |||
lizmat | ok | 23:07 | |
23:07
jim20 left
|
|||
jnthn | So if we can make it only cost something where it's used, that's nice. | 23:07 | |
I can think of some possible ways to try and deal with it in spesh without such a change but...the complexity (and so potential fragility) worries me | |||
23:08
Fieldy2 joined
23:09
p6bannerbot sets mode: +v Fieldy2
|
|||
jnthn | I'll see if I can draft something up tomorrow | 23:11 | |
23:13
Fieldy2 left
23:28
ManyRaptors16 joined
23:29
p6bannerbot sets mode: +v ManyRaptors16
|
|||
lizmat | jnthn: ok | 23:29 | |
23:29
ManyRaptors16 left
|
|||
lizmat | jnthn: meanwhile: should I take care of the other closed over classes, specifically wrt to iterators before the release ? | 23:29 | |
jnthn | lizmat: Yes, it seems fairly safe to do that :) | 23:31 | |
23:35
l4z4i joined
23:36
p6bannerbot sets mode: +v l4z4i
23:37
l4z4i left
|
|||
Geth | MoarVM/postrelease-opts: e105024646 | (Jonathan Worthington)++ | src/jit/x64/emit.dasc Use defined symbol rather than magic number brrt++ for suggesting |
23:37 | |
timotimo | $/ being dynamic isn't a problem like $_ being dynamic because it's not the default parameter of blocks and such? | 23:53 | |
AlexDaniel | by the way, changelog draft for MoarVM is also a thing: github.com/MoarVM/MoarVM/wiki/ChangeLog-Draft | 23:57 | |
I just realized that moarvm also has 400+ commits from the last release | |||
23:58
avar left
|
|||
timotimo | .tell brrt adding up consecutive reciprocals, isn't that a very, very bad case for rationals? making $x num doesn't help because it'll still first do rational for 1/$x and then turn it into Num; using $x += 1/$_.Num is loads faster | 23:59 | |
yoleaux | timotimo: I'll pass your message to brrt. | ||
23:59
avar joined,
avar left,
avar joined,
p6bannerbot sets mode: +v avar
|