github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
00:44 hoelzro joined 00:45 p6bannerbot sets mode: +v hoelzro 00:49 hoelzro left, annieslmaos joined 00:50 p6bannerbot sets mode: +v annieslmaos 00:56 annieslmaos left 01:13 avar left
timotimo jeez, my moar is a bit unstable at the moment 01:14
01:16 avar joined, avar left, avar joined, p6bannerbot sets mode: +v avar 01:17 p6bannerbot sets mode: +v avar
timotimo so, ovid had this benchmark, summing all 1 / n from 1 through 50_000_000 01:17
on his machine, perl5 did it with a "sub reciprocal" in like 9 seconds, inlining it brought it down to 2.5
on my machine, it takes about 7.5 seconds manually inlined and 9.3 seconds not inlined manually 01:18
anyway, turning the profiler on made it run in a minute instead of the 10 seconds, so ... that's fine, right? :D 01:25
MasterDuke manually inlined or not? 01:35
timotimo manually inlined 01:43
MasterDuke perf shows the top 2 functions are mp_mul_2d and mp_set_long at 10% and 4.25% 01:49
for 'my $s = 0e0; $s += 1/$_ for ^10_000_000+1; say $s'
01:50 Alex`21 joined 01:51 p6bannerbot sets mode: +v Alex`21
timotimo yeah, use 1e0 / $_ 01:51
that makes a major difference, unsurprisingly
01:52 Alex`21 left
MasterDuke ah, now it's get_num and MVM_gc_collect_free_nursery_uncopied at 5% and 4.3% (and total time was much shorter) 01:53
timotimo cool, something went wrong with the graphs on the gc page
MasterDuke ugh, why is it calling both Num and Real's infix:</> ? 01:55
timotimo perhaps it's only calling one of them only once to do dispacth checking? 01:56
anyway, i'll go to bed now
less than 1ms on average per GC run, beautiful 01:57
MasterDuke nope, 10m for each
timotimo OK
MasterDuke Real's is just `a.Bridge / b.Bridge`
so there's also 10m calls to Int's and Num's Bridge 01:58
timotimo those get a tiny bit better when you write 1e0 / $_.Num 01:59
seeya!
my next step would have been to look at the spesh bytecode
probably some boxing/unboxing wins to be had later on
MasterDuke later... 02:00
02:22 Humbedooh3 joined 02:23 p6bannerbot sets mode: +v Humbedooh3 02:27 Humbedooh3 left 02:36 Peng5 joined, Peng5 left 02:41 mist23 joined, p6bannerbot sets mode: +v mist23 02:46 mist23 left 02:53 hoelzro joined, p6bannerbot sets mode: +v hoelzro 03:10 Kaiepi left 03:20 Guest60854 joined, p6bannerbot sets mode: +v Guest60854 03:21 Guest60854 left 03:37 belak2 joined, p6bannerbot sets mode: +v belak2 03:38 belak2 left 03:45 matze6 joined, p6bannerbot sets mode: +v matze6 03:47 jrslepak12 joined, jrslepak12 left 03:48 matze6 left 03:49 avar left 03:54 avar joined, avar left, avar joined, p6bannerbot sets mode: +v avar 03:55 p6bannerbot sets mode: +v avar 04:36 orb joined 04:37 orb left 05:13 linear8 joined 05:14 p6bannerbot sets mode: +v linear8, linear8 left 05:29 avar left 05:30 qassim0 joined, p6bannerbot sets mode: +v qassim0 05:31 qassim0 left 05:42 Erynnn8 joined 05:43 p6bannerbot sets mode: +v Erynnn8 05:45 avar joined, avar left, avar joined, p6bannerbot sets mode: +v avar 05:46 p6bannerbot sets mode: +v avar 05:50 Erynnn8 left 06:56 supercool21 joined 06:57 p6bannerbot sets mode: +v supercool21, BruceS6 joined 06:58 p6bannerbot sets mode: +v BruceS6 06:59 BruceS6 left 07:00 supercool21 left 07:38 macky joined 07:39 p6bannerbot sets mode: +v macky 07:41 macky left 07:48 irc-5225225 joined 07:49 p6bannerbot sets mode: +v irc-5225225 07:52 irc-5225225 left 08:16 deltam3 joined, p6bannerbot sets mode: +v deltam3 08:21 deltam3 left, Kaiepi joined 08:22 p6bannerbot sets mode: +v Kaiepi 08:24 phillid joined, phillid left 08:39 lizmat joined, p6bannerbot sets mode: +v lizmat 08:49 MasterDuke left 09:11 CurryWurst joined 09:12 p6bannerbot sets mode: +v CurryWurst 09:15 CurryWurst left 09:37 supercool20 joined 09:38 p6bannerbot sets mode: +v supercool20, supercool20 left 09:49 CoJaBo24 joined 09:50 p6bannerbot sets mode: +v CoJaBo24 09:53 OwenBarfield joined 09:54 p6bannerbot sets mode: +v OwenBarfield 09:55 CoJaBo24 left 09:57 OwenBarfield left 10:33 verm1n9 joined 10:34 verm1n9 left 10:46 Kaypie joined 10:47 p6bannerbot sets mode: +v Kaypie 10:48 Kaiepi left 10:56 Guest738 joined 10:57 p6bannerbot sets mode: +v Guest738 10:58 Guest738 left 11:11 Kaypie left 11:14 Kaiepi joined 11:15 p6bannerbot sets mode: +v Kaiepi 11:29 ski_ joined 11:30 p6bannerbot sets mode: +v ski_ 11:31 ski_ left
timotimo the reciprocal function, which is just 'sub reciprocal(num $int) { 1e0 / $int }' is 570 bytes big, 410 of that from inlined frames 11:54
that's infix:</>, which was 404 bytes big before 11:55
the majority of it comes from the test against division-by-zero
jnthn I'm surprised / didn't staticly inline...
Ohh...that.
timotimo at least that's my guess
jnthn *nod*
Note that under my postrelease-opts branch, it would subtract the size of the inline when considering whether reciprocal itself should be possible to inline 11:56
Where is the code involved here, and is there a Perl 5 and a Perl 6 version?
timotimo that came from a slide in ovid's talk about perl 5 and perl 6
he showed how the code in perl5 went from 9 seconds to 2.5 seconds by manually inlining the reciprocal function into the hot loop 11:57
jnthn Ah
timotimo m: my $s = 0e0; $s += 1/$_ for ^10_000_000+1; say $s
jnthn Yeah, we suffer from expensive calling too, which is why automated inlining saves us :)
timotimo this is basically the code, but in my case it was wordier
camelia 16.695311365857272
timotimo right before it enters the infix:</> inline it grabs three spesh slots that aren't visibly used, so another case of deopt forcing us to keep stuff around 11:58
jnthn It should tell which deopt points cause that though
timotimo one second 11:59
jnthn Also note that if there are inlines, there's at the moment always one register holding the code object kept around
Which we do so that we can reconstruct the callstack should there be an exception
Or so we can deopt from the inline
If we know an inlinee can never possibly throw or cause deopt in any situation whatsoever (like identity, which optimizes away entirely) then we can avoid that, but spesh slot loads are cheap so I didn't make that a priority :) 12:00
timotimo for one it's deopt=1, for the next it's also deopt=1, the next is -1, and it also sets the value from the last one into a register on the inside of the inline, but that doesn't have a deopt printed in the facts list
ah
that's the one you were refering to just now
holding the code for uninline/stack reconstruct
so deopt=-1 refers to "kept around because inline" 12:01
BB 3 is the one that has the getspeshslots; it starts FH Start (3), Logged, Ins Deopt One idx=0, then the two spesh slots, then INS deopt one idx=1, INS deopt one idx=2, getspeshslot again 12:02
12:02 MasterDuke joined, p6bannerbot sets mode: +v MasterDuke
timotimo it doesn't seem like we should keep deopts around when they're on a getspeshslot, though? 12:02
12:03 MasterDuke left, MasterDuke joined, herbert.freenode.net sets mode: +v MasterDuke, p6bannerbot sets mode: +v MasterDuke, danielhuman joined
timotimo or is that for "all instructions up to the next deopt annotation"? 12:03
12:03 p6bannerbot sets mode: +v danielhuman 12:04 danielhuman left 12:05 drakythe joined, drakythe left
jnthn -1 means "unconditionally retained" 12:10
Also...you need to look at the original code to see what the deopt point was originally one
*on
Because the annotations shift during instruction deletions etc.
timotimo OK 12:11
jnthn It's keeping a *lot* less for deopt these days, though, and when I've looked carefully at a few cases where I thought it should not be, then - aside from the case I already mentioned where an inline could never deopt or throw - it's turned out to have been correct.
timotimo the first was on decont from getlexstatic of infix:</>, next one for prepargs, then a One and an All on the invoke_o
i'm not sure how exactly to look at the deopt situation 12:12
a tool that puts original and optimized side-by-side and matches up parts automatically would surely be super nice
jnthn *nod*
Yeah, though that will get harder with time too
(When we add code motion to move stuff out of loops for example) 12:13
timotimo ah, indeed
as long as we keep re-using the actual ins struct, we can totally output the addresses in the msgpack version of the spesh log and match those up 12:14
jnthn But yeah, stuff is increasingly being lowered to the point where it's hard to look at the optimized output and know that it maybe used to be :)
timotimo if we want to, we can be extra sneaky and output the starting addresses of the spesh alloc blocks and figure out the order of allocations of things :)
jnthn This is a good sign overall :) 12:15
timotimo it is!
jnthn lunch, bb :) 12:17
*bbl
12:19 d__b joined 12:20 d__b left, MasterDuke left 12:27 MasterDuke joined, p6bannerbot sets mode: +v MasterDuke 12:29 MasterDuke left, MasterDuke joined, herbert.freenode.net sets mode: +v MasterDuke, p6bannerbot sets mode: +v MasterDuke
timotimo kind of looks like the failure creation and returning is keeping the decontrv from being inlined, i.e. it stays as an sp_speshresolve in the reciprocal code body 12:31
the profiler may want to learn about speshresolve in particular 12:33
any objections to giving the spesh plugin subs names? that way they'll show up clearly in the call graph and routine overview 12:35
oh my, i just now see that jitting wasn't even successful for the reciprocal sub 12:39
that'll be interesting
ah, param_rp_n bails it
8.9s instead of 24.5s when switching reciprocal's parameter from num $int (haha) to just $int 12:42
MasterDuke whoops 12:44
12:45 acerbic joined 12:46 p6bannerbot sets mode: +v acerbic 12:47 casdr8 joined, p6bannerbot sets mode: +v casdr8
jnthn Yeah, native/non-native boundary cases can go pretty badly at the moment. 12:48
12:48 casdr8 left 12:53 acerbic left, ThiefMaster20 joined
timotimo wow, haha 12:54
12:54 p6bannerbot sets mode: +v ThiefMaster20
timotimo that seems weird 12:55
what inline am i looking at here ā€¦
12:56 ThiefMaster20 left
timotimo ooh it's pull-one 12:56
now it makes total sense
mhhh, let's put spesh comments on p6obind_* and friends that tell us what the attribute's name was 13:00
13:08 Selfsigned joined, p6bannerbot sets mode: +v Selfsigned 13:10 Selfsigned left
MasterDuke huh. changing `for ^10_000_000+1 { ... }` to `for 1..10_000_000 { ... }` is a bit faster. and the 10m calls to pull-one in Rakudo::Iterator is completely gone from the profile 13:21
13:22 Sitri joined 13:23 p6bannerbot sets mode: +v Sitri 13:24 Sitri left 13:36 Shrooms18 joined 13:37 p6bannerbot sets mode: +v Shrooms18 13:39 Shrooms18 left
timotimo that means the range optimization takes hold in that case, right? 13:40
my benchmark has a sub sum_reciprocals_to($int) and for 1..$int
13:54 Awesomecase joined 13:55 p6bannerbot sets mode: +v Awesomecase
timotimo time perl6 -e 'my num @parts = 1e0 / ++$ xx 5_000_000; say @parts.sum' 13:59
4.56user 0.23system 0:04.23elapsed 113%CPU (0avgtext+0avgdata 376736maxresident)k
if you have loads of ram, this is also a possibility %)
14:00 FuzzySockets joined, Awesomecase left, p6bannerbot sets mode: +v FuzzySockets 14:04 FuzzySockets left
MasterDuke m: for ^1_000_00+1 { Nil for ^100+1 }; say now - INIT now 14:20
camelia 1.9058206
MasterDuke m: for 1..1_000_00 { Nil for 1..100 }; say now - INIT now
camelia 0.6517606
MasterDuke could the first be optimized into the second?
timotimo /* getattr_o of '$!do' in Code of a Block */ 14:21
[Annotation: Logged (bytecode offset 72)]
jnthn Umm....I think so, but note that prec really is (^100) + 1
timotimo sp_p6ogetvc_o r10(15), r1(2), liti16(8), sslot(3)
that's the intent here
to go from 1 to 100 instead of 0 to 99
jnthn But in theory it can constant fold, I think
timotimo i'd assume ^(100 + 1) already constfolds
jnthn: would you like to see that kind of comment in the spesh log? 14:22
jnthn timotimo: Yeah, though the off indentation will probably drive me nuts :P :P
timotimo that was intentional, but can just as easily be adjusted
jnthn ah :)
timotimo oh, i see that we're only doing getattr_i lowering if the bits are 64; you think that's something worthwhile to expand to other sizes? 14:24
(also, no check for signed vs unsigned)
MasterDuke ah. `for ^10` after optimization is a while, but `for ^10+1` is a p6forstmt and a Range 14:27
timotimo yup 14:28
the optimization looks directly for a range operator as first child
but here it's a + operator instead
MasterDuke so we could add a check if it's + a constant, just add that constant to the initial value and condition of the while? 14:30
jnthn timotimo: Yeah, that can be extended to the other sizes, they're just less common so less to win 14:31
timotimo right
MasterDuke: that's right. check for the range in both the first or second argument and maybe also support - and *? 14:32
MasterDuke and / ? 14:33
timotimo perhaps, but that's kind of likely to get us into Rats and then we no longer optimize the thing 14:34
MasterDuke true 14:35
timotimo gist.github.com/timo/92101baccc059...d2f48af1d8 - looks pretty good i'd say 14:56
+/- indentation of the comments 14:57
indentation is changed now 14:58
jnthn m: say 0.963 / 1.323 15:03
camelia 0.727891
jnthn That's for `my $a = 0; for ^10_000_000 { $a = $a + 2 }; say $a`
timotimo nice! 15:04
jnthn Second number is after I add lowering and JIT of add_I, with doing the calculation directly in the JIT output if the inputs are smallint
timotimo *nice*
jnthn We don't have to range check the result in assembly either, we just do it in a 32-bit register and jump on overflow :) 15:05
timotimo m: say (2 ** 32 - 1) - 10_000_000
camelia 4284967295
timotimo m: say (2 ** 32 - 1)
camelia 4294967295
timotimo ah, that fits very comfortably into 32bit, too
at some point i really should develop an intuition for these literal values 15:06
jnthn The allocation of the result is fastcreate'd too
Which no doubtt helps
It's another 7% off the utf8 million line reading benchmark that adds up the number of chars too :) 15:07
timotimo that sounds very good
jnthn Yeah. Will clean up the patch a bit later and push. I stubbed in sub_I and mul_I lowering too, but still need to fill them out
timotimo how do you feel about annotating lots and lots of getspeshslot ins's with comments saying what it's for? 15:08
jnthn Could we just put that on the same line but after the instruction?
timotimo there's surely some point where adding more comments is just extra noise
very possible; what if there's multiple comments on one instruction?
jnthn There is, but this one could safe a lot of cross-referencing
Oh, I meant that we could do this as a special case in the dumper for sp_getspeshslot :) 15:09
timotimo oh
jnthn But yeah, maybe we could do it generally for comments too
timotimo yeah, could do that
jnthn Comment on the line when it's just one comment
Comments before when multiple
Like #= vs #| in Perl 6 ;)
timotimo would it be fine to put all comments after all annotations in that case?
jnthn Yeah 15:10
timotimo then i don't have to do a pre-scan for comments
is /* ... */ fine with you? or perhaps use # instead? ;)
jnthn I guess # is 3 less characters of clutter :) 15:11
Even more with whitespace not considered 15:12
Time for a break 15:14
timotimo will update the gist soon
there it is 15:16
15:18 Vorpal26 joined 15:19 Vorpal26 left
timotimo not bad. i accidentally left /* */ for more-than-one, but somehow i like it, too. i'll turn it into # soon, though 15:19
i wonder if i should go to the trouble of looking up the attribute name for the unboxes and output that in a spesh comment, too 15:33
probably not quite as useful, though if you can just search for an attribute name and find every actual use of it in the spesh log, that could be good, too
sp_p6oget_i r8(3), r0(2), liti16(8) # getattr_i of '$!i' in <anon|19> 15:35
sp_fastbox_bi_ic r6(3), liti16(40), sslot(5), liti16(32), r8(3), liti16(1) # box_i into a Int
and also:
sp_fastcreate r9(2), liti16(40), sslot(10) # box_n into a Num
sp_bind_n r9(2), liti16(32), r8(2)
the comment there could go either on the fastcreate or on the bind, don't really have a preference there.
15:43 reportable6 joined 15:44 p6bannerbot sets mode: +v reportable6 15:46 ZofBot left, ZofBot joined 15:47 p6bannerbot sets mode: +v ZofBot 15:58 lizmat left 16:03 fake_space_whale joined 16:04 p6bannerbot sets mode: +v fake_space_whale 16:12 zakharyas joined 16:13 p6bannerbot sets mode: +v zakharyas 16:19 nullrouted joined 16:20 p6bannerbot sets mode: +v nullrouted 16:22 nullrouted left 16:32 Fleet21 joined 16:33 Fleet21 left 17:05 zakharyas left 17:06 zakharyas joined 17:07 p6bannerbot sets mode: +v zakharyas 17:10 Kaiepi left 17:23 Ambroisie joined 17:24 p6bannerbot sets mode: +v Ambroisie 17:26 Ambroisie left
jnthn timotimo: On the fastcreate is probably fair enough 17:29
17:30 MikeoftheEast7 joined 17:31 p6bannerbot sets mode: +v MikeoftheEast7 17:34 MikeoftheEast7 left 17:45 Erynnn19 joined 17:46 zakharyas left, Erynnn19 left 17:55 acronix14 joined, p6bannerbot sets mode: +v acronix14 18:00 acronix14 left 18:09 BrianBlaze21 joined 18:10 p6bannerbot sets mode: +v BrianBlaze21 18:14 BrianBlaze21 left 18:15 metax joined 18:16 p6bannerbot sets mode: +v metax 18:17 TingPing4 joined, p6bannerbot sets mode: +v TingPing4 18:18 TingPing4 left 18:19 metax left 18:34 zakharyas joined 18:35 p6bannerbot sets mode: +v zakharyas
timotimo OK, i need to sort out this mess of commits i've spread out between spesh_comments and postrelease_ops 18:38
18:48 Alex`16 joined, Alex`16 left
Geth MoarVM/spesh_comments: 6 commits pushed by (Timo Paulssen)++ 19:03
timotimo i think this branch is clean to be merged
jnthn After release ;) 19:05
Geth MoarVM: jstuder-gh++ created pull request #942:
Improve exception msg for slice op on VMArray
timotimo i meant into the postrelease-opts branch, which i rebased it onto :) 19:06
Geth MoarVM/postrelease-opts: 477dc4cf4c | (Jonathan Worthington)++ | 13 files
Lower add_I, sub_I, and mul_I where possible

When the input and output types are consistent (which should be the overwhelmingly common case) we JIT-compile these into code that tries to do the operation directly if we are dealing with two smallint input values, and provided it doesn't overflow stores it back. If either of those two conditions isn't met, it falls back to a slow path. Since we ... (8 more lines)
19:08
jnthn timotimo: oh, that's OK :)
walk & 19:10
timotimo i'll merge :) 19:11
19:13 domidumont joined
Geth MoarVM/postrelease-opts: 7 commits pushed by (Timo Paulssen)++ 19:13
19:13 p6bannerbot sets mode: +v domidumont 19:19 domidumont left 19:20 domidumont joined 19:21 p6bannerbot sets mode: +v domidumont, JSharp16 joined 19:22 p6bannerbot sets mode: +v JSharp16 19:23 JSharp16 left 19:24 domidumont left 19:47 Kaiepi joined 19:48 p6bannerbot sets mode: +v Kaiepi 19:52 alphor20 joined 19:53 p6bannerbot sets mode: +v alphor20 19:56 alphor20 left 20:05 zakharyas left 20:08 JustTheDoctor2 joined, p6bannerbot sets mode: +v JustTheDoctor2 20:13 JustTheDoctor2 left 20:24 zakharyas joined 20:25 p6bannerbot sets mode: +v zakharyas 20:31 zakharyas left 20:35 deedra13 joined 20:36 p6bannerbot sets mode: +v deedra13, deedra13 left, Soni22 joined, p6bannerbot sets mode: +v Soni22 20:38 Soni22 left 20:40 chaoscon14 joined 20:41 p6bannerbot sets mode: +v chaoscon14 20:46 zakharyas joined 20:47 p6bannerbot sets mode: +v zakharyas 20:48 chaoscon14 left
timotimo goto BB(224) # throwcatdyn of category 16 for handler 9 20:49
that could be helpful and interesting?
the reason why a frame couldn't be inlined can also go in a spesh comment on one of the inliner's instructions 21:00
21:04 zakharyas left 21:05 zakharyas joined, p6bannerbot sets mode: +v zakharyas 21:07 bungle0 joined, p6bannerbot sets mode: +v bungle0
timotimo cool. 21:08
21:10 lizmat joined 21:11 p6bannerbot sets mode: +v lizmat 21:13 bungle0 left
timotimo nice. 21:18
MasterDuke ?
timotimo now it also puts a comment "inline of 'foo' (123) candidate 99" on the first instruction of an inline
21:19 catfuneral joined
MasterDuke cool. you also added the reason things couldn't be inlined? 21:19
timotimo yup!
21:19 catfuneral left
timotimo sp_fastinvoke_o r5(23), r45(0), liti16(0) # could not inline 'symbol' (157) candidate 0: bytecode is too large to inline 21:19
MasterDuke does that remove the need for MVM_SPESH_INLINE_LOG? 21:20
timotimo the inline log is much denser and maybe better for some use cases
MasterDuke ah
timotimo sp_getspeshslot r33(3), sslot(9) # method lookup of '!sort_dispatchees_internal' on a Method 21:21
^- also nice, i think
Geth MoarVM/postrelease-opts: 4efe1b3b2e | (Timo Paulssen)++ | src/spesh/optimize.c
comment for result of optimize_method_lookup

will put a "method lookup of '$name'" after the resulting getspeshslot instruction
21:32
MoarVM/postrelease-opts: a091eb6cc8 | (Timo Paulssen)++ | src/spesh/optimize.c
comment on inline success/failure

on success: puts the name, cuuid, and spesh candidate id on the first instruction of the inlined code
  (potentially after the inlined code or not into
the spesh graph at all if it was reduced to nothingness?)
on failure: the same as above, plus the failure reason.
MoarVM/postrelease-opts: 74b219bc2f | (Timo Paulssen)++ | src/spesh/optimize.c
comment on throwcat* with category and handler id

if it's optimized to a goto
timotimo jnthn: do you think any of the changes made inside inline.c deserve a comment added to the spesh log? 21:45
jnthn Hm, like "rewritten return" or "rewritten arg" or something? 21:49
Maybe
timotimo hm, i guess "arg 0", "arg 1", "named arg foo" could be interesting; do we even still have the info about named args at that point? 21:50
jnthn No
Not easily
in args.c we do the transform
But by the time we inline it we've formed and re-parsed bytecode
bbs
timotimo OK, so perhaps the spesh candidate is also gone already 21:51
21:54 brrt joined, p6bannerbot sets mode: +v brrt
brrt jnthn++ 21:56
timotimo oh hey brrt!
how often do you use the graphviz stuff in the jit log? i'm a little annoyed i have to constantly skip past it :D
brrt nine: re: devbranch, releasebranch, master - I'm also in favor of having a release-branch+master, mostly so we can continue doing what we always do whenever the release process is underway 21:57
timotimo: when debugging
i find it invaluable
timotimo OK, i'll just have to come up with something :)
brrt hm
timotimo maybe i'll just keep using grep for Constructing, Entering, BAIL
brrt I was actually thinking of killing the JIT log entirely
and folding it into the spesh log 21:58
timotimo oh, spesh logs are already often in the hundreds of megabytes %)
brrt that way, we get spesh info + JIT info in the same place
timotimo that's true, it'll be in the right spot immediately
brrt 'disk is cheap'
:-)
21:58 zakharyas left
timotimo reading a million lines with perl6 is getting faster and faster, too ;) 21:59
brrt significantly, even 22:00
timotimo is that so?
well, add to it a check or two, like "contains" or "starts-with" and suddenly it's much more expensive :) 22:01
brrt :-(
timotimo i don't have actual numbers to back this up
brrt re: the reciprocal benchmark
on my machine, the naive perl5 version, 0.6s
timotimo whoa
brrt perl6 runs the same code, 26s
timotimo what kind of potato does ovid have? :)
it already gets lots cheaper if you remove "return" ;) 22:02
brrt well, what was the number of iterations of his version?
timotimo 50_000_000
brrt ah
i have 10_000_000
let me try that out too...
2.6s 22:03
for perl5
timotimo that is the one without manual inlining?
brrt that is the one with manual inlining 22:04
timotimo OK
the reciprocal speshlog is only 165k lines 22:05
so the 0.4 seconds it takes to count all lines starting with "Total" isn't saying so much 22:06
brrt you know, ideal world, we'd both have single textual debug log, potentially with a bunch of flags, and a structured way of getting the same from the deubg server 22:07
gist.github.com/bdw/42819001c1a083...818acf99b6 22:18
anyway
if i write it in nqp, I get 0.636s of runtime 22:19
if i use non-native objects, this increases to 18s 22:20
so. our boxing and unboxing is quite costly
fwiw, the same code in C, on my machine, runs in 0.22s 22:24
so
the long story very short
MoarVM is withinin a factor of three of C, including asynchronous specialization jit compilation, when using native types 22:25
perl6 is a factor of 1000 off
the lesson here is that there is about 70% gain to be expected, at most, from better JIT compilation 22:27
java is 0.4s 22:32
so better than MoarVM, but not by all that much
lizmat brrt: isn't that a factor of 100 compared to C ?
brrt lizmat: i'm talking about this one specifically: gist.github.com/bdw/42819001c1a083...procal-nqp
that reliably runs in 0.6s on my machine 22:33
lizmat m: for ^1_000_000 { }; say now - INIT now 22:34
camelia 0.0703057
lizmat m: for ^10_000_000 { }; say now - INIT now
camelia 0.50295888
lizmat m: for ^10_000_000 -> int $_ { }; say now - INIT now
camelia 0.1287333
lizmat that's boxing for you
brrt :-( 22:35
jnthn It's worse than that.
lizmat ?
jnthn The good news is that this one is quite a bit faster in postrelease-opts
Because boxing got a good bit faster
brrt :-) 22:36
jnthn In that branch if you write the equivalent code in Python we're faster, and if you write the equivalent code in Ruby we're only a little slower. Perl 5 still beats us, but within a factor of 2, for the for ^10_000_000 { } case
lizmat jnthn: also, I was thinking that -> int $_ could be the default signature in 6.d ?
jnthn But what's *really* annoying about this case is that $_ is dynamic 22:37
brrt is not seeing any reasonable code in which that'd break, so is not against it
jnthn If it were just a boring old lexical it'd already be lowered
And then the box would be thrown out
But because $_ is declared `is dynamic` then we can't do that
In fact, thanks to anything anywhere any number of levels deep being able to do CALLER::CALLER::blah, we can't do much 22:38
I've been pondering how to deal with this for the last month
And it's really icky
brrt CALLER makes many things impossible
jnthn In hot loops if we inline everything we can sorta do away with it, if we learn to analyze lexicals better
brrt or well, much harder than they ought to be 22:39
jnthn It's not that bad, because most things aren't dynamic
The problem is that $_ *is* and tons of things use it
As in, lots of common idioms
I don't think `int $_` helps, because 1) it's probably an inconsistency and 2) it doesn't do anything for the "we consider $_ dynamic" case 22:40
lizmat
.oO( torturing the core developers )
jnthn I'm very tempted to submit an RFC for $_ to no longer be dynamic
But I don't think lizmat would receive this too well ;)
As the heaviest user I'm aware of of this feature :-) 22:41
lizmat well, if that would mean that CALLERS::<$_> wouldn't work anymore
well, actually, if there could be a *si 22:42
brrt btw, what's stopping us from implementing dynamics in a single (thread)global table, and pushing, popping them on overrides
(which is how i understand them to be implemented in perl5)
lizmat *signature* that would indicate "lift the $_ from the callers scope"
jnthn lizmat: Yes, CALLERS::<$_> not working any more would be the implication
lizmat that would take care of 100% of my usage of CALLERS:<$_>
brrt we.... could.. hack that together.... 22:43
jnthn It did occur to me that if we could find an alternative solution we might be able to push such a change through
If it's just a hack then it can be a symbol that we export that makes the compiler treat $_ as dynamic within the scope
Then a use of a P5quotemeta or whatever would cause $_ to be dynamic 22:44
lizmat similar to the "_" prototype in Perl 5
jnthn I feel kinda like I'm just not being creative/smart enough when I start pushing for a lang change because it makes optimization too hard... :P
brrt is getting the feeling that the whirlpool is swirling a bit faster again
Hmm 22:45
Here's my take on it.
lizmat everything becomes fluid under enough pressure
jnthn But I've been pondering this one for a long time and I'm struggling on anything that seems like a good way to deal with it.
brrt Perl6 can't be 10 times slower on naive code, than perl5 is 22:46
If I see that nqp reaches pretty close in a 'good benchmark' to pre-compiled c, what with my naive JIT implementation and all, then I think that there's not *that* much more to expect there 22:47
I mean, a factor 2 improvement would be nice, but not worldshattering
and a factor 2 improvement is, I think, as good as we can be expected to do 22:48
hell, a factor two improvement is substantial. And I'm not ruling out that the JIT can make a bigger impact on other benchmarks 22:49
jnthn I suspect it can do better than my hand-written bits of assembly :)
(The expr JIT optimizing things some, that is :))
lizmat jnthn: the idea of a special signature
does that make sense ?\ 22:50
jnthn lizmat: It goes against the grain a bit much, I think
lizmat ah? syntax wise ?
jnthn No, just that we've not had signatures of callees determining caller semantics 22:51
Because multi-dispatch, and because it falls apart once you get any kind of later binding.
22:51 fake_space_whale left
lizmat ok, I see 22:51
jnthn my &a = $foo ?? &foo !! &bar; a() 22:52
brrt especially when we can inline object accesses
It's just a far cry from a factor 100, and we're going to have to look elsehwere for that
anyway, /me will sleep
jnthn 'night, brrt o/
lizmat 'gnight!
jnthn Thus why I suggested some kind of exportable pragma or some such 22:53
brrt 'night 22:54
22:54 brrt left
lizmat in which scope? 22:54
jnthn The scope that does the `use`
That's the scope that'd be affected, I mean
lizmat and that would not be the quotemeta scope, right? 22:55
jnthn The alternative is a more boring pragma and folks are expected to `use dynamic-var <$_>;`
Well, my idea is the module providing P5quotemeta would do <insert whatever here> that causes the scope that does a `use` of that module to compile $_ as `is dynamic`
lizmat my $_ is dynamic # no new syntax needed?
jnthn So then you can CALLERS::<$_> as you do today 22:56
I was hoping to make a bit less boilerplate than that.
lizmat that scope and all scopes within it ?
jnthn Yes, just like any other pragma
(scopes *lexically* within it, to be precise)
lizmat jnthn: FWIW, I think we need a mechanism for exportable pragma's more generally as well 22:59
23:00 Kazuto joined
jnthn Yes, true 23:00
Eventually that'd be solvable with macro/quasi stuff, but that's a bit further out
23:00 p6bannerbot sets mode: +v Kazuto 23:01 Randy28 joined, Randy28 left
lizmat we don't want to tell users of module X that they should also do a "use foo" pragma in that scope to make that module work properly 23:02
23:02 jim20 joined
jnthn Sure. So, provided we had some mechanism to make the user experience of your P5 modules the same as it is today under 6.d, would you be good with a change to make $_ not be `is dynamic` by default? 23:02
23:03 p6bannerbot sets mode: +v jim20, avar left, avar joined, avar left, avar joined, p6bannerbot sets mode: +v avar, Kazuto left, p6bannerbot sets mode: +v avar
lizmat jnthn: as long as the module has a way to find out what $_ of the caller is 23:04
jnthn Yes, that'd work with CALLERS::<$_> as today
The only thing you'd need to do differently is export some pragma (and we may get a pragma export mechanism out of this) 23:05
Heck, I'm willing to implement a pragma export mechanism in return for this :P
lizmat then sure: I mean, this is not about performance, this is about ease of migration
jnthn Yeah, and my feeling is that $_ was made dynamic by default precisely to aid such things 23:06
I'm not currently aware of a use of this feature outside of that
And it wasn't until more recently that I realized just how much it costs us
lizmat ok 23:07
23:07 jim20 left
jnthn So if we can make it only cost something where it's used, that's nice. 23:07
I can think of some possible ways to try and deal with it in spesh without such a change but...the complexity (and so potential fragility) worries me
23:08 Fieldy2 joined 23:09 p6bannerbot sets mode: +v Fieldy2
jnthn I'll see if I can draft something up tomorrow 23:11
23:13 Fieldy2 left 23:28 ManyRaptors16 joined 23:29 p6bannerbot sets mode: +v ManyRaptors16
lizmat jnthn: ok 23:29
23:29 ManyRaptors16 left
lizmat jnthn: meanwhile: should I take care of the other closed over classes, specifically wrt to iterators before the release ? 23:29
jnthn lizmat: Yes, it seems fairly safe to do that :) 23:31
23:35 l4z4i joined 23:36 p6bannerbot sets mode: +v l4z4i 23:37 l4z4i left
Geth MoarVM/postrelease-opts: e105024646 | (Jonathan Worthington)++ | src/jit/x64/emit.dasc
Use defined symbol rather than magic number

  brrt++ for suggesting
23:37
timotimo $/ being dynamic isn't a problem like $_ being dynamic because it's not the default parameter of blocks and such? 23:53
AlexDaniel by the way, changelog draft for MoarVM is also a thing: github.com/MoarVM/MoarVM/wiki/ChangeLog-Draft 23:57
I just realized that moarvm also has 400+ commits from the last release
23:58 avar left
timotimo .tell brrt adding up consecutive reciprocals, isn't that a very, very bad case for rationals? making $x num doesn't help because it'll still first do rational for 1/$x and then turn it into Num; using $x += 1/$_.Num is loads faster 23:59
yoleaux timotimo: I'll pass your message to brrt.
23:59 avar joined, avar left, avar joined, p6bannerbot sets mode: +v avar