»ö« Welcome to Perl 6! | perl6.org/ | evalbot usage: 'perl6: say 3;' or rakudo:, niecza:, std:, or /msg camelia perl6: ... | irclog: irc.perl6.org | UTF-8 is our friend!
Set by sorear on 25 June 2013.
timotimo t.h8.lv/p6bench/2014-04-08-spesh.html - got some graphs for y'all 00:06
did i really build these with spesh? :\
it seems like i did; it must be that the first few benchmarks hardly benefit from it 00:10
but forest fire already got a few percent faster
timotimo interesting to see is the apparent regression of for loops? 00:11
from 2014.02 to 2014.03
timotimo oh well. 00:15
timotimo it's still pretty discouraging to see the performance difference between rakudo and nqp 00:15
timotimo oh, huh 00:21
didn't i have something i wanted to offer as low hanging fruit for the weekly this week?
still quite a lot of work to do in the optimization department then ... 00:24
that just means i'll have sufficient amounts of stuff to do in the coming time
jnthn timotimo: Remember that spesh only kicks in after things are *invoked* a bunch of times. If the main body of the program is just a loop, there's not a lot for it to do, I'd guess... 06:45
lizmat it's windy today 10:00
dalek ast: baf5685 | (Elizabeth Mattijsen)++ | S17-concurrency/lock.t:
Add a Thread/lock stress test

Which so far fails all the time. Not sure whether the test is ok.
10:02
jnthn lizmat: fails all the time on just Moar? 10:29
jnthn lizmat: If it fails on JVM too it's probably the test. 10:29
dalek osystem: cb659e0 | dagurval++ | META.list:
Added WebService::Justcoin
10:35
lizmat jnthn: the test fails on the JVM as well, but generally later, as in up to 12 times (rather than the first time already in the moarvm case) 10:59
lizmat I'm not completely sure the code is ok, but then again, Lock.condition and the ConditionVariable class ate not specced 11:00
so speccially, (as opposed to technically), I'm not sure what they should do 11:01
commuting& 11:02
timotimo o/ 11:14
jnthn: that's a good point that i hadn't considered! 11:16
jnthn lizmat: They ain't spec'd yet, but they have the semantics you'd expect condvars to have anywhere. 11:20
(provided you have an expectation ;))
timotimo: While my students do exercises I've been looking at some of the benchmarks and the code they call. 11:21
timotimo: The worst offender is the push one
jnthn Which is 2000 times slower o.O 11:22
Then I found why: we always slurpy push
Even for the single value case.
I'll probably fix that one tonight.
The other huge pain point that makes assigning to arrays and hashex expensive is:
%h<a> = 42;
timotimo oops, slurpy push is certainly expensive for single values 11:23
jnthn This creates a scalar which a whence, which in turn means taking a closure, which means a Block and CodeRef allocation.
timotimo oh! oof! :)
jnthn timotimo: yes, and in push @a, ...; push sub is slurpy and so is push method!
Anyway, the whence the triggers the lambda and binds ths scalar 11:24
But if we know full well we're going to be assigning...which we often can syntactically, that is an insane amount of overhead.
timotimo ah, i get it now
jnthn The reason for all this is that: 11:25
r: my $a := %h<a>; say %h<a>:exists; $a = 42; say %h<a>:exists;
timotimo it's all autoviv
camelia rakudo-jvm 2b8977: OUTPUT«(timeout)»
..rakudo-parrot 2b8977, rakudo-moar 2b8977: OUTPUT«===SORRY!=== Error while compiling /tmp/tmpfile␤Variable '%h' is not declared␤at /tmp/tmpfile:1␤------> my $a := %h<a>⏏; say %h<a>:exists; $a = 42; say %h<a>:e␤ expecting any of:␤ …»
jnthn m: my %h; my $a := %h<a>; say %h<a>:exists; $a = 42; say %h<a>:exists; 11:26
camelia rakudo-moar 2b8977: OUTPUT«False␤True␤»
timotimo oh!
jnthn That has to work.
But that's *not* the common case
timotimo yeah, that's certainly something you don't see that often
jnthn But we pessimize every darn thing.
timotimo that's what we do :)
it may seem easy to syntactically find out, but how do we signal it downwards?
also, will the same kind of depessimization help slurpy assignments like %h<a b c> = 1, 2, 3? 11:27
jnthn I'm pondering how to do it 11:28
I know full well whatever I come up with, Pm and/or TimToady probably won't like it. 11:29
We already ahve a bind_key and bind_pos; I'm tempted to do assign_keya nd assign_pos...
timotimo that's new public api?
jnthn I guess they would be, yeah :S 11:30
I think we can't afford to not do something like this, though.
timotimo it'll definitely give us a nice speed boost in the general case 11:32
dalek kudo-star-daily: bb44ecd | coke++ | log/ (5 files):
today (automated commit)
11:34
jnthn Overall, I plan to make a pass through CORE.setting and look carefully at the bytecode we're generating 11:45
For each of the benchmarks.
Finishing up multispec will also help. 11:47
nwc10 what's multispec? 11:48
[Coke] we're having enough trouble with a monospec! 11:50
jnthn nwc10: In a multi-dispatch we currently always make a call tot he proto first 11:51
nwc10: If the proto simply says "look in the cache", then that's a huge waste of a callframe and other things.
nwc10: multispec lets us tell the VM how to find the cache out of the code object 11:52
nwc10 ah right. Yes, that does sound like a place to win speed
jnthn So it can completely skip that.
nwc10 which will benefit all 3 backends?
oh, how is the JS backend work?
jnthn Well, in the case of things like infix:<+>, I think the proto is at least as costly as the operation itself.
It needs to be implemented per backend... 11:53
nwc10 yes, but how often do we call + :-) 11:53
jnthn I'll likely also do it for JVM.
Somebody else can take care of it for Parrot if they want things faster there.
guess we can PIC it on the two also 11:55
to abuse the term PIC a little :) 11:56
timotimo i don't know what that term means :(
[Coke] position independent code? 12:00
nwc10 aha 12:04
Polymorphic Inline Cache
jnthn yes, that 12:06
findmethod calls after spesh either are resolved statically or get a monomorphic cache
We use such a technique on JVM too. 12:07
timotimo ah, yes
nwc10 who is doing Star this month? 12:08
jnthn I'm pondering a similar inline cache for multi-dispatch.
timotimo hmm, spesh is not far-reaching enough to help any with the hash assignment problem, aye?
because it doesn't inline?
jnthn timotimo: I dunno if it can ever really be in general.
It's not just inlining, it's a lot of things.
timotimo mhm 12:09
it'd be a complicated pattern to match against
jnthn The mere presence of the closure prevents compile-time lex to loc of self and so forth
timotimo ah
jnthn You'd need to thus teach spesh to also do such things
timotimo anything more simple you could offload onto me for today? :) 12:10
oh, and what invocation did you use to figure out what exact bytecodes were called? moarvm --dump?
jnthn yeah but I'm actually reading the spesh log
cus it tells me the bytecode on hot paths for free :)
timotimo ah :) 12:11
well ... "hot" :)
jnthn more hot than not :) 12:11
timotimo since 0K is kind of unreachable ... yeah, you get that property for free :P 12:11
jnthn Did you get anywhere with the NQP opt stuff?
So we can lower and spesh the grammar rules?
timotimo no :| 12:13
timotimo i had absolutely no clue where to look for the cause of the problem 12:13
jnthn timotimo: ah, ok... 13:28
Guess I'll have to look at that then.
timotimo: adding a single-item candidate to sub push shoiuld be eas to do and test 13:29
multi sub push(@a, \thing) { @a.push(thing) } 13:30
unshift too
cognominal it seems that rakudo build twice as fast using spesh 15:16
benabik Woo! 15:18
cognominal just a feeling, I have no hard numbers. Setting parses in less of 1 min on my macbook. I think it was around 2 minutes 15:21
jnthn cognominal: Don't think it's that dramatic, but yeah, I get the whole setting built in less than a minute on this machine. 15:46
cognominal indeed, but it was two minutes not long ago, so I don't know what made the difference. 15:47
at least, mesh machinery did not slowed it down :) 15:49
dalek kudo/nom: 9eaf468 | jonathan++ | src/core/List.pm:
Greatly cheapen single-item push/unshift subs.
16:04
timotimo oh, i seem to have been under the impression that the methods were also pessimized for single element pushes/unshifts 16:24
jnthn timotimo: They are, but that's a harder fix, so I picked the easy thing first. :) 16:25
timotimo ah! 16:25
but the improvement stacks?
timotimo did you check if our accessor methods for the node classes and such get improved much by spesh or should i have a quick look? 16:26
JimmyZ Was accessor methods inlined yet? :) 16:27
timotimo we don't inline methods yet 16:27
jnthn timotimo: Yes, the improvements will stack 16:28
[Coke] jnthn++
timotimo jnthn: would you be okay with me implementing eqaddr on known values? 16:31
that opcode is the result of the "unless $value =:= NO_VALUE" things we have in the accessors
jnthn timotimo: Yeah, I'm totally fine with that, but I *think* we may be missing one thing for it to trigger 16:33
timotimo: But I totally forget what, so try it :)
timotimo hehe.
jnthn thinks he just made @a eqv @b about 3 times faster 16:35
timotimo \o/ 16:36
jnthn Mighta got several percent improvement on each .new too 16:38
for the defualt candidate anyway
dalek kudo/nom: 6b80fe2 | jonathan++ | src/core/Mu.pm:
Optimize new's delegation to make it cheaper.

Saves around 5% off Foo.new.
16:46
kudo/nom: 3e609e0 | jonathan++ | src/core/Mu.pm:
Make @a eqv @b around 3 times faster.
kudo/nom: 29bc5f6 | jonathan++ | src/core/Any.pm:
Use = in auto-viv, not &infix:<=>.

The latter forces a sub call; the former is compiled inline.
timotimo was &infix:<=> used to work around some kind of bug perhaps? 16:49
ribasushi sorry for the annoying joinflood earlier 16:50
fixed now
vendethiel yay ! 16:50
that's a lot
jnthn timotimo: Maybe, but it's a gone bug now 16:51
At least, so says spectest 16:52
timotimo very well
timotimo hmm. given how many times we call .new on things for basically everything, a 5% win there should translate to a very good win for pretty much everything we do 16:55
timotimo fwiw, my desktop is down to 38.7 seconds for stage parse on moar :) 16:58
1:32 for a whole make m-install after a clean 16:59
dalek kudo/nom: dd1f4fa | jonathan++ | src/core/Any.pm:
Add non-slurping candidates for various list subs.

Prevent an extra layer of wrapping, and way more likely to inline. For
  map on a 5-element list in a loop, around a 5% saving.
17:21
kudo/nom: ad962d0 | jonathan++ | src/core/ (3 files):
Optimize coercion to Order enum.

In the long run, it'd be good to emit better code for coercion to an enum type. For now, this hot-paths it in cmp and <=>, which are used in a whole range of operations. Knocked 15% off a sort benchmark.
kudo/nom: b37ef4f | jonathan++ | src/core/Any.pm:
Micro-opt on sort sub.
vendethiel Optimization time o/ 17:22
timotimo ah, i remember what the LHF was i wanted to put into the p6weekly 17:23
colomon jnthn++ 17:39
japhb jnthn: Why no non-slurping version of join? 17:41
yoleaux 2 Apr 2014 08:41Z <jnthn> japhb: yes, the panda p6doc thing is known to fail install on both JVM and MoarVM. It's not entirely clear why yet, but the failures are likely related.
jnthn japhb: optional $sep parameter cannot be followed by slurpy 17:42
japhb: Though, could find another way :)
The array/hash slicing code really, really needs an optimization pass now it's had a correctness one :) 17:45
japhb jnthn: I was thinking: multi join($sep, @values) 17:48
Yeah, I believe that!
timotimo recently did some informal performance measurements in that area and really has to concur 17:49
jnthn japhb: oh, yeah...duh :) 17:50
[Coke] jnthn++ again 17:51
jnthn japhb: Adding it
jnthn slept awfully last night, taught all day, and so isn't the sharpest two short planks in the jar tonight... :P
jnthn spectests a few more changes 17:52
japhb "sharpest two short planks in the jar" -- well that's ... evocative. :-) 17:54
timotimo benchmarks the recent rakudo optimizations
jnthn hehe 17:55
I should probably get some dinner :) 17:56
japhb timotimo: It occurs to me that the visit_2d_indices_* microbenchmarks should have both loops doing the sqrt of the usual SCALE, but then it further occurs to me that you'd want to then increase SCALE by 4x each run, rather than 2x, and further it might be good to generalize this so that you can specify a factoring of SCALE (e.g. SCALE_1 * SCALE_2 * SCALE_3) that will DTRT. 18:00
timotimo hmm
i'm still not quite getting why the "please do at least 3 runs in every case" code didn't work out 18:01
it's quite annoying to only see a single data point for the rakudos for example for the parse-json benchmark
japhb And on a different case, we might want to do e.g. push performance tests for 1, 8, 64 ... length things to push, to see how scaling and micro-optimizations work out for operations on collections of various sorts. 18:03
timotimo: Hmmm, I'd have to go spelunking again to figure out what went wrong there.
Maybe if I can get a little time later, that would be nice.
(nice to work on, I mean) 18:04
timotimo i would like that, thanks!
jnthn going for dinner...bbl 18:05
ajr_ jnthn appears to be going for a world record in metaphor-garbling; that's >= 3 in 7 words. :-)* 18:14
timotimo ajr_: so *that*'s why that idiom seemed foreign to me! 18:16
ajr_ It's at least a combination of "not the sharpest knife in the drawer/pencil in the jar" and "thick as two short planks", ("thick" = stupid) compounded with the meta message of confusion. 18:19
timotimo thanks for clearing that up :) 18:21
vendethiel .u 0x2297 19:17
yoleaux No characters found
vendethiel .u 2297
yoleaux U+2297 CIRCLED TIMES [Sm] (⊗)
timotimo ⊕_⊕ 19:25
t.h8.lv/p6bench/2014-04-08-rakudo_opt.html - the optimization of push is clearly visible, giving 2x speed in one of the two benchmarks. the other optimizations are not so visible it seems
man, our empty for loops have some significant catching up to do 19:26
jnthn timotimo: Well, the others came more from me looking through CORE.setting rather than analizing the benchmarks 19:30
*analyzing
dalek kudo/nom: 5d74bce | jonathan++ | src/core/Int.pm:
Further optimization of postfix ++ and --.
19:32
timotimo yeah, that's fine ;)
i had expected to see a good difference in the bigger benchmarks, though 19:33
forest fire in particular
vendethiel jnap: what are those # XXX for ?
lizmat has arrived in Zürich and is pleasantly surprised by jnthn++ avalanche of commits today 19:34
timotimo jnthn: do we have any clue why the empty for loop seems to have such massive overhead compared to perl5? 19:35
jnthn lizmat: Given I slept 3 hours last night, so am I ;)
timotimo: Yeah, at least somewhat. We don't flatten the block in. 19:36
timotimo: There is an opt for that, but it's level 3.
timotimo i run rakudo-moar with --optimize=3 in the benchmarks 19:38
jnthn ah
and yeah, it makes a small difference
So the cost is elsewhere
In Perl 5, iiuc, when you do: 19:39
for (my $i = 1; $i < 100000; $i++ { }
Then $i is allocated once, and you're fiddling wiht the same scalar
Whereas in Perl 6 that is a scalar that gets ++'d creating a new Int each time. 19:40
timotimo oh, i see
jnthn So if we want to get the same we need to work out that we're allowed to cheat :) 19:41
Not having to support binding *and* assignment turns out to be rather useful.
Perl 6 wants both so...we have "fun"
dalek ast: 6dd6ced | (David Warring [email@hidden.address] | integration/advent2013-day2 (2 files):
adding advent 2013 days 22 & 23
19:45
lizmat dinner& 19:46
pippo Hi perl6! 20:04
vendethiel o/
PerlJam hello pippo
pippo m: $b=0; my @a; for ^10_000 {@a[$_]=[0,1]}; say time; for ^10_000 {$b+=@a[$_][1]}; say time; 20:05
camelia rakudo-moar b37ef4: OUTPUT«===SORRY!=== Error while compiling /tmp/pViob_i21Z␤Variable '$b' is not declared␤at /tmp/pViob_i21Z:1␤------> $b⏏=0; my @a; for ^10_000 {@a[$_]=[0,1]}; s␤ expecting any of:␤ postfix␤»
pippo m: my $b=0; my @a; for ^10_000 {@a[$_]=[0,1]}; say time; for ^10_000 {$b+=@a[$_][1]}; say time; 20:06
camelia rakudo-moar b37ef4: OUTPUT«1396987562␤1396987562␤»
pippo m: my $b=0; my @a; for ^10_000 {@a[$_]="1,2".split(',')}; say time; for ^10_000 {$b+=@a[$_][1]}; say time;
camelia rakudo-moar b37ef4: OUTPUT«(timeout)1396987604␤»
pippo ^^anybody know why this takes so much time?? 20:07
vendethiel creating an array for 2 elems and splitting a string? 20:08
PerlJam pippo: at a guess, all the memory allocations
moritz one problem is that split compiles a regex, and iirc doesn't cache it 20:09
flussence m: my $b=0; my @a; for ^10_000 {@a[$_]=[0,1]}; say time; for ^10_000 {$b+=@a[$_][1]}; say time; say $b
vendethiel helped a friend fix a performance problem on a Gameoflife in ruby : moved [" ", "x"] outside of the loop (for display) and went from 3 fps to 60+fps
camelia rakudo-moar b37ef4: OUTPUT«1396987747␤1396987747␤10000␤»
flussence okay, so it's not just optimising that out...
flussence split has a non-regex special case though, so it's not that problem either. There's a lot of code in there though... 20:11
dalek rl6-roast-data: 0fd5db1 | coke++ | / (6 files):
today (automated commit)
pippo So when the first "say time" is executed @a is not yet constructed? 20:14
timotimo ... huh? 20:15
in which code exactly?
vendethiel pippo: trick =>
r: say time - BEGIN time
camelia rakudo-jvm b37ef4: OUTPUT«Unhandled exception: java.lang.RuntimeException: Missing or wrong version of dependency 'src/Perl6/Grammar.nqp'␤ in (gen/jvm/main.nqp)␤␤»
..rakudo-parrot b37ef4, rakudo-moar b37ef4: OUTPUT«0␤»
vendethiel what do you waaaaaaant from me, rakudo-jvm ! 20:16
pippo timotimo: my $b=0; my @a; for ^10_000 {@a[$_]="1,2".split(',')}; say time; for ^10_000 {$b+=@a[$_][1]}; say time;
vendethiel r: my int $i = 0; for ^100000 { $i += $u }; say time - BEGIN time;
camelia rakudo-parrot b37ef4, rakudo-moar b37ef4: OUTPUT«===SORRY!=== Error while compiling /tmp/tmpfile␤Variable '$u' is not declared␤at /tmp/tmpfile:1␤------> my int $i = 0; for ^100000 { $i += $u⏏ }; say time - BEGIN time;␤ expecting any …»
..rakudo-jvm b37ef4: OUTPUT«Unhandled exception: java.lang.RuntimeException: Missing or wrong version of dependency 'src/Perl6/Grammar.nqp'␤ in (gen/jvm/main.nqp)␤␤»
vendethiel r: my int $i = 0; for ^100000 { $i += $_ }; say time - BEGIN time; 20:17
camelia rakudo-moar b37ef4: OUTPUT«No such method 'STORE' for invocant of type 'Int'␤ in block at src/gen/m-CORE.setting:16846␤ in block at /tmp/tmpfile:1␤␤»
..rakudo-jvm b37ef4: OUTPUT«Unhandled exception: java.lang.RuntimeException: Missing or wrong version of dependency 'src/Perl6/Grammar.nqp'␤ in (gen/jvm/main.nqp)␤␤»
..rakudo-parrot b37ef4: OUTPUT«Cannot modify an immutable value␤ in block at gen/parrot/CORE.setting:17045␤ in block at /tmp/tmpfile:1␤␤»
vendethiel r: my int $i = 0; for ^100000 { $i = $i + $_ }; say time - BEGIN time; # forgot that ..
camelia rakudo-parrot b37ef4: OUTPUT«3␤»
..rakudo-jvm b37ef4: OUTPUT«Unhandled exception: java.lang.RuntimeException: Missing or wrong version of dependency 'src/Perl6/Grammar.nqp'␤ in (gen/jvm/main.nqp)␤␤»
..rakudo-moar b37ef4: OUTPUT«0␤»
[Coke] (if you don't need all of rakudo, just use p or m)
pippo vendethiel: I did not want to time all the proggy. Just the last for cycle part. :-) 20:18
vendethiel: I did not want to time all the proggy. Just the last "for" loop part. :-) 20:19
timotimo also, BEGIN is when the parser hits the code, CHECK is after all code has been compiled 20:22
so if you compare BEGIN now with CHECK now, you'll get the time between the parser hitting the BEGIN now and the parser finishing and stuff being compiled 20:23
pippo timotimo: any clue on why when I construct the array with split it takes an eternity? 20:24
flussence something is seriously broken there actually... I've been letting that line of code run for several minutes now. 20:25
pippo flussence: I also think so. 20:26
timotimo hum. 20:27
moritz pippo: try to split on rx/\,/ instead 20:31
pippo moritz: sorry. How? 20:32
timotimo i can get 100_000 times the split and put stuff into the array in 45 seconds
moritz pippo: .split(rx/\,/) instead of .split(',') 20:33
pippo timotimo: th block that takes time is not the array construction but array manipulation one i.e. "for ^10_000 {$b+=@a[$_][1]};" 20:34
timotimo ah, hm.
pippo moritz: trying
moritz: trying... 20:35
flussence having said that, the first loop takes some indefinite amount of time in perl6-p too... 20:36
(only 8s in moar)
jnthn so, I'm re-writing Str.split taking a string delim... 20:38
pippo moritz: it is immensly faster!! Thank you! How did you know? 20:40
flussence there's only two code paths to choose from there :) 20:41
moritz pippo: split compiles the separator into a regex internally 20:45
pippo: and that's slow
timotimo well, it's only slow because it does that every time anew :)
moritz aye 20:45
jnthn uh, the split I'm looking at doesn't... 20:46
moritz oh 20:47
then the regex version is much faster than the Cool version :/
and my information likely comes from an outdated version
PerlJam So ... how does this affect the *second* loop? 20:48
moritz does it? 20:49
somehow I read conflicting statements
PerlJam Assuming pippo's statement is accurate ... <pippo> timotimo: th block that takes time is not the array construction but array manipulation one
dalek ast: fe727fc | (David Warring [email@hidden.address] | integration/advent2013-day22.t:
typo
timotimo right. good question.
moritz r: say 'a,b'.split(',').^name 20:50
camelia rakudo-parrot 5d74bc, rakudo-jvm 5d74bc, rakudo-moar 5d74bc: OUTPUT«List␤»
moritz m: say 'a,b'.split(rx/\,/).^name
camelia rakudo-moar 5d74bc: OUTPUT«List␤»
pippo moritz: yes it does. On my machine the first loop is always done quickly (the first "say time" is executed and time is printed) The second one takes tooooooo long to appear. 20:54
moritz: with your suggestion my CSV manipulation program went from 20 min to 2 min to execute !! Yep! :-) 20:55
moritz: ty. 20:57
PerlJam Here's what I get on my computer: gist.github.com/perlpilot/10191112 20:58
sergot night night! o/ 20:59
PerlJam (that's Moar btw) 21:00
though perl6-j appears to exhibit the same behavior 21:02
pippo PerlJam: Nice! Also here on perl6-{j,m} 21:04
jnthn r: say "".split(':') 21:09
camelia rakudo-parrot 5d74bc, rakudo-jvm 5d74bc, rakudo-moar 5d74bc: OUTPUT«␤»
jnthn r: say "".split(':').perl
camelia rakudo-parrot 5d74bc, rakudo-jvm 5d74bc, rakudo-moar 5d74bc: OUTPUT«("",).list␤»
PerlJam pippo: btw, are you actually using Text::CSV? 21:13
pippo PerlJam: Nope. Is it fast? 21:13
PerlJam Dunno how fast it is comparatively. I just thought you should know about it if you didn't :) 21:14
pippo PerlJam: Thank you. I'll try it on my next proggy :-)) 21:15
mdiei hello from finland 21:22
hows monks doing
timotimo hm? 21:26
jnthn pippo, PerlJam: I've now got a version here where gist.github.com/perlpilot/10191112 runs faster with Version A than Version B. :) 21:30
spectesting at the moment 21:31
pippo jnthn: \o/ !!! 21:33
jnthn: jnthn++ 21:34
lizmat r: multi a (int $a) { say "signed $a" }; a(42) # works 21:39
camelia rakudo-parrot 5d74bc, rakudo-jvm 5d74bc, rakudo-moar 5d74bc: OUTPUT«signed 42␤»
lizmat r: multi a (uint $a) { say "unsigned $a" }; multi a (int $a) { say "signed $a" }; a(42) # expected to work as well, but doesn't :-(
camelia rakudo-parrot 5d74bc: OUTPUT«Cannot call 'a'; none of these signatures match:␤:(uint $a)␤:(int $a)␤ in any at gen/parrot/BOOTSTRAP.nqp:1219␤ in sub a at /tmp/tmpfile:1␤ in block at /tmp/tmpfile:1␤␤»
..rakudo-moar 5d74bc: OUTPUT«Cannot call 'a'; none of these signatures match:␤:(uint $a)␤:(int $a)␤ in sub a at /tmp/tmpfile:1␤ in block at /tmp/tmpfile:1␤␤»
..rakudo-jvm 5d74bc: OUTPUT«Cannot call 'a'; none of these signatures match:␤:(uint $a)␤:(int $a)␤ in any at gen/jvm/BOOTSTRAP.nqp:1212␤ in sub a at /tmp/tmpfile:1␤ in block at /tmp/tmpfile:1␤␤»
lizmat if we could MMD on uint, then we could make a candidate for [] that doesn't check the index 21:40
jnthn lizmat: You've broken its ability to compile-time dispatch it by introducing ambiguity.
jnthn lizmat: So it leaves it to runtime and calls it with the boxed Int instead. 21:41
And that doesn't match int/uint in a multi-dispatch.
Otherwise in int vs. Int we'd never reach the Int one.
lizmat hmmm...
m: multi a (uint $a) { say "unsigned $a" }; multi a (Int $a) { say "int $a" }; a(-42); a(42) 21:42
camelia rakudo-moar 5d74bc: OUTPUT«int -42␤unsigned 42␤»
dalek kudo/nom: da1ef6e | jonathan++ | src/core/Str.pm:
Optimize split on a literal string.

Use native str/int and nqp:: ops to get something of a speedup. Also, don't use an infinite range, since that makes evaluation of the map too lazy, causing other performance issues.
21:43
lizmat but uint / Int MMD works apparently
jnthn lizmat: Yeah, though...I worry it's by accident ;) 21:43
pippo jnthn: pulling...
jnthn pippo: Hopefully it helps a bit. 21:44
I'm still not too happy with it.
lizmat jnthn: I could try and run a spectest
jnthn But should be an improvement.
lizmat: spectest on...?
pippo jnthn: I'll test it immeditly and let you how it is here :-)
lizmat creating a uint / Int candidates for []
jnthn crazedpsyc: We already have an int candidate afaik 21:45
oops
lizmat: ^^
how on earth did I end up with a c instead of an l...
lizmat well, if we change the int candidate to a uint, then the negative indexes would be caught by the generic Num case
and bomb there, while the simple [0] cases would not need to check whether the index is < 0 21:46
jnthn They...won't.
lizmat ?? 21:47
jnthn int and uint are currently treated identically
uint isn't really implemented in general; it's only really native arrays that know what to do wiht it.
There's no sense in which uint in a signature is doing any kind of checking. 21:48
lizmat m: multi a (uint $a) { say "unsigned $a" }; multi a (Int $a) { say "int $a" }; a(-42); a(42) # then why does this work ?
camelia rakudo-moar 5d74bc: OUTPUT«int -42␤unsigned 42␤»
jnthn You so don't want to know. :)
42 as a literal is alomorphic 21:49
The - defeats that and leaves us with an Int, so far as the optimizer is concerned.
At present, the only way you ever reach a native multi candidate is if the dispatch is resolved at compile time.
Worse, it's done in the optimizer. 21:50
lizmat but, it the negative indexes are caught at run time
pippo jnthn: lightning fast! Thank you!! :-))
jnthn Hm, that may not actually be true...
m: multi a (uint $a) { say "oops" }; my int $x = -5; a($x) 21:51
camelia rakudo-moar 5d74bc: OUTPUT«oops␤»
lizmat m: multi a (uint $a) { say "oops" }; multi a (Int $a) { say "whoopie" }; my int $x = -5; a($x) 21:52
camelia rakudo-moar 5d74bc: OUTPUT«oops␤»
lizmat m: multi a (uint $a) { say "oops" }; multi a (Int $a) { say "whoopie" }; my int $x = -5; a($x); a(-5) 21:52
camelia rakudo-moar 5d74bc: OUTPUT«oops␤whoopie␤»
jnthn It really, really, doesn't know about this. It knows enough that it can get $a + $b to the (int, int) candidate if $a and $b are declared as int.
lizmat ack, got you now 21:53
so for now, we still need the <0 check in the int candidate, because of the [$x] case
jnthn The whole area is really...icky. Especially as we probably shoudln't be sending $a + $b to the (int, int) candidate
unless a pragma is in force 21:54
Generally, we do about enough that native ints are worth using if you're careful.
And can be a notable speedup. 21:55
Same with num.
But it's sure as heck not polished.
lizmat ack, got ya 21:56
jnthn The thing that really needs opt in that area is basic array and hash assignment, though. 21:57
lizmat Files=801, Tests=31033, 189 wallclock secs ( 8.18 usr 3.58 sys + 1257.33 cusr 90.99 csys = 1360.08 CPU) 21:58
that is significantly down from 200+ wallclock yesterday! 21:59
jnthn \o/
lizmat that's at least a 5% improvement!
wow!
jnthn Yeah, it's faster on my laptop too :)
Down from 464 to 443
Poor thing only has 2 cores.
Well, 2 physical, 4 virtual.
Anyway, we almost got you a 3 minute spectest :) 22:01
lizmat yup
what's even better: running a spectest on parrot in the day, would cost me 20% of my battery
now it's down to something like 5% 22:02
jnthn :)
How fast is the core setting build for you, ooc?
lizmat 1:10 last I checked 22:04
jnthn oh
51.67s on my laptop for the lot 22:05
lizmat hmmm....
jnthn 81.59s for full Rakudo build.
pippo good night perl6! 22:07
jnthn oh, but I was running with spesh
pippo exit
timotimo at some point i'm hopeful we'll be able to propagate knowledge about integers somewhat deep into the innards of ... stuff 22:08
so that perhaps spesh or jit will be able to remove checks like "is the index < 0 here?"
i think that needs either inlining or more facts known at the callsite
jnthn Just did it without. 53.90s for CORE.setting and 84.63s for the whole build.
lizmat real1m40.408s 22:09
user1m38.560s
sys0m1.563s
jnthn r: say 51.67 / 53.90
lizmat for the whole build
camelia rakudo-parrot 5d74bc, rakudo-jvm 5d74bc, rakudo-moar 5d74bc: OUTPUT«0.958627␤»
jnthn r: say 81.59 / 84.63
camelia rakudo-parrot 5d74bc, rakudo-jvm 5d74bc, rakudo-moar 5d74bc: OUTPUT«0.964079␤»
lizmat I guess that jnthn has fewer but faster CPU's
jnthn i7 :)
Anyway, seems spesh currently wins about 4%-5% saving. Not bad given it basically can't analyze too deeply into regexes yet. 22:10
lizmat 2.8GHz i7 for me
jnthn Or deal with named args which show up all over. 22:11
timotimo do you already have an idea what the named args are going to require for us to handle them?
jnthn yeah
We need to get the names made part of the callsite. 22:12
It's mildly tricky.
But not awfully bad.
lizmat: 2.9 :) 22:13
timotimo will that just be a MVMString **?
lizmat that explains then :-)
jnthn timotimo: After bytecode loading, I guess yes
timotimo: In the mbc file I suspect they are just stored as string heap indexes. 22:14
timotimo bytecode loading? i seem to be missing something
oh, of course, the callsites are in the mbc file
jnthn timotimo: Well, callsites are one segment of the mbc file, which is all handled in bytecode.c.
lizmat also, I'm running on battery now, so probably not getting overclocked
jnthn There is one other nice consequence of this refactor, btw.
Right now if you call, say, foo(bar => baz)
Then it's a
prepargs [cs idx] 22:15
argconst_s 0, "foo"
jnthn arg_o 1, rX 22:15
invoke_o # or whatever
jnthn And one of those instructions can go away afer this :) 22:15
timotimo mhm, but that's still at least an index into the literals heap?
jnthn Well, we resolve the index at load time... 22:16
In the spesh case, though, we know for a given callsite what arg buffer location holds a given name.
So we can just use the unsafe sp_getarg_o
Which once we can JIT will probably end up being a few instructions... 22:17
timotimo which part am i going to optimize right now? the caller side or the callee side? 22:18
timotimo well, maybe not "going to", but "supposed to" :P 22:19
jnthn Well, it's the callee that's specialized 22:20
But the caller can have an instruction less per named arg it'll pass too after this.
timotimo oh, so we have a non-specializer-related opt, which is moving the argconst_s from the bytecode into the callsite storage 22:21
and after that, the specializer can continue specializing even if it sees named arguments, because it knows about the named parameters from the callsite 22:22
jnthn right
timotimo i'll have a further look into the code before i decide whether or not i'll take that off of your plate :) 22:23
timotimo do you have a comment on my uthash padding code? i'm not very confident in it, but i've patched all usages of HASH_ functions to decide whether or not padding is needed 22:23
unfortunately, it now crashes and burns almost immediately
jnthn timotimo: It's hard to say at a casual glance... 22:25
timotimo: I'm a bit surprised to see modulo show up in there though. 22:26
timotimo well, it's three bytes of 0 and then one with data
but we have blocks that go up to 12 bytes
jnthn ah
hm
timotimo so i could either "if index == 3 || index == 7 || index == 11"
or use modulo
jnthn yeah, I'll need to look more closely. 22:27
timotimo i went for the shorter code forn ow 22:27
jnthn I'm really uncofortable how the change leaks out in github.com/MoarVM/MoarVM/commit/846d59e8b2 too 22:28
timotimo yes.
so am i
didn't have a better idea yet
jnthn time for me to try and sleep...hopefully more than last night 22:32
o/
timotimo gnite and good luck!
oh, huh
if an arg is named, there's actually two args in the callsite and the first is the name and the second is the value, eh? 22:33
oh, no, it's not "in the callsite", it's passed along
i think i get it
timotimo i seem to be somewhat tired as well 22:49
maybe i'll end up getting a decent sleep rhythm again if i get some sleep now? 22:50
lizmat as am I
yes, good rhythm is good
so good night, #perl6!
timotimo good rhythm lhyzmhat! :)
lizmat and you, timotimo timotimo timotimo :-) 22:51
timotimo i have found code that deserializes callsites from the bytecode file (apparently) and i've found a place in the byetcode verifier where it expects a named arg to be followed by its parameter 23:11
so these things i'll have to change. but i'm not seeing the code that writes out the callsites