timotimo couldn't fall asleep just yet, tried to hack up a little prototype 00:17
this one gets pretty unhappy while compiling nqp.
STable conflict detected during deserialization. ← code-gen bugs always lead to funny results >_< 00:19
i'm seeing some code where a wval + decont gets turned into a wval targeting the register the decont used to target 00:28
it *seems* correct
dalek arVM/spesh_remove_set_op: be182ce | (Timo Paulssen)++ | src/spesh/facts. (2 files):
if an op writes to its first arg, remember it as the assigner.
00:35
arVM/spesh_remove_set_op: db5e8e4 | (Timo Paulssen)++ | src/spesh/optimize.c:
sometimes deconts can turn into nothing instead of a set.
timotimo here's some code for you to try or look at tomorrow; i'll try to get some sleep now. again.
01:18 FROGGS_ joined 02:08 jnap joined 02:10 jnap1 joined 02:26 donaldh joined 03:11 jnap joined 03:12 jnap1 joined 03:13 jnap joined 03:23 jnap joined 06:24 brrt joined 06:55 FROGGS joined 07:14 zakharyas joined
brrt gotta love intel x86 07:15
64 bit operations generate a 64 bit result into the register 07:16
32 bit operatiosn generate a 32 bit result that is zero extended to 64 bits
however
16 and 8 bit operations generate a 16 / 8 bit result - and do not zero-extend it (i.e. the register is not overwritten) 07:17
FROGGS :/ 07:18
brrt so the following sequence: mov64 rax, 0xffff0000 ; mov8 rax, 0xff gives you 0xffff00ff
FROGGS I guess the had to keep the existing behaviour for 8/16 bit operations?
brrt i suppose 07:19
no, i rather suppose that back in the 80s when they designed 386, they were just lazy
FROGGS that's why it is called back*warts*
brrt or perhaps cheap with transistors
FROGGS I bet on cheap :o) 07:20
07:27 domidumont joined 07:41 brrt left
tadzik chaep 07:50
yesterday I found that since 286 you can't do i << 32 07:51
because they only have 5 bits for the shift amount
timotimo m)
tadzik because performance, they say
but it's even sillier now that we have 64 bits
FROGGS tadzik: I guess we need that to be able to run older programs 08:00
jnthn Wow...I must have been a lot more exhausted than I realized... 10:04
nwc10 good no-longer morning?
jnthn aye :) 10:06
FROGGS o/ 10:26
jnthn Well, I worked 7 hours on Sunday, and felt I might be about to get ill the last 3 days, so probably no bad thing that I got some good rest. 10:28
timotimo oh, yeah 10:30
resting for $longtime is much better than being sick for $evenlonger
jnthn aye 10:33
10:51 lee__ joined 11:02 bonsaikitten joined 11:54 tgt joined 12:18 flussence joined, _sri joined 12:55 LLamaRider joined
jnthn OK, time for some more spesh hacking :) 13:05
Todays hacking music: new albums from Insomnium and Epica \o/ 13:08
Hmm...deopt of guards is actually tricker than I realized... 13:41
hm, but mebbe this cute hack works... 13:54
JimmyZ :-) 13:55
13:55 btyler joined 13:56 jnap joined
jnthn bother, segv 13:56
masak .oO( a SEGV is C's way to say "I love you, flaws and all" ) :P 14:00
lizmat
.oO( I thought it indicated a lack of DWIM? )
14:01
jnthn oh, duh...
14:25 donaldh joined
jnthn wtf 14:36
jnthn is seeing something really insane happening under the C debugger...
It appears to jump from the goto NEXT; into the body of the next op?! 14:37
14:37 brrt joined
jnthn ah, not in a debug build, though. 14:38
nwc10 so, something uninitialised? Because under the debug build the C compiler sets everything to 0? 14:39
14:41 jnap joined
jnthn nwc10: The behavior externally is the same udner both. I think it's just some mess-up in writing the debug locations file... 14:42
nwc10: I think it's done a code-size opt and then written a debug file that reveals what it did.
Just looked every confusing.
dalek arVM/spesh_trace: 3810ab3 | jnthn++ | src/spesh/dump.c:
Fix dumping of deopt annotations.
15:10
arVM/spesh_trace: 1d69712 | jnthn++ | src/spesh/facts.c:
Read correct log entries.
15:23 brrt left 15:27 vendethiel joined
dalek arVM/spesh_trace: e7ac215 | jnthn++ | / (4 files):
Break out different kinds of de-opt point.

Can have "current frame only" for failed guards, and "all" for when mixins happen that may invalidate many assumptions.
15:53
arVM/spesh_trace: 905d410 | jnthn++ | src/spesh/ (4 files):
Record deopt annotations per deopt kind.
arVM/spesh_trace: 5de47f6 | jnthn++ | src/spesh/log.c:
Migrate deopt-1 annotations to sp_logs.

The log will be replaced with the guard, which is what will trigger deopt.
16:34
arVM/spesh_trace: 0c4324b | jnthn++ | src/ (3 files):
Implement all vs. one deopt.

Also fixes a bug in looping over the deopt table, so we don't fail to de-optimize half the time.
16:36 domidumont joined 16:38 FROGGS joined 16:40 domidumont joined
dalek arVM/spesh_trace: d1b551f | jnthn++ | src/spesh/manipulate.c:
Fix moving of handler annotations.

We accidentally dropped some of them, leading code-gen to fail.
16:51
arVM/spesh_trace: 6aa4a3a | jnthn++ | src/spesh/facts.c:
Don't log -> guard things we needn't guard.
17:22 zakharyas joined
dalek arVM/spesh_trace: 3ff9edd | jnthn++ | src/spesh/optimize.c:
Delete any leftover log instructions in optimize.
17:23
18:19 timo joined
jnthn With latest work, spesh seems to give a 10% win on "my @a; my $i = 0; while (++$i <= 1000000) { @a[ $i ] = $i }" 18:22
FROGGS ohh! 18:26
jnthn+++
nwc10 nice 18:28
timotimo oh, neat. where does it come from?
timotimo fell off the 'net for a bit :(
ah, i saw it in the clogs 18:30
jnthn timotimo: I suspect at the moment avoiding method dispatches by resolving them, and avoiding type checks
timotimo did you have a chance to look at my not quite working set-removal-thingie? 18:31
jnthn Not yet, but will after dinner
timotimo cool :)
jnthn Figured I'll take the above benchmark as an example and work through getting spesh to do nice things for it. 18:32
btyler jnthn: perhaps my build is borked, but when I switch to spesh_trace and Configure.pl/make install, perl6 -e 'say "hi"' dies with a moar error: 'Unknown flag --execname=(path to perl6 bin)'
switching back to master and Configuring/building everything is a-ok
FROGGS execname is new 18:33
jnthn btyler: spesh_trace is missing patches from master
btyler oh ok, gotcha
I'll wait then :)
jnthn It may also explode in the NQP build. It's a branch for a reason ;)
btyler for sure, I just figured I'd take a poke and see if the win was visible here as well
jnthn :) 18:34
lee__ mine does explode when building nqp/rakudo if i pass --optimize=3 to spesh_trace 18:35
works fine if i don't, though
(sorry if that is annoying, just excited about fast perl6 :) 18:38
timotimo is that the optimize flag for rakudo or for configure.pl? 19:06
lee__ for Configure.pl on MoarVM 19:27
timotimo ah 19:37
i've never been able to use more than --optimize=1 19:38
without b0rking the build during nqp
nwc10 jnthn: seems that >99% of the values loaded by GET_I64() would fit in 16 bits 19:45
jnthn nwc10: Yeah; I'd figured that'd be an easy opt some day :)
nwc10 and 0% are >32 bit 19:46
jnthn Good to know. It'll be worthwhile then :)
nwc10 also, there are about 3 opcodes where the value read is I64, but it's immediately cast to 32 bits
for those ones, why is it even 64 bits in the opcode stream?
for the others, do they need to keep 64 bit variants? 19:47
jnthn Which ops, ooc?
nwc10 I was sort of assuming that the ops get 2 more variants, with 16 and 32
throwcatdyn, throwcatlex, throwcatlexotic
jnthn oh, hm 19:48
nwc10 no urgency
jnthn aye
Those are not common ops
iconst_64 is, though
nwc10 my assumption was (not checked this yet) that the bytecode generating code in C would substitute the smallest opcode variant that could hold the value 19:49
rather than have the MAST generator do it
or is that conceptually wrong?
jnthn I think the having the bytecode gen do it sounds sane enough 19:50
nwc10 argconst_i seems to be unused currently 19:51
19:59 domidumont joined 20:28 domidumont joined
jnthn aha... 21:27
Turns out we really wanted a fixed point analysis on killing dead instructions.
Relying on revserse domaince tree ordering to catch enough didn't catch enough. :) 21:28
dalek arVM/spesh_trace: f72aefb | jnthn++ | src/core/interp.c:
Fix guard clause checks.
21:34
arVM/spesh_trace: f4d2c85 | jnthn++ | src/spesh/dump.c:
Basic dumping of facts.
arVM/spesh_trace: 755fb71 | jnthn++ | src/spesh/facts.c:
Use guard ops involving container checks.
jnthn The local patch I have to do that produces much better code. Sadly, it also seems to lose handlers... 21:37
FROGGS /o\ 21:42
do we need 'em?
jnthn yes.
Otherwise we dont' know what region exception handlers cover in the specialized bytecode. 21:43
FROGGS bah, optimize it out :o)
jnthn oh dammit
Typically the one it chokes on has a HUGE graph
(EXPR)
FROGGS hehe, I tend to keep a distance between EXPR and me :o) 21:44
jnthn It has 127 basic blocks 21:46
oh... 22:03
dalek arVM/spesh_trace: 21689f4 | jnthn++ | src/spesh/codegen.c:
Cope with annotations being moved to a phi.

Before, the code-gen ignored them completely, which meant we could end up losing them. This resolves that issue. While annotations may never start on a phi, they may be moved there by other optimizations.
22:09
timotimo yay things 22:14
jnthn: got a clue what'd help you figure out big graphs like that? 22:16
the pypy people have a block explorer thingie built on top of graphviz and/or sdl i think
jnthn timotimo: Well, the problem here wasn't visualizing it really 22:18
timotimo fair enough
jnthn timotimo: It was more finding the end handlers 22:19
And seeing what on earth was going on.
timotimo OK
jnthn now we....manage to get a stack overflow. d'oh.
timotimo i didn't know we even had those :) 22:20
jnthn A C-level one! o.O
timotimo well, that ought to be an infinite recursion
couldn't explain it any other way
jnthn yeah, it is 22:21
but wtf :)
oh...
oh, I can guess... 22:23
yeah, got it... 22:30
timotimo yey
dalek arVM/spesh_trace: c1065ae | jnthn++ | src/spesh/facts.c:
Make sure we don't clobber block handler setting.

Block exception handlers live in a register. We must be careful not to clobber the set instruction, assuming its result is unused. This is handled by bumping the usage count on such registers in the facts.
22:35
jnthn Unfortunately, various things in NQP/Rakudo build blow up on spesh_trace 22:39
Missing block at line 425, near "-> @values"
timotimo oh, meh
jnthn I thought at first it was due to my improved instruction death 22:40
But it's not; it's something else I've done earlier, I guess.
timotimo mhh
if you need a break from hard stuff, try having a look at my branch? :P
dalek arVM/spesh_trace: f38c767 | jnthn++ | src/spesh/optimize.c:
Typo fix.
22:48
arVM/spesh_trace: 0d353ce | jnthn++ | src/spesh/optimize.c:
Iterate to fixed point when finding unused instrs.

Before we just did a backwards propagation as we went back up the dominator tree. But that didn't catch enough. Switch to just keeping iterating doing removals until we're done. May be more optimal ways to walk BBs in order to speed up convergence. Either way, the results with doing the analysis this way are better.
timotimo jnthn: can that patch easily be cherry-picked to master? 23:03
jnthn timotimo: I'd rather not, given the branch is intended to be merged to master 23:05
And I'm aiming to do that at the weekend. 23:06
So it's not long to wait.
timotimo all right :)
how much better are we talking here?
jnthn Well, not sure yet 23:07
I haven't actually done the big opt that this was all in aid of yet. 23:08
(Which is using spesh type knowledge to resolve multi-dispatches.)
timotimo oooooh 23:09
jnthn Which of course is a pre-req for any useful inlining in Perl 6 code, but that'll be a separate, later thing.
timotimo righto
how much cheaper will that be than hitting the caches
?
jnthn Well, it means you just grab the invokee out of a spesh slot and invoke it 23:10
So, cheap
timotimo we really ought to profile some time :\
jnthn Well, that's partly what led me to look into this. I could see that we were spending a lot of time in the multi-dispatch cache in some of the benchmarks. 23:11
We're spending even more on invocation, though, which inlining can help with.
timotimo ah. well, that does sound good then :) 23:14
continue :)
jnthn My overall path here is spesh time dispatch resolution in 2014.05, inlining in 2014.06, escape analysis in 2014.07, and on-stack-replacement by 2014.08 (though if it's easy it may happen sooner) 23:16
And to give a talk on performance stuff at YAPC::EU :)
Of course, I need to co-ordinate this with brrt so it fits well with the JIT work. 23:17
I'll also this weekend put an opt into Rakudo that tries to avoid creating the %_ if we can see it's not used. 23:18
Then just let spesh take out the creation of the low-level hash. 23:19
timotimo righto
jnthn Anyway, I need some rest :) 23:20
'night
timotimo gnite jnthn! 23:21
23:31 timo joined 23:32 timo1 joined 23:35 timo1 joined