timotimo | couldn't fall asleep just yet, tried to hack up a little prototype | 00:17 | |
this one gets pretty unhappy while compiling nqp. | |||
STable conflict detected during deserialization. ← code-gen bugs always lead to funny results >_< | 00:19 | ||
i'm seeing some code where a wval + decont gets turned into a wval targeting the register the decont used to target | 00:28 | ||
it *seems* correct | |||
dalek | arVM/spesh_remove_set_op: be182ce | (Timo Paulssen)++ | src/spesh/facts. (2 files): if an op writes to its first arg, remember it as the assigner. |
00:35 | |
arVM/spesh_remove_set_op: db5e8e4 | (Timo Paulssen)++ | src/spesh/optimize.c: sometimes deconts can turn into nothing instead of a set. |
|||
timotimo | here's some code for you to try or look at tomorrow; i'll try to get some sleep now. again. | ||
01:18
FROGGS_ joined
02:08
jnap joined
02:10
jnap1 joined
02:26
donaldh joined
03:11
jnap joined
03:12
jnap1 joined
03:13
jnap joined
03:23
jnap joined
06:24
brrt joined
06:55
FROGGS joined
07:14
zakharyas joined
|
|||
brrt | gotta love intel x86 | 07:15 | |
64 bit operations generate a 64 bit result into the register | 07:16 | ||
32 bit operatiosn generate a 32 bit result that is zero extended to 64 bits | |||
however | |||
16 and 8 bit operations generate a 16 / 8 bit result - and do not zero-extend it (i.e. the register is not overwritten) | 07:17 | ||
FROGGS | :/ | 07:18 | |
brrt | so the following sequence: mov64 rax, 0xffff0000 ; mov8 rax, 0xff gives you 0xffff00ff | ||
FROGGS | I guess the had to keep the existing behaviour for 8/16 bit operations? | ||
brrt | i suppose | 07:19 | |
no, i rather suppose that back in the 80s when they designed 386, they were just lazy | |||
FROGGS | that's why it is called back*warts* | ||
brrt | or perhaps cheap with transistors | ||
FROGGS | I bet on cheap :o) | 07:20 | |
07:27
domidumont joined
07:41
brrt left
|
|||
tadzik | chaep | 07:50 | |
yesterday I found that since 286 you can't do i << 32 | 07:51 | ||
because they only have 5 bits for the shift amount | |||
timotimo | m) | ||
tadzik | because performance, they say | ||
but it's even sillier now that we have 64 bits | |||
FROGGS | tadzik: I guess we need that to be able to run older programs | 08:00 | |
jnthn | Wow...I must have been a lot more exhausted than I realized... | 10:04 | |
nwc10 | good no-longer morning? | ||
jnthn | aye :) | 10:06 | |
FROGGS | o/ | 10:26 | |
jnthn | Well, I worked 7 hours on Sunday, and felt I might be about to get ill the last 3 days, so probably no bad thing that I got some good rest. | 10:28 | |
timotimo | oh, yeah | 10:30 | |
resting for $longtime is much better than being sick for $evenlonger | |||
jnthn | aye | 10:33 | |
10:51
lee__ joined
11:02
bonsaikitten joined
11:54
tgt joined
12:18
flussence joined,
_sri joined
12:55
LLamaRider joined
|
|||
jnthn | OK, time for some more spesh hacking :) | 13:05 | |
Todays hacking music: new albums from Insomnium and Epica \o/ | 13:08 | ||
Hmm...deopt of guards is actually tricker than I realized... | 13:41 | ||
hm, but mebbe this cute hack works... | 13:54 | ||
JimmyZ | :-) | 13:55 | |
13:55
btyler joined
13:56
jnap joined
|
|||
jnthn | bother, segv | 13:56 | |
masak .oO( a SEGV is C's way to say "I love you, flaws and all" ) :P | 14:00 | ||
lizmat | .oO( I thought it indicated a lack of DWIM? ) |
14:01 | |
jnthn | oh, duh... | ||
14:25
donaldh joined
|
|||
jnthn | wtf | 14:36 | |
jnthn is seeing something really insane happening under the C debugger... | |||
It appears to jump from the goto NEXT; into the body of the next op?! | 14:37 | ||
14:37
brrt joined
|
|||
jnthn | ah, not in a debug build, though. | 14:38 | |
nwc10 | so, something uninitialised? Because under the debug build the C compiler sets everything to 0? | 14:39 | |
14:41
jnap joined
|
|||
jnthn | nwc10: The behavior externally is the same udner both. I think it's just some mess-up in writing the debug locations file... | 14:42 | |
nwc10: I think it's done a code-size opt and then written a debug file that reveals what it did. | |||
Just looked every confusing. | |||
dalek | arVM/spesh_trace: 3810ab3 | jnthn++ | src/spesh/dump.c: Fix dumping of deopt annotations. |
15:10 | |
arVM/spesh_trace: 1d69712 | jnthn++ | src/spesh/facts.c: Read correct log entries. |
|||
15:23
brrt left
15:27
vendethiel joined
|
|||
dalek | arVM/spesh_trace: e7ac215 | jnthn++ | / (4 files): Break out different kinds of de-opt point. Can have "current frame only" for failed guards, and "all" for when mixins happen that may invalidate many assumptions. |
15:53 | |
arVM/spesh_trace: 905d410 | jnthn++ | src/spesh/ (4 files): Record deopt annotations per deopt kind. |
|||
arVM/spesh_trace: 5de47f6 | jnthn++ | src/spesh/log.c: Migrate deopt-1 annotations to sp_logs. The log will be replaced with the guard, which is what will trigger deopt. |
16:34 | ||
arVM/spesh_trace: 0c4324b | jnthn++ | src/ (3 files): Implement all vs. one deopt. Also fixes a bug in looping over the deopt table, so we don't fail to de-optimize half the time. |
|||
16:36
domidumont joined
16:38
FROGGS joined
16:40
domidumont joined
|
|||
dalek | arVM/spesh_trace: d1b551f | jnthn++ | src/spesh/manipulate.c: Fix moving of handler annotations. We accidentally dropped some of them, leading code-gen to fail. |
16:51 | |
arVM/spesh_trace: 6aa4a3a | jnthn++ | src/spesh/facts.c: Don't log -> guard things we needn't guard. |
|||
17:22
zakharyas joined
|
|||
dalek | arVM/spesh_trace: 3ff9edd | jnthn++ | src/spesh/optimize.c: Delete any leftover log instructions in optimize. |
17:23 | |
18:19
timo joined
|
|||
jnthn | With latest work, spesh seems to give a 10% win on "my @a; my $i = 0; while (++$i <= 1000000) { @a[ $i ] = $i }" | 18:22 | |
FROGGS | ohh! | 18:26 | |
jnthn+++ | |||
nwc10 | nice | 18:28 | |
timotimo | oh, neat. where does it come from? | ||
timotimo fell off the 'net for a bit :( | |||
ah, i saw it in the clogs | 18:30 | ||
jnthn | timotimo: I suspect at the moment avoiding method dispatches by resolving them, and avoiding type checks | ||
timotimo | did you have a chance to look at my not quite working set-removal-thingie? | 18:31 | |
jnthn | Not yet, but will after dinner | ||
timotimo | cool :) | ||
jnthn | Figured I'll take the above benchmark as an example and work through getting spesh to do nice things for it. | 18:32 | |
btyler | jnthn: perhaps my build is borked, but when I switch to spesh_trace and Configure.pl/make install, perl6 -e 'say "hi"' dies with a moar error: 'Unknown flag --execname=(path to perl6 bin)' | ||
switching back to master and Configuring/building everything is a-ok | |||
FROGGS | execname is new | 18:33 | |
jnthn | btyler: spesh_trace is missing patches from master | ||
btyler | oh ok, gotcha | ||
I'll wait then :) | |||
jnthn | It may also explode in the NQP build. It's a branch for a reason ;) | ||
btyler | for sure, I just figured I'd take a poke and see if the win was visible here as well | ||
jnthn | :) | 18:34 | |
lee__ | mine does explode when building nqp/rakudo if i pass --optimize=3 to spesh_trace | 18:35 | |
works fine if i don't, though | |||
(sorry if that is annoying, just excited about fast perl6 :) | 18:38 | ||
timotimo | is that the optimize flag for rakudo or for configure.pl? | 19:06 | |
lee__ | for Configure.pl on MoarVM | 19:27 | |
timotimo | ah | 19:37 | |
i've never been able to use more than --optimize=1 | 19:38 | ||
without b0rking the build during nqp | |||
nwc10 | jnthn: seems that >99% of the values loaded by GET_I64() would fit in 16 bits | 19:45 | |
jnthn | nwc10: Yeah; I'd figured that'd be an easy opt some day :) | ||
nwc10 | and 0% are >32 bit | 19:46 | |
jnthn | Good to know. It'll be worthwhile then :) | ||
nwc10 | also, there are about 3 opcodes where the value read is I64, but it's immediately cast to 32 bits | ||
for those ones, why is it even 64 bits in the opcode stream? | |||
for the others, do they need to keep 64 bit variants? | 19:47 | ||
jnthn | Which ops, ooc? | ||
nwc10 | I was sort of assuming that the ops get 2 more variants, with 16 and 32 | ||
throwcatdyn, throwcatlex, throwcatlexotic | |||
jnthn | oh, hm | 19:48 | |
nwc10 | no urgency | ||
jnthn | aye | ||
Those are not common ops | |||
iconst_64 is, though | |||
nwc10 | my assumption was (not checked this yet) that the bytecode generating code in C would substitute the smallest opcode variant that could hold the value | 19:49 | |
rather than have the MAST generator do it | |||
or is that conceptually wrong? | |||
jnthn | I think the having the bytecode gen do it sounds sane enough | 19:50 | |
nwc10 | argconst_i seems to be unused currently | 19:51 | |
19:59
domidumont joined
20:28
domidumont joined
|
|||
jnthn | aha... | 21:27 | |
Turns out we really wanted a fixed point analysis on killing dead instructions. | |||
Relying on revserse domaince tree ordering to catch enough didn't catch enough. :) | 21:28 | ||
dalek | arVM/spesh_trace: f72aefb | jnthn++ | src/core/interp.c: Fix guard clause checks. |
21:34 | |
arVM/spesh_trace: f4d2c85 | jnthn++ | src/spesh/dump.c: Basic dumping of facts. |
|||
arVM/spesh_trace: 755fb71 | jnthn++ | src/spesh/facts.c: Use guard ops involving container checks. |
|||
jnthn | The local patch I have to do that produces much better code. Sadly, it also seems to lose handlers... | 21:37 | |
FROGGS | /o\ | 21:42 | |
do we need 'em? | |||
jnthn | yes. | ||
Otherwise we dont' know what region exception handlers cover in the specialized bytecode. | 21:43 | ||
FROGGS | bah, optimize it out :o) | ||
jnthn | oh dammit | ||
Typically the one it chokes on has a HUGE graph | |||
(EXPR) | |||
FROGGS | hehe, I tend to keep a distance between EXPR and me :o) | 21:44 | |
jnthn | It has 127 basic blocks | 21:46 | |
oh... | 22:03 | ||
dalek | arVM/spesh_trace: 21689f4 | jnthn++ | src/spesh/codegen.c: Cope with annotations being moved to a phi. Before, the code-gen ignored them completely, which meant we could end up losing them. This resolves that issue. While annotations may never start on a phi, they may be moved there by other optimizations. |
22:09 | |
timotimo | yay things | 22:14 | |
jnthn: got a clue what'd help you figure out big graphs like that? | 22:16 | ||
the pypy people have a block explorer thingie built on top of graphviz and/or sdl i think | |||
jnthn | timotimo: Well, the problem here wasn't visualizing it really | 22:18 | |
timotimo | fair enough | ||
jnthn | timotimo: It was more finding the end handlers | 22:19 | |
And seeing what on earth was going on. | |||
timotimo | OK | ||
jnthn | now we....manage to get a stack overflow. d'oh. | ||
timotimo | i didn't know we even had those :) | 22:20 | |
jnthn | A C-level one! o.O | ||
timotimo | well, that ought to be an infinite recursion | ||
couldn't explain it any other way | |||
jnthn | yeah, it is | 22:21 | |
but wtf :) | |||
oh... | |||
oh, I can guess... | 22:23 | ||
yeah, got it... | 22:30 | ||
timotimo | yey | ||
dalek | arVM/spesh_trace: c1065ae | jnthn++ | src/spesh/facts.c: Make sure we don't clobber block handler setting. Block exception handlers live in a register. We must be careful not to clobber the set instruction, assuming its result is unused. This is handled by bumping the usage count on such registers in the facts. |
22:35 | |
jnthn | Unfortunately, various things in NQP/Rakudo build blow up on spesh_trace | 22:39 | |
Missing block at line 425, near "-> @values" | |||
timotimo | oh, meh | ||
jnthn | I thought at first it was due to my improved instruction death | 22:40 | |
But it's not; it's something else I've done earlier, I guess. | |||
timotimo | mhh | ||
if you need a break from hard stuff, try having a look at my branch? :P | |||
dalek | arVM/spesh_trace: f38c767 | jnthn++ | src/spesh/optimize.c: Typo fix. |
22:48 | |
arVM/spesh_trace: 0d353ce | jnthn++ | src/spesh/optimize.c: Iterate to fixed point when finding unused instrs. Before we just did a backwards propagation as we went back up the dominator tree. But that didn't catch enough. Switch to just keeping iterating doing removals until we're done. May be more optimal ways to walk BBs in order to speed up convergence. Either way, the results with doing the analysis this way are better. |
|||
timotimo | jnthn: can that patch easily be cherry-picked to master? | 23:03 | |
jnthn | timotimo: I'd rather not, given the branch is intended to be merged to master | 23:05 | |
And I'm aiming to do that at the weekend. | 23:06 | ||
So it's not long to wait. | |||
timotimo | all right :) | ||
how much better are we talking here? | |||
jnthn | Well, not sure yet | 23:07 | |
I haven't actually done the big opt that this was all in aid of yet. | 23:08 | ||
(Which is using spesh type knowledge to resolve multi-dispatches.) | |||
timotimo | oooooh | 23:09 | |
jnthn | Which of course is a pre-req for any useful inlining in Perl 6 code, but that'll be a separate, later thing. | ||
timotimo | righto | ||
how much cheaper will that be than hitting the caches | |||
? | |||
jnthn | Well, it means you just grab the invokee out of a spesh slot and invoke it | 23:10 | |
So, cheap | |||
timotimo | we really ought to profile some time :\ | ||
jnthn | Well, that's partly what led me to look into this. I could see that we were spending a lot of time in the multi-dispatch cache in some of the benchmarks. | 23:11 | |
We're spending even more on invocation, though, which inlining can help with. | |||
timotimo | ah. well, that does sound good then :) | 23:14 | |
continue :) | |||
jnthn | My overall path here is spesh time dispatch resolution in 2014.05, inlining in 2014.06, escape analysis in 2014.07, and on-stack-replacement by 2014.08 (though if it's easy it may happen sooner) | 23:16 | |
And to give a talk on performance stuff at YAPC::EU :) | |||
Of course, I need to co-ordinate this with brrt so it fits well with the JIT work. | 23:17 | ||
I'll also this weekend put an opt into Rakudo that tries to avoid creating the %_ if we can see it's not used. | 23:18 | ||
Then just let spesh take out the creation of the low-level hash. | 23:19 | ||
timotimo | righto | ||
jnthn | Anyway, I need some rest :) | 23:20 | |
'night | |||
timotimo | gnite jnthn! | 23:21 | |
23:31
timo joined
23:32
timo1 joined
23:35
timo1 joined
|