00:17
cognominal__ joined
01:32
FROGGS joined
01:54
FROGGS joined
05:17
cognominal__ joined
06:39
lizmat joined
07:14
brrt joined
|
|||
brrt | .tell jnthn if he could, would he care to test moar-jit on win64 | 07:15 | |
i /think/ i've implememted the abi correctly | |||
but how can you tell? :-) | |||
08:02
lizmat joined
08:03
woolfy joined
|
|||
brrt | is MVMArgProcContext * args different from MVMFrame -> args | 08:09 | |
? | |||
brrt is already figuring it out | 08:11 | ||
its different | 08:12 | ||
ok, good to know | |||
08:18
lizmat joined
09:37
brrt joined
12:08
cognominal joined
|
|||
jnthn | .tell brrt it appears the JIT works out nicely on win64 \o/ | 12:27 | |
12:30
brrt joined
|
|||
lizmat | jnthn: so what was jitted (so I can mention in my talk in 15 mins) | 12:32 | |
timotimo | something like "1 + 3" | ||
brrt | lizmat: the routine 2 + 2 = 4 | ||
timotimo | er, that's not legit code :) | 12:33 | |
brrt | fair enough | ||
return 2 + 2; | |||
lizmat | so beyond constant folding ? | ||
brrt | and apparantly it works on win64 :-D | ||
timotimo | there's also a "say "OH HAI"" in foo.nqp, does that get jitted, too? | 12:34 | |
jnthn | lizmat: The takeaway isn't so much "we can JIT 2 + 2" as "we have all the infrastructure in place to JIT stuff, and have a first trivial example working" :) | ||
brrt | just that, yes | ||
timotimo | does it get jitted into a machine-code add instruction or into an invocation of moarvm's "add two integers" function? | ||
lizmat | so this would be an adequate description: "First JITted code execution already seen!" | ||
brrt | i think so, yes | ||
timotimo - into an add instruction, yes | 12:35 | ||
timotimo | sounds good :) | ||
brrt | oh, i should've sent the 'bytecode with explanation' on my blog, too | 12:36 | |
timotimo | aye | ||
brrt: would a wild sub test() { 2 + 2 } become jitted, too? or does that contain some tricky ops you can't do yet? | 12:38 | ||
brrt | that contains checkarity, iirc | ||
timotimo | ah, could be. | 12:39 | |
brrt | and... i haven't checked that out really | ||
timotimo | well, after it's spesh'd, it shouldn't have checkarity any more | ||
and jit comes after spesh, no? | |||
brrt | yes | 12:40 | |
well, try it out, i'd say :-) | |||
timotimo | could do :3 | ||
jnthn | After it's spesh'd, the checkarity is gone. :) | 12:43 | |
yeah, it spesh's to just: | 12:45 | ||
const_i64_16 r0(1), liti16(2) | |||
const_i64_16 r1(1), liti16(2) | |||
add_i r1(2), r0(1), r1(1) | |||
return_i r1(2) | |||
brrt | actuall, it does | ||
jnthn | :) | ||
brrt | so | ||
TimToady | so after spesh the intermediate language is SpeshIL? :) | ||
brrt | timotimo - the answer is yes | ||
timotimo | \o/ | ||
TimToady: *groan* :) | 12:46 | ||
jnthn | :P | ||
timotimo | except if you have code like sub a() { 2 +2 }, the optimizer will turn it into sub a() { 4 } already :) | 12:47 | |
the jit doesn't do "get arguments" yet, right? not even the spesh'd variants? | |||
brrt | it doesn't on moar-jit, not yet | ||
timotimo | right; should be simple to do, once it bubbles to the top of your todo list | 12:48 | |
jnthn | timotimo: The NQP optimizer doesn't yet. | ||
brrt | well, i think it should be done on the spesh level tbh | ||
timotimo | oh! | 12:49 | |
brrt: what exactly? | |||
brrt | constant folding | ||
timotimo | ah, fair enough | ||
brrt | basically, i don't know (yet) if the registers that store the constants will be used another time | 12:50 | |
in theory that is possible (although unlikely) | |||
timotimo | i'm getting "emit" debug output from my sub a() { 2 + 2 } example in nqp \o/ | ||
jnthn | brrt: Now something basic is working, what's your plan from here? | ||
timotimo | gist.github.com/timo/e435363b9ecc25bb9495 | ||
brrt | uhm, i'd point you to my recent blog entry :-) but in short | 12:51 | |
timotimo : thats what i get too :-) | |||
jnthn | oh, did I miss a post/ :) | ||
brrt | o it hasn't bubbled through yet | ||
jnthn | why yes, I did... | ||
brrt | i've just written it this morning | ||
jnthn | I see it now :) | ||
timotimo | oh! | ||
brrt | in short, - positional parameters | ||
- more arithmetic | |||
- branching and conditionals | 12:52 | ||
- phi node reduction | |||
jnthn | +1 to an MVM_JIT_LOG env var | 12:53 | |
timotimo | how come branching seems like the hardest part? | ||
brrt | because it means linearizing the tree, in effect | ||
jnthn | I suggest that dumping the JIT tree could be a good idea. | ||
brrt | that, or it moves the block navigation down to the compiler | 12:54 | |
i agree | |||
jnthn | I've got a lot out ot spesh_log being full of spesh dumps | ||
brrt | ok, that moves up, too, then :-) | ||
jnthn | The other thing you could do is have an env var where you give it a directory and it dumps the JIT compiled output for disasm-ing. | ||
And name the files according to some identifier that ends up in the jit tree log, so it's possible to correlate the two. | 12:55 | ||
I've found building small logging-ish things like this has been a huge time-saver overall, as it helped me debug all sorts with spesh. | 12:56 | ||
brrt | atomically incrementing integer? or do we still have the cu-uuid at spesh-graph time? | ||
jnthn | You can find the cuuid but we might specialize the same one multiple times. | ||
brrt | fair enough | 12:57 | |
jnthn | brrt++ for the post | ||
brrt | i kind of want to move the staticframebody into the spesh graph | ||
jnthn | I suspect "set" will be an important, and easy, opcode. | 12:58 | |
brrt | i've done set | ||
jnthn | ? | ||
you already have g->sf | |||
brrt | or a pointer-to-the-staticframebody :-) | ||
i do? | |||
oh | |||
i said /nothing/ :-D | |||
jnthn | g->sf.body # gets you the body. | ||
brrt | ok, that will be awesome | 12:59 | |
more reason for me, though, to push through with my next change | |||
create a 'copy' (or 'store', or 'load-and-store', or 'move') node | 13:00 | ||
why do that? because moving stuff back and forth is basically what i do all day, and creating a node has me move the decisions of what to move upward to graph creating | 13:01 | ||
jnthn | Sounds sensible. | 13:02 | |
dalek | arVM/inline: ed2a763 | jnthn++ | src/core/ext.c: Make sure extops don't leave noinline as junk. |
13:04 | |
brrt | i noticed param_op_i isn't yet speshed to sp_getarg_i | ||
jnthn | It *may* be. | ||
brrt | hmmm | ||
jnthn | It depends on an arg_i being passed | ||
At some point it should learn to spesh an object being passed into sp_getarg_o followed by unbox_i | 13:05 | ||
dalek | arVM/moar-jit: 5fff8c3 | (Bart Wiegmans)++ | / (5 files): Implement a few more opcodes. |
||
jnthn | Where unbox_i will be able to further spesh into sp_get_i I guess... | ||
brrt | i hope | 13:06 | |
:-) | |||
timotimo | you forgot to add a label for set into the jit.c dispatch switch thingie, no? | 13:08 | |
jnthn | brrt: About /* I basically don't really want to use the TC's CompUnit ... | 13:09 | |
You have the sf, and you can sf->body.cu to get to it safely. | 13:10 | ||
brrt | right :-) that was exactly my question | ||
i may have forgotten that timotimo | |||
:-) | |||
timotimo | as you can see, your mentors are following your commits closely ;) | ||
jnthn | It's almost like they care the project is successful :P | 13:11 | |
timotimo | mentors work in mysterious ways | 13:12 | |
brrt | :-D | ||
timotimo | jnthn: as soon as OSR + jit land, the benchmarks will look incredible for perl6-m (or at least nqp-m) | ||
brrt | on-stack-replacement? | ||
timotimo | yes | 13:13 | |
so that we can spesh single-layer for loops even if they don't invoke some sub inside | |||
also i imagine inlining will cause less "entry points" for going from the regular bytecode to the spesh'd bytecode? | 13:14 | ||
brrt | i hope so | 13:15 | |
jnthn | Yeah, well, I gotta get inlining in shape before any OSR :) | ||
timotimo | of course | ||
why don't i see any jnthn commits? :P :P | 13:16 | ||
.o( i should also be working, rather than complain that others aren't ) | |||
13:18
cognominal joined
|
|||
TimToady is just a tor-mentor | 13:18 | ||
jnthn | hey, there was a fix for one of the inline valgrind failures just 15 mins ago :P | ||
Or, I hope it's a fix for it :) | 13:19 | ||
TimToady | jnthn: are we planning to handle int -> Int overflow via spesh/deopt? | 13:20 | |
timotimo | okay okay :) | ||
TimToady: i didn't know we ever do int -> Int? | 13:21 | ||
TimToady | I mean Int -> int -> Int :) | ||
timotimo | i don't think we do that yet, either | ||
TimToady | spesh to int, deopt back to Int | ||
well, maybe spesh to int64 | |||
timotimo | would that be "strength reduction"? "boxing elimination"? | 13:22 | |
TimToady | I just think of it as using native ints rather than Int objects :) | ||
timotimo | we already have a "fast path" both for data storage and for calculations that makes Int better if the values fit into 32 bit (on moarvm) | ||
TimToady | right, I forgot that | 13:23 | |
brrt | the whole 64 bit mess is messy | ||
timotimo | not sure how we could make that better using knowledge gained from spesh | ||
jnthn | TimToady: I was more thinking initially to do those just by JITting the fast path and if it spots an overflow (or an input to the operation is already overflowed) then it falls back to the slower path instead. | 13:26 | |
timotimo | jnthn: how easy is it to spot overflows in all the ops we support? | ||
can we just tell the cpu to trap overflows and set some register or something? | |||
jnthn | Well, for the common ones that map to CPU operations, I *think* it's checking a flag. | ||
timotimo | in that case, branch prediction probably makes things cheap-ish if we're long-running | 13:27 | |
jnthn | aye. | 13:28 | |
hmm, this read-past-end-of-buffer of the lexicals marking is a curiosity... | |||
We always get the locals marked OK. | 13:29 | ||
brrt | yes, integer overflow is a flag iirc | 13:31 | |
brrt afk | 13:33 | ||
13:33
brrt left
|
|||
jnthn | Yay, I think I have a fix for all thsoe buffer overruns when marking env. | 14:07 | |
lizmat | jnthn++ # can't be said enough | ||
dalek | arVM/inline: 212b75d | jnthn++ | src/gc/roots.c: Don't use spesh'd lexical map for logging frames. A frame may have a spesh_cand, but only because it was being used for doing logging. The spesh cand will later be updated with a type map, but the logging frames will not have had anything inlined, so won't have allocated that much environment storage. |
14:10 | |
jnthn | Managed to re-create the isssue by tracking allocation size, adding an extra check, then getting it to GC every 32KB allocated. :) | ||
timotimo | ouch, that sounds slow :) | 14:20 | |
jnthn | Wasn't so bad. | 14:21 | |
:) | |||
Next issue is why a couple of tests fail because they fail to deopt_one. | 14:22 | ||
nwc10 | jnthn: ASAN failures down to t/spec/S05-metasyntax/litvar.t t/spec/S05-transliteration/trans.rakudo.moar t/spec/S11-modules/require.rakudo.moar t/spec/S17-lowlevel/lock.rakudo.moar t/spec/integration/advent2010-day21.t t/spec/integration/advent2012-day10.t | 14:41 | |
t/spec/S11-modules/require.rakudo.moar doesn't look to be an inline thing | |||
jnthn | Hm, I thought t/spec/S05-transliteration/trans.rakudo.moar was one of the ones I got clean... | 14:42 | |
And litvar.t wasn't failing ASAN so much as just failing due to a deopt_one issue | |||
(Which I seem to have resolved locally) | |||
lizmat | t/spec/S11-modules/require seems to be a result of some S11 refactoring I did | 14:43 | |
investigating it atm | |||
jnthn | nwc10: Are you already wroking on valgrind'ing them for more details? | 14:45 | |
nwc10 | was about to | ||
jnthn | nwc10: If not, hold on a bit for one more patch. | ||
nwc10 | OK | ||
jnthn: this is what ASAN says right now: paste.scsys.co.uk/394679 | 14:49 | ||
dalek | arVM/inline: ff869d0 | jnthn++ | src/spesh/ (3 files): Improve/correct DEOPT_INLINE annotations handling. Place them on the instructions that would oringally have carried a DEOPT_ONE or DEOPT_ALL, rather than on the instruction afterwards. |
||
jnthn | litvar.t there didn't fail due to ASAN, though? | ||
Ah, trans.rakudo.moar does fail with something though | |||
But that seems to be the only ASAN one that could relate to inlines. | 14:51 | ||
ah, and I know what that one is. | 14:53 | ||
dalek | arVM/inline: adaf646 | jnthn++ | src/core/interp.c: Teach another lexical_types access about inlines. Hopefully resolves another of the ASAN complaints. |
15:13 | |
nwc10 | before that commit it's down to t/spec/S05-transliteration/trans.rakudo.moar t/spec/S11-modules/require.rakudo.moar t/spec/S17-lowlevel/lock.rakudo.moar t/spec/integration/advent2010-day21.t | 15:18 | |
one of which is bogus | |||
dalek | arVM/inline: 785be4f | jnthn++ | src/6model/reprs/MVMStaticFrame.c: GC mark the inline code ref. It'll almost always be gen 2, but can't be certain of that. |
15:19 | |
lizmat | S11-modules/require should be fixed now | 15:20 | |
jnthn | adaf646 probably moves the S05 and integration error further in rather than completely solving it. | 15:21 | |
dalek | arVM/inline: 7a52289 | jnthn++ | src/core/frame.c: Make lexical auto-viv inline-aware. |
15:45 | |
jnthn | nwc10: OK, hopefully this nails all the inline-related fails. | ||
nwc10 | might be a delay in getting results - my laptop is being attacked by a one toothed monster | 15:47 | |
timotimo | asan fails or all fails? :) | ||
nwc10 | at least it's only being grabbed. Not drooled on. | ||
lizmat | a fork gone wild? | ||
jnthn | timotimo: All afaik | ||
timotimo | oh! | 15:48 | |
... as in: ready to merge into master?! | |||
jnthn | timotimo: Probably, though I was going to work on the improved allocation thing first, so we actually see the benefit... | ||
timotimo | great! :) | ||
lizmat | I would settle for fewer spurious S17 failures anytime | ||
timotimo | what should be the initial value for the supplies that haven't more'd yet? | 15:50 | |
oh, mischan | |||
16:37
woolfy left
|
|||
nwc10 | jnthn: only t/spec/S17-lowlevel/lock.rakudo.moar (with ASAN) | 16:38 | |
jnthn | OK, great. Which isn't an inline issue. | 16:41 | |
nwc10 | the files that valgrind previously found errors for are now clean. (except t/spec/S17-lowlevel/lock.rakudo.moar ) | 17:51 | |
18:20
benabik joined
18:28
zakharyas joined
|
|||
FROGGS | hi everybody | 18:46 | |
japhb | o/ FROGGS | 18:47 | |
19:36
colomon joined
|
|||
dalek | arVM/inline: 7bcdf18 | jnthn++ | / (7 files): Add a thread-safe fixed-size allocator. To be used in place of malloc/calloc/free for certain things. To get malloc/free used again (for debugging), just set it to have zero buckets. |
19:41 | |
arVM/inline: 677812e | jnthn++ | src/spesh/candidate. (2 files): Pre-calculate spesh-candidate work/env sizes. |
|||
arVM/inline: e592f83 | jnthn++ | src/core/frame.c: Used fixed size allocator for frames/work/env. |
19:42 | ||
arVM/inline: af4b7ad | jnthn++ | src/6model/reprs/MVMHash. (2 files): Use FixedSizeAllocator for hash entries. |
|||
19:55
vendethiel joined
|
|||
nwc10 | jnthn: business as usual - only t/spec/S17-lowlevel/lock.rakudo.moar fails | 20:05 | |
jnthn | nwc10: Oh. I get exit-time SEGVs in a few tests, which I can catch under the debugger. | 20:29 | |
dalek | arVM/inline: 0f83217 | jnthn++ | src/core/threadcontext.c: Frame cleanup needs StaticFrame; re-order cleanup. |
21:06 | |
jnthn | And that fixes it. :) | 21:07 | |
timotimo | jnthn: with the allocator, do we get much improved performance already? or do you still need to build a cache on top of that? | ||
jnthn | timotimo: It seems to help a bit. | ||
timotimo | i can't reach my desktop at the moment, otherwise i'd offer to run some benchmarks | 21:09 | |
jnthn | Well, inline doesn't help Rakudo a whole lot yet, 'cus I didn't yet teach it about frame handlers, and Rakudo pretty much always seems to spit one out at the moment for the return handler... | 21:12 | |
timotimo | frame handlers are everything for exceptions and such? | 21:14 | |
hm, should we try to eliminate return handlers if we know we don't have a "return" statement? | |||
jnthn | I already looked into it and it's a little tricky | 21:15 | |
japhb | jnthn: Have you done any segv fixes on master in the last few days? | 21:16 | |
timotimo | how hard will frame handler merging be? | 21:17 | |
jnthn | japhb: no | 21:19 | |
japhb: Though almost top of my todo list is to look at one Panda tests SEGV | |||
timotimo | ah | ||
release is ~1 week in the future? | 21:20 | ||
jnthn | Thursday, I guess | 21:28 | |
timotimo | that's the first day of the GPN :) | 21:29 | |
japhb | GPN? | 21:30 | |
jnthn | Hm. I just got my first sub-80s Rakudo build... | 21:32 | |
japhb | Oooh | ||
dalek | arVM/inline: 8bb202e | jnthn++ | src/ (2 files): Use the fixed size allocator for named used flags. |
21:33 | |
jnthn thinks inline is looking ok to merge | 22:29 | ||
japhb | \o/ | 22:30 | |
22:33
mj41 joined
|
|||
timotimo | that makes me a bit happy | 22:58 | |
though the missing handler handling leaves something to be desired | 22:59 | ||
jnthn | Well, it's one of various todos. | 23:03 | |
But better to bring improvements in incrementally. | 23:04 | ||
timotimo | 83.08user 1.27system 1:11.12elapsed 118%CPU (0avgtext+0avgdata 1127920maxresident)k | ||
jnthn | Rather than ever-lasting branches doing everything, but leaving opportunity for feedback to the end. | ||
timotimo | yays :) | ||
jnthn | grrr. I was wondering why a patch to eliminate more namesused and similar slowed things down. | 23:05 | |
So I pulled it out again to check and...still slower. Turns out iTunes is doing something expensive. :) | |||
timotimo | m) | 23:06 | |
in that case: feel free to push that patch :) | 23:07 | ||
jnthn | oh, and chrome tab with the latest England vs Italy info is chewing some too :) | ||
timotimo | 87.00user 1.05system 1:14.77elapsed 117%CPU (0avgtext+0avgdata 804048maxresident)k | 23:13 | |
without inline | |||
jnthn | ooh, so it's an improvement :) | ||
dalek | arVM/inline: 03e9cdf | jnthn++ | src/spesh/args.c: Improve named arguments spesh. Can toss namesused checking op and thus the setting of the used flags. This cuts down on ops, but should also allow more inlining. |
23:14 | |
timotimo | i don't really know what's up with the difference in maxresident; probably due to using -j3 and different things ending up at the same time | ||
jnthn | Yeah...hmm | 23:15 | |
timotimo: Could you try wiht 03e9cdf? | |||
timotimo | sure | 23:16 | |
was just about to | |||
83.82user 1.21system 1:11.40elapsed 119%CPU (0avgtext+0avgdata 1129300maxresident)k | 23:18 | ||
that's less elapsed, but more user | |||
i should redo it with -j1 | 23:19 | ||
this is with inline, -j1: | 23:21 | ||
78.13user 1.16system 1:19.60elapsed 99%CPU (0avgtext+0avgdata 1129540maxresident)k | |||
(waiting for laptop to cool off) | 23:22 | ||
jnthn | Hm, that's a real memory increase. Odd. | 23:23 | |
timotimo | 81.69user 0.96system 1:22.96elapsed 99%CPU (0avgtext+0avgdata 803884maxresident)k | 23:24 | |
master with -j1 | |||
are our bytecode segments heavy on memory usage? | |||
jnthn | Not especially | ||
I mean, the extra inlining can't account for *that* much difference. | 23:25 | ||
timotimo | yeah :\ | ||
jnthn | oh...hmm | ||
I think the graphs it makes when inlining maybe don't get freed up properly | 23:26 | ||
timotimo | that could surely explain things | ||
if we free those up, we're going to lose some performance :P | 23:27 | ||
jnthn | Doubt it...it'll give us a smaller working set. | 23:28 | |
Leaking is never good cache wise. | |||
timotimo | oh, hm. | ||
jnthn | Working on a patch | 23:31 | |
timotimo: Think I got one :) | 23:41 | ||
dalek | arVM/inline: bfd79a7 | jnthn++ | src/ (4 files): Don't leak graphs of spesh inlinees. |
23:43 | |
23:43
woolfy joined
|
|||
jnthn | Don't think it's the whole story, though... | 23:44 | |
Well, enough for today. 'night :) | 23:56 |