00:17 cognominal__ joined 01:32 FROGGS joined 01:54 FROGGS joined 05:17 cognominal__ joined 06:39 lizmat joined 07:14 brrt joined
brrt .tell jnthn if he could, would he care to test moar-jit on win64 07:15
i /think/ i've implememted the abi correctly
but how can you tell? :-)
08:02 lizmat joined 08:03 woolfy joined
brrt is MVMArgProcContext * args different from MVMFrame -> args 08:09
?
brrt is already figuring it out 08:11
its different 08:12
ok, good to know
08:18 lizmat joined 09:37 brrt joined 12:08 cognominal joined
jnthn .tell brrt it appears the JIT works out nicely on win64 \o/ 12:27
12:30 brrt joined
lizmat jnthn: so what was jitted (so I can mention in my talk in 15 mins) 12:32
timotimo something like "1 + 3"
brrt lizmat: the routine 2 + 2 = 4
timotimo er, that's not legit code :) 12:33
brrt fair enough
return 2 + 2;
lizmat so beyond constant folding ?
brrt and apparantly it works on win64 :-D
timotimo there's also a "say "OH HAI"" in foo.nqp, does that get jitted, too? 12:34
jnthn lizmat: The takeaway isn't so much "we can JIT 2 + 2" as "we have all the infrastructure in place to JIT stuff, and have a first trivial example working" :)
brrt just that, yes
timotimo does it get jitted into a machine-code add instruction or into an invocation of moarvm's "add two integers" function?
lizmat so this would be an adequate description: "First JITted code execution already seen!"
brrt i think so, yes
timotimo - into an add instruction, yes 12:35
timotimo sounds good :)
brrt oh, i should've sent the 'bytecode with explanation' on my blog, too 12:36
timotimo aye
brrt: would a wild sub test() { 2 + 2 } become jitted, too? or does that contain some tricky ops you can't do yet? 12:38
brrt that contains checkarity, iirc
timotimo ah, could be. 12:39
brrt and... i haven't checked that out really
timotimo well, after it's spesh'd, it shouldn't have checkarity any more
and jit comes after spesh, no?
brrt yes 12:40
well, try it out, i'd say :-)
timotimo could do :3
jnthn After it's spesh'd, the checkarity is gone. :) 12:43
yeah, it spesh's to just: 12:45
const_i64_16 r0(1), liti16(2)
const_i64_16 r1(1), liti16(2)
add_i r1(2), r0(1), r1(1)
return_i r1(2)
brrt actuall, it does
jnthn :)
brrt so
TimToady so after spesh the intermediate language is SpeshIL? :)
brrt timotimo - the answer is yes
timotimo \o/
TimToady: *groan* :) 12:46
jnthn :P
timotimo except if you have code like sub a() { 2 +2 }, the optimizer will turn it into sub a() { 4 } already :) 12:47
the jit doesn't do "get arguments" yet, right? not even the spesh'd variants?
brrt it doesn't on moar-jit, not yet
timotimo right; should be simple to do, once it bubbles to the top of your todo list 12:48
jnthn timotimo: The NQP optimizer doesn't yet.
brrt well, i think it should be done on the spesh level tbh
timotimo oh! 12:49
brrt: what exactly?
brrt constant folding
timotimo ah, fair enough
brrt basically, i don't know (yet) if the registers that store the constants will be used another time 12:50
in theory that is possible (although unlikely)
timotimo i'm getting "emit" debug output from my sub a() { 2 + 2 } example in nqp \o/
jnthn brrt: Now something basic is working, what's your plan from here?
timotimo gist.github.com/timo/e435363b9ecc25bb9495
brrt uhm, i'd point you to my recent blog entry :-) but in short 12:51
timotimo : thats what i get too :-)
jnthn oh, did I miss a post/ :)
brrt o it hasn't bubbled through yet
jnthn why yes, I did...
brrt i've just written it this morning
jnthn I see it now :)
timotimo oh!
brrt in short, - positional parameters
- more arithmetic
- branching and conditionals 12:52
- phi node reduction
jnthn +1 to an MVM_JIT_LOG env var 12:53
timotimo how come branching seems like the hardest part?
brrt because it means linearizing the tree, in effect
jnthn I suggest that dumping the JIT tree could be a good idea.
brrt that, or it moves the block navigation down to the compiler 12:54
i agree
jnthn I've got a lot out ot spesh_log being full of spesh dumps
brrt ok, that moves up, too, then :-)
jnthn The other thing you could do is have an env var where you give it a directory and it dumps the JIT compiled output for disasm-ing.
And name the files according to some identifier that ends up in the jit tree log, so it's possible to correlate the two. 12:55
I've found building small logging-ish things like this has been a huge time-saver overall, as it helped me debug all sorts with spesh. 12:56
brrt atomically incrementing integer? or do we still have the cu-uuid at spesh-graph time?
jnthn You can find the cuuid but we might specialize the same one multiple times.
brrt fair enough 12:57
jnthn brrt++ for the post
brrt i kind of want to move the staticframebody into the spesh graph
jnthn I suspect "set" will be an important, and easy, opcode. 12:58
brrt i've done set
jnthn ?
you already have g->sf
brrt or a pointer-to-the-staticframebody :-)
i do?
oh
i said /nothing/ :-D
jnthn g->sf.body # gets you the body.
brrt ok, that will be awesome 12:59
more reason for me, though, to push through with my next change
create a 'copy' (or 'store', or 'load-and-store', or 'move') node 13:00
why do that? because moving stuff back and forth is basically what i do all day, and creating a node has me move the decisions of what to move upward to graph creating 13:01
jnthn Sounds sensible. 13:02
dalek arVM/inline: ed2a763 | jnthn++ | src/core/ext.c:
Make sure extops don't leave noinline as junk.
13:04
brrt i noticed param_op_i isn't yet speshed to sp_getarg_i
jnthn It *may* be.
brrt hmmm
jnthn It depends on an arg_i being passed
At some point it should learn to spesh an object being passed into sp_getarg_o followed by unbox_i 13:05
dalek arVM/moar-jit: 5fff8c3 | (Bart Wiegmans)++ | / (5 files):
Implement a few more opcodes.
jnthn Where unbox_i will be able to further spesh into sp_get_i I guess...
brrt i hope 13:06
:-)
timotimo you forgot to add a label for set into the jit.c dispatch switch thingie, no? 13:08
jnthn brrt: About /* I basically don't really want to use the TC's CompUnit ... 13:09
You have the sf, and you can sf->body.cu to get to it safely. 13:10
brrt right :-) that was exactly my question
i may have forgotten that timotimo
:-)
timotimo as you can see, your mentors are following your commits closely ;)
jnthn It's almost like they care the project is successful :P 13:11
timotimo mentors work in mysterious ways 13:12
brrt :-D
timotimo jnthn: as soon as OSR + jit land, the benchmarks will look incredible for perl6-m (or at least nqp-m)
brrt on-stack-replacement?
timotimo yes 13:13
so that we can spesh single-layer for loops even if they don't invoke some sub inside
also i imagine inlining will cause less "entry points" for going from the regular bytecode to the spesh'd bytecode? 13:14
brrt i hope so 13:15
jnthn Yeah, well, I gotta get inlining in shape before any OSR :)
timotimo of course
why don't i see any jnthn commits? :P :P 13:16
.o( i should also be working, rather than complain that others aren't )
13:18 cognominal joined
TimToady is just a tor-mentor 13:18
jnthn hey, there was a fix for one of the inline valgrind failures just 15 mins ago :P
Or, I hope it's a fix for it :) 13:19
TimToady jnthn: are we planning to handle int -> Int overflow via spesh/deopt? 13:20
timotimo okay okay :)
TimToady: i didn't know we ever do int -> Int? 13:21
TimToady I mean Int -> int -> Int :)
timotimo i don't think we do that yet, either
TimToady spesh to int, deopt back to Int
well, maybe spesh to int64
timotimo would that be "strength reduction"? "boxing elimination"? 13:22
TimToady I just think of it as using native ints rather than Int objects :)
timotimo we already have a "fast path" both for data storage and for calculations that makes Int better if the values fit into 32 bit (on moarvm)
TimToady right, I forgot that 13:23
brrt the whole 64 bit mess is messy
timotimo not sure how we could make that better using knowledge gained from spesh
jnthn TimToady: I was more thinking initially to do those just by JITting the fast path and if it spots an overflow (or an input to the operation is already overflowed) then it falls back to the slower path instead. 13:26
timotimo jnthn: how easy is it to spot overflows in all the ops we support?
can we just tell the cpu to trap overflows and set some register or something?
jnthn Well, for the common ones that map to CPU operations, I *think* it's checking a flag.
timotimo in that case, branch prediction probably makes things cheap-ish if we're long-running 13:27
jnthn aye. 13:28
hmm, this read-past-end-of-buffer of the lexicals marking is a curiosity...
We always get the locals marked OK. 13:29
brrt yes, integer overflow is a flag iirc 13:31
brrt afk 13:33
13:33 brrt left
jnthn Yay, I think I have a fix for all thsoe buffer overruns when marking env. 14:07
lizmat jnthn++ # can't be said enough
dalek arVM/inline: 212b75d | jnthn++ | src/gc/roots.c:
Don't use spesh'd lexical map for logging frames.

A frame may have a spesh_cand, but only because it was being used for doing logging. The spesh cand will later be updated with a type map, but the logging frames will not have had anything inlined, so won't have allocated that much environment storage.
14:10
jnthn Managed to re-create the isssue by tracking allocation size, adding an extra check, then getting it to GC every 32KB allocated. :)
timotimo ouch, that sounds slow :) 14:20
jnthn Wasn't so bad. 14:21
:)
Next issue is why a couple of tests fail because they fail to deopt_one. 14:22
nwc10 jnthn: ASAN failures down to t/spec/S05-metasyntax/litvar.t t/spec/S05-transliteration/trans.rakudo.moar t/spec/S11-modules/require.rakudo.moar t/spec/S17-lowlevel/lock.rakudo.moar t/spec/integration/advent2010-day21.t t/spec/integration/advent2012-day10.t 14:41
t/spec/S11-modules/require.rakudo.moar doesn't look to be an inline thing
jnthn Hm, I thought t/spec/S05-transliteration/trans.rakudo.moar was one of the ones I got clean... 14:42
And litvar.t wasn't failing ASAN so much as just failing due to a deopt_one issue
(Which I seem to have resolved locally)
lizmat t/spec/S11-modules/require seems to be a result of some S11 refactoring I did 14:43
investigating it atm
jnthn nwc10: Are you already wroking on valgrind'ing them for more details? 14:45
nwc10 was about to
jnthn nwc10: If not, hold on a bit for one more patch.
nwc10 OK
jnthn: this is what ASAN says right now: paste.scsys.co.uk/394679 14:49
dalek arVM/inline: ff869d0 | jnthn++ | src/spesh/ (3 files):
Improve/correct DEOPT_INLINE annotations handling.

Place them on the instructions that would oringally have carried a DEOPT_ONE or DEOPT_ALL, rather than on the instruction afterwards.
jnthn litvar.t there didn't fail due to ASAN, though?
Ah, trans.rakudo.moar does fail with something though
But that seems to be the only ASAN one that could relate to inlines. 14:51
ah, and I know what that one is. 14:53
dalek arVM/inline: adaf646 | jnthn++ | src/core/interp.c:
Teach another lexical_types access about inlines.

Hopefully resolves another of the ASAN complaints.
15:13
nwc10 before that commit it's down to t/spec/S05-transliteration/trans.rakudo.moar t/spec/S11-modules/require.rakudo.moar t/spec/S17-lowlevel/lock.rakudo.moar t/spec/integration/advent2010-day21.t 15:18
one of which is bogus
dalek arVM/inline: 785be4f | jnthn++ | src/6model/reprs/MVMStaticFrame.c:
GC mark the inline code ref.

It'll almost always be gen 2, but can't be certain of that.
15:19
lizmat S11-modules/require should be fixed now 15:20
jnthn adaf646 probably moves the S05 and integration error further in rather than completely solving it. 15:21
dalek arVM/inline: 7a52289 | jnthn++ | src/core/frame.c:
Make lexical auto-viv inline-aware.
15:45
jnthn nwc10: OK, hopefully this nails all the inline-related fails.
nwc10 might be a delay in getting results - my laptop is being attacked by a one toothed monster 15:47
timotimo asan fails or all fails? :)
nwc10 at least it's only being grabbed. Not drooled on.
lizmat a fork gone wild?
jnthn timotimo: All afaik
timotimo oh! 15:48
... as in: ready to merge into master?!
jnthn timotimo: Probably, though I was going to work on the improved allocation thing first, so we actually see the benefit...
timotimo great! :)
lizmat I would settle for fewer spurious S17 failures anytime
timotimo what should be the initial value for the supplies that haven't more'd yet? 15:50
oh, mischan
16:37 woolfy left
nwc10 jnthn: only t/spec/S17-lowlevel/lock.rakudo.moar (with ASAN) 16:38
jnthn OK, great. Which isn't an inline issue. 16:41
nwc10 the files that valgrind previously found errors for are now clean. (except t/spec/S17-lowlevel/lock.rakudo.moar ) 17:51
18:20 benabik joined 18:28 zakharyas joined
FROGGS hi everybody 18:46
japhb o/ FROGGS 18:47
19:36 colomon joined
dalek arVM/inline: 7bcdf18 | jnthn++ | / (7 files):
Add a thread-safe fixed-size allocator.

To be used in place of malloc/calloc/free for certain things. To get malloc/free used again (for debugging), just set it to have zero buckets.
19:41
arVM/inline: 677812e | jnthn++ | src/spesh/candidate. (2 files):
Pre-calculate spesh-candidate work/env sizes.
arVM/inline: e592f83 | jnthn++ | src/core/frame.c:
Used fixed size allocator for frames/work/env.
19:42
arVM/inline: af4b7ad | jnthn++ | src/6model/reprs/MVMHash. (2 files):
Use FixedSizeAllocator for hash entries.
19:55 vendethiel joined
nwc10 jnthn: business as usual - only t/spec/S17-lowlevel/lock.rakudo.moar fails 20:05
jnthn nwc10: Oh. I get exit-time SEGVs in a few tests, which I can catch under the debugger. 20:29
dalek arVM/inline: 0f83217 | jnthn++ | src/core/threadcontext.c:
Frame cleanup needs StaticFrame; re-order cleanup.
21:06
jnthn And that fixes it. :) 21:07
timotimo jnthn: with the allocator, do we get much improved performance already? or do you still need to build a cache on top of that?
jnthn timotimo: It seems to help a bit.
timotimo i can't reach my desktop at the moment, otherwise i'd offer to run some benchmarks 21:09
jnthn Well, inline doesn't help Rakudo a whole lot yet, 'cus I didn't yet teach it about frame handlers, and Rakudo pretty much always seems to spit one out at the moment for the return handler... 21:12
timotimo frame handlers are everything for exceptions and such? 21:14
hm, should we try to eliminate return handlers if we know we don't have a "return" statement?
jnthn I already looked into it and it's a little tricky 21:15
japhb jnthn: Have you done any segv fixes on master in the last few days? 21:16
timotimo how hard will frame handler merging be? 21:17
jnthn japhb: no 21:19
japhb: Though almost top of my todo list is to look at one Panda tests SEGV
timotimo ah
release is ~1 week in the future? 21:20
jnthn Thursday, I guess 21:28
timotimo that's the first day of the GPN :) 21:29
japhb GPN? 21:30
jnthn Hm. I just got my first sub-80s Rakudo build... 21:32
japhb Oooh
dalek arVM/inline: 8bb202e | jnthn++ | src/ (2 files):
Use the fixed size allocator for named used flags.
21:33
jnthn thinks inline is looking ok to merge 22:29
japhb \o/ 22:30
22:33 mj41 joined
timotimo that makes me a bit happy 22:58
though the missing handler handling leaves something to be desired 22:59
jnthn Well, it's one of various todos. 23:03
But better to bring improvements in incrementally. 23:04
timotimo 83.08user 1.27system 1:11.12elapsed 118%CPU (0avgtext+0avgdata 1127920maxresident)k
jnthn Rather than ever-lasting branches doing everything, but leaving opportunity for feedback to the end.
timotimo yays :)
jnthn grrr. I was wondering why a patch to eliminate more namesused and similar slowed things down. 23:05
So I pulled it out again to check and...still slower. Turns out iTunes is doing something expensive. :)
timotimo m) 23:06
in that case: feel free to push that patch :) 23:07
jnthn oh, and chrome tab with the latest England vs Italy info is chewing some too :)
timotimo 87.00user 1.05system 1:14.77elapsed 117%CPU (0avgtext+0avgdata 804048maxresident)k 23:13
without inline
jnthn ooh, so it's an improvement :)
dalek arVM/inline: 03e9cdf | jnthn++ | src/spesh/args.c:
Improve named arguments spesh.

Can toss namesused checking op and thus the setting of the used flags. This cuts down on ops, but should also allow more inlining.
23:14
timotimo i don't really know what's up with the difference in maxresident; probably due to using -j3 and different things ending up at the same time
jnthn Yeah...hmm 23:15
timotimo: Could you try wiht 03e9cdf?
timotimo sure 23:16
was just about to
83.82user 1.21system 1:11.40elapsed 119%CPU (0avgtext+0avgdata 1129300maxresident)k 23:18
that's less elapsed, but more user
i should redo it with -j1 23:19
this is with inline, -j1: 23:21
78.13user 1.16system 1:19.60elapsed 99%CPU (0avgtext+0avgdata 1129540maxresident)k
(waiting for laptop to cool off) 23:22
jnthn Hm, that's a real memory increase. Odd. 23:23
timotimo 81.69user 0.96system 1:22.96elapsed 99%CPU (0avgtext+0avgdata 803884maxresident)k 23:24
master with -j1
are our bytecode segments heavy on memory usage?
jnthn Not especially
I mean, the extra inlining can't account for *that* much difference. 23:25
timotimo yeah :\
jnthn oh...hmm
I think the graphs it makes when inlining maybe don't get freed up properly 23:26
timotimo that could surely explain things
if we free those up, we're going to lose some performance :P 23:27
jnthn Doubt it...it'll give us a smaller working set. 23:28
Leaking is never good cache wise.
timotimo oh, hm.
jnthn Working on a patch 23:31
timotimo: Think I got one :) 23:41
dalek arVM/inline: bfd79a7 | jnthn++ | src/ (4 files):
Don't leak graphs of spesh inlinees.
23:43
23:43 woolfy joined
jnthn Don't think it's the whole story, though... 23:44
Well, enough for today. 'night :) 23:56