timotimo the word "macro" seems like a good fit here 00:14
02:03 jnap joined 05:41 brrt joined
brrt \o #moarvm 05:41
www.freelists.org/post/luajit/runti...n-dynasm,5 may be of interest
.tell jnthn that i can see how macro's would work, and that for x64 it would mean a matter of preprocessing 05:42
also, i've tested it out, and for the first few registers - rax up to rsp - dynamic register access actually works, but it brings us right back into register starvation :-) 05:43
nwc10 jnthn: not broken. 06:05
brrt which would be enough to prove the point, though 06:38
hmm
06:49 FROGGS joined 07:08 zakharyas joined
nebuchadnezzar nwc10: thanks, proposing patches was my first intention, I was confident that #moarvm will be happy with them ;-) 08:23
09:07 brrt joined
jnthn urgh 09:13
jnthn slept awfully...
brrt how come :-) 09:23
morning, though
brrt is wondering if it would be more difficult to write an x86-64 assembler or to patch up dynasm, but suspsects the former 09:30
nwc10 code is a liability 09:31
if it's not a core differentiator, it's a liability, not an asset
ie, if you can get patching back upstream, that's best 09:32
brrt i agree 09:33
jnthn brrt: How bad is the preprocessing route, and to what degree can it wait and we do the naive thing first? 09:41
brrt the preprocessing route is bad 09:42
jnthn suspected so
brrt to give you an idea, there are 16 general-purpose 64-bit size registers, and 3 addressing mode of interest, so i can have the combination of 16x16 registers x 3 addressing modes, per opcode 09:43
there are 8 or so floating point registers, and much the same story applies to them
jnthn On "will never lead to much better performance than what the interpreter can do" - it would still be quite a lot better due to (1) not needing to do any instruction decoding, (2) not needing to do the switch thing, (3) do way better on cache and branch prediction. 09:44
brrt that is fair
jnthn It depends on overhead.
For ops like set, add, etc. the decoding overhead dominates the operation cost.
brrt uhuh 09:45
jnthn And since spesh turns many attribute accesses into cheap pointer operations, those are dominated too
Same with sp_getarg_*
And arg_*
brrt well, for compiling instruction-per-instruction, i don't need a jit tree
the naive route is admittedly tremendously simpler 09:46
i.e. it /could/ be feasable to take interp.c, have gcc compile it to assembler, do some transformations to reduce the indirection caused by operand decoding, and stitch it together in a dasc file 09:47
jnthn Well, you don't *need* one, but (a) having one - even a basic one - will provide a way to move beyond that approach in the future, and (b) the "this ops is just this C function call" decision has to be made once and that knowledge can be re-used over different CPUs. 09:48
brrt hmm, yes, thats true 09:51
in case it wasn't apparant, my idea in the jit tree was to take all indirections out into the open and use tree rewriting to write the code 09:52
i can suppose it wasn't, it was an unclear post i think
jnthn Yeah, I can see that may be a step too far without the dynamic register stuff... 09:54
brrt to rephrase what i really meant, is: making indirections visible will help nothing in code generation if i can't select registers dynamicly 09:55
that, yes
:-)
ultimately the optimum would be patching dynasm 09:56
the 'near peak' is static register allocation and perhaps stuff like operation merging (peephole-style transformations) 09:57
i ... think we all agree we should go for the near peak first
FROGGS Please send general questions to the LuaJIT mailing list. You can also send any questions you have directly to me:
E-Mail: [email@hidden.address]
brrt: I suppose a small chat with Michael Pall make sense here 09:58
jnthn brrt: Yes, I think so.
brrt i suppose so too FROGGS :-)
jnthn Well, yeah, his estimate on the amount of work needed to patch dynasm to support dynamic registers may be worth having 09:59
Like, is it an O(day), O(week), etc task.
brrt ok, i'll mail him for that :-)
jnthn But yeah, that aside, near peak working would still be a very meaningful improvement. 10:04
brrt sent, i'll see what i hear 10:09
brrt &lunch
10:34 brrt joined
brrt is having tea at 27 degrees c mad? 10:41
jnthn Yes. Tea should either be ice-tea and cold, or not ice tea and thus much hotter.
Oh wait, you didn't mean the temperature of the tea... 10:42
I'd say it's mad but I'm sipping coffee in similar temperatures here in Sweden :)
brrt :-) i've had coffee to, but i've run out of it 10:43
jnthn Which also seemed like a good idea at the time... :)
brrt why so hot? and to think i have to get out, too 10:44
jnthn It's even hot in Sweden, dammit. 10:47
Well, this bit of Sweden.
Clearly I need to find a further north place to move ot.
*to
nwc10 jnthn: you wouldn't like Vienna 10:52
jnthn nwc10: I used to live an hour away from there, and am well aware it can touch 35C or so in those parts in the summer. 10:53
nwc10 I suspect that the swimming is better here. Lots of bonus bits of Danube 10:54
jnthn True :)
My bit had the cheaper beer, though. :) 10:59
brrt wow, vienna is 33 degrees
jnthn Which is another cooling strategy :)
brrt (according to the google)
jnthn Prague, another city I sometimes feel a vague temptation to move to, is hitting 32 today also. 11:01
I guess if I want to do central Europe again I'll just have to learn to cope with heat. Or find a place with aircon :P
brrt or flee the summer
jnthn Or that, yes.
Summer home in Svalbard or something. :)
nwc10 if Scotland votes for independance, will it qualify for your grand tour of S? 11:02
brrt nice
jnthn
.oO( Summer in a place you can bear )
nwc10: hah, I'd not thought of that. But yes. :P
brrt wonders what is up with all those regions wanting to be states 11:03
moritz well, Scotland has a long and bloody history of that 11:06
when you consider that, the proceedings are the most civilized of all the recent ones
brrt that is true 11:12
moritz also my impression of Scotland (from 2006/2007) was that they love to hate the English, but that they love other stuff even more, like the English health care system, the common currency, and similar stuff 11:14
FROGGS brrt: can you join #luajit please? 11:18
brrt yes, of course
brrt wonders why he hadn't thought of that
helpful discussion at #luajit indeed 12:12
they suggest using llvm as a codegen
not sure if that helps all that much
nwc10 it gets you a lot more platforms 12:13
brrt that is true
nwc10 it costs you, um, a need for C++
brrt among other things :-) 12:14
nwc10 and possibly more resources in other places
and yes, I'm not sure
brrt people typically use a special thread just for llvm
nwc10 but all the cool kids are using llvm currently, so you get a lot of other people's work for free
brrt at the cost of magic 12:15
(yes, my mood has a cynical turn today, i'm sorry, i'll be more optimistic some other day)
(when it is cooler, probably)
nwc10 or when it's a public holiday again? :-)
brrt well, i don't really care for public holidays right now as i'm totally freeeeee :-) 12:16
for the summer
anyway, &appointment 12:18
12:21 brrt left 12:23 jnap joined
lizmat
.oO( totally freeee != free from appointments )
12:39
jnthn
.oO( or writing a JIT compiler :P )
I really don't think dragging in llvm is the way to go. Heck, it's huge... And I really don't want to have to deal with C++. 12:40
I'm not quite sure why "build the best imaginable JIT" is becoming a goal. We don't have one *at all* yet. Dealing with all the places in the VM that aren't JIT-ready is going to be tough enough. 12:42
Getting deopt+JIT figure out for a simple one will be hard enough too.
And code-gen is just one small part of the overall performance story. 12:43
More of the kinds of things spesh does, plus inlining, OSR, and escape analysis are also huge factors.
What most VMs call their JIT is actually what we're calling spesh+JIT, which obscures things a bit. Eliminating the interpreter overhead *in combination* with the other things spesh does/will do (noting many of the transforms it does are turning costly operations into cheap ones that suffer interp overhead more) will help a good bit already. 12:46
nwc10 jnthn: meaning that LLVM is the wrong choice in the near term, because there's a lot more win to be had from a KISS approach to the actual codegen part, because that's not actually where the hard stuff is? 12:57
and maybe the wrong choice for the foreseeable future, not just the near term
jnthn nwc10: Well, I'm a lot more confident we'll get something viable and helpful (as in, actually lets people run Perl 6 programs faster than they can today) with such an approach to the code-gen itself, yes. 13:08
nwc10: For the longer term: harder to say. But I don't think we can work out the right longer-term options without doing the simple things first. 13:09
nwc10 that was what I'd sort of figured, but didn't say explicitly 13:11
jnthn Code-gen can be a big and complex area, but trying to do that in a highly clever way *and* having to tackle all the integration points between code-gen and the rest of the VM feels like a lot to bite off in a summer. 13:12
nwc10 that figuring out how to use llvm well needs the experience of starting with something simpler.
jnthn (As in, more than I think is reasonably possible.)
And I'm quite sure our downstream users would much prefer a sufficiently complete and useful JIT today that gives a 2x-3x improvement by eliminating interpreter overhead, than the a half-complete but will-be-super-awesome-someday JIT. 13:13
The additional factors will come from other opts, like inlining, and escape analysis, and more specialization, which turn costly operations into cheap ones. 13:15
(like, turning "invoke this method, creating it a callframe" into "goto" 'cus we've now gone and inlined that method) 13:16
13:16 donaldh joined 13:17 donaldh left 13:39 ggoebel111116 joined 13:55 ggoebel111116 joined 14:56 brrt joined
brrt lizmat: doctors appointments trump everything, i'm afraid :-) 14:56
jnthn, what happens if i use the mvm jit graph construction to do the following: linearize the bytecode stream, assign labels, convert some ops into c calls 15:06
anyway, i have a more important issue to solve first 15:07
'where to insert the jit code and have it called' 15:08
and fwiw, i agree on llvm being far too big 15:09
jnthn brrt: (jit graph) that sounds fairly sane as a starting point. Can desugar some ops a bit, I'd guess. 15:12
brrt: On "where to put it to have it called" - yes, that's interesting indeed. I wrote up a gist on some ideas on that a while back :) 15:13
brrt if you have it, i'd like to read it 15:14
jnthn Oh noes, you losted it :P
jnthn goes to find it
gist.github.com/jnthn/c1b88756121f0525ff28 15:15
brrt the secret jit op, that solution, i recall 15:16
jnthn yeah
And hang the generated code off MVMSpeshCandidate
brrt not on the frame? 15:18
jnthn Oh, two levels hee 15:19
*here
Well, actually
Yeah, off the candidate would do
Why? Because cur_frame->spesh_cand->jitted will doit
brrt :-) ok 15:22
so a whole frame has to be compiled right?
i think you need a whole frame to form a graph
jnthn Yes 15:24
1 spesh graph = 1 frame
Well, apart from I'm busy ruining that in a branch called inline
But even so, the difference doesn't matter to you even then - except deopt. :)
brrt after inline, it will reform to a single frame, so i don't care
oh that is true
jnthn Well, in deopt we call back into the interpreter 15:25
Oh, I worked out how you can do it really easily :)
You'll just call into the deopt code path and explicitly pass it the deopt-point index and then fall out of the JITted code back into the interpreter, and deopt will ahve stuck it in the right place. :)
And the deopt point index hangs off a spesh annotation. 15:26
So it's readily available.
brrt i think things are starting to come together
because i've also figured out pretty much how to emit a 'call-to-interpreter-and-return-here' scheme 15:27
basically, emit a c call to MVM_frame_invoke() or something like that
that sets up the interpreter 15:28
also emit a label
return the reference to that label
and on the next entry, pass the same label again
we might not return it but rather store it in a frame variable / work register
that might even be better 15:29
jnthn Yeah, that sounds along the right lines.
I think that what we actually want is a MVM_frame_invoke_from_jit()
brrt thats quite alright to me to 15:30
that is indeed probably better
jnthn Where if you're calling to another thing that happens to be JITTed, it returns you an address of the JITted code to jump to
brrt because you can have that function receive the label
jnthn And if it returns NULL you know you have to fall back to the interpreter.
It *does* mean you'll be essentially doing a tail call
Return needs the same thing, fwiw. Apart from in the return to JITted code from JITted code case it's just jumping to the label, like you said. 15:31
And if it returns NULL then again, you know to fall back into the interpreter.
brrt uhuh
jnthn Which leaves the forth quadrant - interpreted code returning to JITted code - which just gets pointed to some (maybe different) enter JIT op. 15:32
brrt yes
we might need to add a flag to the frame, to tell the interpreter that it is running jitted code
jnthn We might, though we can also infer it from f->spesh_cand->jitcode not being NULL, maybe, and then just hide it behind a function. 15:33
brrt that works, too 15:34
ok, i'm going to write some stuff down now 15:35
brrt feels like this is the most productive discussion he has had all day
jnthn :) 15:44
Aye, well, it seems things are less blocked now :)
brrt yes, that is how it feels to me too 15:59
some branches of the tree of possibilities have been culled
dalek arVM/moar-jit: 9aefa41 | (Bart Wiegmans)++ | src/ (5 files):
Used macros and type maps to clean up jit codegen.

Minor change, but I've also moved MVMJitCode to types.h so that MVMSpeshCandidate can now refer to it (doesn't yet).
16:10
16:15 FROGGS joined 16:36 jnap joined 16:38 jnap1 joined 16:55 vendethiel joined
dalek arVM/inline: e738af3 | jnthn++ | src/spesh/deopt.c:
Implement multi-frame uninlining.
17:15
arVM/inline: 478b111 | jnthn++ | src/spesh/deopt.c:
Tweak deopt debugging help.

Could do something more formal that this, like write to the spesh log, or start keeping a deopt log.
17:16
17:16 brrt joined 17:18 brrt left
jnthn With that, first 2 NQP tests are passing again :) 17:32
Seems the next issue is missing spesh slot values..hm. 17:44
nwc10 jnthn: just to check - is your fridge big enough for all the beer you deserve after you get this working? 17:46
TimToady fortunately, deserved beer can be lazily evaluated much of the time 17:47
jnthn :) 17:54
Well, the fridge is big enough to hold the www.ratebeer.com/beer/great-divide-...out/85174/ that should be a sufficient reward. :) 17:55
nwc10 jnthn: origin/inline && master && nom not broken. 18:06
jnthn Darn, when I fix the speshslot bug the first 2 NQP tests hang. wat. 18:13
BB 6: 18:17
Instructions:
...blah...
goto BB(6)
...that may explain it. wtf :)
Well, dinner first, but it's a case where we inlined 3 things, but one of those things already had something inlined into it. 18:20
FROGGS ewww 18:21
jnthn Well, it's an *awesome* thing to be doing. I just checked into it. 18:22
Will debug why it's going wrong after food, but just to share what it's trying to do here, this is append:
method append(MAST::InstructionList $other) {
push_ilist(@!instructions, $other);
$!result_reg := $other.result_reg; 18:23
$!result_kind := $other.result_kind;
}
It figures it can inline all of result_reg, result_kind, and push_ilist
The latter two are trivial 'cus they just read attributes
push_ilist is as follows:
sub push_ilist(@dest, $src) is export { nqp::splice(@dest, $src.instructions, +@dest, 0);
}
Which is tiny, so again it makes sense to inline. But notice it calls .instructions - another cheap accessor.
FROGGS nice 18:24
jnthn And earlioer in the log we find:
Can inline instructions (cuid_31_1402147152.67041) into push_ilist (cuid_57_1402147152.67041)
So basically it flattens away 4 call frame creations once it's doing it right. Which is very cool. 18:25
Now I just need to wrok out why the push_ilist nested inline comes out looking utterly broken.
But I'm hungry. So, after noms :)
FROGGS nom well :o) 18:26
nwc10 jnthn: nice. 18:37
jnthn [Annotation: INS Deopt All (idx 1 -> pc 52)] 20:06
goto BB(3)
[Annotation: INS Deopt One (idx 0 -> pc 52)]
sp_guardconc r2(2), <nyi(lit)>
Successors: 2
oh dear. :)
FROGGS <nyi(lit)>? 20:07
jnthn no
That the goto has something after it
Oh, hmm...but that maybe isn't actually it.
oh, yeah, it's wrong.
Not sure if it's *the* issue, but it's pretty clear the goto should end the basic block :) 20:08
20:11 woolfy1 joined 20:13 lue joined 20:29 oetiker joined 20:33 nwc10 joined, xiaomiao joined, cxreg joined, ashleydev joined
dalek arVM/moar-jit: be6071c | (Bart Wiegmans)++ | src/ (4 files):
Try JIT compilation in candidate specialization.

This adds a JITCode field to MVMSpeshCandidate, and if applicable compiles the jit graph. The jitcode_size field is for when we want to remove the code.
20:37
20:58 lizmat_ joined 21:00 woolfy joined
dalek arVM/inline: ba6b4e3 | jnthn++ | src/spesh/ (2 files):
Don't put sp_log/guard in same BB as its invoke.

This means that the inovke would fail to be the last thing in the BB, but worse meant that we'd end up not checking the return guard when doing an inline.
21:03
arVM/inline: 3472f4d | jnthn++ | src/spesh/graph.c:
Make sure spesh slots are conveyed from inlinee.
21:03 brrt joined
jnthn Sadly, doesn't fix the issue, but would run into it soon enough anyway, so still worth fixing.
brrt what is 'the issue' here? :-) 21:04
21:06 donaldh joined
jnthn brrt: There's some case where we inline a thing that also carried inlines 21:07
brrt: And end up messing up the graph. 21:08
brrt oh, hasty
nasty
if you have brainspace left, i've been meaning to ask 21:09
where do we enter the jit code? 21:10
(i know i asked this before)
wait
nm... i didn't ask it again, it just seems that way 21:11
:-)
jnthn Well, in the end by exeuting a magical op like 21:12
MVM_OP_jit_enter: 21:13
tc->cur_frame->spesh_cand->jitcode(tc, tc->cur_frame);
goto NEXT;
And invoke just points the interp there instead of at the specialized bytecode.
brrt yes, and i just recalled how that should work :-) so that is why it was a dumb question 21:14
jnthn :)
I was thinking...I thought we did that ;)
brrt actually, the 'invoke points the interp there' was the part that was missing from my thought process
jnthn ah
brrt so thanks for filling that up :-)
jnthn See MVM_frame_invoke
brrt possibly i could create a static bytecode segment that simply runs MVM_OP_jit_invoke 21:15
jnthn Yeah, that'd do it
When you add jit_invoke in oplist, make sure to mark it .s 21:16
And put it at the end of the file
That way it can't appear in normal code
(.s means "spesh op")
(and after editing oplist, perl6 tools/update_ops.p6) 21:17
brrt thinks we should add that to the makefile
jnthn I didn't because I never know quite which perl6 I'm gonna need to run it with.
brrt fair enough :-) 21:18
jnthn I've had situations where my install/bin/moar was busted enough I had to use perl6-p instead :)
(yes, I normally use bleeding-edge Moar to re-generate the Moar ops stuff :P)
brrt and why not? dogfood is nutricious
jnthn and tasty...wait... :P 21:19
brrt :-) 21:20
jnthn goes for a stroll before it gets completely dark
timotimo it's already completely dark here :| 21:21
jnthn maybe I'll magically realize what's bust during it :)
timotimo that would be nice :)
brrt wishes a good stroll
(ugh, i should teach emacs not to use tabs in whitespace) 21:22
lizmat yes, please, no TABs :-) 21:29
timotimo put it on my tab
21:39 woolfy1 joined
brrt will have to check out the many cases of tabs emacs has inserted by now 21:39
dalek arVM/moar-jit: e46bde6 | (Bart Wiegmans)++ | / (10 files):
Add magic bytecode to invoke the JIT compiled function.

During specialization, if we can compile the jit graph into a function, we now insert special a special opecode that will actually invoke that function.
21:46
brrt that's it for today
brrt off
21:47 brrt left
jnthn back 21:56
I'm guessing it could be dead instruction cleanup gone wild... 22:00
oh...the sucessors aren't updated correctly. 22:07
22:18 lizmat joined
lizmat Mouq: hmmm... "use" is compile time: does it make sense to assume a directory will appear out of thin air during runtime? 22:19
and if so, wouldn't that be a serious potential security hole
jnthn lizmat: mischan? 22:20
lizmat mischan ? 22:21
oops, yes 22:22
timotimo jnthn: does that indirectly cause the goto to have code after it still? 22:31
jnthn timotimo: Already fixed that kind of bug earlier :) 22:35
timotimo ah ok 22:43
ah, i see that now
the jit stuff can now call into a "jit compiled function", but no compilation is done so far, right? 22:45
jnthn didn't read the patches yet.. 22:48
Ah...seems the reason it was still busted after I fixed the succ/pred setting was that there was a leftover hack... 22:52
yup. 22:56
dalek arVM/inline: 40042be | jnthn++ | src/spesh/inline.c:
Properly fix up BB succ/pred.

Also remove a hack that kept things falsely alive before.
23:01
jnthn Some more NQP tests passing again now. 23:08
Still a good bunch of failures to deal with. 23:09
Time for some rest...'night 23:28
timotimo gnite jnthn :) 23:29