timotimo | the word "macro" seems like a good fit here | 00:14 | |
02:03
jnap joined
05:41
brrt joined
|
|||
brrt | \o #moarvm | 05:41 | |
www.freelists.org/post/luajit/runti...n-dynasm,5 may be of interest | |||
.tell jnthn that i can see how macro's would work, and that for x64 it would mean a matter of preprocessing | 05:42 | ||
also, i've tested it out, and for the first few registers - rax up to rsp - dynamic register access actually works, but it brings us right back into register starvation :-) | 05:43 | ||
nwc10 | jnthn: not broken. | 06:05 | |
brrt | which would be enough to prove the point, though | 06:38 | |
hmm | |||
06:49
FROGGS joined
07:08
zakharyas joined
|
|||
nebuchadnezzar | nwc10: thanks, proposing patches was my first intention, I was confident that #moarvm will be happy with them ;-) | 08:23 | |
09:07
brrt joined
|
|||
jnthn | urgh | 09:13 | |
jnthn slept awfully... | |||
brrt | how come :-) | 09:23 | |
morning, though | |||
brrt is wondering if it would be more difficult to write an x86-64 assembler or to patch up dynasm, but suspsects the former | 09:30 | ||
nwc10 | code is a liability | 09:31 | |
if it's not a core differentiator, it's a liability, not an asset | |||
ie, if you can get patching back upstream, that's best | 09:32 | ||
brrt | i agree | 09:33 | |
jnthn | brrt: How bad is the preprocessing route, and to what degree can it wait and we do the naive thing first? | 09:41 | |
brrt | the preprocessing route is bad | 09:42 | |
jnthn suspected so | |||
brrt | to give you an idea, there are 16 general-purpose 64-bit size registers, and 3 addressing mode of interest, so i can have the combination of 16x16 registers x 3 addressing modes, per opcode | 09:43 | |
there are 8 or so floating point registers, and much the same story applies to them | |||
jnthn | On "will never lead to much better performance than what the interpreter can do" - it would still be quite a lot better due to (1) not needing to do any instruction decoding, (2) not needing to do the switch thing, (3) do way better on cache and branch prediction. | 09:44 | |
brrt | that is fair | ||
jnthn | It depends on overhead. | ||
For ops like set, add, etc. the decoding overhead dominates the operation cost. | |||
brrt | uhuh | 09:45 | |
jnthn | And since spesh turns many attribute accesses into cheap pointer operations, those are dominated too | ||
Same with sp_getarg_* | |||
And arg_* | |||
brrt | well, for compiling instruction-per-instruction, i don't need a jit tree | ||
the naive route is admittedly tremendously simpler | 09:46 | ||
i.e. it /could/ be feasable to take interp.c, have gcc compile it to assembler, do some transformations to reduce the indirection caused by operand decoding, and stitch it together in a dasc file | 09:47 | ||
jnthn | Well, you don't *need* one, but (a) having one - even a basic one - will provide a way to move beyond that approach in the future, and (b) the "this ops is just this C function call" decision has to be made once and that knowledge can be re-used over different CPUs. | 09:48 | |
brrt | hmm, yes, thats true | 09:51 | |
in case it wasn't apparant, my idea in the jit tree was to take all indirections out into the open and use tree rewriting to write the code | 09:52 | ||
i can suppose it wasn't, it was an unclear post i think | |||
jnthn | Yeah, I can see that may be a step too far without the dynamic register stuff... | 09:54 | |
brrt | to rephrase what i really meant, is: making indirections visible will help nothing in code generation if i can't select registers dynamicly | 09:55 | |
that, yes | |||
:-) | |||
ultimately the optimum would be patching dynasm | 09:56 | ||
the 'near peak' is static register allocation and perhaps stuff like operation merging (peephole-style transformations) | 09:57 | ||
i ... think we all agree we should go for the near peak first | |||
FROGGS | Please send general questions to the LuaJIT mailing list. You can also send any questions you have directly to me: | ||
E-Mail: [email@hidden.address] | |||
brrt: I suppose a small chat with Michael Pall make sense here | 09:58 | ||
jnthn | brrt: Yes, I think so. | ||
brrt | i suppose so too FROGGS :-) | ||
jnthn | Well, yeah, his estimate on the amount of work needed to patch dynasm to support dynamic registers may be worth having | 09:59 | |
Like, is it an O(day), O(week), etc task. | |||
brrt | ok, i'll mail him for that :-) | ||
jnthn | But yeah, that aside, near peak working would still be a very meaningful improvement. | 10:04 | |
brrt | sent, i'll see what i hear | 10:09 | |
brrt &lunch | |||
10:34
brrt joined
|
|||
brrt | is having tea at 27 degrees c mad? | 10:41 | |
jnthn | Yes. Tea should either be ice-tea and cold, or not ice tea and thus much hotter. | ||
Oh wait, you didn't mean the temperature of the tea... | 10:42 | ||
I'd say it's mad but I'm sipping coffee in similar temperatures here in Sweden :) | |||
brrt | :-) i've had coffee to, but i've run out of it | 10:43 | |
jnthn | Which also seemed like a good idea at the time... :) | ||
brrt | why so hot? and to think i have to get out, too | 10:44 | |
jnthn | It's even hot in Sweden, dammit. | 10:47 | |
Well, this bit of Sweden. | |||
Clearly I need to find a further north place to move ot. | |||
*to | |||
nwc10 | jnthn: you wouldn't like Vienna | 10:52 | |
jnthn | nwc10: I used to live an hour away from there, and am well aware it can touch 35C or so in those parts in the summer. | 10:53 | |
nwc10 | I suspect that the swimming is better here. Lots of bonus bits of Danube | 10:54 | |
jnthn | True :) | ||
My bit had the cheaper beer, though. :) | 10:59 | ||
brrt | wow, vienna is 33 degrees | ||
jnthn | Which is another cooling strategy :) | ||
brrt | (according to the google) | ||
jnthn | Prague, another city I sometimes feel a vague temptation to move to, is hitting 32 today also. | 11:01 | |
I guess if I want to do central Europe again I'll just have to learn to cope with heat. Or find a place with aircon :P | |||
brrt | or flee the summer | ||
jnthn | Or that, yes. | ||
Summer home in Svalbard or something. :) | |||
nwc10 | if Scotland votes for independance, will it qualify for your grand tour of S? | 11:02 | |
brrt | nice | ||
jnthn | .oO( Summer in a place you can bear ) |
||
nwc10: hah, I'd not thought of that. But yes. :P | |||
brrt wonders what is up with all those regions wanting to be states | 11:03 | ||
moritz | well, Scotland has a long and bloody history of that | 11:06 | |
when you consider that, the proceedings are the most civilized of all the recent ones | |||
brrt | that is true | 11:12 | |
moritz | also my impression of Scotland (from 2006/2007) was that they love to hate the English, but that they love other stuff even more, like the English health care system, the common currency, and similar stuff | 11:14 | |
FROGGS | brrt: can you join #luajit please? | 11:18 | |
brrt | yes, of course | ||
brrt wonders why he hadn't thought of that | |||
helpful discussion at #luajit indeed | 12:12 | ||
they suggest using llvm as a codegen | |||
not sure if that helps all that much | |||
nwc10 | it gets you a lot more platforms | 12:13 | |
brrt | that is true | ||
nwc10 | it costs you, um, a need for C++ | ||
brrt | among other things :-) | 12:14 | |
nwc10 | and possibly more resources in other places | ||
and yes, I'm not sure | |||
brrt | people typically use a special thread just for llvm | ||
nwc10 | but all the cool kids are using llvm currently, so you get a lot of other people's work for free | ||
brrt | at the cost of magic | 12:15 | |
(yes, my mood has a cynical turn today, i'm sorry, i'll be more optimistic some other day) | |||
(when it is cooler, probably) | |||
nwc10 | or when it's a public holiday again? :-) | ||
brrt | well, i don't really care for public holidays right now as i'm totally freeeeee :-) | 12:16 | |
for the summer | |||
anyway, &appointment | 12:18 | ||
12:21
brrt left
12:23
jnap joined
|
|||
lizmat | .oO( totally freeee != free from appointments ) |
12:39 | |
jnthn | .oO( or writing a JIT compiler :P ) |
||
I really don't think dragging in llvm is the way to go. Heck, it's huge... And I really don't want to have to deal with C++. | 12:40 | ||
I'm not quite sure why "build the best imaginable JIT" is becoming a goal. We don't have one *at all* yet. Dealing with all the places in the VM that aren't JIT-ready is going to be tough enough. | 12:42 | ||
Getting deopt+JIT figure out for a simple one will be hard enough too. | |||
And code-gen is just one small part of the overall performance story. | 12:43 | ||
More of the kinds of things spesh does, plus inlining, OSR, and escape analysis are also huge factors. | |||
What most VMs call their JIT is actually what we're calling spesh+JIT, which obscures things a bit. Eliminating the interpreter overhead *in combination* with the other things spesh does/will do (noting many of the transforms it does are turning costly operations into cheap ones that suffer interp overhead more) will help a good bit already. | 12:46 | ||
nwc10 | jnthn: meaning that LLVM is the wrong choice in the near term, because there's a lot more win to be had from a KISS approach to the actual codegen part, because that's not actually where the hard stuff is? | 12:57 | |
and maybe the wrong choice for the foreseeable future, not just the near term | |||
jnthn | nwc10: Well, I'm a lot more confident we'll get something viable and helpful (as in, actually lets people run Perl 6 programs faster than they can today) with such an approach to the code-gen itself, yes. | 13:08 | |
nwc10: For the longer term: harder to say. But I don't think we can work out the right longer-term options without doing the simple things first. | 13:09 | ||
nwc10 | that was what I'd sort of figured, but didn't say explicitly | 13:11 | |
jnthn | Code-gen can be a big and complex area, but trying to do that in a highly clever way *and* having to tackle all the integration points between code-gen and the rest of the VM feels like a lot to bite off in a summer. | 13:12 | |
nwc10 | that figuring out how to use llvm well needs the experience of starting with something simpler. | ||
jnthn | (As in, more than I think is reasonably possible.) | ||
And I'm quite sure our downstream users would much prefer a sufficiently complete and useful JIT today that gives a 2x-3x improvement by eliminating interpreter overhead, than the a half-complete but will-be-super-awesome-someday JIT. | 13:13 | ||
The additional factors will come from other opts, like inlining, and escape analysis, and more specialization, which turn costly operations into cheap ones. | 13:15 | ||
(like, turning "invoke this method, creating it a callframe" into "goto" 'cus we've now gone and inlined that method) | 13:16 | ||
13:16
donaldh joined
13:17
donaldh left
13:39
ggoebel111116 joined
13:55
ggoebel111116 joined
14:56
brrt joined
|
|||
brrt | lizmat: doctors appointments trump everything, i'm afraid :-) | 14:56 | |
jnthn, what happens if i use the mvm jit graph construction to do the following: linearize the bytecode stream, assign labels, convert some ops into c calls | 15:06 | ||
anyway, i have a more important issue to solve first | 15:07 | ||
'where to insert the jit code and have it called' | 15:08 | ||
and fwiw, i agree on llvm being far too big | 15:09 | ||
jnthn | brrt: (jit graph) that sounds fairly sane as a starting point. Can desugar some ops a bit, I'd guess. | 15:12 | |
brrt: On "where to put it to have it called" - yes, that's interesting indeed. I wrote up a gist on some ideas on that a while back :) | 15:13 | ||
brrt | if you have it, i'd like to read it | 15:14 | |
jnthn | Oh noes, you losted it :P | ||
jnthn goes to find it | |||
gist.github.com/jnthn/c1b88756121f0525ff28 | 15:15 | ||
brrt | the secret jit op, that solution, i recall | 15:16 | |
jnthn | yeah | ||
And hang the generated code off MVMSpeshCandidate | |||
brrt | not on the frame? | 15:18 | |
jnthn | Oh, two levels hee | 15:19 | |
*here | |||
Well, actually | |||
Yeah, off the candidate would do | |||
Why? Because cur_frame->spesh_cand->jitted will doit | |||
brrt | :-) ok | 15:22 | |
so a whole frame has to be compiled right? | |||
i think you need a whole frame to form a graph | |||
jnthn | Yes | 15:24 | |
1 spesh graph = 1 frame | |||
Well, apart from I'm busy ruining that in a branch called inline | |||
But even so, the difference doesn't matter to you even then - except deopt. :) | |||
brrt | after inline, it will reform to a single frame, so i don't care | ||
oh that is true | |||
jnthn | Well, in deopt we call back into the interpreter | 15:25 | |
Oh, I worked out how you can do it really easily :) | |||
You'll just call into the deopt code path and explicitly pass it the deopt-point index and then fall out of the JITted code back into the interpreter, and deopt will ahve stuck it in the right place. :) | |||
And the deopt point index hangs off a spesh annotation. | 15:26 | ||
So it's readily available. | |||
brrt | i think things are starting to come together | ||
because i've also figured out pretty much how to emit a 'call-to-interpreter-and-return-here' scheme | 15:27 | ||
basically, emit a c call to MVM_frame_invoke() or something like that | |||
that sets up the interpreter | 15:28 | ||
also emit a label | |||
return the reference to that label | |||
and on the next entry, pass the same label again | |||
we might not return it but rather store it in a frame variable / work register | |||
that might even be better | 15:29 | ||
jnthn | Yeah, that sounds along the right lines. | ||
I think that what we actually want is a MVM_frame_invoke_from_jit() | |||
brrt | thats quite alright to me to | 15:30 | |
that is indeed probably better | |||
jnthn | Where if you're calling to another thing that happens to be JITTed, it returns you an address of the JITted code to jump to | ||
brrt | because you can have that function receive the label | ||
jnthn | And if it returns NULL you know you have to fall back to the interpreter. | ||
It *does* mean you'll be essentially doing a tail call | |||
Return needs the same thing, fwiw. Apart from in the return to JITted code from JITted code case it's just jumping to the label, like you said. | 15:31 | ||
And if it returns NULL then again, you know to fall back into the interpreter. | |||
brrt | uhuh | ||
jnthn | Which leaves the forth quadrant - interpreted code returning to JITted code - which just gets pointed to some (maybe different) enter JIT op. | 15:32 | |
brrt | yes | ||
we might need to add a flag to the frame, to tell the interpreter that it is running jitted code | |||
jnthn | We might, though we can also infer it from f->spesh_cand->jitcode not being NULL, maybe, and then just hide it behind a function. | 15:33 | |
brrt | that works, too | 15:34 | |
ok, i'm going to write some stuff down now | 15:35 | ||
brrt feels like this is the most productive discussion he has had all day | |||
jnthn | :) | 15:44 | |
Aye, well, it seems things are less blocked now :) | |||
brrt | yes, that is how it feels to me too | 15:59 | |
some branches of the tree of possibilities have been culled | |||
dalek | arVM/moar-jit: 9aefa41 | (Bart Wiegmans)++ | src/ (5 files): Used macros and type maps to clean up jit codegen. Minor change, but I've also moved MVMJitCode to types.h so that MVMSpeshCandidate can now refer to it (doesn't yet). |
16:10 | |
16:15
FROGGS joined
16:36
jnap joined
16:38
jnap1 joined
16:55
vendethiel joined
|
|||
dalek | arVM/inline: e738af3 | jnthn++ | src/spesh/deopt.c: Implement multi-frame uninlining. |
17:15 | |
arVM/inline: 478b111 | jnthn++ | src/spesh/deopt.c: Tweak deopt debugging help. Could do something more formal that this, like write to the spesh log, or start keeping a deopt log. |
17:16 | ||
17:16
brrt joined
17:18
brrt left
|
|||
jnthn | With that, first 2 NQP tests are passing again :) | 17:32 | |
Seems the next issue is missing spesh slot values..hm. | 17:44 | ||
nwc10 | jnthn: just to check - is your fridge big enough for all the beer you deserve after you get this working? | 17:46 | |
TimToady | fortunately, deserved beer can be lazily evaluated much of the time | 17:47 | |
jnthn | :) | 17:54 | |
Well, the fridge is big enough to hold the www.ratebeer.com/beer/great-divide-...out/85174/ that should be a sufficient reward. :) | 17:55 | ||
nwc10 | jnthn: origin/inline && master && nom not broken. | 18:06 | |
jnthn | Darn, when I fix the speshslot bug the first 2 NQP tests hang. wat. | 18:13 | |
BB 6: | 18:17 | ||
Instructions: | |||
...blah... | |||
goto BB(6) | |||
...that may explain it. wtf :) | |||
Well, dinner first, but it's a case where we inlined 3 things, but one of those things already had something inlined into it. | 18:20 | ||
FROGGS | ewww | 18:21 | |
jnthn | Well, it's an *awesome* thing to be doing. I just checked into it. | 18:22 | |
Will debug why it's going wrong after food, but just to share what it's trying to do here, this is append: | |||
method append(MAST::InstructionList $other) { | |||
push_ilist(@!instructions, $other); | |||
$!result_reg := $other.result_reg; | 18:23 | ||
$!result_kind := $other.result_kind; | |||
} | |||
It figures it can inline all of result_reg, result_kind, and push_ilist | |||
The latter two are trivial 'cus they just read attributes | |||
push_ilist is as follows: | |||
sub push_ilist(@dest, $src) is export { nqp::splice(@dest, $src.instructions, +@dest, 0); | |||
} | |||
Which is tiny, so again it makes sense to inline. But notice it calls .instructions - another cheap accessor. | |||
FROGGS | nice | 18:24 | |
jnthn | And earlioer in the log we find: | ||
Can inline instructions (cuid_31_1402147152.67041) into push_ilist (cuid_57_1402147152.67041) | |||
So basically it flattens away 4 call frame creations once it's doing it right. Which is very cool. | 18:25 | ||
Now I just need to wrok out why the push_ilist nested inline comes out looking utterly broken. | |||
But I'm hungry. So, after noms :) | |||
FROGGS | nom well :o) | 18:26 | |
nwc10 | jnthn: nice. | 18:37 | |
jnthn | [Annotation: INS Deopt All (idx 1 -> pc 52)] | 20:06 | |
goto BB(3) | |||
[Annotation: INS Deopt One (idx 0 -> pc 52)] | |||
sp_guardconc r2(2), <nyi(lit)> | |||
Successors: 2 | |||
oh dear. :) | |||
FROGGS | <nyi(lit)>? | 20:07 | |
jnthn | no | ||
That the goto has something after it | |||
Oh, hmm...but that maybe isn't actually it. | |||
oh, yeah, it's wrong. | |||
Not sure if it's *the* issue, but it's pretty clear the goto should end the basic block :) | 20:08 | ||
20:11
woolfy1 joined
20:13
lue joined
20:29
oetiker joined
20:33
nwc10 joined,
xiaomiao joined,
cxreg joined,
ashleydev joined
|
|||
dalek | arVM/moar-jit: be6071c | (Bart Wiegmans)++ | src/ (4 files): Try JIT compilation in candidate specialization. This adds a JITCode field to MVMSpeshCandidate, and if applicable compiles the jit graph. The jitcode_size field is for when we want to remove the code. |
20:37 | |
20:58
lizmat_ joined
21:00
woolfy joined
|
|||
dalek | arVM/inline: ba6b4e3 | jnthn++ | src/spesh/ (2 files): Don't put sp_log/guard in same BB as its invoke. This means that the inovke would fail to be the last thing in the BB, but worse meant that we'd end up not checking the return guard when doing an inline. |
21:03 | |
arVM/inline: 3472f4d | jnthn++ | src/spesh/graph.c: Make sure spesh slots are conveyed from inlinee. |
|||
21:03
brrt joined
|
|||
jnthn | Sadly, doesn't fix the issue, but would run into it soon enough anyway, so still worth fixing. | ||
brrt | what is 'the issue' here? :-) | 21:04 | |
21:06
donaldh joined
|
|||
jnthn | brrt: There's some case where we inline a thing that also carried inlines | 21:07 | |
brrt: And end up messing up the graph. | 21:08 | ||
brrt | oh, hasty | ||
nasty | |||
if you have brainspace left, i've been meaning to ask | 21:09 | ||
where do we enter the jit code? | 21:10 | ||
(i know i asked this before) | |||
wait | |||
nm... i didn't ask it again, it just seems that way | 21:11 | ||
:-) | |||
jnthn | Well, in the end by exeuting a magical op like | 21:12 | |
MVM_OP_jit_enter: | 21:13 | ||
tc->cur_frame->spesh_cand->jitcode(tc, tc->cur_frame); | |||
goto NEXT; | |||
And invoke just points the interp there instead of at the specialized bytecode. | |||
brrt | yes, and i just recalled how that should work :-) so that is why it was a dumb question | 21:14 | |
jnthn | :) | ||
I was thinking...I thought we did that ;) | |||
brrt | actually, the 'invoke points the interp there' was the part that was missing from my thought process | ||
jnthn | ah | ||
brrt | so thanks for filling that up :-) | ||
jnthn | See MVM_frame_invoke | ||
brrt | possibly i could create a static bytecode segment that simply runs MVM_OP_jit_invoke | 21:15 | |
jnthn | Yeah, that'd do it | ||
When you add jit_invoke in oplist, make sure to mark it .s | 21:16 | ||
And put it at the end of the file | |||
That way it can't appear in normal code | |||
(.s means "spesh op") | |||
(and after editing oplist, perl6 tools/update_ops.p6) | 21:17 | ||
brrt thinks we should add that to the makefile | |||
jnthn | I didn't because I never know quite which perl6 I'm gonna need to run it with. | ||
brrt | fair enough :-) | 21:18 | |
jnthn | I've had situations where my install/bin/moar was busted enough I had to use perl6-p instead :) | ||
(yes, I normally use bleeding-edge Moar to re-generate the Moar ops stuff :P) | |||
brrt | and why not? dogfood is nutricious | ||
jnthn | and tasty...wait... :P | 21:19 | |
brrt | :-) | 21:20 | |
jnthn goes for a stroll before it gets completely dark | |||
timotimo | it's already completely dark here :| | 21:21 | |
jnthn | maybe I'll magically realize what's bust during it :) | ||
timotimo | that would be nice :) | ||
brrt wishes a good stroll | |||
(ugh, i should teach emacs not to use tabs in whitespace) | 21:22 | ||
lizmat | yes, please, no TABs :-) | 21:29 | |
timotimo | put it on my tab | ||
21:39
woolfy1 joined
|
|||
brrt will have to check out the many cases of tabs emacs has inserted by now | 21:39 | ||
dalek | arVM/moar-jit: e46bde6 | (Bart Wiegmans)++ | / (10 files): Add magic bytecode to invoke the JIT compiled function. During specialization, if we can compile the jit graph into a function, we now insert special a special opecode that will actually invoke that function. |
21:46 | |
brrt | that's it for today | ||
brrt off | |||
21:47
brrt left
|
|||
jnthn back | 21:56 | ||
I'm guessing it could be dead instruction cleanup gone wild... | 22:00 | ||
oh...the sucessors aren't updated correctly. | 22:07 | ||
22:18
lizmat joined
|
|||
lizmat | Mouq: hmmm... "use" is compile time: does it make sense to assume a directory will appear out of thin air during runtime? | 22:19 | |
and if so, wouldn't that be a serious potential security hole | |||
jnthn | lizmat: mischan? | 22:20 | |
lizmat | mischan ? | 22:21 | |
oops, yes | 22:22 | ||
timotimo | jnthn: does that indirectly cause the goto to have code after it still? | 22:31 | |
jnthn | timotimo: Already fixed that kind of bug earlier :) | 22:35 | |
timotimo | ah ok | 22:43 | |
ah, i see that now | |||
the jit stuff can now call into a "jit compiled function", but no compilation is done so far, right? | 22:45 | ||
jnthn didn't read the patches yet.. | 22:48 | ||
Ah...seems the reason it was still busted after I fixed the succ/pred setting was that there was a leftover hack... | 22:52 | ||
yup. | 22:56 | ||
dalek | arVM/inline: 40042be | jnthn++ | src/spesh/inline.c: Properly fix up BB succ/pred. Also remove a hack that kept things falsely alive before. |
23:01 | |
jnthn | Some more NQP tests passing again now. | 23:08 | |
Still a good bunch of failures to deal with. | 23:09 | ||
Time for some rest...'night | 23:28 | ||
timotimo | gnite jnthn :) | 23:29 |