00:47
jnap joined
01:34
FROGGS_ joined
04:33
FROGGS_ joined
05:15
FROGGS_ joined
06:07
oetiker joined,
oetiker_ joined
06:08
_sri joined
06:31
brrt joined
|
|||
brrt | i'm still in dumb-question mode this morning, but | 06:42 | |
an MVMCallSite is made by the caller, and has to be processed by the callee, right? | 06:43 | ||
so the callee has to be 'interpretative' of the MVMCallSite | |||
would it be plausible / possible at all to have the callee supply the MVMCallSite and have the caller build a 'correct' one? | 06:44 | ||
bbi30 | |||
07:08
zakharyas joined
07:59
oetiker joined,
oetiker_ joined
08:07
FROGGS_ joined
|
|||
jnthn | brrt: Callsites are static (and even interned) rather than made, the exception being when we flatten. | 08:13 | |
So the caller has nothing to build. | 08:14 | ||
Also, spesh actually does specialization partly by the shape of the incoming callsite. | 08:15 | ||
So once you're looking at a spesh'd thing, in the common cases today things like checkarity and coercions are gone. | 08:16 | ||
uh, checking if we need coercion, even. | |||
brrt | ok, but they're still a caller-supplied thing, right? | 08:19 | |
i sometimes feel as if moarvm is overwhelmingly large | 08:23 | ||
jnthn | Yes, a call sequence is | 08:24 | |
prepargs <callsite> | 08:25 | ||
arg instructions | |||
invoke <invokee> | |||
brrt | ok | ||
i'll write that down | |||
its /probably/ a good idea to make a single node of that | 08:32 | ||
jnthn | Anyway, the caller doesn't always know what the callee will be. | ||
Whereas callees know what they want. | |||
brrt | yes, that is true | 08:33 | |
hmmm | |||
does spesh spesh on callsite? i.e. when compiling, can we (after a guard) assume the args to have a certain format? | |||
or is that something that is being worked on | 08:34 | ||
08:34
donaldh joined
|
|||
brrt afk for an hour | 08:35 | ||
jnthn | Yes | 08:38 | |
A specialization is always callsite + guards | |||
That's why we intern callsites: to make the "does the callsite match" just a pointer comparison. | |||
09:24
brrt joined
|
|||
brrt | ok, i get it :-) | 09:30 | |
dalek | arVM/moar-jit: d0bff6d | (Bart Wiegmans)++ | / (2 files): Add jit.h headers to installation, so including moar.h works when installed. |
09:34 | |
timotimo | so i'm thinking about strength reduction again | 09:36 | |
doing a whole bunch of iterations of * 8 and then div 8 is only marginally slower than +< 3 and then +> 3 | |||
but * 8 and / 8 is about 10x slower; unsurprisingly, as / gives us rational number semantics | 09:37 | ||
i'm not sure how to make such an optimization work properly, since it requires an integer value to be present, rather than a rat | |||
and strength reduction at the bytecode level (or in spesh) would probably only hit places where we have already-int-typed stuff :\ | 09:38 | ||
at least, if the $i is int-typed, it's a bit faster than without it, but it requires "div" explicitly again | 09:40 | ||
oh, wow, with a native-typed int, +< 3 and +> 3 is more than 2x faster than * 8 and div 8 | 09:45 | ||
jnthn | Doing it with native types is probably a good place to start | 09:47 | |
timotimo | aye, it also seems to be worth it | 09:49 | |
would you prefer i do it in spesh or in optimizer? | |||
i wonder how often we have a known int value in spesh that we don't have in the optimizer | 09:51 | ||
brrt lunching | 10:09 | ||
jnthn | timotimo: Not sure, really... | 10:10 | |
timotimo: Putting it in spesh means we may well get more out of it because other things may reduce to things we can stregth-reduce... | 10:11 | ||
brrt | how is the order of optimizations determined, in spesh? | 10:18 | |
jnthn | There's essentially 3 passes: fact addition/propagation, do the type-driven optimizations, do dead instruction elimination. So, just 3 passes at the moment. | 10:20 | |
Oh, and a 4th before all of that which is args-based. | 10:22 | ||
(The one that looks at the callsite) | |||
That's the piece I was patching yesterday. | |||
brrt | ok | 10:30 | |
good for me to know :-) | |||
10:32
donaldh joined
|
|||
brrt afk | 10:35 | ||
dalek | arVM: cc338be | jnthn++ | src/io/asyncsocket.c: Implement cancelling listening on a socket. |
10:51 | |
arVM: 3a69c1b | jnthn++ | / (3 files): Correct asyncreadbytes op signature. |
11:11 | ||
11:15
brrt joined
|
|||
brrt | i'm wondering how low the level of a jit graph should be | 11:43 | |
lower is better | |||
but lower is also more work / memory | |||
jnthn, timotimo, i'd like some advice on what should be in the jit graph and how it differs from the spesh graph | 11:44 | ||
i /think/ i'd like to abstract operands to values, but i'm not sure if that is a move 'up' or 'down' as it were | |||
jnthn | brrt: If there's anything I've learned about AST-like things, it's that you tend not to need a huge number of nodes. | 11:45 | |
brrt: In this case, really it's mostly about what kinds of things happen at CPU level. | |||
brrt | ok, but thats much smaller than what happens at the moarvm level | ||
arithmetic, branching, load, store, thats most of it | 11:46 | ||
jnthn | Yup | ||
I think the exception is probably CallCFunction | |||
Which is a bit higher level | |||
brrt | ok so the idea is to have the speshgraph -> jitgraph do the heavy lifting of 'lowering' the code to jittable size | 11:47 | |
jnthn | Yes. The stuff beyond JIT graph is architecture specific knowledge. | ||
If we don't put the heavy lifting in speshgraph -> jitgraph, we'll have to repeat it for other architectures. | 11:48 | ||
So it feels like the right place. | |||
brrt | i agree | ||
dalek | arVM: 6b19b4b | jnthn++ | src/io/asyncsocket.c: Implement async bytes reads from sockets. |
||
brrt | and i'm also thinking of abstracting the architecture-specific stuff into functions called 'jit_arithmetic', 'jit_branch', etc | 11:49 | |
and have a header between the 'high level' jit and the 'architecture' jit | |||
brrt is afk again :-) | 11:51 | ||
12:44
jnap joined
13:26
harrow joined
|
|||
jnthn | Errands done...time for more work :) | 13:36 | |
nwc10 | work's work, or this work? | ||
jnthn | Moar work :) | 13:38 | |
FROGGS_ | +1 from me then :o) | 13:41 | |
13:41
brrt joined
13:48
btyler joined
|
|||
brrt has no idea what i'm doing.... | 15:00 | ||
or rather | |||
i don't have a good story wrt memory access yet | |||
not sure whether its necessary | |||
FROGGS_ | memory... what was that again? | 15:01 | |
jnthn | I forgot. | 15:02 | |
brrt: Can you be more specific? | |||
brrt | basically, i have no way to represent 'i want to access this variable' in the jit graph | ||
15:03
oetiker joined
|
|||
brrt | i want to do clever things with that, but i don't know how to present that yet, because i have no way to declare a 'block' - only single instructions | 15:03 | |
15:04
oetiker_ joined
|
|||
brrt | which is another way of saying 'my jit graph is much too simplistic' | 15:04 | |
and i /want/ simplistic right now | 15:05 | ||
jnthn | I guess you could have an "access this MVM register" node, but won't that typically be a pointer displacement off the register case? | 15:06 | |
*base? | |||
brrt | yes | 15:07 | |
it will be | |||
(for most things) | 15:08 | ||
jnthn | It *may* be worth conveying "it's a register access" lower down, since on less register-starved architectures we can probably keep ->work in a register for fast indexing off, but maybe we can't afford that on x86... | ||
nwc10 | x86_64 overtook x86 | 15:09 | |
brrt | wiki says 64 general-purpose registers | 15:10 | |
arm iirc has 32, of which only 8 are available at any given time | |||
nwc10 | (at least, that was for debian installs, a year or so back) | ||
32 bit ARM has 16 mapped in, of which (if you push it as an assembler programmer) you can get to 14 in use | |||
and certainly the C compiler should be able to find 12 | 15:11 | ||
brrt | but the ******** thing about x86 registers is that they are /all/ to be considered overwritten after a function call | ||
nwc10 | anyway, my point was "does ia32 matter enough to optimise for *it*?" | 15:12 | |
I would have thought that x86_64 and ARM (possibly ARM v8) are the two that matter currently | |||
v8 is the new 64 bit thingy | |||
timotimo | do ia32 get sold at all? | ||
nwc10 | v7 and earlier are 32 bit | ||
brrt | probably, yes, but not for consumer electronics anymore | 15:13 | |
nwc10 | timotimo: I don't know. I'd guess so, for embedded | ||
oh, what he said | |||
timotimo | OK | ||
embedded isn't our target at the moment | |||
as the empty perl6 program still takes ~123 megabytes of ram ;) | |||
brrt | and if it were we'd target arm | ||
nwc10 | I doubt anyone will be deploying Perl 6 code in revenue earning servers on anything other than x86_64 and ARM | ||
brrt | thats nothing | ||
:-p | |||
timotimo | and the setting is still going to grow | ||
brrt | don't you want to run perl6 on your 1998 powerpc imac? :-p | 15:14 | |
nwc10 | there was "optimising" and there was "it runs at all" | ||
jnthn: can you give an arm-wavy number to what you mean by "register starved" ? | 15:15 | ||
in that, I think it may be reasonable to assume that >90% of everyone who matters is no longer register starved | |||
jnthn | nwc10: x86 32-bit | 15:16 | |
nwc10: But agree it matters less and less. | |||
nwc10 | making me think that for now, go with KISS and assume "a sufficiency of registers" | 15:17 | |
(?) | |||
jnthn | works for me. | ||
brrt | oh, 16 registers on amd64 | 15:18 | |
actually assuming an /insufficiency/ of registers is simpler :-) | |||
because it means that every operation is a sequence of load, op, store | 15:19 | ||
whereas having many registers means you have to wonder where they were last defined etc | |||
jnthn | brrt: Note that you need to do that not only due to register starvation, but also because deopt. | ||
brrt | true, but ultimately only when deopting | 15:20 | |
jnthn | brrt: Unless you can prove you're in a no-deopt zone. :) | ||
brrt | i.e. i want to move to the situation where we /only/ store (spill) registers when deopt is in effect | ||
jnthn | Sure. But one of the things we have to be most careful with is that when you fall out of optimzied code, enough is in place that the deopt'd code has what it needs. | ||
This works for deopt_one (you know where they are and it's local), a bit less so far deopt_all. | 15:21 | ||
But those are only really on invokes. | |||
brrt | hmm | ||
i haven't looked at the deopt ops enough yet | 15:22 | ||
i'm off for dinner, be back in an hour (or two) | |||
jnthn | k :) | ||
dalek | arVM/moar-jit: 2a500f4 | (Bart Wiegmans)++ | src/ (5 files): Very simplistic and naive jit graph building. Which is kind of the purpose. |
15:23 | |
arVM: ba1937b | jnthn++ | / (13 files): Resolve "which spesh cand" statically if possible. This means we don't need to check the guards if we have enough to prove they will always be met. |
15:24 | ||
timotimo | oh cool | 15:38 | |
coming close to inlining it seems? | 15:39 | ||
less guards is more better :) | |||
can also mean less deopt points later on, eh? | 15:40 | ||
jnthn | true | ||
[Coke] | brrt: no, I never ever ever want to run perl6 on a powerpc imac. | ||
TimToady | what if you place your phone on the imac? | 15:41 | |
jnthn | timotimo: Well, that's one bit of logic we need for it at least. | 15:45 | |
dalek | arVM: 6b45de6 | jnthn++ | / (6 files): Pick a spesh threshold by bytecode size. This means large things that take more time to spesh will need to do more work to prove they are hot. |
15:58 | |
16:11
FROGGS joined
|
|||
dalek | arVM/inline: 25680a8 | jnthn++ | / (4 files): Empty stub of can-we-inline check. |
16:35 | |
jnthn | Well, here starts the next round of hard things... :) | 16:36 | |
Dinner first :) | |||
17:44
brrt joined
|
|||
brrt | ok, my MVMJitValue type is pretty much wrong | 17:50 | |
i /might/ need to represent all forms of floats and integers but i really need to represent literals, pointers, registers, lexicals first | 17:52 | ||
japhb_ | jnthn, How did you pick those new spesh thresholds? Are they based on measurements, analysis of algorithmic complexity, gut feeling, ... ? | 17:54 | |
brrt is almost out of energy :-( | 18:08 | ||
FROGGS | :/ | 18:12 | |
brrt: take a walk, that helps :o) | |||
jnthn | japhb_: Gut feeling, with a glance at the spesh_log afterwards, and a hope that somebody will jump in to do some measuring/tuning. | 18:22 | |
TimToady | could go as far as to have an 'is hot' to spesh it from the getgo | 18:28 | |
[Coke] | I don't think I'd trust anyone crazy enough to write perl 6 with that power. | 18:29 | |
TimToady | that leaves out most of us :) | ||
jnthn | Well, the thing we'd really like it to go inlining are operators. | 18:30 | |
And accessors | |||
And identity methods | 18:31 | ||
All of which are quite tiny :) | |||
18:37
zakharyas joined
18:54
brrt joined
|
|||
brrt | yes, it does :-) | 18:54 | |
does anyone think it makes a lot of sense to distinguish between lvalues and rvalues? (on the machine level) | 19:02 | ||
except for literals, both lvalues and rvalues can be the same | |||
i.e. register, lexical, pointer | |||
also | |||
... are we always guaranteed to be able to resolve lexicals? | 19:03 | ||
jnthn | Depends which lookup up | ||
timotimo | yes | ||
jnthn | The by name ones are very late bound | ||
timotimo | well, yeah | ||
jnthn | The index ones are not. | ||
And should always work out. | 19:04 | ||
timotimo | but regular ones are done at compile time already | ||
jnthn | brrt: Is your JitValue thing a bit like a SpeshOperand? | ||
brrt | ehm... kind-of | ||
jnthn | brrt: Except caryring type info too? | ||
brrt | i'm not sure about carrying type info, although probably yes | 19:05 | |
(i don't seem to be sure of anything today) | |||
jnthn | Well, I saw a slot in there for what kind of value it was. :) | ||
brrt | true, but ehm... that was only literals, and i've come to the conclusion that literals aren't enough | 19:06 | |
jnthn | JitInstruction with an array of JitOperands (probably a better name) could work out. | ||
In Spesh we don't track the type because we have data on the ops | |||
So we know how to interpret the union. | |||
hoelzro | is there a reason certain operations (ie. symlink) are implemented as operations in bytecode instead of simply functions one can call? | 19:08 | |
brrt | ok. but is the distinction betwee literal / pointer + offset / lexical / register good enough? | ||
calling functions isn't simple? :-p i don't know. for the jit they're functions | |||
jnthn | hoelzro: Because then you need to design a *second* mechanism for dispatching those built-in functions. Why do that? | 19:09 | |
The interp is already perfectly good at dispatching VM-supported operations. | |||
hoelzro | I dunno, I was just curious =) | 19:10 | |
jnthn | :) | ||
brrt: I'd probably stick to "current frame lexicals" as JITting directly and call a function for outer ones for now... | |||
brrt: Though both work, of course. | 19:11 | ||
brrt: I'd say pointer and offset are two operands. | |||
brrt | and.. i'd say they that depends on the architecture | 19:12 | |
jnthn | brrt: And it's the instruction specifying that there's an offset there... | ||
brrt | but fair enough | ||
jnthn | brrt: Well, but the tree is there to translate into any arch | ||
brrt | hmm... | ||
i guess what i'm saying is | |||
a 'register' to the jit is just an address from which to load, relative to a base | 19:13 | ||
similarly, a pointer + offset is just a memory location, hopefully a known memory location | 19:14 | ||
hmmm | |||
wait | |||
and i'd like to have instructions whose addressing translates naturally from moar-level to machine-level | |||
if i made the offset to the pointer another operand, i'd have to 'know' that, it wouldn't be a single thing anymore | 19:15 | ||
on the other hand, any actual use of a fixed pointer and offset would indeed be two operands, as they could vary independently | |||
that doesn't make any sense, what i just said | |||
ok, i'm wrong | 19:16 | ||
:-) | |||
basically if i'm accessing a fixed offset from a fixed pointer, then i'd just need a single pointer | |||
if pointer or offset can vary, then they need to be two operands | 19:17 | ||
jnthn | In the common case the pointer comes from a CPU register, and the offset is a constant... | 19:18 | |
19:19
lue joined
|
|||
TimToady | depends on array vs struct | 19:19 | |
brrt | hmm | ||
jnthn | True, though array access is done inside the REPR funcs at the moment, which we don't yet inline. | 19:25 | |
But yes, later on the array case will matter. | |||
TimToady wants efficient arrays of ints/nums to matter very much :) | 19:27 | ||
jnthn | TimToady: Just looking at it from a "what we'll get most value out of JITting in the next month or two" perspective. | 19:28 | |
So long as we don't box ourself in, we're good :) | |||
The array repr stuff really will want to learn about multi-dim in the end, I think... | |||
19:35
lue joined
|
|||
brrt | i'm just going to steal ideas, how about that | 19:39 | |
jnthn | I think that's called "research" :) | 19:40 | |
dalek | arVM/inline: 40179c7 | jnthn++ | src/spesh/inline.c: Add inline bytecode size check. |
||
arVM/inline: 3c0ffaf | jnthn++ | src/spesh/graph. (2 files): Make spesh graph build more flexible wrt bytecode. This is a step towards being able to graph a specialized form of the bytecode, which we'll need to do to get a graph to merge while doing inlining. |
|||
arVM/inline: 2d9c6ff | jnthn++ | src/spesh/graph. (2 files): Decouple handler in graph build from original too. If building handlers for spesh'd bytecode for inline, we'll need to use the fixed up handler addresses. |
19:55 | ||
arVM/inline: bd4d7a3 | jnthn++ | src/spesh/ (3 files): Build inline graph for specialized bytecode. |
|||
[Coke] | test | 20:01 | |
%9test | |||
whoops, wrong window. | |||
japhb | Though if your test was to determine if you had network access to the IRC server, then you passed. :-) | 20:02 | |
brrt | wiki.luajit.org/SSA-IR-2.0 is interesting | 20:10 | |
also | |||
they use only a single ir | |||
dalek | arVM/inline: 6df0e6e | jnthn++ | src/core/oplist: Mark some ops a not suitable to inline. |
20:11 | |
arVM/inline: e63acc4 | jnthn++ | tools/update_ops.p6: Fix oplist parser bug. An op with no operands but an adverb was mis-parsed. |
|||
arVM/inline: 4c53a61 | jnthn++ | src/core/ (2 files): Re-generate ops.c, carrying no-inline data. |
|||
jnthn | bbi15 | 20:20 | |
brrt | luajit uses seemingly recursive c preprocessor macro's | 20:35 | |
wow | |||
jnthn back | 20:43 | ||
brrt | hi :-) | 20:45 | |
jnthn figures he'll nudge preparations for inlining a little further along. :) | 20:46 | ||
brrt wonders how you have so much energy | 20:48 | ||
lizmat | I think it's called "being in the flow" | 20:49 | |
brrt | well then... i'd like to have tha tmore | 20:50 | |
anyway.. luajit never has pointers, so doesn't have my issue | 20:53 | ||
although its perfectly plausible just to copy their design | |||
(the IR doesn't have pointers, that is) | |||
i'm starting to see why thats clever | 20:54 | ||
jnthn | :) | 20:56 | |
jnthn wonders if it can be replicated | |||
brrt | hmm... probably not | 20:57 | |
or | |||
hmmm | |||
some aspects of it can be replicated, i'm sure | 20:58 | ||
i'm going to sleep again | 21:00 | ||
see you tomorrow | |||
21:01
brrt left
|
|||
jnthn | 'night o/ | 21:03 | |
dalek | arVM/inline: c0813c1 | jnthn++ | src/spesh/inline.c: Refuse inline if non-inlinable ops encountered. |
21:10 | |
arVM/inline: d502a23 | jnthn++ | src/core/op (2 files): For now, capturelex and takeclosure are :noinline. May be able to relax this later. |
|||
lizmat | gnight jnthn | 21:14 | |
jnthn | lizmat: oh, was saying night to brrt :) | ||
lizmat | ah, ok :-) | 21:15 | |
dalek | arVM/inline: 5e8eb90 | jnthn++ | src/ (3 files): Things with free lexicals can't be inlined. May be able to relax this later with more careful analysis of outer relationships; for now keep things simple. |
21:16 | |
arVM/inline: aeb1f12 | jnthn++ | src/spesh/optimize.c: Try building inline graph; don't use it yet. |
21:35 | ||
jnthn | Well, there's the easy initial analysis bits done. :) | 21:41 | |
Need to work on the graph merge next. That'll be trickier. :) | 21:42 | ||
(So I'll save it for a new day. :)) | 21:47 | ||
23:30
cognominal joined
|