00:47 jnap joined 01:34 FROGGS_ joined 04:33 FROGGS_ joined 05:15 FROGGS_ joined 06:07 oetiker joined, oetiker_ joined 06:08 _sri joined 06:31 brrt joined
brrt i'm still in dumb-question mode this morning, but 06:42
an MVMCallSite is made by the caller, and has to be processed by the callee, right? 06:43
so the callee has to be 'interpretative' of the MVMCallSite
would it be plausible / possible at all to have the callee supply the MVMCallSite and have the caller build a 'correct' one? 06:44
bbi30
07:08 zakharyas joined 07:59 oetiker joined, oetiker_ joined 08:07 FROGGS_ joined
jnthn brrt: Callsites are static (and even interned) rather than made, the exception being when we flatten. 08:13
So the caller has nothing to build. 08:14
Also, spesh actually does specialization partly by the shape of the incoming callsite. 08:15
So once you're looking at a spesh'd thing, in the common cases today things like checkarity and coercions are gone. 08:16
uh, checking if we need coercion, even.
brrt ok, but they're still a caller-supplied thing, right? 08:19
i sometimes feel as if moarvm is overwhelmingly large 08:23
jnthn Yes, a call sequence is 08:24
prepargs <callsite> 08:25
arg instructions
invoke <invokee>
brrt ok
i'll write that down
its /probably/ a good idea to make a single node of that 08:32
jnthn Anyway, the caller doesn't always know what the callee will be.
Whereas callees know what they want.
brrt yes, that is true 08:33
hmmm
does spesh spesh on callsite? i.e. when compiling, can we (after a guard) assume the args to have a certain format?
or is that something that is being worked on 08:34
08:34 donaldh joined
brrt afk for an hour 08:35
jnthn Yes 08:38
A specialization is always callsite + guards
That's why we intern callsites: to make the "does the callsite match" just a pointer comparison.
09:24 brrt joined
brrt ok, i get it :-) 09:30
dalek arVM/moar-jit: d0bff6d | (Bart Wiegmans)++ | / (2 files):
Add jit.h headers to installation, so including moar.h works when installed.
09:34
timotimo so i'm thinking about strength reduction again 09:36
doing a whole bunch of iterations of * 8 and then div 8 is only marginally slower than +< 3 and then +> 3
but * 8 and / 8 is about 10x slower; unsurprisingly, as / gives us rational number semantics 09:37
i'm not sure how to make such an optimization work properly, since it requires an integer value to be present, rather than a rat
and strength reduction at the bytecode level (or in spesh) would probably only hit places where we have already-int-typed stuff :\ 09:38
at least, if the $i is int-typed, it's a bit faster than without it, but it requires "div" explicitly again 09:40
oh, wow, with a native-typed int, +< 3 and +> 3 is more than 2x faster than * 8 and div 8 09:45
jnthn Doing it with native types is probably a good place to start 09:47
timotimo aye, it also seems to be worth it 09:49
would you prefer i do it in spesh or in optimizer?
i wonder how often we have a known int value in spesh that we don't have in the optimizer 09:51
brrt lunching 10:09
jnthn timotimo: Not sure, really... 10:10
timotimo: Putting it in spesh means we may well get more out of it because other things may reduce to things we can stregth-reduce... 10:11
brrt how is the order of optimizations determined, in spesh? 10:18
jnthn There's essentially 3 passes: fact addition/propagation, do the type-driven optimizations, do dead instruction elimination. So, just 3 passes at the moment. 10:20
Oh, and a 4th before all of that which is args-based. 10:22
(The one that looks at the callsite)
That's the piece I was patching yesterday.
brrt ok 10:30
good for me to know :-)
10:32 donaldh joined
brrt afk 10:35
dalek arVM: cc338be | jnthn++ | src/io/asyncsocket.c:
Implement cancelling listening on a socket.
10:51
arVM: 3a69c1b | jnthn++ | / (3 files):
Correct asyncreadbytes op signature.
11:11
11:15 brrt joined
brrt i'm wondering how low the level of a jit graph should be 11:43
lower is better
but lower is also more work / memory
jnthn, timotimo, i'd like some advice on what should be in the jit graph and how it differs from the spesh graph 11:44
i /think/ i'd like to abstract operands to values, but i'm not sure if that is a move 'up' or 'down' as it were
jnthn brrt: If there's anything I've learned about AST-like things, it's that you tend not to need a huge number of nodes. 11:45
brrt: In this case, really it's mostly about what kinds of things happen at CPU level.
brrt ok, but thats much smaller than what happens at the moarvm level
arithmetic, branching, load, store, thats most of it 11:46
jnthn Yup
I think the exception is probably CallCFunction
Which is a bit higher level
brrt ok so the idea is to have the speshgraph -> jitgraph do the heavy lifting of 'lowering' the code to jittable size 11:47
jnthn Yes. The stuff beyond JIT graph is architecture specific knowledge.
If we don't put the heavy lifting in speshgraph -> jitgraph, we'll have to repeat it for other architectures. 11:48
So it feels like the right place.
brrt i agree
dalek arVM: 6b19b4b | jnthn++ | src/io/asyncsocket.c:
Implement async bytes reads from sockets.
brrt and i'm also thinking of abstracting the architecture-specific stuff into functions called 'jit_arithmetic', 'jit_branch', etc 11:49
and have a header between the 'high level' jit and the 'architecture' jit
brrt is afk again :-) 11:51
12:44 jnap joined 13:26 harrow joined
jnthn Errands done...time for more work :) 13:36
nwc10 work's work, or this work?
jnthn Moar work :) 13:38
FROGGS_ +1 from me then :o) 13:41
13:41 brrt joined 13:48 btyler joined
brrt has no idea what i'm doing.... 15:00
or rather
i don't have a good story wrt memory access yet
not sure whether its necessary
FROGGS_ memory... what was that again? 15:01
jnthn I forgot. 15:02
brrt: Can you be more specific?
brrt basically, i have no way to represent 'i want to access this variable' in the jit graph
15:03 oetiker joined
brrt i want to do clever things with that, but i don't know how to present that yet, because i have no way to declare a 'block' - only single instructions 15:03
15:04 oetiker_ joined
brrt which is another way of saying 'my jit graph is much too simplistic' 15:04
and i /want/ simplistic right now 15:05
jnthn I guess you could have an "access this MVM register" node, but won't that typically be a pointer displacement off the register case? 15:06
*base?
brrt yes 15:07
it will be
(for most things) 15:08
jnthn It *may* be worth conveying "it's a register access" lower down, since on less register-starved architectures we can probably keep ->work in a register for fast indexing off, but maybe we can't afford that on x86...
nwc10 x86_64 overtook x86 15:09
brrt wiki says 64 general-purpose registers 15:10
arm iirc has 32, of which only 8 are available at any given time
nwc10 (at least, that was for debian installs, a year or so back)
32 bit ARM has 16 mapped in, of which (if you push it as an assembler programmer) you can get to 14 in use
and certainly the C compiler should be able to find 12 15:11
brrt but the ******** thing about x86 registers is that they are /all/ to be considered overwritten after a function call
nwc10 anyway, my point was "does ia32 matter enough to optimise for *it*?" 15:12
I would have thought that x86_64 and ARM (possibly ARM v8) are the two that matter currently
v8 is the new 64 bit thingy
timotimo do ia32 get sold at all?
nwc10 v7 and earlier are 32 bit
brrt probably, yes, but not for consumer electronics anymore 15:13
nwc10 timotimo: I don't know. I'd guess so, for embedded
oh, what he said
timotimo OK
embedded isn't our target at the moment
as the empty perl6 program still takes ~123 megabytes of ram ;)
brrt and if it were we'd target arm
nwc10 I doubt anyone will be deploying Perl 6 code in revenue earning servers on anything other than x86_64 and ARM
brrt thats nothing
:-p
timotimo and the setting is still going to grow
brrt don't you want to run perl6 on your 1998 powerpc imac? :-p 15:14
nwc10 there was "optimising" and there was "it runs at all"
jnthn: can you give an arm-wavy number to what you mean by "register starved" ? 15:15
in that, I think it may be reasonable to assume that >90% of everyone who matters is no longer register starved
jnthn nwc10: x86 32-bit 15:16
nwc10: But agree it matters less and less.
nwc10 making me think that for now, go with KISS and assume "a sufficiency of registers" 15:17
(?)
jnthn works for me.
brrt oh, 16 registers on amd64 15:18
actually assuming an /insufficiency/ of registers is simpler :-)
because it means that every operation is a sequence of load, op, store 15:19
whereas having many registers means you have to wonder where they were last defined etc
jnthn brrt: Note that you need to do that not only due to register starvation, but also because deopt.
brrt true, but ultimately only when deopting 15:20
jnthn brrt: Unless you can prove you're in a no-deopt zone. :)
brrt i.e. i want to move to the situation where we /only/ store (spill) registers when deopt is in effect
jnthn Sure. But one of the things we have to be most careful with is that when you fall out of optimzied code, enough is in place that the deopt'd code has what it needs.
This works for deopt_one (you know where they are and it's local), a bit less so far deopt_all. 15:21
But those are only really on invokes.
brrt hmm
i haven't looked at the deopt ops enough yet 15:22
i'm off for dinner, be back in an hour (or two)
jnthn k :)
dalek arVM/moar-jit: 2a500f4 | (Bart Wiegmans)++ | src/ (5 files):
Very simplistic and naive jit graph building. Which is kind of the purpose.
15:23
arVM: ba1937b | jnthn++ | / (13 files):
Resolve "which spesh cand" statically if possible.

This means we don't need to check the guards if we have enough to prove they will always be met.
15:24
timotimo oh cool 15:38
coming close to inlining it seems? 15:39
less guards is more better :)
can also mean less deopt points later on, eh? 15:40
jnthn true
[Coke] brrt: no, I never ever ever want to run perl6 on a powerpc imac.
TimToady what if you place your phone on the imac? 15:41
jnthn timotimo: Well, that's one bit of logic we need for it at least. 15:45
dalek arVM: 6b45de6 | jnthn++ | / (6 files):
Pick a spesh threshold by bytecode size.

This means large things that take more time to spesh will need to do more work to prove they are hot.
15:58
16:11 FROGGS joined
dalek arVM/inline: 25680a8 | jnthn++ | / (4 files):
Empty stub of can-we-inline check.
16:35
jnthn Well, here starts the next round of hard things... :) 16:36
Dinner first :)
17:44 brrt joined
brrt ok, my MVMJitValue type is pretty much wrong 17:50
i /might/ need to represent all forms of floats and integers but i really need to represent literals, pointers, registers, lexicals first 17:52
japhb_ jnthn, How did you pick those new spesh thresholds? Are they based on measurements, analysis of algorithmic complexity, gut feeling, ... ? 17:54
brrt is almost out of energy :-( 18:08
FROGGS :/ 18:12
brrt: take a walk, that helps :o)
jnthn japhb_: Gut feeling, with a glance at the spesh_log afterwards, and a hope that somebody will jump in to do some measuring/tuning. 18:22
TimToady could go as far as to have an 'is hot' to spesh it from the getgo 18:28
[Coke] I don't think I'd trust anyone crazy enough to write perl 6 with that power. 18:29
TimToady that leaves out most of us :)
jnthn Well, the thing we'd really like it to go inlining are operators. 18:30
And accessors
And identity methods 18:31
All of which are quite tiny :)
18:37 zakharyas joined 18:54 brrt joined
brrt yes, it does :-) 18:54
does anyone think it makes a lot of sense to distinguish between lvalues and rvalues? (on the machine level) 19:02
except for literals, both lvalues and rvalues can be the same
i.e. register, lexical, pointer
also
... are we always guaranteed to be able to resolve lexicals? 19:03
jnthn Depends which lookup up
timotimo yes
jnthn The by name ones are very late bound
timotimo well, yeah
jnthn The index ones are not.
And should always work out. 19:04
timotimo but regular ones are done at compile time already
jnthn brrt: Is your JitValue thing a bit like a SpeshOperand?
brrt ehm... kind-of
jnthn brrt: Except caryring type info too?
brrt i'm not sure about carrying type info, although probably yes 19:05
(i don't seem to be sure of anything today)
jnthn Well, I saw a slot in there for what kind of value it was. :)
brrt true, but ehm... that was only literals, and i've come to the conclusion that literals aren't enough 19:06
jnthn JitInstruction with an array of JitOperands (probably a better name) could work out.
In Spesh we don't track the type because we have data on the ops
So we know how to interpret the union.
hoelzro is there a reason certain operations (ie. symlink) are implemented as operations in bytecode instead of simply functions one can call? 19:08
brrt ok. but is the distinction betwee literal / pointer + offset / lexical / register good enough?
calling functions isn't simple? :-p i don't know. for the jit they're functions
jnthn hoelzro: Because then you need to design a *second* mechanism for dispatching those built-in functions. Why do that? 19:09
The interp is already perfectly good at dispatching VM-supported operations.
hoelzro I dunno, I was just curious =) 19:10
jnthn :)
brrt: I'd probably stick to "current frame lexicals" as JITting directly and call a function for outer ones for now...
brrt: Though both work, of course. 19:11
brrt: I'd say pointer and offset are two operands.
brrt and.. i'd say they that depends on the architecture 19:12
jnthn brrt: And it's the instruction specifying that there's an offset there...
brrt but fair enough
jnthn brrt: Well, but the tree is there to translate into any arch
brrt hmm...
i guess what i'm saying is
a 'register' to the jit is just an address from which to load, relative to a base 19:13
similarly, a pointer + offset is just a memory location, hopefully a known memory location 19:14
hmmm
wait
and i'd like to have instructions whose addressing translates naturally from moar-level to machine-level
if i made the offset to the pointer another operand, i'd have to 'know' that, it wouldn't be a single thing anymore 19:15
on the other hand, any actual use of a fixed pointer and offset would indeed be two operands, as they could vary independently
that doesn't make any sense, what i just said
ok, i'm wrong 19:16
:-)
basically if i'm accessing a fixed offset from a fixed pointer, then i'd just need a single pointer
if pointer or offset can vary, then they need to be two operands 19:17
jnthn In the common case the pointer comes from a CPU register, and the offset is a constant... 19:18
19:19 lue joined
TimToady depends on array vs struct 19:19
brrt hmm
jnthn True, though array access is done inside the REPR funcs at the moment, which we don't yet inline. 19:25
But yes, later on the array case will matter.
TimToady wants efficient arrays of ints/nums to matter very much :) 19:27
jnthn TimToady: Just looking at it from a "what we'll get most value out of JITting in the next month or two" perspective. 19:28
So long as we don't box ourself in, we're good :)
The array repr stuff really will want to learn about multi-dim in the end, I think...
19:35 lue joined
brrt i'm just going to steal ideas, how about that 19:39
jnthn I think that's called "research" :) 19:40
dalek arVM/inline: 40179c7 | jnthn++ | src/spesh/inline.c:
Add inline bytecode size check.
arVM/inline: 3c0ffaf | jnthn++ | src/spesh/graph. (2 files):
Make spesh graph build more flexible wrt bytecode.

This is a step towards being able to graph a specialized form of the bytecode, which we'll need to do to get a graph to merge while doing inlining.
arVM/inline: 2d9c6ff | jnthn++ | src/spesh/graph. (2 files):
Decouple handler in graph build from original too.

If building handlers for spesh'd bytecode for inline, we'll need to use the fixed up handler addresses.
19:55
arVM/inline: bd4d7a3 | jnthn++ | src/spesh/ (3 files):
Build inline graph for specialized bytecode.
[Coke] test 20:01
%9test
whoops, wrong window.
japhb Though if your test was to determine if you had network access to the IRC server, then you passed. :-) 20:02
brrt wiki.luajit.org/SSA-IR-2.0 is interesting 20:10
also
they use only a single ir
dalek arVM/inline: 6df0e6e | jnthn++ | src/core/oplist:
Mark some ops a not suitable to inline.
20:11
arVM/inline: e63acc4 | jnthn++ | tools/update_ops.p6:
Fix oplist parser bug.

An op with no operands but an adverb was mis-parsed.
arVM/inline: 4c53a61 | jnthn++ | src/core/ (2 files):
Re-generate ops.c, carrying no-inline data.
jnthn bbi15 20:20
brrt luajit uses seemingly recursive c preprocessor macro's 20:35
wow
jnthn back 20:43
brrt hi :-) 20:45
jnthn figures he'll nudge preparations for inlining a little further along. :) 20:46
brrt wonders how you have so much energy 20:48
lizmat I think it's called "being in the flow" 20:49
brrt well then... i'd like to have tha tmore 20:50
anyway.. luajit never has pointers, so doesn't have my issue 20:53
although its perfectly plausible just to copy their design
(the IR doesn't have pointers, that is)
i'm starting to see why thats clever 20:54
jnthn :) 20:56
jnthn wonders if it can be replicated
brrt hmm... probably not 20:57
or
hmmm
some aspects of it can be replicated, i'm sure 20:58
i'm going to sleep again 21:00
see you tomorrow
21:01 brrt left
jnthn 'night o/ 21:03
dalek arVM/inline: c0813c1 | jnthn++ | src/spesh/inline.c:
Refuse inline if non-inlinable ops encountered.
21:10
arVM/inline: d502a23 | jnthn++ | src/core/op (2 files):
For now, capturelex and takeclosure are :noinline.

May be able to relax this later.
lizmat gnight jnthn 21:14
jnthn lizmat: oh, was saying night to brrt :)
lizmat ah, ok :-) 21:15
dalek arVM/inline: 5e8eb90 | jnthn++ | src/ (3 files):
Things with free lexicals can't be inlined.

May be able to relax this later with more careful analysis of outer relationships; for now keep things simple.
21:16
arVM/inline: aeb1f12 | jnthn++ | src/spesh/optimize.c:
Try building inline graph; don't use it yet.
21:35
jnthn Well, there's the easy initial analysis bits done. :) 21:41
Need to work on the graph merge next. That'll be trickier. :) 21:42
(So I'll save it for a new day. :)) 21:47
23:30 cognominal joined