jnthn danluu.com/new-cpu-features/ is a kinda interesting read, except I'm a bit full of cold to concentrate long enough to read it all the way through right now... 20:05
nwc10 is reading it a bit rapidly. It references www-plan.cs.colorado.edu/klipto/myt...plos09.pdf which used perlbench. Yay! 20:34
(and many other things)
(significance that I remain curious about is "why does link order affect speed?" - I found this even when I forced all code generation options to align with L1 cache lines) 20:35
brrt interesting article 20:45
brrt jnthn, you mean the pointer to the how obj in the spesh slot will be updated if / when the how obj is relocated? 21:22
jnthn brrt: Yes.
brrt oh, awesome
jnthn brrt: The GC visits spesh slots.
timotimo that's neat 21:27
so having a pointer as a spesh slot is the way to go
i seem to recall i had wanted something like an object literal that i could just load into an object register
but it seems like spesh slot would be much more sensible
jnthn Object literals are basically what spesh slots exist for :) 21:28
"I resolved this thingy and want to make it easy to grab"
timotimo i thought spesh slots were something that code could, in the future, put some value into to reference later; kind of like for caches 21:31
jnthn It can be used for that too 21:32
And actually they are MVMCollectable *
So you can safely stick an STable in 'em too
brrt jit sometimes puts raw pointers into the bytecode 21:40
which, i'm told, is pretty evil
jnthn Yes, that *does* need guarding with gen2
brrt hmm
timotimo so if you have a cached value, it definitely has to be boxed. good to know
brrt now that i think of it, i can probably do better than that
if i ever figure out how to make separate sections with dynasm 21:45
linus doesn't like cmov it appears: yarchive.net/comp/linux/cmov.html
timotimo one more thing i've been thinking about or worried about: 21:49
if we don't make the deoptimization strategy from jit code to other code better, we won't be able to really get away from loads before instruction, code for instruction, stores after instruction
because we need to have the data in the WORK at the points where we deoptimize
jnthn I don't know there's a problem with the strategy. You statically know where deopt can happen. 21:50
It's not like absolutely any place can trigger it. 21:51
And between those points, cheating on WORK is just fine.
Conditionally writing back to WORK only if deopt happens is also legit.
timotimo oh 21:52
well, that does sound helpful
jnthn WORK only has to be in good enough shape at the point deopt actually happens, if it happens.
timotimo would we have a section in the assembly code for each deopt point that knows what registers need to be put into what WORK slots?
jnthn That's a possible strategy, yes 21:53
timotimo since if we do that inside the deopt function that the C compiler builds, we'll most probably lose register content
jnthn Could even emit it at the end of the JITted output and branch to it, rather than jump over it, if we want to avoid some branches and clutter the I-Cache for never-hit deopts less.
brrt you'll have to be conservative in WORK writing anyway because of gc barriers
jnthn Writes to WORK aren't barriered :) 21:54
'cus MVMFrame is not an MVMCollectable
brrt yeah timo, that's impossible
timotimo please explain the "end of the JITted output"?
jnthn timotimo: As in
timotimo i was thinking we'd be putting all the deopt "bridges" at the end anyway
jnthn blah blah machine code blah
timotimo not directly after the deopt point
brrt i mean not barriers. but anytime you may trigger gc, you need to write to WORK
jnthn if guard_failed goto deopt_42
timotimo i'm hoping deopt is rare :)
jnthn blah blah more code 21:55
brrt and anyway, that's all future
jnthn ...end of normal code...
deopt_42:
...code to get WORK in order...
timotimo right, that's what i had in mind, too
brrt nods, that's basically how i want to do it
timotimo great
we are all in agreement in that case
brrt but keep in mind you also need to do this in the case of invokes
jnthn brrt: Yes, that's true. But we should be able to have a good idea when that can happen.
timotimo right, we already have invoke guards anyway
jnthn Yes, invokes are also deopt points.
timotimo and probably also invokish guards?
ah, good.
jnthn OTOH, for small invoked things, we can inline, and then the invoke is gone. 21:56
timotimo but in case of invocations we'd have to tidy up WORK before the invocation?
brrt and for c callouts (except for callee-save register, but i've already filled them)
timotimo right, that's something spesh could do before the jit even triggers
jnthn Yes, WORK'd need syncing before it
Spesh already handles inline before JIT triggers. :)
Anyway, I think we have a reasonable amount of wiggle room. 21:57
brrt this is a problem common to all JITs I know of
jnthn Well, it's the cost of speculative optimization: you gotta deal with having speculated wrongly. 21:58
timotimo oh, i meant to write "would", not "could"
actually "will"
brrt my roadmap was as follows 21:59
- first add register selection to x64 dynasm 22:00
- then write a pretty much separate expression compiler 22:01
- i think expressions should be pretty simple in order to be sufficiently generic?
- then add an expression as a node to the jit graph
- do writeouts whenever we 'leave' the expression 22:02
so you'd have a sequence: callout, expression, callout, expression, invoke
ultimately, add more and more jit nodes into the expression compiler 22:03
how to do all this? i don't know yet exactly
timotimo oh?
i thought the SSA form was sufficiently expressiony to work like that
but what do i know :)
you're the expert for a reason :)
jnthn brrt: GSoC 2015? :D ;) 22:04
brrt hardly an expert :-)
timotimo hear hear
brrt well... i'm certainly not against it :-)
(wrt to SSA: it /is/ but it is for expressions on the moarvm level)
jnthn Or I suspect alternative funding can be found if that fails. It'd be a significant and valuable piece of work. 22:05
brrt well.. we'll discuss it when the time comes :-) 22:06
timotimo i'm apparently not close enough to the material to understand the need for the expression compiler
but i very much do understand what we need register selection for
brrt the expession compiler is a term i made up for the idea that i have about it
it probably has another name
the main point of it is to put the load / stores / operations that are done in a tree so we can perform CSE on it 22:08
you can't really do that at moarvm level since it doesn't have this concept of load/stores
timotimo ah, ok
you want to reify loads and stores :)
brrt you /can/ build such a tree from the ssa form
and other things
conditionals and tests
the dynasm input could be used as a template then 22:10
(but i'm not sure i'm making sense now) 22:11
lua is funky, btw 22:29
hoelzro it *does* part the fun in funky 22:30
brrt part or put? 22:50
anyway, startrek& :-)