dalek arVM/jit_devirtualize_reprops_2: a185b28 | timotimo++ | src/jit/emit_x64.dasc:
special case initial TC and STable, obj, objbody cases in jit
00:36
02:45 ggoebel joined 07:03 brrt joined 07:09 FROGGS joined 07:53 Ven joined 08:58 Ven joined 09:09 kjs_ joined 10:07 kjs_ joined 10:24 brrt joined
brrt 6 10:24
damnit erc
\o
timotimo
i do not agree
i mean, yeah, this probably works, but is it worth it? 10:25
heck, i'd rather have you generate a single JIT_REG_REPROP, and check for that; but again
not worth it (yet)
11:46 kjs_ joined 11:52 zakharyas joined
timotimo as soon as the expression thingie is in, this will be completely unnecessary 12:42
the problem with JIT_REG_REPROP is that it'd have to be counted as three arguments, which i'm not completely sure how to do, etc etc
but i haven't looked into it too deeply
13:08 brrt joined
timotimo exprcomp, was that the name? 13:09
brrt: i didn't mean to undermine your pretty code in the jit :)
also, for some reason i'm getting crashes with valgrind, but nothing at all with asan
so i'll likely revert it anyway 13:10
brrt timotimo - i'm not saying my jit code is necessarily pretty
just that this was a hack, and likely to be overseen in a future time
wierd though 13:11
timotimo :) 13:14
brrt also timotimo++ for the weekly 13:34
timotimo thank you :)
do you have an idea why devirtualizing reprops might not give a significant improvement at all? 13:35
oh, already dc'd
13:35 brrt joined
timotimo you're back! :) 13:35
brrt yeah, i had to restart polari because it was... annoying
it's a shame when otherwise well-designed software acts annoying like this
timotimo i don't even know what polari is :) 13:36
brrt it's a graphical irc client for gnome3
i use that or erc typicallly :-) 13:37
timotimo OK 13:39
are you pleased with gnome3?
i was doubtful about it for a very long time, but the new screenshots i've been seeing have some charm to them
nwc10 that is an interesting choice of name, given en.wikipedia.org/wiki/Polari 13:42
or maybe that was the point 13:43
brrt maybe.. i hadn't really considered that :-) 13:45
timotimo i've never heard of that, interesting!
brrt in general i'm pretty happy
a big benefit for me is the 'accessibility' button on the top bar 13:46
timotimo ah? what's in it for you?
brrt and just pressing the 'super' button and type-to-search is really handy and well-implemented
timotimo if you don't mind me asking, of course
brrt large text and zooming
alternatively :-) no of course
timotimo ah 13:47
brrt also i don't really care about advanced window management
i typically have one or at most two windows open at the same time
timotimo i was surprised gnome and other environments weren't in a fabulous shape to handle HiDPI displays yet, since they've had accessibility for a long time anyway
yeah, these days i mostly do left/right halves of the screen for my windows, but i do sometimes miss having quarters
the newest version of xfce, which is what i use, will have that, though 13:48
brrt i never miss it, and when i do, there's emacs
timotimo hehe
brrt so yeah, i'm happy 13:49
i do understand where some of the hate is coming from
but i find that it doesn't really apply to my usecase, so why care?
timotimo right. i've been hating on it a bit myself, but i should have realized that it's a very self-centered thing to do
"gnome3 doesn't work well for me, so i'll proclaim it sucks!" 13:50
silly timo.
do you have tips for where i could spend my energy (related to jit and spesh) better than optimizing argument passing? 13:51
also, would you also rather see the optimization for TC in arg1 go?
brrt my hypothesis is that the tmp6 -> correct place dance doesn't really cost a lot 13:52
maybe i'm wrong about that
timotimo mhh, could be
well, modern processors are magical
brrt i'm preparing a blog post on future JIT improvements
well... as for what you can do.. i haven't really looked at why the hyper test breaks 13:53
timotimo cool
oh? i'm not sure i know about a hyper test breaking 13:54
13:58 kjs_ joined
brrt with the elems repr devirt patch? you asked me a few days ago 14:00
timotimo oh 14:01
aye
super weird if you ask me.
the only thing i can think of is relying on something spesh doesn't provide, but i put an unconditional "use facts" into the "optimize_repr_op" function of spesh
brrt hmmm 14:03
the main difference is that the JIT devirtualizes something while spesh never does this, right?
it could be a spesh bug, is what i'm saying
but i guess that's what you're saying too
timotimo well, the jit devirtualizes a repr op only if spesh didn't do it yet 14:05
so a getattr, for example, or a decont would probably have turned into a p6oget_* by that point
however, sometimes we have getattr ops that have a known type but an unknown attribute name
in that case, a devirt is helpful 14:06
and of course there's all the array functions like push and pop that don't get touched by spesh, because the underlying code is too complex to be turned into a simple operation (resizing and such)
brrt hmm right 14:07
oh, i know how you can spend your time Very Fruitfully
if you have it to spend
timotimo i guess i do :)
i'm still feeling kind of sad about the fact that i'm not internals-savvy or design-savvy enough to make nativecall as awesome as it could be, JIT-wise 14:08
brrt well, renember the param_[ro][pn]_[inso] business? 14:10
timotimo yes
brrt these return a struct
returning a struct is... annoying
there are two fields i think of relevance 14:11
the result value, and if it is present
14:11 Ven joined
timotimo oh, that seems like a good place to use VMNull vs C-level null instead 14:11
brrt hmm really? i don't see why? 14:12
timotimo well, result value + presence yes/no can be encoded into a simple pointer in that case
if the value was "a null", it'll just be VMNull, if it wasn't present it'll be a null pointer instead
or am i misunderstanding?
brrt i dunno if that's going to cause too many semantic changes? 14:13
timotimo aren't we using that internally exclusively? 14:14
brrt but anyway, i'd suggest returning the pressence, passing a register, and settng the value conditionally
then the required ops can throw if not present, and optional ops can branch if not present 14:15
timotimo this would be a refactoring of how moarvm does these operators and it'd mainly affect interp.c, right?
(well, usage-wise, that is)
brrt e.g. currently we have MVMArgInfo MVM_args_ops_obj(tc, ctx, pos, required)
i'd change that to MVMint32 MVM_args_pos_obj(tc, ctx, pos, &register) 14:16
but one of the complexities is that these ops do automagic transformations
timotimo hmm, where do we actually use the flags entry to that struct? can that just be thrown out? 14:17
transformations, as in vivification and such? 14:18
brrt yes 14:19
autobox/unbox
timotimo oh?
brrt and i-to-n transformations and the like 14:20
a lot of things
timotimo hm, ok
brrt a map of these transformations would be very helpful
it's not strictly necessary
timotimo ah, the autounbox define
brrt i'll be off, have to go to the vet 14:21
timotimo that's where the flags are used
oh, i hope your pet will get better! unless it's just a routine inspection or something
brrt no, rather sick
timotimo uh oh
get well soon, brrt's pet!
brrt she thanks you :-)
brrt afk
timotimo i still have some feature work i was interested in building ... udp sockets 14:25
14:34 FROGGS[mobile] joined 15:10 Ven joined 18:18 FROGGS joined 18:34 kjs_ joined 19:14 brrt joined
brrt pet is not getting well soon at all, it appears 19:14
timotimo oh no! :( 19:15
what kind of pet is it? for some reason i seem to recall you have a ferret or something
but i could just have dreamt that
brrt i have 3 rats 19:19
timotimo ah 19:20
brrt dinner & 19:21
well, ferrets and rats sound alike, in name, so that is not crazy :-) 19:34
timotimo :) 19:35
do they also sound alike in squeaks?
brrt no. rats hardly ever squeak, except for social purposes 19:42
:-)
like the way they do in the movies? rats don't do that
19:42 mj41 joined
timotimo yeah, just like cats and dogs 19:42
as soon as they enter a frame they have to make a sound 19:43
brrt right
timotimo currently i'm spending a lot of time over a ta friend's house where there's two cats 19:49
one of them is very easily made to make a sound 19:51
brrt some cats are, yes 20:00
many cats in my neighbourhood are really eager for petting and will meow at you just for walking by
timotimo this one does a dove-like sound whenever she gets surprised, even if the surprise is very slight 20:11
it's quite adorable
hm, maybe more like a pidgeon sound rather than a dove
FROGGS timotimo: I managed to get at the 'is inlined' information from within the CStruct 20:21
timotimo oh, great! 20:23
i'm eager to see how you implemented it
dalek arVM/union: 09746e9 | FROGGS++ | src/ (4 files):
inline CUnions when attr's 'inlined' flag is set

Before the default was to inline CUnions. Now a trait on the attribute can set this flag.
20:38
FROGGS timotimo: pushed the rakudo part too now 20:40
timotimo mhm 20:49
good :)
brrt anyone care to review my next blog post while i'm afk? brrt-to-the-future.blogspot.com/b/...;type=POST 21:15
jnthn brrt: I'm tired, but I'll ahve a look :)
21:17 colomon joined, kjs_ joined
TimToady optiizations 21:17
jnthn "Many more optimizations are forbidden by Perl 6 semantics" - I'd put it more like "Many optimizations can not be proven safe at compile time in Perl 6" 21:21
TimToady and the link-time guarantees in the design document are largely NYI 21:22
but we're supposed to be able to close and finalize classes once we're to link time 21:23
jnthn TimToady: But...I don't really believe in that in a sense. 21:24
TimToady: If we already pre-comp'd modules to bytecode as we installed them, then the top-level application is allowed to close things...the ship has already sailed. 21:25
TimToady might still provide facts like "I'll never need this guard" 21:26
jnthn That's true
Pushing the info down VM-wards may help
Further, if we forbid mixing in, we can save a bit of memory per object
brrt: "transforming tight loops into SIMD" - yes that's hard, but hyperops on native arrays actually are explicit SIMD 21:27
TimToady plus you might be able to avoid the need for some deopt logic
jnthn Indeed.
I do worry a bit that we're going to find it trick in so far as an application may well use dozens of modules, and one of them somewhere may happen to use mixins on its insides. 21:28
And closing stuff at application level will bust that module
*tricky
brrt: It may be worth noting why the JIT we have today is the way it is: was done in GSoC time period, and needed to cope with deoptimization, OSR, and so forth. 21:29
TimToady +1 blame Google :) 21:30
jnthn brrt: And that deopt was certainly needed for it to actually be useful for the kinds of optimizations we want to do for Perl 6, and so that had to take priority over wonderful code-gen - which is what the aim is now.
brrt: Given you note we'll keep a lot of the infrastructure (especially around OSR and deopt) in place, I'd word it more like "The expression JIT functions as an extra node type for the existing JIT - though very many things will likely end up using it." 21:33
brrt: One other opportunity we may get is being able to JIT native calls more easily also :) 21:34
Anyway, good post. And nice plans. brrt++
FROGGS brrt++ 21:36
I understood parts of it \o/
21:39 kjs_ joined 21:44 retupmoca joined 21:49 leedo joined
TimToady another possible optimization is to identify idempotent sequences of code that can increase the distance between deopt sequence points, at the expense of repeating some calculations 21:50
22:18 colomon joined 22:20 kjs_ joined 22:28 brrt joined
brrt jnthn++ for critical review :-) 22:32
also TimToady++ :-) 22:35
hmm... what would be the real benefit of increasing deopt point distance? aside from moving deopt points out of loops etc 22:36
in general calculation is cheap and memory is expensive though :-)
TimToady not having to sync things to memory in between mostly 22:40
alternately, you could have points that say "if you have to deopt here, write these registers to these locations first" 22:41
how to commit the current transaction, as it were, rather than how to rollback
brrt that is actually what i'd rather do 22:42
timotimo having fewer deopt points active reduces the amount of registers we have to keep alive in spesh
brrt eliminating deopt points is at least partly spesh's responsibility i'd think 22:43
TimToady and since idempotence is related to lack of side effects, escape analysis will also play into that
timotimo sure
TimToady anyway, just talking in general terms like I know what I'm sayin' :)
brrt :-) it makes sense 22:45
TimToady my math always had more vigor than rigor, I fear... :)
brrt and in the jit i really do want to do active restoration sequences (optimisticly)
but that's.. hard, in general, and wasn't necessary until now 22:46
TimToady careful, or you'll invent STM :)
for my part, I think we can do a much better job of caching dynvars than we do 22:48
brrt it's not really the same as STM, is it? i'm not at all familiar with that
TimToady well, it is transactional, and it has something to do with software and memory :) 22:49
brrt right
i sometimes wonder whether transactions aren't more difficult to reason about than just the random storm of writes that we'd otherwise have 22:50
TimToady well, I think you'll need to view the random storm that happens between possible deopt points as some kind of transaction, or you'll get erroneous deopts 22:51
well, or the random storm of register writes that *aren't memory writes, anyway 22:52
one could view every memory write as a kind of transaction commit
but then your registers are just a write-through cache
anyway, looking forward to seeing your work in the future, for sure 22:53
brrt i'm looking forward to working on it again, too :-) 22:54
TimToady goes back to thinking more about dynvar caches
thing is, we really only need to store a cache starting at the deepest $*FOO, and deeper frames can just dup a pointer to that cache 22:57
and if the cache were somehow authoritative, we would never even have to look in the lexpads
well, perhaps dup a pointer to the frame that holds the cache, rather, esp if the actual cache were the root of a linked list of authoritative dynvar locations, then if you have to search too far for $*FOO, you just link it into the front of the list 22:59
also, since there are relatively few dynvar names, they could very easily profit from string interning, even if the language in general doesn't use it 23:01
timotimo mhm 23:02
i'd like dynvars to become faster, if only for $*OUT
TimToady indeed
that's one of the reasons our IO is so slow 23:03
using the frames as a linked list of cache entries doesn't seem very...cache friendly 23:05
a direct link to a dynvar cache attached only to frames that actually declare dynvars seems like it could be more CPU cache friendly, though one would have to take into account allocation costs 23:08
a resizeable cache in one chunk of memory is probably more CPU cache friendly these days than a linked list 23:09
timotimo if that linked list only links chunks that have been allocated in close proximity, maybe that helps? 23:10
TimToady the degerate case of that is to just copy all the "environmental" locations down as Unix processes do, but that's probably insufficiently lazy for our purposes
a cache chunk pool might work, yes
timotimo i was wondering if mmapping a chunk of memory for every piece of code we jit is a bad idea 23:11
TimToady doesn't follow the reasoning 23:12
or were you going back to the deopt thing? 23:13
timotimo nah
just running perl6 -e 'say 1' calls mmap 360 times
TimToady oh, just a new subject then :)
this after the lazy deserialization? 23:14
timotimo yes, sorry
but i'm now doing something else entirely :P
TimToady maybe figure out what has to get deserialized, and clump in in one mmap somehow?
brrt hmmm 23:15
TimToady should probably do more measurement of dynvar overhead anyway first...
brrt i hadn't considered mmapping to be such a memory or cpu cost
TimToady well, syscalls are known to be atrociously slow on most unixen
brrt that is true. but we execute a lot more code than just 360 mmaps before we get to say 23:16
(although i'll be the first to admit that it is a lot) 23:17
TimToady time perl6-m -e 'say 42' 23:18
42
real0m0.190s
user0m0.135s
sys0m0.055s
sys time is significant
brrt fair enough 23:19
but that's startup
and still it's less than 50%
TimToady I thought that's what timotimo++ was talking about 23:20
but a 10% win, say, for coalescing mmaps would be significant 23:21
brrt right
that's pretty much opposite to what the python folks think, btw :-) 23:22
TimToady well, maybe jnthn++ has already thought about which things should load eagerly vs lazily, but it's just an idea
japhb brrt: What is "opposite to what the python folks think"? They don't believe in mmap coalescing? 23:23
brrt they don't believe in making the interpreter more than n% complex for less than n% speedup 23:24
TimToady well, there's something to be said for that point of view too :)
brrt which is a ... reasonable rule, i'd guess, but also the reason (aside from missing jnthn :-P) they don't have spesh
TimToady otoh, they don't really believe in tormenting the implementors on behalf of the users quite as much as we do :) 23:25
TimToady just always wonders about any particular tradeoff whether the OR-ness of it is intrinsic or extrinsic to the actual problem... 23:26
brrt OR or XOR :-P
TimToady OR xor XOR :P
brrt but in general if things are hard it's nature's fault, i find 23:27
some things are human problems, i try to avoid these
TimToady then by all means avoid me :)
brrt hah, i didn't say humans are problems 23:28
TimToady well, some of us are... 23:29
brrt sleep & 23:32
TimToady o/
brrt o/