dalek | arVM/jit_devirtualize_reprops_2: a185b28 | timotimo++ | src/jit/emit_x64.dasc: special case initial TC and STable, obj, objbody cases in jit |
00:36 | |
02:45
ggoebel joined
07:03
brrt joined
07:09
FROGGS joined
07:53
Ven joined
08:58
Ven joined
09:09
kjs_ joined
10:07
kjs_ joined
10:24
brrt joined
|
|||
brrt | 6 | 10:24 | |
damnit erc | |||
\o | |||
timotimo | |||
i do not agree | |||
i mean, yeah, this probably works, but is it worth it? | 10:25 | ||
heck, i'd rather have you generate a single JIT_REG_REPROP, and check for that; but again | |||
not worth it (yet) | |||
11:46
kjs_ joined
11:52
zakharyas joined
|
|||
timotimo | as soon as the expression thingie is in, this will be completely unnecessary | 12:42 | |
the problem with JIT_REG_REPROP is that it'd have to be counted as three arguments, which i'm not completely sure how to do, etc etc | |||
but i haven't looked into it too deeply | |||
13:08
brrt joined
|
|||
timotimo | exprcomp, was that the name? | 13:09 | |
brrt: i didn't mean to undermine your pretty code in the jit :) | |||
also, for some reason i'm getting crashes with valgrind, but nothing at all with asan | |||
so i'll likely revert it anyway | 13:10 | ||
brrt | timotimo - i'm not saying my jit code is necessarily pretty | ||
just that this was a hack, and likely to be overseen in a future time | |||
wierd though | 13:11 | ||
timotimo | :) | 13:14 | |
brrt | also timotimo++ for the weekly | 13:34 | |
timotimo | thank you :) | ||
do you have an idea why devirtualizing reprops might not give a significant improvement at all? | 13:35 | ||
oh, already dc'd | |||
13:35
brrt joined
|
|||
timotimo | you're back! :) | 13:35 | |
brrt | yeah, i had to restart polari because it was... annoying | ||
it's a shame when otherwise well-designed software acts annoying like this | |||
timotimo | i don't even know what polari is :) | 13:36 | |
brrt | it's a graphical irc client for gnome3 | ||
i use that or erc typicallly :-) | 13:37 | ||
timotimo | OK | 13:39 | |
are you pleased with gnome3? | |||
i was doubtful about it for a very long time, but the new screenshots i've been seeing have some charm to them | |||
nwc10 | that is an interesting choice of name, given en.wikipedia.org/wiki/Polari | 13:42 | |
or maybe that was the point | 13:43 | ||
brrt | maybe.. i hadn't really considered that :-) | 13:45 | |
timotimo | i've never heard of that, interesting! | ||
brrt | in general i'm pretty happy | ||
a big benefit for me is the 'accessibility' button on the top bar | 13:46 | ||
timotimo | ah? what's in it for you? | ||
brrt | and just pressing the 'super' button and type-to-search is really handy and well-implemented | ||
timotimo | if you don't mind me asking, of course | ||
brrt | large text and zooming | ||
alternatively :-) no of course | |||
timotimo | ah | 13:47 | |
brrt | also i don't really care about advanced window management | ||
i typically have one or at most two windows open at the same time | |||
timotimo | i was surprised gnome and other environments weren't in a fabulous shape to handle HiDPI displays yet, since they've had accessibility for a long time anyway | ||
yeah, these days i mostly do left/right halves of the screen for my windows, but i do sometimes miss having quarters | |||
the newest version of xfce, which is what i use, will have that, though | 13:48 | ||
brrt | i never miss it, and when i do, there's emacs | ||
timotimo | hehe | ||
brrt | so yeah, i'm happy | 13:49 | |
i do understand where some of the hate is coming from | |||
but i find that it doesn't really apply to my usecase, so why care? | |||
timotimo | right. i've been hating on it a bit myself, but i should have realized that it's a very self-centered thing to do | ||
"gnome3 doesn't work well for me, so i'll proclaim it sucks!" | 13:50 | ||
silly timo. | |||
do you have tips for where i could spend my energy (related to jit and spesh) better than optimizing argument passing? | 13:51 | ||
also, would you also rather see the optimization for TC in arg1 go? | |||
brrt | my hypothesis is that the tmp6 -> correct place dance doesn't really cost a lot | 13:52 | |
maybe i'm wrong about that | |||
timotimo | mhh, could be | ||
well, modern processors are magical | |||
brrt | i'm preparing a blog post on future JIT improvements | ||
well... as for what you can do.. i haven't really looked at why the hyper test breaks | 13:53 | ||
timotimo | cool | ||
oh? i'm not sure i know about a hyper test breaking | 13:54 | ||
13:58
kjs_ joined
|
|||
brrt | with the elems repr devirt patch? you asked me a few days ago | 14:00 | |
timotimo | oh | 14:01 | |
aye | |||
super weird if you ask me. | |||
the only thing i can think of is relying on something spesh doesn't provide, but i put an unconditional "use facts" into the "optimize_repr_op" function of spesh | |||
brrt | hmmm | 14:03 | |
the main difference is that the JIT devirtualizes something while spesh never does this, right? | |||
it could be a spesh bug, is what i'm saying | |||
but i guess that's what you're saying too | |||
timotimo | well, the jit devirtualizes a repr op only if spesh didn't do it yet | 14:05 | |
so a getattr, for example, or a decont would probably have turned into a p6oget_* by that point | |||
however, sometimes we have getattr ops that have a known type but an unknown attribute name | |||
in that case, a devirt is helpful | 14:06 | ||
and of course there's all the array functions like push and pop that don't get touched by spesh, because the underlying code is too complex to be turned into a simple operation (resizing and such) | |||
brrt | hmm right | 14:07 | |
oh, i know how you can spend your time Very Fruitfully | |||
if you have it to spend | |||
timotimo | i guess i do :) | ||
i'm still feeling kind of sad about the fact that i'm not internals-savvy or design-savvy enough to make nativecall as awesome as it could be, JIT-wise | 14:08 | ||
brrt | well, renember the param_[ro][pn]_[inso] business? | 14:10 | |
timotimo | yes | ||
brrt | these return a struct | ||
returning a struct is... annoying | |||
there are two fields i think of relevance | 14:11 | ||
the result value, and if it is present | |||
14:11
Ven joined
|
|||
timotimo | oh, that seems like a good place to use VMNull vs C-level null instead | 14:11 | |
brrt | hmm really? i don't see why? | 14:12 | |
timotimo | well, result value + presence yes/no can be encoded into a simple pointer in that case | ||
if the value was "a null", it'll just be VMNull, if it wasn't present it'll be a null pointer instead | |||
or am i misunderstanding? | |||
brrt | i dunno if that's going to cause too many semantic changes? | 14:13 | |
timotimo | aren't we using that internally exclusively? | 14:14 | |
brrt | but anyway, i'd suggest returning the pressence, passing a register, and settng the value conditionally | ||
then the required ops can throw if not present, and optional ops can branch if not present | 14:15 | ||
timotimo | this would be a refactoring of how moarvm does these operators and it'd mainly affect interp.c, right? | ||
(well, usage-wise, that is) | |||
brrt | e.g. currently we have MVMArgInfo MVM_args_ops_obj(tc, ctx, pos, required) | ||
i'd change that to MVMint32 MVM_args_pos_obj(tc, ctx, pos, ®ister) | 14:16 | ||
but one of the complexities is that these ops do automagic transformations | |||
timotimo | hmm, where do we actually use the flags entry to that struct? can that just be thrown out? | 14:17 | |
transformations, as in vivification and such? | 14:18 | ||
brrt | yes | 14:19 | |
autobox/unbox | |||
timotimo | oh? | ||
brrt | and i-to-n transformations and the like | 14:20 | |
a lot of things | |||
timotimo | hm, ok | ||
brrt | a map of these transformations would be very helpful | ||
it's not strictly necessary | |||
timotimo | ah, the autounbox define | ||
brrt | i'll be off, have to go to the vet | 14:21 | |
timotimo | that's where the flags are used | ||
oh, i hope your pet will get better! unless it's just a routine inspection or something | |||
brrt | no, rather sick | ||
timotimo | uh oh | ||
get well soon, brrt's pet! | |||
brrt | she thanks you :-) | ||
brrt afk | |||
timotimo | i still have some feature work i was interested in building ... udp sockets | 14:25 | |
14:34
FROGGS[mobile] joined
15:10
Ven joined
18:18
FROGGS joined
18:34
kjs_ joined
19:14
brrt joined
|
|||
brrt pet is not getting well soon at all, it appears | 19:14 | ||
timotimo | oh no! :( | 19:15 | |
what kind of pet is it? for some reason i seem to recall you have a ferret or something | |||
but i could just have dreamt that | |||
brrt | i have 3 rats | 19:19 | |
timotimo | ah | 19:20 | |
brrt | dinner & | 19:21 | |
well, ferrets and rats sound alike, in name, so that is not crazy :-) | 19:34 | ||
timotimo | :) | 19:35 | |
do they also sound alike in squeaks? | |||
brrt | no. rats hardly ever squeak, except for social purposes | 19:42 | |
:-) | |||
like the way they do in the movies? rats don't do that | |||
19:42
mj41 joined
|
|||
timotimo | yeah, just like cats and dogs | 19:42 | |
as soon as they enter a frame they have to make a sound | 19:43 | ||
brrt | right | ||
timotimo | currently i'm spending a lot of time over a ta friend's house where there's two cats | 19:49 | |
one of them is very easily made to make a sound | 19:51 | ||
brrt | some cats are, yes | 20:00 | |
many cats in my neighbourhood are really eager for petting and will meow at you just for walking by | |||
timotimo | this one does a dove-like sound whenever she gets surprised, even if the surprise is very slight | 20:11 | |
it's quite adorable | |||
hm, maybe more like a pidgeon sound rather than a dove | |||
FROGGS | timotimo: I managed to get at the 'is inlined' information from within the CStruct | 20:21 | |
timotimo | oh, great! | 20:23 | |
i'm eager to see how you implemented it | |||
dalek | arVM/union: 09746e9 | FROGGS++ | src/ (4 files): inline CUnions when attr's 'inlined' flag is set Before the default was to inline CUnions. Now a trait on the attribute can set this flag. |
20:38 | |
FROGGS | timotimo: pushed the rakudo part too now | 20:40 | |
timotimo | mhm | 20:49 | |
good :) | |||
brrt | anyone care to review my next blog post while i'm afk? brrt-to-the-future.blogspot.com/b/...;type=POST | 21:15 | |
jnthn | brrt: I'm tired, but I'll ahve a look :) | ||
21:17
colomon joined,
kjs_ joined
|
|||
TimToady | optiizations | 21:17 | |
jnthn | "Many more optimizations are forbidden by Perl 6 semantics" - I'd put it more like "Many optimizations can not be proven safe at compile time in Perl 6" | 21:21 | |
TimToady | and the link-time guarantees in the design document are largely NYI | 21:22 | |
but we're supposed to be able to close and finalize classes once we're to link time | 21:23 | ||
jnthn | TimToady: But...I don't really believe in that in a sense. | 21:24 | |
TimToady: If we already pre-comp'd modules to bytecode as we installed them, then the top-level application is allowed to close things...the ship has already sailed. | 21:25 | ||
TimToady | might still provide facts like "I'll never need this guard" | 21:26 | |
jnthn | That's true | ||
Pushing the info down VM-wards may help | |||
Further, if we forbid mixing in, we can save a bit of memory per object | |||
brrt: "transforming tight loops into SIMD" - yes that's hard, but hyperops on native arrays actually are explicit SIMD | 21:27 | ||
TimToady | plus you might be able to avoid the need for some deopt logic | ||
jnthn | Indeed. | ||
I do worry a bit that we're going to find it trick in so far as an application may well use dozens of modules, and one of them somewhere may happen to use mixins on its insides. | 21:28 | ||
And closing stuff at application level will bust that module | |||
*tricky | |||
brrt: It may be worth noting why the JIT we have today is the way it is: was done in GSoC time period, and needed to cope with deoptimization, OSR, and so forth. | 21:29 | ||
TimToady | +1 blame Google :) | 21:30 | |
jnthn | brrt: And that deopt was certainly needed for it to actually be useful for the kinds of optimizations we want to do for Perl 6, and so that had to take priority over wonderful code-gen - which is what the aim is now. | ||
brrt: Given you note we'll keep a lot of the infrastructure (especially around OSR and deopt) in place, I'd word it more like "The expression JIT functions as an extra node type for the existing JIT - though very many things will likely end up using it." | 21:33 | ||
brrt: One other opportunity we may get is being able to JIT native calls more easily also :) | 21:34 | ||
Anyway, good post. And nice plans. brrt++ | |||
FROGGS | brrt++ | 21:36 | |
I understood parts of it \o/ | |||
21:39
kjs_ joined
21:44
retupmoca joined
21:49
leedo joined
|
|||
TimToady | another possible optimization is to identify idempotent sequences of code that can increase the distance between deopt sequence points, at the expense of repeating some calculations | 21:50 | |
22:18
colomon joined
22:20
kjs_ joined
22:28
brrt joined
|
|||
brrt | jnthn++ for critical review :-) | 22:32 | |
also TimToady++ :-) | 22:35 | ||
hmm... what would be the real benefit of increasing deopt point distance? aside from moving deopt points out of loops etc | 22:36 | ||
in general calculation is cheap and memory is expensive though :-) | |||
TimToady | not having to sync things to memory in between mostly | 22:40 | |
alternately, you could have points that say "if you have to deopt here, write these registers to these locations first" | 22:41 | ||
how to commit the current transaction, as it were, rather than how to rollback | |||
brrt | that is actually what i'd rather do | 22:42 | |
timotimo | having fewer deopt points active reduces the amount of registers we have to keep alive in spesh | ||
brrt | eliminating deopt points is at least partly spesh's responsibility i'd think | 22:43 | |
TimToady | and since idempotence is related to lack of side effects, escape analysis will also play into that | ||
timotimo | sure | ||
TimToady | anyway, just talking in general terms like I know what I'm sayin' :) | ||
brrt | :-) it makes sense | 22:45 | |
TimToady | my math always had more vigor than rigor, I fear... :) | ||
brrt | and in the jit i really do want to do active restoration sequences (optimisticly) | ||
but that's.. hard, in general, and wasn't necessary until now | 22:46 | ||
TimToady | careful, or you'll invent STM :) | ||
for my part, I think we can do a much better job of caching dynvars than we do | 22:48 | ||
brrt | it's not really the same as STM, is it? i'm not at all familiar with that | ||
TimToady | well, it is transactional, and it has something to do with software and memory :) | 22:49 | |
brrt | right | ||
i sometimes wonder whether transactions aren't more difficult to reason about than just the random storm of writes that we'd otherwise have | 22:50 | ||
TimToady | well, I think you'll need to view the random storm that happens between possible deopt points as some kind of transaction, or you'll get erroneous deopts | 22:51 | |
well, or the random storm of register writes that *aren't memory writes, anyway | 22:52 | ||
one could view every memory write as a kind of transaction commit | |||
but then your registers are just a write-through cache | |||
anyway, looking forward to seeing your work in the future, for sure | 22:53 | ||
brrt | i'm looking forward to working on it again, too :-) | 22:54 | |
TimToady goes back to thinking more about dynvar caches | |||
thing is, we really only need to store a cache starting at the deepest $*FOO, and deeper frames can just dup a pointer to that cache | 22:57 | ||
and if the cache were somehow authoritative, we would never even have to look in the lexpads | |||
well, perhaps dup a pointer to the frame that holds the cache, rather, esp if the actual cache were the root of a linked list of authoritative dynvar locations, then if you have to search too far for $*FOO, you just link it into the front of the list | 22:59 | ||
also, since there are relatively few dynvar names, they could very easily profit from string interning, even if the language in general doesn't use it | 23:01 | ||
timotimo | mhm | 23:02 | |
i'd like dynvars to become faster, if only for $*OUT | |||
TimToady | indeed | ||
that's one of the reasons our IO is so slow | 23:03 | ||
using the frames as a linked list of cache entries doesn't seem very...cache friendly | 23:05 | ||
a direct link to a dynvar cache attached only to frames that actually declare dynvars seems like it could be more CPU cache friendly, though one would have to take into account allocation costs | 23:08 | ||
a resizeable cache in one chunk of memory is probably more CPU cache friendly these days than a linked list | 23:09 | ||
timotimo | if that linked list only links chunks that have been allocated in close proximity, maybe that helps? | 23:10 | |
TimToady | the degerate case of that is to just copy all the "environmental" locations down as Unix processes do, but that's probably insufficiently lazy for our purposes | ||
a cache chunk pool might work, yes | |||
timotimo | i was wondering if mmapping a chunk of memory for every piece of code we jit is a bad idea | 23:11 | |
TimToady doesn't follow the reasoning | 23:12 | ||
or were you going back to the deopt thing? | 23:13 | ||
timotimo | nah | ||
just running perl6 -e 'say 1' calls mmap 360 times | |||
TimToady | oh, just a new subject then :) | ||
this after the lazy deserialization? | 23:14 | ||
timotimo | yes, sorry | ||
but i'm now doing something else entirely :P | |||
TimToady | maybe figure out what has to get deserialized, and clump in in one mmap somehow? | ||
brrt | hmmm | 23:15 | |
TimToady should probably do more measurement of dynvar overhead anyway first... | |||
brrt | i hadn't considered mmapping to be such a memory or cpu cost | ||
TimToady | well, syscalls are known to be atrociously slow on most unixen | ||
brrt | that is true. but we execute a lot more code than just 360 mmaps before we get to say | 23:16 | |
(although i'll be the first to admit that it is a lot) | 23:17 | ||
TimToady | time perl6-m -e 'say 42' | 23:18 | |
42 | |||
real0m0.190s | |||
user0m0.135s | |||
sys0m0.055s | |||
sys time is significant | |||
brrt | fair enough | 23:19 | |
but that's startup | |||
and still it's less than 50% | |||
TimToady | I thought that's what timotimo++ was talking about | 23:20 | |
but a 10% win, say, for coalescing mmaps would be significant | 23:21 | ||
brrt | right | ||
that's pretty much opposite to what the python folks think, btw :-) | 23:22 | ||
TimToady | well, maybe jnthn++ has already thought about which things should load eagerly vs lazily, but it's just an idea | ||
japhb | brrt: What is "opposite to what the python folks think"? They don't believe in mmap coalescing? | 23:23 | |
brrt | they don't believe in making the interpreter more than n% complex for less than n% speedup | 23:24 | |
TimToady | well, there's something to be said for that point of view too :) | ||
brrt | which is a ... reasonable rule, i'd guess, but also the reason (aside from missing jnthn :-P) they don't have spesh | ||
TimToady | otoh, they don't really believe in tormenting the implementors on behalf of the users quite as much as we do :) | 23:25 | |
TimToady just always wonders about any particular tradeoff whether the OR-ness of it is intrinsic or extrinsic to the actual problem... | 23:26 | ||
brrt | OR or XOR :-P | ||
TimToady | OR xor XOR :P | ||
brrt | but in general if things are hard it's nature's fault, i find | 23:27 | |
some things are human problems, i try to avoid these | |||
TimToady | then by all means avoid me :) | ||
brrt | hah, i didn't say humans are problems | 23:28 | |
TimToady | well, some of us are... | 23:29 | |
brrt sleep & | 23:32 | ||
TimToady | o/ | ||
brrt | o/ |