00:03
danaj joined
|
|||
timotimo | a curcode should be speshable immediately into a static pointer set, right? | 01:54 | |
and getcodeobj should also then be constant-foldable? | |||
curcode and getcodeobj appear ~820 times in the setting | 01:55 | ||
always in code that looks very, very similar | |||
not 100% sure what that is, exactly | |||
hmm. if i turn curcode and getcodeobj into a constant pointer, that removes a :noinline instruction from that frame and maybe it'd become inlinable? | 01:57 | ||
the fact that the code_ref is inside the Frame but not in the StaticFrame makes me think that wouldn't work ... | 02:01 | ||
it seems like a proto's dispatcher generates two curcode + getcodeobj calls | 02:04 | ||
02:41
JimmyZ joined
|
|||
JimmyZ | Is there a reason that it can't move code_ref to StaticFrame? :P | 02:51 | |
timotimo | i don't know much about this part of moarvm | 02:54 | |
if the proto actually does have to call curcode twice and can get different results each time ... that would make that code legit | |||
otherwise, it should at least be possible to unify the values and only curcode + getcodeobj once | |||
JimmyZ | looks like it's for state variables | 02:55 | |
timotimo | oh | 03:00 | |
03:01
FROGGS[mobile] joined
09:29
lizmat joined
09:31
woolfy joined
09:40
camelia joined
09:48
woolfy left
11:04
ggoebel111111114 joined
11:57
zakharyas joined
|
|||
timotimo | jnthn: this one shouldn't require too much wakeness: can curcode + gedcodeobj be spesh'd into something static? i.e. constant folded? does having two pairs of these close by each other make sense? i.e. will the value that you get from them change through some kind of event? | 13:35 | |
because every single one of our protos seems to be doing curcode + getcodeobj twice in relatively close succession | 13:36 | ||
JimmyZ | oh, good morning, timotimo! | ||
jnthn | No, because curcode = current closure | ||
timotimo | OK | 13:37 | |
good morning JimmyZ :) | |||
jnthn | And since generated proto methods all come from a single static thing... | ||
...it'd surely screw up to constant fold it to...something. | |||
timotimo | OK :) | ||
since you've been gone, i've been grasping for straws regarding performance improvements m) | 13:38 | ||
jnthn | For non-generated we might do better but...hmm. | ||
It's not immediately clear how | |||
If we can do better we'd want to just get code-gen to spit out a wval when we know it's OK... | |||
timotimo | OK, that'd be good on all back-ends | ||
i found something else that stumped me. in control.pm, we have a "my sub THROW" and we use that in take and take-rw | 13:39 | ||
now the sub THROW ends in "0" explicitly so that there's no return-value overhead | |||
but the users of THROW all inspect the return value for sinkability ... and they also look up "&THROW" indirectly | |||
i don't understand why :\ | |||
is that just something the optimizer should have picked up? or should code-gen have done better? | 13:42 | ||
maybe i should inspect what spesh does with that before going on | |||
jnthn | Not sure without looking | 13:46 | |
The indirect lookup sounds like a code-gen thing though | |||
timotimo | i'll follow that hint | ||
even though take and take-rw are likely to see a big change for glr, right? | 13:47 | ||
jnthn | Well, if those two are the ONLY things using THROW, we may just consider inlining it by hand into both | 13:50 | |
timotimo | mhm | 13:51 | |
jnthn | And get rid of THROW | ||
timotimo | i think 4 things actually use it | ||
or something | |||
jnthn | Ah | ||
timotimo | i'll look | ||
jnthn | That's...a little worse :) | ||
timotimo | last, next, redo, succeed, proceed all use it | 13:52 | |
with a "my sub", is it guaranteed to not change? i would assume so | |||
surely i'll find more meaningful stuff to change for performance improvements, though. | 13:53 | ||
and the code-gen for sink is going to change soon anyway; iirc sinking will become part of p6decontrv or something like that | |||
one other thing i saw: every nativecallinvoke is preceeded by a call to a generic "decontall" sub | 13:54 | ||
i was thinking we should be able to do better than that, especially since we know the number of arguments to the native call when we spesh | |||
but i'm bothering you too much now :) | 13:55 | ||
jnthn | "my" just means "lexical" | ||
uh | |||
And it's the default | |||
So the "my" is kinda meangingless here | |||
Or at least, surpluss | |||
timotimo | oh | 13:57 | |
it doesn't have "my" there (any more?) | |||
i didn't see the jvm ifdefs, it seems like | |||
jnthn: t.h8.lv/add_core_op_extops_and_dataflow.svg ā you might like this :3 | 14:01 | ||
jnthn | So diagram! | 14:10 | |
Wow :) | |||
15:30
colomon_ joined
15:38
colomon_ joined
16:20
zakharyas joined
17:28
zakharyas1 joined
17:50
FROGGS_ joined
|
|||
timotimo | jnthn: we have sp_findmeth, maybe sp_can_s would be interesting to have, too? | 18:59 | |
i've seen multiple cases where we first can_s, then sp_findmeth | |||
jnthn | We'd really like to find a way to turn those lookups into a single hash lookup... | 19:01 | |
timotimo | that's a change that probably goes much deeper than what i feel comfortable trying to do myself :S | 19:15 | |
i see no elegant way to improve the decont_all situation, as we don't know the number of arguments coming in at code-gen time :( | |||
and the lack of working call graph pruning kinda drives me up the wall, but i don't think i can figure out what's wrong with the code i already pushed :\ | 19:16 | ||
interesting | 19:32 | ||
for postinc native spends 15% of its time garbage collecting, but the allocations tab only shows 2 allocations all in all | |||
782 gc runs for 40964096 runs, each taking about 7ms | 19:33 | ||
jnthn | Then I'm guessing the allocations tab is missing something... | 19:36 | |
timotimo | after a whole bunch of GCs where we promote nothing at all, would it be sensible to do a gen2 collection to make future collections cheaper? | 19:37 | |
jnthn | Make them cheaper how? | 19:38 | |
timotimo | maybe having fewer gen2-to-nursery links? | ||
jnthn | We prunse those each nursery collect, no? | ||
Unless you mean, the gen2 objects themselves may go away | 19:39 | ||
In which case, yes, it may be worth having an absolute threshold as well as the promotion based one, but we'd want to set it fairly high. | |||
And I'm still not sure it's worth it | 19:40 | ||
timotimo | i'll try setting it to 200 and see what happens to the gc times after vs before | 19:44 | |
no visible change at all | 19:56 | ||
except it takes 10x as long for a single gc run, which is the full collection | 19:57 | ||
i'm now analyzing the nursery's contents | 21:12 | ||
jnthn: gist.github.com/timo/1eef31e3be2e7e816bef - this is what the nursery looks like in that benchmark | 21:22 | ||
we shouldn't be putting that many MVMStaticFrame instances into the nursery, aye? | 21:23 | ||
shouldn't we have only a single one per existing frame | |||
i don't know very much about moarvm, but i do know we shouldn't have 11k static frames being created for each of the gc runs of which this benchmark has 782 | 21:25 | ||
22:21
lizmat joined
22:45
woolfy joined
|
|||
timotimo | so i've breakpointed the gc_allocate for MVMStaticFrame and that didn't get triggered after the first gc run was over | 23:10 | |
so we're just cloning them memcpy-style? | |||
all over the place? | |||
freshcoderef creates static frames by cloning | 23:13 |