00:03 danaj joined
timotimo a curcode should be speshable immediately into a static pointer set, right? 01:54
and getcodeobj should also then be constant-foldable?
curcode and getcodeobj appear ~820 times in the setting 01:55
always in code that looks very, very similar
not 100% sure what that is, exactly
hmm. if i turn curcode and getcodeobj into a constant pointer, that removes a :noinline instruction from that frame and maybe it'd become inlinable? 01:57
the fact that the code_ref is inside the Frame but not in the StaticFrame makes me think that wouldn't work ... 02:01
it seems like a proto's dispatcher generates two curcode + getcodeobj calls 02:04
02:41 JimmyZ joined
JimmyZ Is there a reason that it can't move code_ref to StaticFrame? :P 02:51
timotimo i don't know much about this part of moarvm 02:54
if the proto actually does have to call curcode twice and can get different results each time ... that would make that code legit
otherwise, it should at least be possible to unify the values and only curcode + getcodeobj once
JimmyZ looks like it's for state variables 02:55
timotimo oh 03:00
03:01 FROGGS[mobile] joined 09:29 lizmat joined 09:31 woolfy joined 09:40 camelia joined 09:48 woolfy left 11:04 ggoebel111111114 joined 11:57 zakharyas joined
timotimo jnthn: this one shouldn't require too much wakeness: can curcode + gedcodeobj be spesh'd into something static? i.e. constant folded? does having two pairs of these close by each other make sense? i.e. will the value that you get from them change through some kind of event? 13:35
because every single one of our protos seems to be doing curcode + getcodeobj twice in relatively close succession 13:36
JimmyZ oh, good morning, timotimo!
jnthn No, because curcode = current closure
timotimo OK 13:37
good morning JimmyZ :)
jnthn And since generated proto methods all come from a single static thing...
...it'd surely screw up to constant fold it to...something.
timotimo OK :)
since you've been gone, i've been grasping for straws regarding performance improvements m) 13:38
jnthn For non-generated we might do better but...hmm.
It's not immediately clear how
If we can do better we'd want to just get code-gen to spit out a wval when we know it's OK...
timotimo OK, that'd be good on all back-ends
i found something else that stumped me. in control.pm, we have a "my sub THROW" and we use that in take and take-rw 13:39
now the sub THROW ends in "0" explicitly so that there's no return-value overhead
but the users of THROW all inspect the return value for sinkability ... and they also look up "&THROW" indirectly
i don't understand why :\
is that just something the optimizer should have picked up? or should code-gen have done better? 13:42
maybe i should inspect what spesh does with that before going on
jnthn Not sure without looking 13:46
The indirect lookup sounds like a code-gen thing though
timotimo i'll follow that hint
even though take and take-rw are likely to see a big change for glr, right? 13:47
jnthn Well, if those two are the ONLY things using THROW, we may just consider inlining it by hand into both 13:50
timotimo mhm 13:51
jnthn And get rid of THROW
timotimo i think 4 things actually use it
or something
jnthn Ah
timotimo i'll look
jnthn That's...a little worse :)
timotimo last, next, redo, succeed, proceed all use it 13:52
with a "my sub", is it guaranteed to not change? i would assume so
surely i'll find more meaningful stuff to change for performance improvements, though. 13:53
and the code-gen for sink is going to change soon anyway; iirc sinking will become part of p6decontrv or something like that
one other thing i saw: every nativecallinvoke is preceeded by a call to a generic "decontall" sub 13:54
i was thinking we should be able to do better than that, especially since we know the number of arguments to the native call when we spesh
but i'm bothering you too much now :) 13:55
jnthn "my" just means "lexical"
uh
And it's the default
So the "my" is kinda meangingless here
Or at least, surpluss
timotimo oh 13:57
it doesn't have "my" there (any more?)
i didn't see the jvm ifdefs, it seems like
jnthn: t.h8.lv/add_core_op_extops_and_dataflow.svg ā† you might like this :3 14:01
jnthn So diagram! 14:10
Wow :)
15:30 colomon_ joined 15:38 colomon_ joined 16:20 zakharyas joined 17:28 zakharyas1 joined 17:50 FROGGS_ joined
timotimo jnthn: we have sp_findmeth, maybe sp_can_s would be interesting to have, too? 18:59
i've seen multiple cases where we first can_s, then sp_findmeth
jnthn We'd really like to find a way to turn those lookups into a single hash lookup... 19:01
timotimo that's a change that probably goes much deeper than what i feel comfortable trying to do myself :S 19:15
i see no elegant way to improve the decont_all situation, as we don't know the number of arguments coming in at code-gen time :(
and the lack of working call graph pruning kinda drives me up the wall, but i don't think i can figure out what's wrong with the code i already pushed :\ 19:16
interesting 19:32
for postinc native spends 15% of its time garbage collecting, but the allocations tab only shows 2 allocations all in all
782 gc runs for 40964096 runs, each taking about 7ms 19:33
jnthn Then I'm guessing the allocations tab is missing something... 19:36
timotimo after a whole bunch of GCs where we promote nothing at all, would it be sensible to do a gen2 collection to make future collections cheaper? 19:37
jnthn Make them cheaper how? 19:38
timotimo maybe having fewer gen2-to-nursery links?
jnthn We prunse those each nursery collect, no?
Unless you mean, the gen2 objects themselves may go away 19:39
In which case, yes, it may be worth having an absolute threshold as well as the promotion based one, but we'd want to set it fairly high.
And I'm still not sure it's worth it 19:40
timotimo i'll try setting it to 200 and see what happens to the gc times after vs before 19:44
no visible change at all 19:56
except it takes 10x as long for a single gc run, which is the full collection 19:57
i'm now analyzing the nursery's contents 21:12
jnthn: gist.github.com/timo/1eef31e3be2e7e816bef - this is what the nursery looks like in that benchmark 21:22
we shouldn't be putting that many MVMStaticFrame instances into the nursery, aye? 21:23
shouldn't we have only a single one per existing frame
i don't know very much about moarvm, but i do know we shouldn't have 11k static frames being created for each of the gc runs of which this benchmark has 782 21:25
22:21 lizmat joined 22:45 woolfy joined
timotimo so i've breakpointed the gc_allocate for MVMStaticFrame and that didn't get triggered after the first gc run was over 23:10
so we're just cloning them memcpy-style?
all over the place?
freshcoderef creates static frames by cloning 23:13