01:05 vendethiel joined 01:29 vendethiel joined 01:52 vendethiel joined 02:32 btyler joined, dalek joined, hoelzro joined, oetiker joined, lue joined 02:41 lue joined 03:29 vendethiel joined 04:28 JimmyZ joined 05:04 kjs_ joined 05:18 vendethiel joined 06:04 vendethiel joined 06:40 vendethiel joined 11:10 oetiker joined, hoelzro joined, dalek joined, btyler joined 11:13 vendethiel joined 11:29 zakharyas joined 11:37 zakharyas1 joined
dalek arVM: 898fe2a | (Timo Paulssen)++ | src/spesh/optimize.c:
integrate set with previous instruction if possible

cases where an operation sets an intermediate register that only gets read by a set immediately following that operation occur frequently enough that this should be worth the tiny bit of analysis work.
11:40
arVM: 63c5d88 | (Timo Paulssen)++ | src/spesh/args.c:
non-passed optional parameters make some BBs unreachable

they don't get removed yet, perhaps because they are still referred to by the dominance tree?
12:32
timotimo ^- this also wants an accompanying case for the other branch of named optional parameters 12:33
where we turn the instruction into a goto and ignore the "other" successor
and i'm not entirely sure what keeps the BBs alive that no longer get referred to 12:34
oh, i know what's wrong with that 12:38
the potential goto we're kicking out would skip the next block. we're just turning a potential skip into a guaranteed fall-through 12:39
that's fair, then
future optimizations may be happy to see the predecessors disappear for that particular future block
12:39 dalek joined 12:53 tgt joined
timotimo i may have to build a "not quite legit" fact discovery thing that relies on the mechanism we use to turn the SSA back into regular bytecode ... 13:26
by looking at version gaps and building an artificial "set", or alternatively raising the version on the instruction that writes to a too-low version ...
it seems like it's now safe to uncomment the optimize_can_op now 13:33
timotimo is spec-testing right now
dalek arVM: f89f527 | (Timo Paulssen)++ | src/spesh/optimize.c:
re-activate optimize_can_op; survives spec tests now.
13:44
arVM: 2aa669d | (Timo Paulssen)++ | src/spesh/optimize.c:
turning stuff into sets should also cause fact copying
timotimo hmpf. 13:57
given a not_i whose source operand's value is known, setting the target operand's known value as well makes the core setting compilation fail 13:58
could it be we're setting a known_value somewhere that's not actually correct?!
dalek arVM/spesh_constant_folding: 7f1646f | (Timo Paulssen)++ | src/spesh/optimize.c:
trivial constant folding for not_i

this breaks the rakudo build; maybe there's an optimization somewhere that sets a KNOWN_VALUE that ends up not having the correct value set?
14:13
14:42 FROGGS[mobile] joined
timotimo is it expected that the benchmark '[+] (1 .. 10_1024).comb>>.Int' will spend a whole lot of time in bind and bind_one_param? (compared to the rest of things, that is) 14:57
13.49% of exclusive time spent in "find_best_dispatchee", 11.55% of x-time in bind and 11.05% in bind_one_param
next one is &return with 9.16% x-time spent 14:58
also the GC activity is fun to look at. first a bunch of stuff gets promoted, then 10 collections long it'll only promote and retain a bit of stuff, almost exactly the same amount each time 14:59
then it basically promotes + retains about 100% each time for 35 runs
then it promotes & retains almost nothing at all for almost 500 runs 15:00
then it promotes and retains wildly fluctuating amounts between 25% and 100%
interestingly, the profiler notices exactly two allocations in total 15:01
one BOOTCode and one Block
huh. 'my int $i = 0; while $i < 1024 { my int $j = 0; while $j < 1024 { $k = $i + $j; $j = $j + 1 }; $i = $i + 1 }; say $k' also spends almost all of its time in find_best_dispatchee + bind_one_param and bind 15:06
23.75% + 23.29% + 21.91% 15:07
then 10% in at_key and 5.6% in a different at_key
15:26 dalek joined 16:10 kjs_ joined 16:30 kjs_ joined
japhb That seems ... wildly wrong. All types are known and native; there are no loop controls and the loops are of the simplest non-infinite type. Why is this not optimized out the wazoo, and never dispatching anywhere? 16:36
timotimo good q. 16:46
i would have expected that as well.
something quite weird happens with another of the benchmarks where there's a whole bunch of GC runs that discard almost 100% of their data, but the allocations tab doesn't show much; a manual introspection of the nursery with my gdb helper thingie reveals that the nursery gets filled up with MVMStaticFrame instances over and over and over 16:50
on the one hand, these things annoy me greatly because i have no clue where to look to find out what's wrong 17:08
on the other hand: yay, there's still performance improvements that can be had!
also: why does it look like the loops in that last benchmark don't get inlined?
timotimo is hoping for a little bit of jnthn magic to fix these things up a bit 17:09
i wonder if we do worse than we used to in these benchmarks? i can't really run benchmarks on my desktop at the moment, as its ipv4 is b0rked from a distro upgrade ... 17:11
17:52 FROGGS__ joined 17:58 zakharyas joined 18:28 colomon joined
timotimo i see that our profiler has no way to handle operations that may or may not allocate 18:32
how about adding a "does the object that was passed to prof_allocated reside at the very end of the nursery?" and set more extops in rakudo to "ALLOCATING"?
jnthn: sorry for the huge amount of backlog filler %) 18:41
so ... the at_key use comes from Stash 18:51
why the hell would it want to access a Stash object rather than going directly through the lexpad for that code?!
dalek arVM: a1a9f88 | (Timo Paulssen)++ | src/core/interp.c:
when bindlex fails, we should report "bindlex", not "getlex"
19:01
arVM: e92c6c8 | (Timo Paulssen)++ | src/profiler/instrument.c:
takeclosure is a very popular allocating op.
timotimo ^- this gives us more precise profiles
bind_one_param allocates 4194308 BOOTCode objects in the doubly nested while loop we have up there 19:02
which ... just wow.
is_bindable and at_key give 1048577 each and find_best_dispatchee is responsible for another 2097160 19:03
i *still* don't know why we're going through Stash, though
we really should be inlining these blocks anyway
m: say "test"
camelia rakudo-moar 3bbf7b: OUTPUT«test␤»
timotimo m: "that benchmark allocates { 8389651 / 4194308 } as many BOOTCode as it does Scalar" 19:04
camelia ( no output )
timotimo m: say "that benchmark allocates { 8389651 / 4194308 } as many BOOTCode as it does Scalar"
camelia rakudo-moar 3bbf7b: OUTPUT«that benchmark allocates 2.00024676 as many BOOTCode as it does Scalar␤»
timotimo that's pretty impressive.
but all of these Scalar allocations happen in at_key and bind_one_parameter
japhb: see what i did there? for some reason i was missing the declaration of $k as a lexical and that made stuff blow up >_< 19:26
and of course that's already fixed in newest perl6-bench 19:27
grrr. been hunting ghosts again 19:28
jnthn: if you don't have time (or energy (which i would understand)) to backlog over my wall of text, i'd like you to answer at least this simple-ish question: 19:29
for ops (i've only looked at extops here, really) that only sometimes allocate, should there be a check "does the value we got back from that operation look like it was allocated very recently?"
jnthn Well, for simplicity we may want to consider just always putting the check in. 19:31
Sicne it goes in the profiling code
timotimo that's what i thought; so you think the check sounds like a sane idea?
i'd probably compare if object address + object size is equal to or at least "close to" the current allocation pointer of the nursery 19:32
hm, but that wouldn't cover allocating "directly to gen2"
jnthn To the degree a guy who has been traveling for 20 hours knows what's sane... :P
We don't need to cover "directly to gen2" really
timotimo OK
jnthn It's typically done by thing slike deserialization or bytecode loading
Which aren't really anything the user can do anything about in their program
timotimo fair enough 19:34
jnthn: github.com/rakudo/rakudo/blob/nom/...aops.pm#L3 - do you know of a way how to make this allocate a ludicrous amount of BOOTCode? 19:35
could nqp::getstaticcode or something similar be used for that?
maybe a macro would be sensible ... not that we have that working yet %)
jnthn Well, I put a desugars mechanism into Actions with the idea of turning some of these very common meta-ops into just some QAST nodes 19:37
I can live with all the list-processing meta-ops involving a bit of HOP; you're normally dealing with a bunch of data.
timotimo ah 19:38
jnthn But would prefer the assign and not ones just do some code-gen, I think...
timotimo great. what should i grep for to find that?
jnthn But can only do that when you know they are executing immediately 19:39
So it's not so straightforward
timotimo oh, you mean we have to check we're not doing something like my &foo = &[+=]
jnthn Exact.
timotimo i have an idea how to figure that out in the optimizer. in the Actions, however ... not so much
maybe it could be a use case for Want? 19:40
er, no, that doesn't make sense
if += appears in void context, it wouldn't do anything at all
ah, the desugar thing is the very first thing in actions 19:42
jnthn The optimizer may actually be a much easier place than the desugar... 19:46
Since you have the context to hand
timotimo that should decrease the memory pressure on tight loops that use += a *whole* lot 19:47
19:50 Ven joined
timotimo 294 collections instead of 938 when i write the += out 19:51
jnthn so collect /o\ 19:55
bbiab
20:10 Ven joined
dalek arVM/6pe: 5a3f555 | jonathan++ | docs/6model-parametric-extensions.markdown:
Start documenting the parametric 6model design.
20:59
arVM/6pe: b9c4ee9 | jonathan++ | / (6 files):
Stub parametricity-related ops.
arVM/6pe: 7812954 | jonathan++ | src/6model/6model.h:
STable extensions for parametricity.
arVM/6pe: 5edcbb5 | jonathan++ | src/gc/collect.c:
GC marking for STable parametricity bits.
jnthn Hm, and there were more commits, but I overflew dalek.... 21:00
FROGGS__ jnthn: that's a prep for NSA? 21:02
such abbr
jnthn Amongst other things, yes. 21:03
It's one of the two main VM-level pieces needed for NSA
FROGGS__ what's the other one? 21:04
jnthn Well, or it will be when I get it done. :P
Other one is the native references.
I'm very contented with the 6pe design.
Well, what I have of it so far
FROGGS that sounds good to me :o)
jnthn Still need some more brain cycles on the native references stuff. Something felt a little off last time I was working on those. 21:05
Probably I just need some more concentrated, non-exhausted time.
Train journeys tend to be good thinking time, and I'll be back and forth to Stockholm like a yoyo for the next several weeks... So I think I'll get a design I like straightened out and coded up within the next weeks. :) 21:06
FROGGS what are native refs in one sentence?
jnthn Well, consider the naive compilation of: my $x := @omg-i'm-a-native-int-array[42]; $x = 69; 21:08
There's no Scalar container in a native array. A native reference is an assignable thingy that is a reference to a native location.
FROGGS ohh, understood 21:09
jnthn They're a bit curious to design because you're trying to optimize for being able to kill them off in the earliest possible optimizer. :)
FROGGS thanks :o)
jnthn That is, those that Perl6::Optimizer can kill off, it should. What it can't, spesh + inlining should be able to do something about.
At least, for the kinds of cases people are likely to write. 21:10
So jetlag. Very bedtime. zzz 21:12
TimToady o/
FROGGS gnight jnthn :o)
timotimo &infix:<+> is put into the metaop as a QAST::Var lexical 22:02
that's a tiny bit problematic
FROGGS ⁺ <--- just a tiny bit 22:03
timotimo :)
i wonder how i should analyze this to figure out it's not going to change under my feet
m: &infix:<+> = sub ($a, $b) { 1 }; 22:04
camelia rakudo-moar 3bbf7b: OUTPUT«Cannot modify an immutable Sub+{<anon>}+{Precedence}␤ in block <unit> at /tmp/1sBJjsoRIM:1␤␤»
FROGGS m: my &infix:<+> = sub ($a, $b) { 1 };
camelia ( no output )
FROGGS would that hurt also?
timotimo no 22:11
it's just being referenced
oh, i'd just copy the name over into the call's name
and that'd be fine