| IRC logs at
Set by AlexDaniel on 12 June 2018.
00:25 MasterDuke joined 00:27 MasterDuke left, MasterDuke joined 06:10 patrickb joined 06:24 domidumont joined 06:47 squashable6 left 06:48 squashable6 joined 07:48 zakharyas joined 08:33 lizmat left 08:57 lizmat joined 09:01 brrt joined
brrt \o 09:06
yoleaux 15 May 2019 21:04Z <samcv> brrt: ik hub gereageerd
brrt thank you samcv
fwiw, malloc_trim manual page claims it is a GNU extension, so maybe we just an '#ifdef _GNU_SOURCE' 09:07
09:12 squashable6 left 09:15 squashable6 joined 09:33 sena_kun joined 09:37 brrt left 10:01 domidumont left
samcv .tell nope. others have it too 10:22
yoleaux samcv: What kind of a name is "nope."?!
samcv there's no reliable way to do it other than to actually check
nwc10 o/ 10:23
.tell brrt o/ -- I was in a meeting
yoleaux nwc10: I'll pass your message to brrt.
11:01 Voldenet left 11:07 Voldenet joined, Voldenet left, Voldenet joined 11:12 zakharyas left 11:16 squashable6 left 11:21 brrt joined, squashable6 joined
brrt \o 11:48
yoleaux 10:23Z <nwc10> brrt: o/ -- I was in a meeting
brrt hehe
nwc10 manual o/ 11:50
brrt samcv: then a probe is in order, yes 11:54
jnthn o/ 12:15
12:15 domidumont joined 12:21 MasterDuke left
jnthn Today working on the Int EA some more. First is clearing up the now-redundant two-phase optimization of that so we do all the stuff in PEA 12:23
12:39 robertle joined 13:04 robertle left 13:06 robertle joined
Geth MoarVM/more-pea: 14a35bbc14 | (Jonathan Worthington)++ | 10 files
Leave all big integer op devirt to the EA

Previously, we had:
1. A set of devirtualized ops for a few common big integer operations.
   We rewrote big integer opts into these in the post-inline pass.
2. EA over these ops. ... (11 more lines)
jnthn That in isolation does the right thing, but produces something rather explosive, which seems to be because of some missing alias handling in the materialization phase... 13:07
13:12 zakharyas joined 13:26 lizmat left 13:50 brrt left 14:02 robertle left
jnthn Darn, this is all a bit tricky. :) 14:16
14:17 robertle joined
jnthn But I think the SSA form failure that was blowing up the DU check in NQP NODELAY build can be detected (and so a mitigation introduced) with the same machinary we need to track uses of aliases of materialized things 14:17
Or in the case of provisional insertion of materializations at merge points, *any* uses 14:18
(So that we don't insert materializations that are never used)
14:26 brrt joined
brrt who knew, adaptive optimization would be tricky :-P 14:37
14:45 patrickb left
Geth MoarVM/more-pea: b5ec72d71d | (Jonathan Worthington)++ | 2 files
Keep track of inserted materialization transforms

So we can far more easily detect the case where we have a usage of a materialized value and insert it into the list of aliases to write into upon a materialization. This will also come in useful for fixing the SSA form problems encountered when we materialize an object on two different sides of a branch. No functional changes are intended as a result of this; it's just a preparatory refactor.
15:12 domidumont left 15:18 robertle left
brrt jnthn: left a comment; I think memmem isn't a function on windows, but we have an implementation stashed somewhere 15:18
yes, `src/platform/memmem.h` has 15:19
jnthn Hmm...I wonder if we can somehow use that to implement the macro? 15:23
brrt probably yes :-) 15:25
15:40 brrt left
Geth MoarVM/more-pea: e08f8f8a46 | (Jonathan Worthington)++ | src/spesh/pea.c
Handle uses of materialized objects

We need to track usages of materialized objects and make sure that we install them into all later-used registers that alias the object at the point it is materialized.
jnthn OK, that's one source of potential big explosions sorted out :) 15:52
timotimo on new year's eve we can pull out all those explosive commits again \o/ 15:53
jnthn The SSA fun with materializations still needs sorting out, though will return to the bigint bits that I was primarily planning to work on today before I realized we had this rather sizable problem :) 15:55
All of it needs sorting before I can even really start testing this and hutning regressions 15:56
16:08 robertle joined 16:20 lizmat joined 16:21 dogbert17 left 16:38 Kaiepi left
Geth MoarVM/more-pea: 9f340f5949 | (Jonathan Worthington)++ | 8 files
Use integer cache if materializing a big integer

This re-instates an optimization lost during the changes to the way we devirtualize big integer operations.
16:59 lizmat left 17:19 Kaiepi joined 17:28 lizmat joined
Geth MoarVM/more-pea: e61f028ed6 | (Jonathan Worthington)++ | 2 files
Unmaterialize big integers during PEA

So if we inline something that materializes a big integer to hand back as a result, then we can undo that materialization and optimize it further, hopefully resulting in further eliminated allocations.
timotimo ooh, unmaterialize, eh? 17:41
is that how we call that now
17:41 zakharyas left
jnthn m: say 401 / 496 17:45
camelia 0.808468
jnthn I've got a silly Rat benchmark just adding up big integers, which is actually quite a lot of work. But just having some allocations eliminated on the + and * (not anything else yet) seems to cut out nearly 20% of the GC runs. 17:47
timotimo nice
i'm conflicted on this
on the one hand i'm hoping that GC time is already super low for that
on the other hand i'm hoping getting rid of 20% gc runs makes a big difference :)
jnthn Well, it's not just GC cost 17:48
It's also in theory less memory access
Especially once the expression JIT can handle these things, given it knows how to avoid the memory traffic of VM register writes that aren't needed.
timotimo aye
jnthn Also, when we JIT these ops, we inline the math op and only make a function call if we're doing the big integer case. 17:49
timotimo indeed
jnthn Which also should help a good bit
timotimo i still kind of sort of wish we'll be able to up the "smallbigint" case to 64bit instead of just 32bit
jnthn Probably do-able with some effort 17:50
timotimo when we've done full unmaterialization, we can perhape keep calculations with 64bit until we have to materialize and turn it into a realbigint then
if we reach lower numbers before materialization, for example with gcd and rat normalization, we'll potentially end up with 32bit again for those cases 17:51
all in all, sounds pretty great
jnthn wowser, div_I is quite involved 17:54
timotimo wasn't it big integer division where a new algorithm was recently discovered 17:55
like, faster than what we had so far, but only if the number is bigger than a couple terabytes 17:56
japhb jnthn: Wouldn't want you to feel like this work was too easy. I mean, clearly it's been nearly effortless so far .... ;-P
timotimo or something like that :D
jnthn japhb: Well, mostly trying to figure out what is worth inlining into the assembly, and if the number of checks involved make it worth it at all...
17:57 brrt joined
timotimo hey brrt, what do you think about jnthn's divisive comments just now? 17:57
japhb jnthn: Meaning, that you lose too much in expanded code size inlining all the checks at the call site rather than having them in one place in the function?
timotimo (just a pun)
jnthn japhb: Pretty much 17:58
japhb timotimo: Look, if you're not part of the quotient, you're part of the remainder.
jnthn I notice that gcd can be written really quite compactly though :)
(when both numbers are in the smallint range)
japhb I love that algorithm.
jnthn So that one's almost certainly worth inlining :) 17:59
timotimo oh jnthn btw don't forget about libraries that can precompute a little constant + choice of one of a few formulas that will be very quick at computing a quotient or a remainder - for example 18:00
jnthn Probably too much of a diversion Right Now, but likely doable in the future :) 18:01
timotimo ah, sure
jnthn bah, I shoulda reduced the iteration could before callgrinding this rat program... 18:14
I figured I'd leave it run while I did a few admin bits and then glance it before heading home to make dinner. I've done my admin bits and it's still running...
18:18 brrt left
jnthn Oh, huh, it's actually running into big integer size, where I thought this example was too small for that 18:20
m: say int32.Range
camelia -2147483648..2147483647
jnthn m: say (10_000_000 * 1.5).nude 18:21
camelia (15000000 1)
jnthn m: say (10_000_001 * 1.5)
camelia 15000001.5
jnthn m: say (10_000_001 * 1.5).nude
camelia (30000003 2)
jnthn Well, should be home...will play more later 18:22
samcv the PR for malloc_trim() should now work on Musl and other libc's, whether or not they have malloc_trim() or not 18:32
18:38 lizmat left 18:44 Kaiepi left
timotimo jnthn: i think callgrind will give you results even if you ctrl-c, of course you can't compare the total number with a different full run 18:51
ugh, my head tho 18:54
18:58 zakharyas joined 19:22 brrt joined
brrt timotimo: I think not so much :-) 19:23
20:02 robertle left 20:34 zakharyas left 21:28 Kaiepi joined 21:32 brrt left 22:38 Kaiepi left 22:41 Kaiepi joined 22:42 Kaiepi left
jnthn I had a useful walk this evening. So for a while I've been wondering how we can replace the attrinit'd thing, because it gets in the way much. 23:01
Anyway, I realized there's only one combination of cases where we need it: when we have an attribute with a default AND the class has a BUILD submethod 23:02
Meaning we can elimiante it completely for any class where those two are not true
And for classes where it is true, we can restrict it to only apply to attributes that have a default
That'll get us a *really* long way, without having to deal with the really thorny problem.
timotimo oh, that does sound very good 23:05
jnthn In fact, I think if I can nail this, I think the Point benchmark will be able to eliminate every allocation except the VMHash for the named args and the Int that escapes into $total.
And replacing a hash which only ever has constant keys used on it is always a SMOP 23:06
That leaves $total, which means loops, which is not simple in any sense.
23:06 MasterDuke joined
timotimo oh yes 23:06
23:06 MasterDuke left, MasterDuke joined
jnthn I've been designing for the loop thing; I roughly get how to do the fixed point algorithm to decide what escapes or doesn't. 23:08
What I don't have a good feeling for yet is how to calculate the set of things that have to be deconstructed at the OSR point.
Surely it's strongly related to the deopt materialization handling though.
Given that deopt is a kind of OSR 23:09
timotimo yeah, just undeopt, how hard could it be 23:10
jnthn Anyway, I think I have a rough design for how to mostly (except in the case I mentioend) get rid of auto-viv stuff. 23:11
timotimo i like hearing that
jnthn Since it'll NULL fields out, we can finally get most attribute access down to...1 CPU instruction. 23:12
timotimo heck yeah
jnthn Except in the case that PEA gets it down to zero :P
timotimo right
jnthn I guess having a load less branches ("is this thing null?") will potentially help us a lot 23:13
MasterDuke timotimo: btw, it was a new optimized large int multiplication, not division 23:15
timotimo ah 23:16
it wasn't very relevant anyway :D :D
23:42 sena_kun left