github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
jnthn I suspect merge_bbs is broken. Turns out that my moving it back to where it used to be both fixed something and broke something. 00:00
Geth: On vacation? :) 00:13
Ah well, I pushed stuff
'night o/
00:14 Geth left 00:43 Kaiepi left 01:48 Kingsy9 joined 01:50 Kingsy9 left 01:56 avar left 03:07 avar joined, avar left, avar joined, p6bannerbot sets mode: +v avar 04:20 stmuk_ joined 04:21 p6bannerbot sets mode: +v stmuk_ 04:22 stmuk left 05:30 stmuk joined 05:31 p6bannerbot sets mode: +v stmuk, stmuk_ left 05:42 stmuk_ joined 05:43 p6bannerbot sets mode: +v stmuk_ 05:44 stmuk left 05:51 stmuk joined 05:52 p6bannerbot sets mode: +v stmuk 05:54 stmuk_ left, stmuk_ joined 05:55 p6bannerbot sets mode: +v stmuk_ 05:56 stmuk left 06:25 stmuk joined 06:26 p6bannerbot sets mode: +v stmuk 06:27 stmuk_ left 07:06 fake_space_whale left 07:15 Croepha13 joined 07:16 p6bannerbot sets mode: +v Croepha13 07:20 Croepha13 left 09:19 timotimo left, timotimo joined, asimov.freenode.net sets mode: +v timotimo, p6bannerbot sets mode: +v timotimo 09:26 stmuk_ joined 09:27 p6bannerbot sets mode: +v stmuk_ 09:28 stmuk left 11:21 iDanoo9 joined 11:30 iDanoo9 left 12:26 zakharyas joined, p6bannerbot sets mode: +v zakharyas 12:36 MasterDuke left 12:43 zakharyas left
dogbert17 .tell jnthn have found a way to repro the 'SC index out of range' problem consistently. 12:55
yoleaux dogbert17: I'll pass your message to jnthn.
13:01 brrt joined 13:02 p6bannerbot sets mode: +v brrt
brrt \o 13:05
dogbert17 hello brrt
brrt ohai dogbert17 13:07
dogbert17 .tell jnthn gist.github.com/dogbert17/23b5e847...c582760963 13:08
yoleaux dogbert17: I'll pass your message to jnthn.
dogbert17 it's a bit empty here atm
brrt: are you working on the (expr) JIT? 13:11
jnthn .
yoleaux 12:55Z <dogbert17> jnthn: have found a way to repro the 'SC index out of range' problem consistently.
13:08Z <dogbert17> jnthn: gist.github.com/dogbert17/23b5e847...c582760963
13:13 dalek joined, p6lert_ left, synopsebot_ left, synopsebot joined, Geth joined, p6lert joined 13:14 p6bannerbot sets mode: +v dalek, p6bannerbot sets mode: +v synopsebot, p6bannerbot sets mode: +v Geth, p6bannerbot sets mode: +v p6lert 13:18 brrt left, brrt joined 13:19 p6bannerbot sets mode: +v brrt 13:26 Kaiepi joined 13:27 p6bannerbot sets mode: +v Kaiepi 13:33 brrt left 13:55 brrt joined 13:56 p6bannerbot sets mode: +v brrt 14:51 notable6 joined, p6bannerbot sets mode: +v notable6 15:01 stmuk joined 15:02 p6bannerbot sets mode: +v stmuk 15:03 stmuk_ left 15:29 stmuk_ joined, p6bannerbot sets mode: +v stmuk_ 15:30 stmuk left 15:36 stmuk joined, p6bannerbot sets mode: +v stmuk 15:37 stmuk_ left 15:46 fake_space_whale joined, p6bannerbot sets mode: +v fake_space_whale 15:51 brrt left 16:01 JSharp20 joined 16:02 JSharp20 left
timotimo random observation: the substr method of Str won't get inlined, because it uses getlexref_i to pass $from to SUBSTR-START-OOR 16:54
that's the (Str:D: Int:D \start) candidate 16:55
16:56 zakharyas joined 16:57 p6bannerbot sets mode: +v zakharyas 17:08 zakharyas left, zakharyas joined 17:09 p6bannerbot sets mode: +v zakharyas 17:17 fake_space_whale left 17:53 zakharyas left 18:21 brrt joined 18:24 brrt left, brrt joined 18:25 p6bannerbot sets mode: +v brrt
brrt \o 18:25
blog post notification: brrt-to-the-future.blogspot.com/201...hmark.html
the tl;dr - postrelease-opts can lead to an order-of-5 benchmark improvement 18:27
jnthn++ timotimo++
timotimo oooooh 18:28
brrt: you got the origin of the benchmark right 18:30
brrt: i think the parenthesis in the paragraph below "and yet..." is wrong: "perl (the interpreter) does not"? 18:31
"this sbenchmark"
and the parenthesized sentence after that isn't closed
"c-stype for loops" -> "c-style for loops" 18:32
brrt thank you 18:33
timotimo yw!
thanks for blogging :)
brrt :-) 18:34
timotimo yeah, those numbers at the end of the post are nice 18:35
brrt :-D
yes
timotimo would you say that "natives don't work" isn't as true any more in postrelease-opts? 18:36
oh
you didn't re-run the code for that after the branch 18:37
could you do that, too, real quick? reciprocal-while.pl6 with "num" in there?
brrt yeah, I tried that; it's 6.4s rather than 4.6s 18:38
timotimo instead of the 36s :D
hm
m: say 6.4 / 4.6; say 36 / 26.5
camelia 1.391304
1.358491
timotimo well, it got a little better! 18:39
brrt 6.6s now 18:40
:-)
timotimo the shortest time is always the truth :P
brrt yeah. 18:41
curiously, if you make the native $i an int, we slow it down to about 11s
probably because it has to do a box, then a conversion
timotimo probably; do you also turn 5e7 into an int for that? 18:42
manually, i mean
brrt i'd expect 5e7 is a float?
timotimo right
null r25(1) 18:43
isnull r26(1), r25(1)
if_i r26(1), BB(12)
brrt o.O 18:46
timotimo that's some very good code 18:47
brrt seems there might be an optimization opportunity there
timotimo that comes from infix:</> 18:48
aha, takedispatcher
that if_i will only be jumping over a single bindlex instruction with outers being 0 18:50
i wonder how costly that actually is?
brrt well, we can eliminate the whole thing, including the bindlex 18:52
and removing the goto entirely
timotimo you know, we do have information on how often a branch was taken during logging, if there's a logged instruction close to the branch point
brrt we could potentially use that
but.
I'm thinking it would be not so hard to do this explicitly 18:53
I'm imagining a structure that logs basic block pairs
I.e. trace the graph as an edge list
The nice thing about that is that the size of that graph is bounded 18:54
a basic block can be followed by:
- the linear successor
- a conditional successor 18:55
timotimo don't forget we sometimes have very many successors, in the case of handlers
brrt - one of the N handlers that are active
timotimo right
brrt that is still a tight bound
far fewer than N^2 for N basic blocks
- a callee basic block (potentially many) 18:56
- the callers' basic block (potentially many)
timotimo we don't explicitly have the BB structure in the interpreter stage before spesh comes into the picture, though
brrt restricting to within-routine tracing, we have a nice tight bound
timotimo at that point the bytecode comes straight from the bytecode file, and has only been through the verifier
brrt correct, but we can insert basic block entry logging instructions 18:57
I expect the expense of that is relatively small
timotimo how do we do that if we don't know where basic blocks start and end?
brrt when we insert logs?
timotimo we don't insert logs any more :)
brrt oh
really
hmm
well, then we do it during verification
timotimo yeah, all those ops now just naturally log when we're pre-spesh
and those we don't turn into something better still get turned into a "does not log" variant of the same op 18:58
like sp_getlex_ins is an example of that if i'm not mistaken
brrt hmm
I actually don't expect that inserting logging statements is very expensive during bytecode verification, since we just need the CFG
timotimo bytecode verification is currently an operation that doesn't mutate, except for endian-swapping 18:59
it neither mutates nor allocates something new
brrt hmmm
timotimo we might as well have variants of our if and unless ops and sp_ variants and have the first kind log jumps and the other not log jumps 19:00
brrt yeah
timotimo (no need for a non-logging if_o or unless_o, though, since we always turn that into an if_i or unless_i)
we wouldn't catch throwish things that way, though
brrt you can log that during the exception handling? 19:01
alternatively, we can have the logging statements inserted by the mast compiler itself 19:02
timotimo but we also want non-logging versions of things 19:04
taking into account the many adhoc exceptions we have ... :\
do you have an intuition how much reordering BBs so that the most likely sequence is linear could give us? 19:05
brrt That by itself, not so much 19:07
What I expect to be the big wins, are the collection of 'deep' traces 19:09
That combined with per-trace escape analysis, I expect we can eliminate most of the overhead of perl6 19:10
timotimo i can imagine 19:11
brrt the idea being that if you have a trace through a number of routines and can treat that as one thing, you can aggressively throw out basic blocks based on probability
timotimo whenever there's a :noinline instruction, having a trace instead of a regular spesh, it'd just Not Be A Problemā„¢ 19:12
brrt and (assuming that the generality and indirection is useful for dealing with exceptional cases, rather than the common case), that will allow us to eliminate ever more levels of indirectoin
right 19:13
the trace will provide optimization information even though we can't inline
(one thing we can do already, is install a specialized candidate for uninlineable attempts at inlines, if such a candidate does not yet exist) 19:14
timotimo not sure how you mean that
brrt so, we currently have a fallback mechanism for when we know the invocant, but there isn't a specialized candidate yet 19:15
see src/spesh/inline.c:230 19:16
MVM_spesh_inline_try_get_graph_from_unspecialized
if that happens, and the graph cannot be inlined, we just throw it away 19:17
what we could do instead, is install a specialized candidate instead for the invocant, and call that directly with some sp_invoke call 19:18
timotimo that sounds like we don't do any argument specialization at all 19:26
brrt Not for this case, no 19:27
we create a specialized graph, then if we find that we can't inline, we just drop
timotimo but if we're inlining it, we've actually got all the facts already, too?
brrt at least some, eys
*yes
timotimo i'm not sure what "on the invocant" would mean; you mean we already know the type of the first argument because otherwise how would we have found the right code object? 19:28
brrt I mean invokee, not invocant :-) 19:29
So, we know the code that we're going to invoke
timotimo i'm not entirely sure how that's not what we're already doing; if we weren't able to inline the code we got "from_unspecialized", what are we supposed to specialize? 19:30
if i'm being too dense, feel free to just stop explaining it to me %) 19:31
brrt if we can't inline it, that doesn't mean we can't specialize it altogether 19:32
so the order of operations is this (to the best of my understanding) 19:33
- we start specializing a frame
- the frame wants to call a piece of code
- we can resolve that piece of code during specialization
- we try to inline that piece of code
- we find that there is not already a specialized candidate
- we proceed to generate a specialized candidate and try to inline it 19:34
- if we find that we cannot inline the resulting graph, we drop it
- however, that leaves us with no inlining and no specialized candidate 19:35
timotimo i think i get it now
brrt - I claim that we would be better of generating a specialized candidate and invoking that :-)
timotimo but if we just blindly specialize a candidate without much information, we'll not be logging from it at all 19:36
brrt true
but that is a tradeoff we already make when we specialize unspecialized graphs
for inlining
timotimo also true
though when we inline, we can benefit from facts we have in the outside frame 19:37
brrt all those facts ought to be available based on the call info we have though
it's the same information - the subset of facts from the caller graph, that is passed through to the callee graph 19:38
timotimo it'd only be available if we ever called the thing 19:39
not if it's super rare
right?
brrt true
in which case, it'd probably be preferable to drop it altogether, imho 19:40
(which is why we need trace specialization :-))
timotimo i'm not saying we shouldn't go for trace specialization :)
brrt news.ycombinator.com/item?id=17842054 maybe upvote? ;-) 19:41
19:41 avar left, avar joined, avar left, avar joined, p6bannerbot sets mode: +v avar
timotimo don't have an account on that site 19:41
brrt o.O
you are wise
19:42 p6bannerbot sets mode: +v avar
timotimo eh, i've got an account on reddit 19:42
brrt you can post it on perl reddit? 19:45
timotimo i'm not actually posting on reddit
just commenting every now and then 19:46
it seems like reddit no longer splits karma up into post and comment karma
otherwise i could have shown you
i have 14 posts over a period of 4 years 19:47
nine I keep wondering if there is a way to keep an encoded version of some string around attached to a string object. That could speed up any use case where we just get a string from outside and pass it back to the outside later on in the same encoding. 19:48
timotimo either we make VMString a bunch bigger, or we have a per-thread cache that holds weak references or something ... 19:51
brrt we have some room for making VMString a bit bigger
nine Or we have an additional repr (maybe a reappropriated CStr) for that. But we'd need to handle this specially in all the necessary places. 19:52
brrt: we do?
timotimo it'd have both a reference to the buffer and an identifier for the encoding used, no?
or maybe an external object that holds encoding name and resulting buffer
brrt yeah, based on the theory that you kind of want your object either much smaller than a cache line, or as big as one and aligned on them
so.... if I take a look at the current state....
timotimo personally i use pahole for that 19:53
don't forget strings can be inlined into P6opaque, though
brrt MVMString is 48 bytes big, of which 24 is the header 19:54
we can add 16 bytes, that's two pointers 19:55
if you want to inline the MVMStringBody into P6opaque (didn't know we could do that), then I think we have 8 bytes left still?
because P6opaque has this redirect pointer (that, if at all possible, I'd love to get rid of, but haven't investigated how we might do that) 19:56
and we have plenty of flags space left to indicate that we have a representation
timotimo P6opaque has this flattened_stables thing 19:57
brrt sooooooo... I say go for it
timotimo int, num, str, those at the very least can go in there
brrt there is also still the plan to do in situ strings
timotimo aye
brrt that would probably give you an ascii literal for 90% of strings, right there 19:58
nine But maybe we wouldn't even have to make MVMStringBody larger. It already deals with multiple representations of strings. We could just add one more which looks like struct { MVMString *vmstr; char *encoded_str; MVMuint8 encoding; } 20:12
brrt Hmmm
I'm not convinced I like having one more representation, tbh
Frankly, take that byte. It's cheap
(not byte, 8 bytes) 20:13
nine Can't measure any performance difference in test-t.pl 20:18
(with an additional pointer in MVMStringBody)
brrt hmmm 20:22
maybe the cost of reallocating an encoded string isn't so large
nine To be clear: I just added the field to the struct to see if memory pressure or cache usage may affect performance. But test-t may just not be a good benchmark for that. 20:26
brrt oh, I see 20:27
:-)
20:37 stmuk left, stmuk joined 20:38 p6bannerbot sets mode: +v stmuk 20:40 stmuk_ joined 20:41 p6bannerbot sets mode: +v stmuk_ 20:43 stmuk left, fake_space_whale joined 20:44 p6bannerbot sets mode: +v fake_space_whale 21:01 stmuk joined 21:02 p6bannerbot sets mode: +v stmuk 21:03 stmuk_ left 21:13 stmuk_ joined 21:14 stmuk left, p6bannerbot sets mode: +v stmuk_ 21:18 stmuk_ left
jnthn brrt++ # really intersting post 21:19
Also, postrelase-opts is more dramatic than I'd expected for that benchmark o.O
21:19 stmuk joined
jnthn timotimo: About the takedispatcher thing - yesterday I put in the smarts to optimize it from takedispatcher to null. However, we do the transform inside of the inline. 21:20
21:20 p6bannerbot sets mode: +v stmuk
jnthn We currently cannot optimize much inside of inlines, however, because we don't track their deopt points, so we don't know that we're not utterly busting deopt 21:20
That's fixable, but it's - like all deopt-related things - hard, and when I implement it, I know I'm in for 1-2 weeks of debugging upshot. 21:21
And deopt bugs are up there with GC bugs in being annoying to find. Especially as the things that reliably expose them tend to be very large. 21:22
21:24 stmuk__ joined, stmuk left 21:25 p6bannerbot sets mode: +v stmuk__ 21:29 stmuk__ left
brrt aye 21:31
21:31 stmuk__ joined 21:32 p6bannerbot sets mode: +v stmuk__
brrt afk 21:32
21:32 brrt left 21:36 stmuk__ left 21:38 stmuk__ joined 21:39 p6bannerbot sets mode: +v stmuk__
jnthn I see I need to spend some time looking at our Perl 6 natives performance too :) 21:43
22:31 stmuk joined 22:32 p6bannerbot sets mode: +v stmuk 22:33 stmuk__ left 22:36 stmuk_ joined 22:37 p6bannerbot sets mode: +v stmuk_ 22:38 stmuk left 22:39 stmuk joined, p6bannerbot sets mode: +v stmuk 22:41 stmuk_ left 23:55 bladernr5 joined 23:59 bladernr5 left