github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
jnthn | I suspect merge_bbs is broken. Turns out that my moving it back to where it used to be both fixed something and broke something. | 00:00 | |
Geth: On vacation? :) | 00:13 | ||
Ah well, I pushed stuff | |||
'night o/ | |||
00:14
Geth left
00:43
Kaiepi left
01:48
Kingsy9 joined
01:50
Kingsy9 left
01:56
avar left
03:07
avar joined,
avar left,
avar joined,
p6bannerbot sets mode: +v avar
04:20
stmuk_ joined
04:21
p6bannerbot sets mode: +v stmuk_
04:22
stmuk left
05:30
stmuk joined
05:31
p6bannerbot sets mode: +v stmuk,
stmuk_ left
05:42
stmuk_ joined
05:43
p6bannerbot sets mode: +v stmuk_
05:44
stmuk left
05:51
stmuk joined
05:52
p6bannerbot sets mode: +v stmuk
05:54
stmuk_ left,
stmuk_ joined
05:55
p6bannerbot sets mode: +v stmuk_
05:56
stmuk left
06:25
stmuk joined
06:26
p6bannerbot sets mode: +v stmuk
06:27
stmuk_ left
07:06
fake_space_whale left
07:15
Croepha13 joined
07:16
p6bannerbot sets mode: +v Croepha13
07:20
Croepha13 left
09:19
timotimo left,
timotimo joined,
asimov.freenode.net sets mode: +v timotimo,
p6bannerbot sets mode: +v timotimo
09:26
stmuk_ joined
09:27
p6bannerbot sets mode: +v stmuk_
09:28
stmuk left
11:21
iDanoo9 joined
11:30
iDanoo9 left
12:26
zakharyas joined,
p6bannerbot sets mode: +v zakharyas
12:36
MasterDuke left
12:43
zakharyas left
|
|||
dogbert17 | .tell jnthn have found a way to repro the 'SC index out of range' problem consistently. | 12:55 | |
yoleaux | dogbert17: I'll pass your message to jnthn. | ||
13:01
brrt joined
13:02
p6bannerbot sets mode: +v brrt
|
|||
brrt | \o | 13:05 | |
dogbert17 | hello brrt | ||
brrt | ohai dogbert17 | 13:07 | |
dogbert17 | .tell jnthn gist.github.com/dogbert17/23b5e847...c582760963 | 13:08 | |
yoleaux | dogbert17: I'll pass your message to jnthn. | ||
dogbert17 | it's a bit empty here atm | ||
brrt: are you working on the (expr) JIT? | 13:11 | ||
jnthn | . | ||
yoleaux | 12:55Z <dogbert17> jnthn: have found a way to repro the 'SC index out of range' problem consistently. | ||
13:08Z <dogbert17> jnthn: gist.github.com/dogbert17/23b5e847...c582760963 | |||
13:13
dalek joined,
p6lert_ left,
synopsebot_ left,
synopsebot joined,
Geth joined,
p6lert joined
13:14
p6bannerbot sets mode: +v dalek,
p6bannerbot sets mode: +v synopsebot,
p6bannerbot sets mode: +v Geth,
p6bannerbot sets mode: +v p6lert
13:18
brrt left,
brrt joined
13:19
p6bannerbot sets mode: +v brrt
13:26
Kaiepi joined
13:27
p6bannerbot sets mode: +v Kaiepi
13:33
brrt left
13:55
brrt joined
13:56
p6bannerbot sets mode: +v brrt
14:51
notable6 joined,
p6bannerbot sets mode: +v notable6
15:01
stmuk joined
15:02
p6bannerbot sets mode: +v stmuk
15:03
stmuk_ left
15:29
stmuk_ joined,
p6bannerbot sets mode: +v stmuk_
15:30
stmuk left
15:36
stmuk joined,
p6bannerbot sets mode: +v stmuk
15:37
stmuk_ left
15:46
fake_space_whale joined,
p6bannerbot sets mode: +v fake_space_whale
15:51
brrt left
16:01
JSharp20 joined
16:02
JSharp20 left
|
|||
timotimo | random observation: the substr method of Str won't get inlined, because it uses getlexref_i to pass $from to SUBSTR-START-OOR | 16:54 | |
that's the (Str:D: Int:D \start) candidate | 16:55 | ||
16:56
zakharyas joined
16:57
p6bannerbot sets mode: +v zakharyas
17:08
zakharyas left,
zakharyas joined
17:09
p6bannerbot sets mode: +v zakharyas
17:17
fake_space_whale left
17:53
zakharyas left
18:21
brrt joined
18:24
brrt left,
brrt joined
18:25
p6bannerbot sets mode: +v brrt
|
|||
brrt | \o | 18:25 | |
blog post notification: brrt-to-the-future.blogspot.com/201...hmark.html | |||
the tl;dr - postrelease-opts can lead to an order-of-5 benchmark improvement | 18:27 | ||
jnthn++ timotimo++ | |||
timotimo | oooooh | 18:28 | |
brrt: you got the origin of the benchmark right | 18:30 | ||
brrt: i think the parenthesis in the paragraph below "and yet..." is wrong: "perl (the interpreter) does not"? | 18:31 | ||
"this sbenchmark" | |||
and the parenthesized sentence after that isn't closed | |||
"c-stype for loops" -> "c-style for loops" | 18:32 | ||
brrt | thank you | 18:33 | |
timotimo | yw! | ||
thanks for blogging :) | |||
brrt | :-) | 18:34 | |
timotimo | yeah, those numbers at the end of the post are nice | 18:35 | |
brrt | :-D | ||
yes | |||
timotimo | would you say that "natives don't work" isn't as true any more in postrelease-opts? | 18:36 | |
oh | |||
you didn't re-run the code for that after the branch | 18:37 | ||
could you do that, too, real quick? reciprocal-while.pl6 with "num" in there? | |||
brrt | yeah, I tried that; it's 6.4s rather than 4.6s | 18:38 | |
timotimo | instead of the 36s :D | ||
hm | |||
m: say 6.4 / 4.6; say 36 / 26.5 | |||
camelia | 1.391304 1.358491 |
||
timotimo | well, it got a little better! | 18:39 | |
brrt | 6.6s now | 18:40 | |
:-) | |||
timotimo | the shortest time is always the truth :P | ||
brrt | yeah. | 18:41 | |
curiously, if you make the native $i an int, we slow it down to about 11s | |||
probably because it has to do a box, then a conversion | |||
timotimo | probably; do you also turn 5e7 into an int for that? | 18:42 | |
manually, i mean | |||
brrt | i'd expect 5e7 is a float? | ||
timotimo | right | ||
null r25(1) | 18:43 | ||
isnull r26(1), r25(1) | |||
if_i r26(1), BB(12) | |||
brrt | o.O | 18:46 | |
timotimo | that's some very good code | 18:47 | |
brrt | seems there might be an optimization opportunity there | ||
timotimo | that comes from infix:</> | 18:48 | |
aha, takedispatcher | |||
that if_i will only be jumping over a single bindlex instruction with outers being 0 | 18:50 | ||
i wonder how costly that actually is? | |||
brrt | well, we can eliminate the whole thing, including the bindlex | 18:52 | |
and removing the goto entirely | |||
timotimo | you know, we do have information on how often a branch was taken during logging, if there's a logged instruction close to the branch point | ||
brrt | we could potentially use that | ||
but. | |||
I'm thinking it would be not so hard to do this explicitly | 18:53 | ||
I'm imagining a structure that logs basic block pairs | |||
I.e. trace the graph as an edge list | |||
The nice thing about that is that the size of that graph is bounded | 18:54 | ||
a basic block can be followed by: | |||
- the linear successor | |||
- a conditional successor | 18:55 | ||
timotimo | don't forget we sometimes have very many successors, in the case of handlers | ||
brrt | - one of the N handlers that are active | ||
timotimo | right | ||
brrt | that is still a tight bound | ||
far fewer than N^2 for N basic blocks | |||
- a callee basic block (potentially many) | 18:56 | ||
- the callers' basic block (potentially many) | |||
timotimo | we don't explicitly have the BB structure in the interpreter stage before spesh comes into the picture, though | ||
brrt | restricting to within-routine tracing, we have a nice tight bound | ||
timotimo | at that point the bytecode comes straight from the bytecode file, and has only been through the verifier | ||
brrt | correct, but we can insert basic block entry logging instructions | 18:57 | |
I expect the expense of that is relatively small | |||
timotimo | how do we do that if we don't know where basic blocks start and end? | ||
brrt | when we insert logs? | ||
timotimo | we don't insert logs any more :) | ||
brrt | oh | ||
really | |||
hmm | |||
well, then we do it during verification | |||
timotimo | yeah, all those ops now just naturally log when we're pre-spesh | ||
and those we don't turn into something better still get turned into a "does not log" variant of the same op | 18:58 | ||
like sp_getlex_ins is an example of that if i'm not mistaken | |||
brrt | hmm | ||
I actually don't expect that inserting logging statements is very expensive during bytecode verification, since we just need the CFG | |||
timotimo | bytecode verification is currently an operation that doesn't mutate, except for endian-swapping | 18:59 | |
it neither mutates nor allocates something new | |||
brrt | hmmm | ||
timotimo | we might as well have variants of our if and unless ops and sp_ variants and have the first kind log jumps and the other not log jumps | 19:00 | |
brrt | yeah | ||
timotimo | (no need for a non-logging if_o or unless_o, though, since we always turn that into an if_i or unless_i) | ||
we wouldn't catch throwish things that way, though | |||
brrt | you can log that during the exception handling? | 19:01 | |
alternatively, we can have the logging statements inserted by the mast compiler itself | 19:02 | ||
timotimo | but we also want non-logging versions of things | 19:04 | |
taking into account the many adhoc exceptions we have ... :\ | |||
do you have an intuition how much reordering BBs so that the most likely sequence is linear could give us? | 19:05 | ||
brrt | That by itself, not so much | 19:07 | |
What I expect to be the big wins, are the collection of 'deep' traces | 19:09 | ||
That combined with per-trace escape analysis, I expect we can eliminate most of the overhead of perl6 | 19:10 | ||
timotimo | i can imagine | 19:11 | |
brrt | the idea being that if you have a trace through a number of routines and can treat that as one thing, you can aggressively throw out basic blocks based on probability | ||
timotimo | whenever there's a :noinline instruction, having a trace instead of a regular spesh, it'd just Not Be A Problemā¢ | 19:12 | |
brrt | and (assuming that the generality and indirection is useful for dealing with exceptional cases, rather than the common case), that will allow us to eliminate ever more levels of indirectoin | ||
right | 19:13 | ||
the trace will provide optimization information even though we can't inline | |||
(one thing we can do already, is install a specialized candidate for uninlineable attempts at inlines, if such a candidate does not yet exist) | 19:14 | ||
timotimo | not sure how you mean that | ||
brrt | so, we currently have a fallback mechanism for when we know the invocant, but there isn't a specialized candidate yet | 19:15 | |
see src/spesh/inline.c:230 | 19:16 | ||
MVM_spesh_inline_try_get_graph_from_unspecialized | |||
if that happens, and the graph cannot be inlined, we just throw it away | 19:17 | ||
what we could do instead, is install a specialized candidate instead for the invocant, and call that directly with some sp_invoke call | 19:18 | ||
timotimo | that sounds like we don't do any argument specialization at all | 19:26 | |
brrt | Not for this case, no | 19:27 | |
we create a specialized graph, then if we find that we can't inline, we just drop | |||
timotimo | but if we're inlining it, we've actually got all the facts already, too? | ||
brrt | at least some, eys | ||
*yes | |||
timotimo | i'm not sure what "on the invocant" would mean; you mean we already know the type of the first argument because otherwise how would we have found the right code object? | 19:28 | |
brrt | I mean invokee, not invocant :-) | 19:29 | |
So, we know the code that we're going to invoke | |||
timotimo | i'm not entirely sure how that's not what we're already doing; if we weren't able to inline the code we got "from_unspecialized", what are we supposed to specialize? | 19:30 | |
if i'm being too dense, feel free to just stop explaining it to me %) | 19:31 | ||
brrt | if we can't inline it, that doesn't mean we can't specialize it altogether | 19:32 | |
so the order of operations is this (to the best of my understanding) | 19:33 | ||
- we start specializing a frame | |||
- the frame wants to call a piece of code | |||
- we can resolve that piece of code during specialization | |||
- we try to inline that piece of code | |||
- we find that there is not already a specialized candidate | |||
- we proceed to generate a specialized candidate and try to inline it | 19:34 | ||
- if we find that we cannot inline the resulting graph, we drop it | |||
- however, that leaves us with no inlining and no specialized candidate | 19:35 | ||
timotimo | i think i get it now | ||
brrt | - I claim that we would be better of generating a specialized candidate and invoking that :-) | ||
timotimo | but if we just blindly specialize a candidate without much information, we'll not be logging from it at all | 19:36 | |
brrt | true | ||
but that is a tradeoff we already make when we specialize unspecialized graphs | |||
for inlining | |||
timotimo | also true | ||
though when we inline, we can benefit from facts we have in the outside frame | 19:37 | ||
brrt | all those facts ought to be available based on the call info we have though | ||
it's the same information - the subset of facts from the caller graph, that is passed through to the callee graph | 19:38 | ||
timotimo | it'd only be available if we ever called the thing | 19:39 | |
not if it's super rare | |||
right? | |||
brrt | true | ||
in which case, it'd probably be preferable to drop it altogether, imho | 19:40 | ||
(which is why we need trace specialization :-)) | |||
timotimo | i'm not saying we shouldn't go for trace specialization :) | ||
brrt | news.ycombinator.com/item?id=17842054 maybe upvote? ;-) | 19:41 | |
19:41
avar left,
avar joined,
avar left,
avar joined,
p6bannerbot sets mode: +v avar
|
|||
timotimo | don't have an account on that site | 19:41 | |
brrt | o.O | ||
you are wise | |||
19:42
p6bannerbot sets mode: +v avar
|
|||
timotimo | eh, i've got an account on reddit | 19:42 | |
brrt | you can post it on perl reddit? | 19:45 | |
timotimo | i'm not actually posting on reddit | ||
just commenting every now and then | 19:46 | ||
it seems like reddit no longer splits karma up into post and comment karma | |||
otherwise i could have shown you | |||
i have 14 posts over a period of 4 years | 19:47 | ||
nine | I keep wondering if there is a way to keep an encoded version of some string around attached to a string object. That could speed up any use case where we just get a string from outside and pass it back to the outside later on in the same encoding. | 19:48 | |
timotimo | either we make VMString a bunch bigger, or we have a per-thread cache that holds weak references or something ... | 19:51 | |
brrt | we have some room for making VMString a bit bigger | ||
nine | Or we have an additional repr (maybe a reappropriated CStr) for that. But we'd need to handle this specially in all the necessary places. | 19:52 | |
brrt: we do? | |||
timotimo | it'd have both a reference to the buffer and an identifier for the encoding used, no? | ||
or maybe an external object that holds encoding name and resulting buffer | |||
brrt | yeah, based on the theory that you kind of want your object either much smaller than a cache line, or as big as one and aligned on them | ||
so.... if I take a look at the current state.... | |||
timotimo | personally i use pahole for that | 19:53 | |
don't forget strings can be inlined into P6opaque, though | |||
brrt | MVMString is 48 bytes big, of which 24 is the header | 19:54 | |
we can add 16 bytes, that's two pointers | 19:55 | ||
if you want to inline the MVMStringBody into P6opaque (didn't know we could do that), then I think we have 8 bytes left still? | |||
because P6opaque has this redirect pointer (that, if at all possible, I'd love to get rid of, but haven't investigated how we might do that) | 19:56 | ||
and we have plenty of flags space left to indicate that we have a representation | |||
timotimo | P6opaque has this flattened_stables thing | 19:57 | |
brrt | sooooooo... I say go for it | ||
timotimo | int, num, str, those at the very least can go in there | ||
brrt | there is also still the plan to do in situ strings | ||
timotimo | aye | ||
brrt | that would probably give you an ascii literal for 90% of strings, right there | 19:58 | |
nine | But maybe we wouldn't even have to make MVMStringBody larger. It already deals with multiple representations of strings. We could just add one more which looks like struct { MVMString *vmstr; char *encoded_str; MVMuint8 encoding; } | 20:12 | |
brrt | Hmmm | ||
I'm not convinced I like having one more representation, tbh | |||
Frankly, take that byte. It's cheap | |||
(not byte, 8 bytes) | 20:13 | ||
nine | Can't measure any performance difference in test-t.pl | 20:18 | |
(with an additional pointer in MVMStringBody) | |||
brrt | hmmm | 20:22 | |
maybe the cost of reallocating an encoded string isn't so large | |||
nine | To be clear: I just added the field to the struct to see if memory pressure or cache usage may affect performance. But test-t may just not be a good benchmark for that. | 20:26 | |
brrt | oh, I see | 20:27 | |
:-) | |||
20:37
stmuk left,
stmuk joined
20:38
p6bannerbot sets mode: +v stmuk
20:40
stmuk_ joined
20:41
p6bannerbot sets mode: +v stmuk_
20:43
stmuk left,
fake_space_whale joined
20:44
p6bannerbot sets mode: +v fake_space_whale
21:01
stmuk joined
21:02
p6bannerbot sets mode: +v stmuk
21:03
stmuk_ left
21:13
stmuk_ joined
21:14
stmuk left,
p6bannerbot sets mode: +v stmuk_
21:18
stmuk_ left
|
|||
jnthn | brrt++ # really intersting post | 21:19 | |
Also, postrelase-opts is more dramatic than I'd expected for that benchmark o.O | |||
21:19
stmuk joined
|
|||
jnthn | timotimo: About the takedispatcher thing - yesterday I put in the smarts to optimize it from takedispatcher to null. However, we do the transform inside of the inline. | 21:20 | |
21:20
p6bannerbot sets mode: +v stmuk
|
|||
jnthn | We currently cannot optimize much inside of inlines, however, because we don't track their deopt points, so we don't know that we're not utterly busting deopt | 21:20 | |
That's fixable, but it's - like all deopt-related things - hard, and when I implement it, I know I'm in for 1-2 weeks of debugging upshot. | 21:21 | ||
And deopt bugs are up there with GC bugs in being annoying to find. Especially as the things that reliably expose them tend to be very large. | 21:22 | ||
21:24
stmuk__ joined,
stmuk left
21:25
p6bannerbot sets mode: +v stmuk__
21:29
stmuk__ left
|
|||
brrt | aye | 21:31 | |
21:31
stmuk__ joined
21:32
p6bannerbot sets mode: +v stmuk__
|
|||
brrt afk | 21:32 | ||
21:32
brrt left
21:36
stmuk__ left
21:38
stmuk__ joined
21:39
p6bannerbot sets mode: +v stmuk__
|
|||
jnthn | I see I need to spend some time looking at our Perl 6 natives performance too :) | 21:43 | |
22:31
stmuk joined
22:32
p6bannerbot sets mode: +v stmuk
22:33
stmuk__ left
22:36
stmuk_ joined
22:37
p6bannerbot sets mode: +v stmuk_
22:38
stmuk left
22:39
stmuk joined,
p6bannerbot sets mode: +v stmuk
22:41
stmuk_ left
23:55
bladernr5 joined
23:59
bladernr5 left
|