AlexDaniel | releasable6: next | 00:25 | |
releasable6 | AlexDaniel, Next release in 3 days and ≈18 hours. Blockers: github.com/rakudo/rakudo/issues?q=...%9A%A0%22. Unknown changelog format | ||
01:17
unicodable6 joined,
quotable6 joined
02:57
ilbot3 joined
03:41
Kaiepi joined
03:57
bloatable6 joined
04:32
Kaiepi joined
06:45
domidumont joined
07:13
domidumont joined
08:02
zakharyas joined
08:10
domidumont joined
|
|||
nine | YES!! | 08:39 | |
MVM_frame_find_contextual_by_name has a similar issue like uninline: if (ret_offset >= cand->inlines[i].start && ret_offset < cand->inlines[i].end) | 08:42 | ||
08:42
brrt joined
|
|||
nine | There's that off-by-one again in the bounds check. Fixing that fixes the remaining issue at least when the JIT is turned off. | 08:42 | |
So all that's left is finding out why the JITed code path is wrong, too. That one does if (return_label >= labels[inls[i].start_label] && return_label <= labels[inls[i].end_label]) | 08:43 | ||
brrt | ohai… ehm. | 08:47 | |
maybe that needs to be return_label < labels[inls[i].end_label] | 08:48 | ||
but | |||
it interacts with how we fetch the label in the JIT currently, which i intend to change to stack-walking | |||
(still intend to change) | |||
nine | brrt: but that looks exactly like the broken non-JITed path then :) | 08:59 | |
brrt | i'm still not 100% sure I can conclusively tell you what is or isn't a good idea | 09:02 | |
i keep forgetting where the inline end label is placed on | 09:04 | ||
or: we're using these labels to assert a property of the current code that is executing with regards to the inline structure of that code | 09:05 | ||
which is to say, 'our current position in code is within this inline' | |||
now, for the lego jit, we insert a label *after* any and all code that we compiled for the instruction; this means that if we loaded a dynamic label it must've been before the end of that instruction (and after it's begin) | 09:08 | ||
sorry for the english grammar fail, it's bad today | |||
nine | Yes: MVMint32 label = MVM_jit_label_after_ins(tc, jg, bb, ins); | ||
jg_append_label(tc, jg, label); | |||
jg->inlines[ann->data.inline_idx].end_label = label; | |||
So the inline check looks ok. Curiously the frame handler end label is put _before_ the instruction with the FH_END annotation. Even though I think the bounds check should include that instruction. | 09:10 | ||
At least JITed and non-JITed code are similar in that regard | 09:11 | ||
brrt | in the expr JIT, we also get a label after the current instruction, and we insert an after-label (if necessary) for the graph as well. so it is pretty much equivalent | ||
the frame handler end label should be placed *after*… that's what I believed it was doing anyway | |||
nine | The JIT places the label before the instruction and checks with <=, while the non-JITed code records the offset of the instruction itself and checks with < | 09:12 | |
09:17
brrt joined
|
|||
brrt | no, that really shouldn't be what's happening, and that's also not what the expr JIT does (it appends the root for the after label *after* having appended the root for the instruction) | 09:18 | |
if it does then there is a screwup in some lower level | |||
09:23
zakharyas1 joined
|
|||
nine | Huh? I've now run tests and spectests again with enabled JIT but can't reproduce any errors anymore | 09:24 | |
That said, I'm not sure anymore if I checked if the JITed code is faulty at all. Considering how large the broken function is, it could just be that it is never JITed at all, thus it always hit the faulty non-JIT code path. | 09:26 | ||
brrt | hmm, fair enough | 09:27 | |
nine | Could also explain why the bug did not hit us anywhere else | ||
10:12
Kaiepi joined
10:19
jnthn joined
|
|||
Geth | MoarVM: 20a4a7976c | (Stefan Seifert)++ | src/core/frame.c Fix off-by-one error when checking inline boundaries in MVM_frame_find_contextual_by_name The INLINE_END annotation is on the last instruction of the inlined code, so the bounds check needs to include the boundary. Same as commit 7ea08024edfe1cb85c6925be1229324c349dd2c9 |
11:08 | |
brrt | nine++ | 11:11 | |
I suggest that we move the whole 'position-within-inline' thing into a separate function | 11:12 | ||
e.g MVM_spesh_candidate_find_inline_by_code_position(MVMThreadContext*,MVMSpeshCandidate,MVMint32 code_offset); | |||
and an equivalend MVM_spesh_candidate_find_inline_by_code_position_for_jit(MVMThreadContext*,MVMSpeshCandidate*,void*position); | 11:13 | ||
jnthn | nine++ # good find | 11:15 | |
brrt: Yes, that would likely be sensible :) | |||
brrt | (or find_inline_by_jitcode_position) | 11:16 | |
Geth | MoarVM/inline_in_place: 189468e213 | (Timo Paulssen)++ (committed by Stefan Seifert) | 4 files Put inlined blocks between their caller and its succ Previously inlined callees were added to the end of the basic block list. We now put the inlined blocks into the list at the position of the invoke op. However we cannot yet get rid of the goto ops entering and exiting the inlined code as that would lead to odd bugs possibly related to the annotations on these ops. |
11:29 | |
MoarVM/inline_in_place: caa306d237 | (Stefan Seifert)++ | src/spesh/optimize.c Turn inline_end annotated pointless gotos into no_ops We can't remove them completely as that causes weird deopt issues. But turning them into no_ops should give almost the same benefits without the cost. |
|||
MoarVM/inline_in_place: fbdf707c0b | (Stefan Seifert)++ | src/spesh/inline.c Move inlinee's handlers to front instead of duplicating inliner's Previously active handlers at the point of the invoke were copied and the inlinee's code surrounded by annotations pointing at those copies. Instead we now move the inlinee's handlers to the top of the table so they will be searched first. This means we have to adjust the indexes of the inliner's annotations but that's actually simpler than the whole copying business. |
|||
MoarVM/inline_in_place: c4e6a35d5f | (Stefan Seifert)++ | 3 files Remove the pointless goto at the end of an inlined function. |
11:30 | ||
brrt | (this is a rebase, I think?) | 11:31 | |
nine | Except that last commit | 11:32 | |
That was the last 100 hours of debugging :D | |||
11:32
brrt1 joined
|
|||
nine | I guess in terms of performance improvement per invested hour this is the most epic fail ever ;) | 11:32 | |
brrt1 | nah, the expression JIT can compete with that still | 11:33 | |
11:33
domidumont joined
11:46
committable6 joined
|
|||
nine | brrt: there's a rather confusing comment in src/jit/graph.c:646 consdering our discussion this morning | 11:50 | |
brrt | oh, which one? | 11:52 | |
11:52
AlexDani` joined
|
|||
nine | brrt: github.com/MoarVM/MoarVM/blob/mast...aph.c#L646 | 11:53 | |
jnthn | It seems that there's funds for me to apply for another 200 hour extension of my reliability/performance grant. | ||
I've some ideas what I'd like to focus on, but am open to any suggestions. :) | 11:54 | ||
brrt | ehm, let me think if i know anything off hand | ||
what would be your own ideas? | 11:55 | ||
lizmat | jnthn: I would like to see the work on the JIT come to more fruition | ||
brrt | nine: note that that's a frame handler label | ||
so the FH_END instruction notes the first instruction that is beyond the scope of the frame handler | 11:56 | ||
(at least, that is how the JIT does it) | |||
nine | brrt: ah, then that is different from interpreted code, where the FH_END is _on_ the last instruction, therefore including. | 11:57 | |
brrt | … really… | ||
that is very probably a bug, then | |||
but one that has been in there for ages | |||
hmmm | 11:58 | ||
are you sure about that? | |||
jnthn | lizmat: Are you talking about the machine codegen part in particular, or JIT as taking in the whole dynamic optimization stuff? | 12:00 | |
brrt | lunch & | ||
lizmat | I think about the machine codegen part, as that's still mostly not "filled in", right ? | ||
jnthn | There's a bunch more templates to be written there, though I think it's also limited in how good a job it can do by us not fusing basic blocks that had their potential control flow eliminated | 12:02 | |
lizmat | ok, so you're saying we still need to do some / a lot of prerequisite work before the codegen will really deliver > | 12:03 | |
? | |||
nine | Fusing basic blocks doesn't sound that hard, does it? | ||
jnthn | nine: No | ||
Should be an easy way to a win | 12:04 | ||
nine | Sounds to me like if a BB has only one predecessor and is entered via fall through, we'd just have to move its instructions to the predecessor and fix up the indexes. | ||
BB indexes that is. If it's actually necessary, that they are consecutive | 12:05 | ||
jnthn | nine: Well, yes and no; if it ends with an instruction that is invokish still, for example, it needs to be there | ||
But yeah, those kinds of thing aside | |||
It's not hard | |||
(Those cases aren't hard either really) | 12:06 | ||
lizmat: There's still certain common things that we just don't specialize in a way that makes it possible to JIT them into something particularly nice | |||
lizmat: Assignment is the big one | |||
lizmat | that also sounds like a target then :-) | 12:07 | |
jnthn | Yeah, that's one that I had in mind | ||
Also getting rid of the lexical auto-viv | |||
lizmat | lexical auto-viv ? | ||
jnthn | Since that costs us a branch and some code on every lexical read | ||
lizmat | ah, ok | ||
jnthn | Yeah, a while back we started to allocate Scalar containers on first touch | ||
lizmat | also, there's still the elephant in the room wrt finding out in which version your caller is running | 12:08 | |
it feels to me we need something like $/ and friends | |||
jnthn | I don't think that falls much under performance/reliability though :) | ||
lizmat | true, but it *is* an elephant :-) | 12:09 | |
jnthn | Anyway, the lexical-auto-viv made it so that while every sub may well have a $/ and $! and $_, we didn't need to allocate Scalar containers for those | 12:10 | |
Unless they were actually used | |||
lizmat | and you want to get rid of that again ? | ||
in favour of spesh clearing things up at run time ? | |||
jnthn | Considering that, yes, in that it'd make every lexical access cheaper | ||
lizmat | what was the reason for putting in auto-viv in the first place ? | 12:11 | |
jnthn | It gave a big speed-up | ||
Note that back then, there wasn't spesh/inlining | |||
lizmat | ah, ok | ||
so you could argue we've shown that premature optimization is bad :-) | 12:12 | ||
jnthn | Depends. | ||
lizmat | yeah, I know | ||
sometimes you need to put in scaffolding to make other things possible faster | |||
jnthn | It's like the nqp:: stuff in CORE.setting | 12:13 | |
lizmat | right | ||
jnthn | It's not ideal, but it's something that can be done now that gets a win | ||
lizmat | indeed | ||
lizmat is looking forward to removing all the nqp stuff from the core | |||
jnthn | And that was the case with the lazy Scalar viv | ||
It was achievable in a reasonable amount of time and gave a big win | 12:14 | ||
And the cost of everything else was still pretty high so the extra branching was just noise | |||
Whereas now, instead of a lexical lookup JITting into a few instructions, it JITs into that plus a NULL check and a C function call | 12:15 | ||
brrt can probably estimate better than me the cost of that, but I'd guess it makes the size of the machine code 3-4 times bigger | |||
And takes a slot in the CPU branch predictor, and means we can fit less machine code into the instruction cache | 12:16 | ||
lizmat | jnthn: if I understand correctly, you're saying that your work would reduce the size to 33% - 25% of the current size ? | 12:21 | |
jnthn | For lexical accesses, yes. Note that I'd also want to spend time doing more lexical to local lowering, which could reduce it even further | ||
lizmat | sounds like a plan to me then :-) would that also include the basic block fusing ? | 12:22 | |
jnthn | If I'm not beaten to it, I could do it :) | ||
12:57
domidumont joined
12:58
zakharyas joined
13:19
zakharyas joined
13:41
zakharyas joined
|
|||
lizmat clickbaits www.perl.com/article/an-open-lette...community/ | 13:42 | ||
13:43
leedo joined
|
|||
timotimo | jnthn: since we have control flow information built into spesh, we can figure out for any given place if any given lexical has definitely been accessed just now and in that case optimize to an op that doesn't do a check | 13:44 | |
jnthn | timotimo: Perhaps yes | 13:45 | |
timotimo | if we only keep $_, $/ and $! lazy and eagerly vivify all others, that could still work perhaps? maybe even put in a dummy getlex at the beginning of every function to tell the rest of spesh that they aren't supposed to be lazy? kind of like we do the null removal at the beginning of frames for locals | 13:50 | |
14:11
squashable6 joined
14:14
bisectable6 joined
14:39
releasable6 joined,
greppable6 joined,
statisfiable6 joined
15:02
dogbert2 joined
|
|||
dogbert2 | jnthn: how about squashing github.com/rakudo/rakudo/issues/1202 | 15:02 | |
there's also the leaks/crashes in the whateverables | 15:35 | ||
there's also the odd GC problem remaining, anyway congratulations for the grant extension :-) | 15:52 | ||
jnthn | Well, no grant extension granted yet. :) Just that I can apply for it. :) | 15:57 | |
dogbert2 | let's hope you get it | 16:00 | |
github.com/MoarVM/MoarVM/issues/680 is an interesting case as well | 16:01 | ||
dogbert2 is on a roll :-) | |||
moving two variable in MasterDuke's code example lessens memory strain by 300 megs on a 54 bit system | 16:03 | ||
*64 | |||
16:08
brrt joined
17:01
Kaiepi joined
17:23
coverable6 joined,
benchable6 joined
|
|||
nine | What exactly is the purpose of DEOPT_ALL_INS annotations? | 18:26 | |
18:41
reportable6 joined,
nativecallable6 joined
|
|||
nine | Looks like most gotos entering an inline can also be eliminated. In the first crashing test case it's just a single goto of 292 that causes us to fail | 18:51 | |
Intriguing...it breaks when uninlining creates a frame, when it shouldn't | 18:57 | ||
18:59
domidumont joined
19:02
Ven`` joined
|
|||
nine | Possibly another off-by-one error in deciding whether we are in an inline or not. This time though it seems to be a false positive in offset >= cand->inlines[i].start | 19:05 | |
Yep, with this changed to offset > cand->inlines[i].start we pass all tests and spectests. But this time I don't understand, why this would be correct. | 19:43 | ||
Geth | MoarVM/inline_in_place: ce466f9874 | (Stefan Seifert)++ | src/spesh/deopt.c Fix another off-by-one in uninline |
||
MoarVM/inline_in_place: f672ee95b1 | (Stefan Seifert)++ | src/spesh/optimize.c Remove pointless goto for entering an inline |
|||
nine | Kept the fix in the branch as I don't understand why it works. But this is basically it. Inlines are now put in place of the call in the spesh graph and we successfully remove both the goto jumping into the inlined code and the one jumping out :) | 19:44 | |
jnthn | \o/ | ||
nine++ | |||
Not sure I understand it either...hm | 19:45 | ||
lizmat | Last time I had something like this, it turned out to be a bug in the C-compiler | 19:48 | |
but that's well over 30 years ago now :-) | |||
nine | jnthn: all the calls to MVM_spesh_deopt_* happen _after_ advancing cur_op in MVM_interp_run. So the instruction causing the deop is immediately before the one pointed to by offset, isn't it? | 19:52 | |
jnthn | Ah, yes | 20:04 | |
20:05
Ven`` joined
|
|||
nine | Doens't that also mean that the end boundary check was off by 2 and is still off by 1? | 20:05 | |
20:13
domidumont joined
20:14
Kaiepi joined
20:17
zakharyas joined
|
|||
jnthn | nine: I'd think if the instruction has to be > the start address of the inline, and the previous instruction's end falls on the address of the inline, then the > (not >=) should cut it | 20:17 | |
All this said | |||
A while back I started always passing in the index of the deopt point | 20:18 | ||
I wonder if this would be more robust if we kept a map of which deopt points are in which inline | |||
And identified the things to uninline that way | |||
(Absolutely not a requirement for merging your branch, of course, just an idea that might be more robust in the future) | 20:19 | ||
20:33
travis-ci joined
|
|||
travis-ci | MoarVM build errored. Stefan Seifert 'Fix off-by-one error when checking inline boundaries in MVM_frame_find_contextual_by_name | 20:33 | |
travis-ci.org/MoarVM/MoarVM/builds/329822315 github.com/MoarVM/MoarVM/compare/2...a4a7976c5d | |||
20:33
travis-ci left
|
|||
dogbert17 | libuv 1.19 is out, github.com/libuv/libuv/blob/v1.x/ChangeLog | 20:43 | |
20:51
evalable6 joined
21:00
Kaiepi joined
21:07
Kaiepi joined
21:20
zakharyas joined
21:36
klapperl joined
21:41
Kaiepi joined
22:02
Ven`` joined
22:09
leedo joined
|