AlexDaniel releasable6: next 00:25
releasable6 AlexDaniel, Next release in 3 days and ≈18 hours. Blockers: github.com/rakudo/rakudo/issues?q=...%9A%A0%22. Unknown changelog format
01:17 unicodable6 joined, quotable6 joined 02:57 ilbot3 joined 03:41 Kaiepi joined 03:57 bloatable6 joined 04:32 Kaiepi joined 06:45 domidumont joined 07:13 domidumont joined 08:02 zakharyas joined 08:10 domidumont joined
nine YES!! 08:39
MVM_frame_find_contextual_by_name has a similar issue like uninline: if (ret_offset >= cand->inlines[i].start && ret_offset < cand->inlines[i].end) 08:42
08:42 brrt joined
nine There's that off-by-one again in the bounds check. Fixing that fixes the remaining issue at least when the JIT is turned off. 08:42
So all that's left is finding out why the JITed code path is wrong, too. That one does if (return_label >= labels[inls[i].start_label] && return_label <= labels[inls[i].end_label]) 08:43
brrt ohai… ehm. 08:47
maybe that needs to be return_label < labels[inls[i].end_label] 08:48
but
it interacts with how we fetch the label in the JIT currently, which i intend to change to stack-walking
(still intend to change)
nine brrt: but that looks exactly like the broken non-JITed path then :) 08:59
brrt i'm still not 100% sure I can conclusively tell you what is or isn't a good idea 09:02
i keep forgetting where the inline end label is placed on 09:04
or: we're using these labels to assert a property of the current code that is executing with regards to the inline structure of that code 09:05
which is to say, 'our current position in code is within this inline'
now, for the lego jit, we insert a label *after* any and all code that we compiled for the instruction; this means that if we loaded a dynamic label it must've been before the end of that instruction (and after it's begin) 09:08
sorry for the english grammar fail, it's bad today
nine Yes: MVMint32 label = MVM_jit_label_after_ins(tc, jg, bb, ins);
jg_append_label(tc, jg, label);
jg->inlines[ann->data.inline_idx].end_label = label;
So the inline check looks ok. Curiously the frame handler end label is put _before_ the instruction with the FH_END annotation. Even though I think the bounds check should include that instruction. 09:10
At least JITed and non-JITed code are similar in that regard 09:11
brrt in the expr JIT, we also get a label after the current instruction, and we insert an after-label (if necessary) for the graph as well. so it is pretty much equivalent
the frame handler end label should be placed *after*… that's what I believed it was doing anyway
nine The JIT places the label before the instruction and checks with <=, while the non-JITed code records the offset of the instruction itself and checks with < 09:12
09:17 brrt joined
brrt no, that really shouldn't be what's happening, and that's also not what the expr JIT does (it appends the root for the after label *after* having appended the root for the instruction) 09:18
if it does then there is a screwup in some lower level
09:23 zakharyas1 joined
nine Huh? I've now run tests and spectests again with enabled JIT but can't reproduce any errors anymore 09:24
That said, I'm not sure anymore if I checked if the JITed code is faulty at all. Considering how large the broken function is, it could just be that it is never JITed at all, thus it always hit the faulty non-JIT code path. 09:26
brrt hmm, fair enough 09:27
nine Could also explain why the bug did not hit us anywhere else
10:12 Kaiepi joined 10:19 jnthn joined
Geth MoarVM: 20a4a7976c | (Stefan Seifert)++ | src/core/frame.c
Fix off-by-one error when checking inline boundaries in MVM_frame_find_contextual_by_name

The INLINE_END annotation is on the last instruction of the inlined code, so the bounds check needs to include the boundary.
Same as commit 7ea08024edfe1cb85c6925be1229324c349dd2c9
11:08
brrt nine++ 11:11
I suggest that we move the whole 'position-within-inline' thing into a separate function 11:12
e.g MVM_spesh_candidate_find_inline_by_code_position(MVMThreadContext*,MVMSpeshCandidate,MVMint32 code_offset);
and an equivalend MVM_spesh_candidate_find_inline_by_code_position_for_jit(MVMThreadContext*,MVMSpeshCandidate*,void*position); 11:13
jnthn nine++ # good find 11:15
brrt: Yes, that would likely be sensible :)
brrt (or find_inline_by_jitcode_position) 11:16
Geth MoarVM/inline_in_place: 189468e213 | (Timo Paulssen)++ (committed by Stefan Seifert) | 4 files
Put inlined blocks between their caller and its succ

Previously inlined callees were added to the end of the basic block list. We now put the inlined blocks into the list at the position of the invoke op. However we cannot yet get rid of the goto ops entering and exiting the inlined code as that would lead to odd bugs possibly related to the annotations on these ops.
11:29
MoarVM/inline_in_place: caa306d237 | (Stefan Seifert)++ | src/spesh/optimize.c
Turn inline_end annotated pointless gotos into no_ops

We can't remove them completely as that causes weird deopt issues. But turning them into no_ops should give almost the same benefits without the cost.
MoarVM/inline_in_place: fbdf707c0b | (Stefan Seifert)++ | src/spesh/inline.c
Move inlinee's handlers to front instead of duplicating inliner's

Previously active handlers at the point of the invoke were copied and the inlinee's code surrounded by annotations pointing at those copies. Instead we now move the inlinee's handlers to the top of the table so they will be searched first. This means we have to adjust the indexes of the inliner's annotations but that's actually simpler than the whole copying business.
MoarVM/inline_in_place: c4e6a35d5f | (Stefan Seifert)++ | 3 files
Remove the pointless goto at the end of an inlined function.
11:30
brrt (this is a rebase, I think?) 11:31
nine Except that last commit 11:32
That was the last 100 hours of debugging :D
11:32 brrt1 joined
nine I guess in terms of performance improvement per invested hour this is the most epic fail ever ;) 11:32
brrt1 nah, the expression JIT can compete with that still 11:33
11:33 domidumont joined 11:46 committable6 joined
nine brrt: there's a rather confusing comment in src/jit/graph.c:646 consdering our discussion this morning 11:50
brrt oh, which one? 11:52
11:52 AlexDani` joined
nine brrt: github.com/MoarVM/MoarVM/blob/mast...aph.c#L646 11:53
jnthn It seems that there's funds for me to apply for another 200 hour extension of my reliability/performance grant.
I've some ideas what I'd like to focus on, but am open to any suggestions. :) 11:54
brrt ehm, let me think if i know anything off hand
what would be your own ideas? 11:55
lizmat jnthn: I would like to see the work on the JIT come to more fruition
brrt nine: note that that's a frame handler label
so the FH_END instruction notes the first instruction that is beyond the scope of the frame handler 11:56
(at least, that is how the JIT does it)
nine brrt: ah, then that is different from interpreted code, where the FH_END is _on_ the last instruction, therefore including. 11:57
brrt … really…
that is very probably a bug, then
but one that has been in there for ages
hmmm 11:58
are you sure about that?
jnthn lizmat: Are you talking about the machine codegen part in particular, or JIT as taking in the whole dynamic optimization stuff? 12:00
brrt lunch &
lizmat I think about the machine codegen part, as that's still mostly not "filled in", right ?
jnthn There's a bunch more templates to be written there, though I think it's also limited in how good a job it can do by us not fusing basic blocks that had their potential control flow eliminated 12:02
lizmat ok, so you're saying we still need to do some / a lot of prerequisite work before the codegen will really deliver > 12:03
?
nine Fusing basic blocks doesn't sound that hard, does it?
jnthn nine: No
Should be an easy way to a win 12:04
nine Sounds to me like if a BB has only one predecessor and is entered via fall through, we'd just have to move its instructions to the predecessor and fix up the indexes.
BB indexes that is. If it's actually necessary, that they are consecutive 12:05
jnthn nine: Well, yes and no; if it ends with an instruction that is invokish still, for example, it needs to be there
But yeah, those kinds of thing aside
It's not hard
(Those cases aren't hard either really) 12:06
lizmat: There's still certain common things that we just don't specialize in a way that makes it possible to JIT them into something particularly nice
lizmat: Assignment is the big one
lizmat that also sounds like a target then :-) 12:07
jnthn Yeah, that's one that I had in mind
Also getting rid of the lexical auto-viv
lizmat lexical auto-viv ?
jnthn Since that costs us a branch and some code on every lexical read
lizmat ah, ok
jnthn Yeah, a while back we started to allocate Scalar containers on first touch
lizmat also, there's still the elephant in the room wrt finding out in which version your caller is running 12:08
it feels to me we need something like $/ and friends
jnthn I don't think that falls much under performance/reliability though :)
lizmat true, but it *is* an elephant :-) 12:09
jnthn Anyway, the lexical-auto-viv made it so that while every sub may well have a $/ and $! and $_, we didn't need to allocate Scalar containers for those 12:10
Unless they were actually used
lizmat and you want to get rid of that again ?
in favour of spesh clearing things up at run time ?
jnthn Considering that, yes, in that it'd make every lexical access cheaper
lizmat what was the reason for putting in auto-viv in the first place ? 12:11
jnthn It gave a big speed-up
Note that back then, there wasn't spesh/inlining
lizmat ah, ok
so you could argue we've shown that premature optimization is bad :-) 12:12
jnthn Depends.
lizmat yeah, I know
sometimes you need to put in scaffolding to make other things possible faster
jnthn It's like the nqp:: stuff in CORE.setting 12:13
lizmat right
jnthn It's not ideal, but it's something that can be done now that gets a win
lizmat indeed
lizmat is looking forward to removing all the nqp stuff from the core
jnthn And that was the case with the lazy Scalar viv
It was achievable in a reasonable amount of time and gave a big win 12:14
And the cost of everything else was still pretty high so the extra branching was just noise
Whereas now, instead of a lexical lookup JITting into a few instructions, it JITs into that plus a NULL check and a C function call 12:15
brrt can probably estimate better than me the cost of that, but I'd guess it makes the size of the machine code 3-4 times bigger
And takes a slot in the CPU branch predictor, and means we can fit less machine code into the instruction cache 12:16
lizmat jnthn: if I understand correctly, you're saying that your work would reduce the size to 33% - 25% of the current size ? 12:21
jnthn For lexical accesses, yes. Note that I'd also want to spend time doing more lexical to local lowering, which could reduce it even further
lizmat sounds like a plan to me then :-) would that also include the basic block fusing ? 12:22
jnthn If I'm not beaten to it, I could do it :)
12:57 domidumont joined 12:58 zakharyas joined 13:19 zakharyas joined 13:41 zakharyas joined
lizmat clickbaits www.perl.com/article/an-open-lette...community/ 13:42
13:43 leedo joined
timotimo jnthn: since we have control flow information built into spesh, we can figure out for any given place if any given lexical has definitely been accessed just now and in that case optimize to an op that doesn't do a check 13:44
jnthn timotimo: Perhaps yes 13:45
timotimo if we only keep $_, $/ and $! lazy and eagerly vivify all others, that could still work perhaps? maybe even put in a dummy getlex at the beginning of every function to tell the rest of spesh that they aren't supposed to be lazy? kind of like we do the null removal at the beginning of frames for locals 13:50
14:11 squashable6 joined 14:14 bisectable6 joined 14:39 releasable6 joined, greppable6 joined, statisfiable6 joined 15:02 dogbert2 joined
dogbert2 jnthn: how about squashing github.com/rakudo/rakudo/issues/1202 15:02
there's also the leaks/crashes in the whateverables 15:35
there's also the odd GC problem remaining, anyway congratulations for the grant extension :-) 15:52
jnthn Well, no grant extension granted yet. :) Just that I can apply for it. :) 15:57
dogbert2 let's hope you get it 16:00
github.com/MoarVM/MoarVM/issues/680 is an interesting case as well 16:01
dogbert2 is on a roll :-)
moving two variable in MasterDuke's code example lessens memory strain by 300 megs on a 54 bit system 16:03
*64
16:08 brrt joined 17:01 Kaiepi joined 17:23 coverable6 joined, benchable6 joined
nine What exactly is the purpose of DEOPT_ALL_INS annotations? 18:26
18:41 reportable6 joined, nativecallable6 joined
nine Looks like most gotos entering an inline can also be eliminated. In the first crashing test case it's just a single goto of 292 that causes us to fail 18:51
Intriguing...it breaks when uninlining creates a frame, when it shouldn't 18:57
18:59 domidumont joined 19:02 Ven`` joined
nine Possibly another off-by-one error in deciding whether we are in an inline or not. This time though it seems to be a false positive in offset >= cand->inlines[i].start 19:05
Yep, with this changed to offset > cand->inlines[i].start we pass all tests and spectests. But this time I don't understand, why this would be correct. 19:43
Geth MoarVM/inline_in_place: ce466f9874 | (Stefan Seifert)++ | src/spesh/deopt.c
Fix another off-by-one in uninline
MoarVM/inline_in_place: f672ee95b1 | (Stefan Seifert)++ | src/spesh/optimize.c
Remove pointless goto for entering an inline
nine Kept the fix in the branch as I don't understand why it works. But this is basically it. Inlines are now put in place of the call in the spesh graph and we successfully remove both the goto jumping into the inlined code and the one jumping out :) 19:44
jnthn \o/
nine++
Not sure I understand it either...hm 19:45
lizmat Last time I had something like this, it turned out to be a bug in the C-compiler 19:48
but that's well over 30 years ago now :-)
nine jnthn: all the calls to MVM_spesh_deopt_* happen _after_ advancing cur_op in MVM_interp_run. So the instruction causing the deop is immediately before the one pointed to by offset, isn't it? 19:52
jnthn Ah, yes 20:04
20:05 Ven`` joined
nine Doens't that also mean that the end boundary check was off by 2 and is still off by 1? 20:05
20:13 domidumont joined 20:14 Kaiepi joined 20:17 zakharyas joined
jnthn nine: I'd think if the instruction has to be > the start address of the inline, and the previous instruction's end falls on the address of the inline, then the > (not >=) should cut it 20:17
All this said
A while back I started always passing in the index of the deopt point 20:18
I wonder if this would be more robust if we kept a map of which deopt points are in which inline
And identified the things to uninline that way
(Absolutely not a requirement for merging your branch, of course, just an idea that might be more robust in the future) 20:19
20:33 travis-ci joined
travis-ci MoarVM build errored. Stefan Seifert 'Fix off-by-one error when checking inline boundaries in MVM_frame_find_contextual_by_name 20:33
travis-ci.org/MoarVM/MoarVM/builds/329822315 github.com/MoarVM/MoarVM/compare/2...a4a7976c5d
20:33 travis-ci left
dogbert17 libuv 1.19 is out, github.com/libuv/libuv/blob/v1.x/ChangeLog 20:43
20:51 evalable6 joined 21:00 Kaiepi joined 21:07 Kaiepi joined 21:20 zakharyas joined 21:36 klapperl joined 21:41 Kaiepi joined 22:02 Ven`` joined 22:09 leedo joined