github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
08:06
zakharyas joined
08:14
AlexDaniel left
08:27
AlexDaniel joined,
AlexDaniel left,
AlexDaniel joined
08:32
Kaeipi left
08:33
Kaiepi joined
09:26
squashable6 left
09:27
squashable6 joined
|
|||
MasterDuke | is there a reason ctxouter isn't jitted? the implementation in interp.c looks pretty simple github.com/MoarVM/MoarVM/blob/mast...3356-L3364 | 09:28 | |
09:30
sena_kun joined
|
|||
MasterDuke | hm. colabti.org/irclogger/irclogger_lo...03-03#l102 "18:37 brrt I don't recall the exact reason, but there was a reason ctxouter didn't work." | 09:31 | |
that was a year ago, not sure that a lot has been done to the jit in the meantime, so whatever reason probably still holds? | 09:32 | ||
lizmat | yeah, fraid so, altough it might be worth pinging brrt | 09:50 | |
.seen brrt | |||
tellable6 | lizmat, I saw brrt 2020-04-08T11:33:08Z in #moarvm: <brrt> \o | ||
lizmat hopes brrt is doing ok | |||
sena_kun | lizmat, I saw his messages an hour ago or so in another place. | 10:04 | |
lizmat | ok, good to hear! | ||
10:56
Altai-man_ joined
10:58
sena_kun left
11:24
zakharyas left
11:45
pamplemousse joined
11:57
MasterDuke left
12:17
MasterDuke joined
12:57
sena_kun joined
12:58
Altai-man_ left
13:07
pamplemousse left
13:09
pamplemousse joined
13:37
farcas1982regreg joined
14:06
robertle joined
|
|||
MasterDuke | yep, this patch gist.github.com/MasterDuke17/40d52...44a9477cc3 causes `Frame has no lexical with name '$?PACKAGE' at gen/moar/stage2/NQPHLL.nqp:1499 (/home/dan/Source/perl6/install/share/nqp/lib/NQPHLL.moarvm:SET_BLOCK_OUTER_CTX)` when running install-core-dist.raku after successfully building rakudo | 14:09 | |
and lots of rakudo's tests fail | 14:10 | ||
14:43
robertle left
14:56
Altai-man_ joined
14:58
sena_kun left
|
|||
MasterDuke | hm. v2 of the patch gist.github.com/MasterDuke17/40d52...r_v2-patch has a very similar failure `Frame has no lexical with name '::?CLASS'` | 15:56 | |
i don't have any bash history for jit-bisect anymore, anybody remember how it's supposed to be run? | 16:09 | ||
nine | MasterDuke: are there any other JIT implementations of ops that use contexts and/or the framewalker? | 16:16 | |
Could be that MVM_context_apply_traversal relies on some book keeping data that's just not set up by the JIT | |||
MasterDuke | nine: i copied the implementation of ctxcallerskipthunks (it's interp.c implementation is identical except for the literal passed to MVM_context_apply_traversal) | 16:17 | |
github.com/MoarVM/MoarVM/blob/mast...1027-L1048 and github.com/MoarVM/MoarVM/blob/mast...4241-L4250 | 16:18 | ||
a bisect is currently running | 16:20 | ||
`JIT Broken Frame/BB: 1 / 91===SORRY!===Frame has no lexical with name '$_'` | 16:22 | ||
nine | Ah, I see. Then I'd guess that the error is actually in another JITed op and implemeting ctxouter just unlocks that | 16:23 | |
MasterDuke | nine: care to see the log the jit bisect produced? i've never really understood them enough to find anything in them that points out where to look | 16:24 | |
nine | can take a look | 16:25 | |
MasterDuke | gist.github.com/MasterDuke17/40d52...44a9477cc3 has it | 16:26 | |
jnthn | So working on dispatch has led me to our calling conventions. | ||
MasterDuke | they need changing? | 16:27 | |
jnthn | And looking at how we can efficiently implement the whole capture tweakery thing | 16:28 | |
Because the naive approach - well, also what we'd do when evaluating a dispatcher to record a guard/transform chain - is just to produce new MVMCaptures each time | |||
nine | So...you're gonna tell us that it will be a lot faster to pass on arguments in the future? | 16:29 | |
jnthn | But we don't want to do that for the real guard chain walk. | ||
Anyway, focusing back on what we do today for a moment | 16:30 | ||
prepargs <callsite> - OK, so the callsite contains the argument register kinds, and also now the named argument names | 16:31 | ||
arg_o 0, r(0) | |||
arg_o 1, r(2) | |||
The integer in the middle there writes into the args buffer. But we always, afaik, emit those in order. That's pretty redundant. | |||
But wait, the information that it's an object argument is redundant too, 'cus that's in the callsite | 16:32 | ||
And in fact, why do we even have an args buffer at all? It means we have to copy twice. | 16:33 | ||
First, register to args buffer | |||
Then in binding, args buffer to parameter | |||
nine | Couldn't the callsite contain the list of work registers that contain the args? They are determined at compile time anyway | 16:38 | |
jnthn | I don't think it should contain the actual work register indices | 16:41 | |
Because we can't intern callsites so widely then | |||
But I think it could contain constants | 16:42 | ||
So then we have | |||
prepargs <callsite> | |||
[list of 16-bit integers identifying registers] | |||
dispatch ... | |||
That way, every arg is 2 bytes instead of 6 bytes today (or 2 bytes instead of 14 bytes for named args) | 16:43 | ||
timotimo | list of integers, like, literally where we'd normally have bytecode? | ||
jnthn | Yes | 16:44 | |
They're effectively "varargs" to the prepargs | |||
nine | Basically a prepargs OP with a variable number of arguments | ||
jnthn | hah! | ||
And what if we take it even further? | 16:45 | ||
dispatch_o r(0), <callsite>, 'dispatcher-name' | |||
And then followed by the list of 16-bit integers | 16:46 | ||
So instead of a 2-argument call today being prepargs (2 + 4 bytes), 2 arg_o instructions (2 * 6 bytes) and one invoke instruction (2 + 2 + 2 bytes), for a total of 24 bytes *and* 4 instructions to interpret | 16:47 | ||
It'd be 2 (dispatch_o instruction code) + 2 (result register) + 4 (callsite) + 4 (dispatcher name) + 3 * 2 registers (one register is the invokee) = 18 bytes | 16:48 | ||
1 instruction to interpret | 16:49 | ||
And no copying into an arg buffer | |||
No arg buffer for the GC to have to collect | |||
In fact, no arg buffer to allocate at all | |||
So every frame takes less ->work too | |||
The other thing I'm thinking to do is move flattening up front | 16:51 | ||
So we do it at the callsite | 16:52 | ||
And for cases where we have, say, up to N positional args flattened in, we resolve it to an interned callsite | |||
Maybe some rule for named ones too | |||
(Need to be careful that a malicious program doesn't explode the memory use :)) | |||
16:57
sena_kun joined
|
|||
jnthn | And if I hang this new way of doing things off the new `dispatch` instruction, I've got a gradual migration path for implementing this. :) | 16:58 | |
16:59
Altai-man_ left
|
|||
jnthn | Ok, home time | 17:04 | |
jnthn hopes the time invested in the design work will mean he has an easier/shorter time of the impl work :) | 17:05 | ||
17:25
zakharyas joined
|
|||
MasterDuke | nine: guess nothing jumped out at you in that bisect log? | 18:50 | |
18:56
Altai-man_ joined
18:57
zakharyas left
18:58
sena_kun left
|
|||
nwc10 | jnthn: er, hangon, currently each *call* causes allocation? Or "each call site on first call"? | 19:00 | |
timotimo | which allocation are you refering to? | 19:12 | |
19:13
zakharyas joined
|
|||
nwc10 | 17:49 < jnthn> No arg buffer for the GC to have to collect | 19:13 | |
timotimo | that's more a "have a couple pointers that have to be put into a worklist" thing | ||
nwc10 | timotimo: I'm not familiar (at all) with the MoarVM calling convention, so I can't easily follow from jnthn's long description what is "plan he can rule out now" versus "current" | ||
(sort of clear what "future" is intended to be, but of course "no plan survives contact with the enemy") | 19:14 | ||
timotimo | arg_buffer is actually a pointer into *work, eh? so maybe we're currently just allocating it at the end of the registers area or something? | ||
lizmat | also, will these plans affect the JIT in any way ? | 19:17 | |
will new ops need to be JIIted | |||
I assume so | |||
jnthn | nwc10: Currently the registers area for a frame has an area known as the "args buffer"; we keep a pointer into it also. | 19:56 | |
nwc10: The GC needs to walk these registers based on the callsite describing which ones are objects/strings | 19:57 | ||
It's not a big amount of work, but every little helps. | |||
nwc10 | ah OK thanks | 19:58 | |
jnthn | lizmat: Remains to be seen exactly how it works out, but it's unlikely that the op the interpreter uses will be JITted directly. | ||
lizmat | yeah, figured as much | 19:59 | |
by having the ops do more, wouldn't that make it harder to JIT ? | |||
jnthn | We'll be able to do things from inlining (op disappears) through specialization linking and so turning it into a fastinvoke of a specialization and fall back to at least a variant that avoids some of the overheads. | 20:00 | |
lizmat: Only if we ever let the JIT see it. :) | 20:01 | ||
lizmat | ok, so you're saying the JIT is going to have simpler targets ? | ||
jnthn | Well, in the inlining case it's got no op, in the linked specialization case it's a lot like today. That covers the monomorphic majority without really needing any changes. But yeah, a nicer fallback form for the JIT is possible, perhaps even including JITting the guard tree as it exists at the point we produce the specialization. | 20:03 | |
20:04
MasterDuke left
|
|||
jnthn | tbh, I'm mostly worried at this point about how badly we'll behave on the megamorphic minority, 'cus as it stands the design hasn't got a great answer to that. | 20:06 | |
nwc10 | "Doctor doctor, it hurts when I do this" "Well, don't do that then" | 20:07 | |
one of the two English meta-jokes that I'm aware of | |||
the other being | |||
"Two Irishmen sitting on the floor. One fell off" | |||
lizmat | megamorphic as "method raku" existing on many types in a single dispatch chain? | 20:08 | |
nwc10 | These will make no sense unless you are aware of various stereotype and set-piece English jokes | ||
jnthn | lizmat: Yes, if you have one particular callsite that encounters many different types, for example. | 20:09 | |
I was always fond of the one where they saw an advert saying "tree fellers wanted" and were like, "darn, there's only two of us"... :) | 20:10 | ||
lizmat: Though the other variant is stable type but many method names (the current factoring of how we invoke action methods looks this way) | 20:11 | ||
And the worst would be $so-many-types."$so-many-names"() :) | 20:12 | ||
20:12
MasterDuke joined
|
|||
lizmat | couldn't a guard be something like "type seen"? | 20:14 | |
jnthn | Well, normally you'd see a type and a method name and they won't change much, so the approach of "guard on type and name" (if name ain't already a constant) works out fine. | 20:15 | |
But if you see 100 types and 100 method names, you don't want to build a tree of 10,000 entries | 20:16 | ||
At some point you're better off with having a per-target-type hash | 20:17 | ||
lizmat | so why not start out with one? | ||
jnthn | ? | ||
lizmat | a per-target hash ? | ||
or a per-target list ? | 20:18 | ||
jnthn | We do that today. | 20:19 | |
$ perl6 -e 'say X::AdHoc.^methods(:all).elems' | |||
165 | |||
$ cat src/core.c/Exception.pm6 | grep class | wc -l | 20:20 | ||
322 | |||
m: say 165 * 322 | |||
camelia | 53130 | ||
jnthn | Just for that one file, there's 53,000 serialized hash entries in CORE.setting's precomp thanks to this. | ||
Even if we assume we manage to do it compact enough that there's 2 bytes each for the key and hash (it'll be wrose in a big comp unit like CORE.setting), that's 200KB. | 20:22 | ||
That's *before* you use the type and we deserialize the per-type method cache hash. | |||
lizmat | I wonder if X::AdHoc needs that many methods | 20:24 | |
maybe Exception should be made outside of Any ? | |||
jnthn | Was just doing the calculation, and I reckon it's 40 bytes just for the hash bucket storage once expanded... | ||
m: say 165 * 40 | 20:25 | ||
camelia | 6600 | ||
jnthn | This only happens for the types you use, but still... | ||
lizmat: It's not really to do with exceptions, it's everything. I just picked it as a file that illustrates that Raku code is quite class-dense. | 20:26 | ||
Or at least, can be. | |||
lizmat | yeah, but this was really outside of this discussion :-) | ||
jnthn | Especially given they have safety/performance benefits over hashes. | ||
Anyway, no, I don't really think Exceptions not being Any would help matters. :) | 20:27 | ||
lizmat | it doesn't break the build, but it does break installing core modules | ||
jnthn | I'm just noting why the pre-calculation of a method cache for every type is costly now we have the size of standard library and people running the size of applications they do :) | ||
And why I'm keen to move away from it as part of this set of changes, so we at least only build it for the cases that really need it. | 20:28 | ||
(The other part of the story here is that I relied on this pre-calc to resolve a bootstrap loop also, and will probably have to find another way to circularity saw that too...) | 20:32 | ||
nwc10 wonders if the circularity saw is related to [Tux]'s chainsaw. (This is probably a far to cross-channel in joke. Don't cross the streams) | 20:33 | ||
lizmat | yeah, I got it :-) | 20:34 | |
on p5p, [Tux] would always be ready to rip out code that had become obsolete and removable | 20:35 | ||
nwc10 | and I smile, because formats never met *his* view of these criteria :-) | 20:36 | |
lizmat | well, they may have been obsolete, but definitely not removable ? | ||
nwc10 | my opinion (I stress both of these) is that removing the *implementation* gains little, as it is (relatively) bug fre and self contained. But optionally disabling it lexically would allow all the "magic" variables to be disabld, which would "free up" a lot of "syntax space" | 20:38 | |
all the things like (IIRC) $= $; $- | |||
needing to be treated as scalars | |||
jnthn wanders away for a bit to do homework | 20:41 | ||
nwc10 | I should wander away to do sleep | 20:42 | |
20:56
pamplemousse left
20:57
sena_kun joined
20:58
Altai-man_ left
21:19
zakharyas left
22:01
sena_kun left
22:30
farcas1982regreg left
|