Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:02 reportable6 left 00:05 reportable6 joined 00:34 frost joined 01:01 CaCode_ joined 01:04 CaCode left 03:29 discord-raku-bot left, gfldex left 03:31 gfldex joined 03:49 evalable6 joined 05:51 CaCode_ left 06:02 reportable6 left 06:04 reportable6 joined 06:11 CaCode joined
nine MasterDuke: I've had similar issues when JITing dispatch ops. You need to store the pointer in a local MVMuint64 variable and then use that in the assembly, like I did here: github.com/MoarVM/MoarVM/blob/52c4...dasc#L3191 07:06
07:33 evalable6 left, linkable6 left 07:35 evalable6 joined 07:54 CaCode_ joined 07:56 CaCode- joined 07:57 CaCode left 07:59 CaCode_ left
MasterDuke nine: ah, but i am! github.com/MoarVM/MoarVM/pull/1582...1688-R1690 08:08
nine MasterDuke: no, you're not. You are casting in the assembly instruction 08:11
MasterDuke oh. interesting. because there are a bunch of other places that do that too... 08:12
nine Maybe that's not the problem anyway 08:13
It's not like I really understand these details. I arrived at the code empirically :)
MasterDuke well, i'll certainly try switching to see if that changes anything. bit easier to test right now with this windows vm spun up, but wow, i haven't used windbg in forever and am not very proficient in it 08:15
nope, same crash 08:21
nine :/
08:34 linkable6 joined 09:34 linkable6 left, evalable6 left 09:36 evalable6 joined
nine 7.823s by using an IterationBuffer instead of Array and micro optimizations like using a local variable to safe repeated lookup of the same attribute. Yes accessing an attribute is very cheap, but we then have to also apply the same guards we've checked before 09:54
And we can't do anything about this in spesh because multi threading means that the attribute can indeed change between those accesses 09:55
jnthnwrthngtn moarning o/ 09:57
lizmat nine++ jnthnwrthngtn o/
.oO( JIT greeting )
MasterDuke github.com/MoarVM/MoarVM/pull/1582 updated to just use jg_append_call_c, which does seem to work 10:01
nine MasterDuke: now that I see it, is hll_name actually a stable pointer, i.e. always allocated in gen2? 10:03
MasterDuke i think so github.com/MoarVM/MoarVM/blob/mast...L995-L1037 10:07
nine MasterDuke: one can side-step that question by noticing that an MVMCompUnit doesn't only have an hll_name, but also an hll_config pointer. So we could actually use that with a to be written MVM_hll_sym_get_from_config 10:08
That would avoid a costly lookup by hll name (which requires taking a lock) 10:09
Oh, no, we can't. The symbols are not part of the hll config, but stored in a different nested hash 10:10
MasterDuke CI is clean, and it's (very slowly) working its way through a rakudo spectest in my vm 10:29
nine Would be nice to find out what the issue is with these JITed calls on Windows 10:35
10:35 linkable6 joined
MasterDuke i agree, but that's way above my paygrade 10:38
nine only if you give up :) 10:39
MasterDuke heh. it's also an -Ofun consideration. i'm not finding developing/debugging in this windows vm the highlight of my days 10:43
nine That I can understand extremely well 10:45
11:30 Altai-man joined, squashable6 left 11:32 squashable6 joined 11:38 squashable6 left 11:39 squashable6 joined
nine MasterDuke: oh, I may have one more thing to try. You were using call &MVM_hll_sym_get, but everywhere else we call function pointers with callp instead of call 11:39
Well, not everywhere. But it's only MVM_OP_newtype and MVM_OP_newmixintype which violate this and they don't sound like terribly hot ops, so we may just have been lucky so far. 11:41
And they are very recent as well
MasterDuke hmmm, i'll give that a try 11:51
nine ++MasterDuke 11:52
MasterDuke somewhat off topic, but wow, we have a ton of failing spectests on windows 11:54
ugexe github.com/Raku/roast/issues/504 12:00
might give some insight into what you can safely ignore
MasterDuke ugexe++ i hadn't seen that before 12:01
12:02 reportable6 left
MasterDuke nine++ looks like call vs callp might be it 12:03
12:03 reportable6 joined
MasterDuke i think i'll leave the PR as is because it means less hand-written asm, but i'll change to two we have now and i'm going to go back and look at that PR from a while ago that had some randomly failing 12:10
12:35 discord-raku-bot joined
Geth MoarVM: MasterDuke17++ created pull request #1583:
Use callp instead of call in jit of new(mixin)type
MoarVM: MasterDuke17++ created pull request #1584:
Another attempt at jitting some not so common ops that are still seen bailing in a spesh log
MoarVM: ba03cb2e5e | (Daniel Green)++ | src/jit/x64/emit.dasc
Use callp instead of call in jit of new(mixin)type

Callp should be used with function pointers, but we've just been lucky so far (probably because these aren't very hot ops).
MoarVM: af2f79a087 | MasterDuke17++ (committed using GitHub Web editor) | src/jit/x64/emit.dasc
Merge pull request #1583 from MasterDuke17/fix_lego_jit_new(mixin)type_implementations
13:35 evalable6 left, linkable6 left
[Coke] (failing windows spectests) oops 13:40
[Coke] kicks off a spectest on a recent build. 13:41
Geth MoarVM: 4c3f4cde46 | (Daniel Green)++ | src/jit/graph.c
Lego JIT of getcurhllsym
MoarVM: 51bff712c2 | MasterDuke17++ (committed using GitHub Web editor) | src/jit/graph.c
Merge pull request #1582 from MasterDuke17/jit_getcurhllsym
[Coke] MasterDuke: it's slow, but only a single failure so far. 13:58
MasterDuke how are in? 13:59
[Coke] ... and it hug. 14:32
last displayed is S17-procasync\encodingt
has a few failures before that.
redoing with TEST_JOBS=1 14:33
japhb So what's the reason for callp? 14:39
timo now that we changed how the callsite intern cache is accessed (i.e. less locking) maybe i should try that branch again that tries to find an existing interned callsite for a given callsite and an operation on it faster than just going through the interning process 14:50
not sure if there's a reason to believe it would be faster now
the "replace one existing argument with a different value of compatible type" system call could be A Thing though. it doesn't need to create a new callsite, after all! 14:53
jnthnwrthngtn Yeah, that could be common enough to be worth the special case. 14:56
timo i see one spot in nqp's dispatchers where it looks usable; that's in nqp-multi
jnthnwrthngtn I think the Rakudo one that replaces a Scalar container with the value that was in it, it'd also be useful 14:59
Also where we replace a code object with the bytecode handle read out of it
timo i'll base it on the insert arg rather than the insert arg literal ones, right?
yeah, for those two cases definitely
jnthnwrthngtn Yeah, that's probably more useful 15:01
Although we could easily find we want both forms
timo will be a SMOP :) 15:03
MasterDuke interesting. setboolspec and settypecache are two of the few remaining bails, both are the only places in interp.c that call MVM_malloc 15:06
jnthnwrthngtn MasterDuke: Extract them to functions, I'd say 15:09
timo i now realize: say we have $cap with $a, $b, $c, and first we replace-arg $x, 1 and then we drop-arg 1, we'd end up with the same capture in our little tree, except the entry already exists from using the drop-one. we'd have skipped creating the callsite for that, though, as that's what we're making replace-arg for anyway
so there'd have to be a kind of strategy for merging here; if the capture entry is NULL, overwrite it with what we just generated 15:10
but how do we actually recognize this is the case ...
MasterDuke jnthnwrthngtn: k
timo alternatively: that's a DIHWIDT and yu should have done it the other way around if you knew you were going to need the intermediate one 15:11
15:31 patrickb joined 15:36 evalable6 joined, linkable6 joined
timo actually, nqp could also use replace-arg for the case of nqp-call where $is-regex, however, it would want to insert a literal-obj arg rather than just a regular arg 15:36
(if the code is a constant) 15:37
(i wonder how often that isn't the case tho, hmm.)
oh, that same shape is in a couple more of the branches of nqp-call 15:38
15:42 Altai-man left
Geth MoarVM/dispatcher-replace-arg-syscall: bcdef5ee74 | (Timo Paulssen)++ | 5 files
implement dispatcher-replace-arg syscall

can save allocating one capture object when a drop-arg is immediately followed by an insert-arg.
timo this has an nqp branch as well, but compilation fails very, very early in nqp build 15:54
can't check it out more closely right away, maybe someone else can have a look
i assume there should also be a check for the callsite being interned or not so it can be cloned for the new capture when replacing, to get the lifetime right 16:01
[Coke] TEST_JOBS=1 I'm up to S26... 16:05
MasterDuke: you have a list of failed tests somewhere? I'll throw these in a gist at least
MasterDuke github.com/Raku/roast/issues/504
16:08 patrickz joined, patrickb left
[Coke] MasterDuke: gist.github.com/coke/f047e704d51be...d1fdee2e4d 16:48
that's with a monday-vintage build 16:49
MasterDuke looks like some overlap with the stuff from before, but not perfectly 16:50
[Coke] Right 16:51
each of us has stuff that passed for the other
Geth MoarVM/new-disp-nativecall: 493a80ed20 | (Stefan Seifert)++ | 17 files
Remove obsolete native call JIT implementation based on invoke protocol
MoarVM/new-disp-nativecall: 731da66b27 | (Stefan Seifert)++ | 11 files
Remove old invoke protocol
MoarVM/new-disp-nativecall: 0462e6e9c5 | (Stefan Seifert)++ | 12 files
Remove old invoke protocol
nine jnthnwrthngtn: don't really know what to write about ^^^.
17:15 patrickz left
jnthnwrthngtn nine: Perhaps something like "Remove leftovers from the previous calling conventions, now that everything using them has been eliminated." 17:19
Also, yay, 2 pointers smaller frames :D 17:20
There's a further cleanup too: github.com/MoarVM/MoarVM/blob/1f98...rame.c#L47 17:29
That `+ static_frame_body->cu->body.max_callsite_size)` can go away (which will mean we have smaller work areas too :))
And this is probably the only reason we have a max_callsite_size field, so it likely also can go away
17:40 patrickb joined
Geth MoarVM/new-disp-nativecall: d363f14d1d | (Stefan Seifert)++ | 16 files
Remove old invoke protocol

Remove leftovers from the previous calling conventions, now that everything using them has been eliminated.
nine likes removing stuff
lizmat chainsaw time! 17:44
so I was looking at "foobarbaz".subst("x",:g) for ^10000
and about 45% of time is spent in the Cool.subst proto 17:45
which consists of: $/ := nqp::getlexcaller('$/');
which is especially a pity since that code doesn't actually do anything with regexes 17:46
or set $/ 17:47
timo looks like discord messages aren't making it over to here 17:54
lizmat I thought only #raku was being bridged ? 17:57
17:58 patrickb left
lizmat m: my multi method rinse(Str:D:) { "" }; say "foobarbaz".rinse; 18:00
camelia No such method 'rinse' for invocant of type 'Str'. Did you mean
in block <unit> at <tmp> line 1
lizmat I sorta hoped that, now that method caching is done at the invocation site, that this would just work :-) 18:01
18:03 reportable6 left 18:04 reportable6 joined
nine Oh nice! Looks like new-disp fulfilled my hopes that multi dispatch with constant args (like multi foo(1) { }) will be as fast as dispatch on types. Let's me get rid of this workaround: github.com/niner/Inline-Perl5/blob...5.pm6#L252 18:10
timo there's a check in emit_args_ops that should have exploded given the new structure a capture path may have, so it's not making it that far yet it seems like 18:12
lizmat Sometime I wonder whether we should introduce a IterationBuffer.new that would take positional args 18:16
looks like it could make that piece of code about 2x as fast 18:37
PR coming up
jnthnwrthngtn lizmat: Is that proto actually required for correctness any more? 18:51
lizmat you mean the Cool.subst one? 18:52
I think so :-(
jnthnwrthngtn lizmat: Under the previous dispatch model a proto was invoked on uncached calls to prime the cache, so one had to defensively assume that it would be.
lizmat: Under new-disp, a simple proto is never invoked, no similar defenses ain't needed 18:53
There's a high chance the $/ trickery there is only working around the pre-new-disp semantics
lizmat so you mean the getlexcaller() stuff can go ?
that would be ... big!
jnthnwrthngtn Yes, so the proto can become just a normal onlystar one. 18:54
lizmat ok, lemme try that and see what breaks :-)
jnthnwrthngtn Worth a try; I may be missing something but I think it maybe just goes away. :)
What is the signature?
Just | ?
lizmat yup
jnthnwrthngtn Yeah, that will satisfy the "simple proto" rule 18:55
lizmat builds and runs a spectest
jnthnwrthngtn If it passes and helps perf, may be worth an audit for similar cases. 18:58
BTW, did you see a slight spectest time reduction after the last bump? 19:00
lizmat yeah... I will :-)
you betcha!
MasterDuke speaking of PRs, any objection to merging github.com/MoarVM/MoarVM/pull/1584 ? these were the ones that had unexplained windows failures before, but nine++ noticed i should have been using callp instead of call with the function pointers and now they look fine 19:02
jnthnwrthngtn It was pointed out to be that test-t also reached a new low :) 19:03
Uh, that sounds less positive than it is... :D
MasterDuke but less positive is good in this case...
19:04 evalable6 left, linkable6 left
lizmat spectest is clean, yay! 19:04
MasterDuke though going negative might mean rakudo is overfitting its optimizations for that particular benchmark...
jnthnwrthngtn lizmat: And...is it a speedup, given the thing that took 45% of the time is gone? :)
lizmat lemme double check that 19:05
19:05 evalable6 joined 19:06 linkable6 joined
jnthnwrthngtn MasterDuke: Not sure there's a massive benefit from inlining what the interpreter does vs. moving the logic out of interp.c into functions and calling them 19:07
(The latter being easier maint)
MasterDuke ok, that's a pretty simply change 19:09
jnthnwrthngtn It's also lower risk (as in, we can trivially reason that the function call in the JIT is correct)
(And then the function is just the C code we already had)
MasterDuke will do that after i finish saving the earth from these aliens that have invaded black mesa
jnthnwrthngtn Have fun :) 19:10
Talking of having fun, I'll be away from tomorrow until mid-Monday.
lizmat .subst now up to 1.9x as fast 19:11
that's for not finding a string in a string :-) 19:12
github.com/rakudo/rakudo/commit/0dae67a60d 19:14
going after the other ones now 19:15
jnthnwrthngtn That's not at all bad :) 19:16
Deleted code is optimized code
lizmat .match is another one :-) 19:17
jnthnwrthngtn: any idea what $/ := nqp::getlexcaller('$/'); is doing in sub callwith and sub nextwith ?? 19:30
and callsame ?
and nextsame ?
and samewith ?
I guess just in case it's needed ? 19:31
jnthnwrthngtn lizmat: I'd guess so the callsame doesn't hide a $/ that was in scope in the thing that used callsame
timo oh the nqp build looks pretty good 19:32
jnthnwrthngtn lizmat: Those probably have a more user-visible effect than the ones you are removing
timo tests pass as well 19:33
nine jnthnwrthngtn: raku-meth-call-mega doesn't have to guard on the invokee because raku-meth-call already did that, right? 19:34
Geth MoarVM/dispatcher-replace-arg-syscall: 0a5b37a17d | (Timo Paulssen)++ | 5 files
implement dispatcher-replace-arg syscall

can save allocating one capture object when a drop-arg is immediately followed by an insert-arg.
nine No, that's not it. It just adds the guards later
timo this fixes the syscall, with the branch from nqp, it already works for the cases we have
jnthnwrthngtn nine: The *point* is not to guard on the invocant 19:36
ah, you said invokee
But I don't think it guards on that either, though probably on its type (e.g. Method) 19:37
timo i find it difficult to come up with tests for the argument shifting ops 19:38
nine Ok, since I may even use the words wrong, better ask directly: we're making decisions based on nqp::how_nd(nqp::captureposarg($capture, 0)). What makes this safe?
jnthnwrthngtn We read the flat method cache attribute out of the HOW. That establishes a type/concreteness guard on the HOW. 19:40
The principle is that if we have a lot of variance of type at the value level, we try and look for stability at the meta level instead.
We are coupling to ClassHOW as a result. 19:41
nine Ah, yes, the implicit guards. Certainly catching a lot of errors but are easy to forget :)
jnthnwrthngtn That's OK because we own it.
They are not only good for correctness, but also save us the cost of multiple syscalls :) 19:42
nine What about the else branch? That's using the $how as well but doesn't track the how's attribute 19:44
jnthnwrthngtn The else branch of raku-meth-call-mega? 19:47
nine yes
jnthnwrthngtn There's two cases there. The second one throws an exception, which means we don't install a dispatch program.
The first one does `nqp::dispatch('boot-syscall', 'dispatcher-do-not-install');`
Since we're not installing the dispatch program, then guards don't matter. 19:48
There may be a better strategy for the first branch than "just end up in the dispatcher again". It probably means we have the double-whammy of a callsite seeing numerous types and method names, combined with a FALLBACK or similar 19:50
timo now how do i get a nice measurement that tells me the replace-arg thing helped 19:51
jnthnwrthngtn timo: --profile-compile -e '' and look at BOOTCapture allocation counts
timo oh, do these actually show up in the profile? 19:52
jnthnwrthngtn Yes.
timo i wasn't sure if they were getting prof_allocate instructions generated for them somehow
jnthnwrthngtn I think we always place them after dispatch_o (as we did with invoke_o) 19:53
nine Would it make sense to add found methods to the $!cached_all_methods_table?
timo you said you were running --profile-compile on the core setting? is that a good idea to try? 19:56
jnthnwrthngtn timo: --profile-complie -e '' (the empty program), which is enough to warm a bunch of callsites anyway 19:57
nine: Certainly not in-place add (hashes ain't threadsafe), but clone + add maybe 19:58
nine: I'm not sure what reasonable situation it would happen in, though
And the copying means it's a costly thing to do if it gets out of hand
timo oh, ok 20:00
nine I guess we could leave that to find_method anyway
timo BOOTCapture 8645, BOOTTracked 5269 20:02
+/- 1 20:03
cool, i literally only need to recompile nqp and rakudo! ... damn, that's almost everything :D
20:06 evalable6 left, linkable6 left 20:07 evalable6 joined, kjp left 20:09 linkable6 joined, kjp joined
lizmat jnthnwrthngtn : with the .subst changes, cro::webapp all of a sudden feels a lot faster :-) 20:09
MasterDuke sadly, building nqp+rakudo takes the same time as just building moarvm on windows. msvc is slow 20:10
timo i was doing something wrong and had to recompile many times 20:11
MasterDuke btw, i just tried doing a --profile-compile of building CORE.c earlier today and got the old behavior of the build completing, but no message and no profile
timo 20 allocations less! 20:12
MasterDuke hm, looks like it might have been the oom killer, but whey didn't it tell me it was doing so 20:13
lizmat timo: per ? or total? 20:18
timo total 20:19
for -e ''
lizmat ah... well, on a total of?
timo m: say 8645 * 100 / 8665
camelia 99.769186
MasterDuke what about for something like that mqtt test?
timo about a third of one percent :P
lizmat well, every little bit helps! 20:20
nine 2021-10-27T15:03:41.8789620Z Type check failed for return value; expected CompUnit:D but got Method (method sink (Mu: *%_...) 20:21
Looks kinda bad: dev.azure.com/MoarVM/6b7ef196-3f6c...9/logs/124
2021-10-27T15:25:53.1335746Z No such method 'defined' for invocant of type 'NQPRoutine' in dev.azure.com/MoarVM/6b7ef196-3f6c...9/logs/212 20:22
MasterDuke random java error/backtrace from comma gist.github.com/MasterDuke17/76bd2...a32c8cde2d
20:23 kjp left
jnthnwrthngtn nine: Which PR do those go with? 20:23
nine jnthnwrthngtn: those messages are from CI runs of "Migrate special return to callstack and simplify return handling". Do you think it's likely to be caused by your changes or do they sound like pre-existing bugs?
lizmat logs.liz.nl/moarvm/live.html now runs on Rakudo master 20:24
timo well, now that i have code for the "replace arg with tracked value", i can relatively easily make "replace arg with literal value" in the same way, and that would go in more spots in the nqp dispatchers 20:25
20:25 kjp joined
timo and after that, the rakudo dispatchers would be next 20:25
20:25 Geth left, Geth joined
jnthnwrthngtn nine: It whiffs of deopt bug, and I'm not immediately sure how I might have caused one of those. It's possible. 20:26
20:26 RakuIRCLogger left, RakuIRCLogger joined
lizmat Geth and the logger now also running on Rakudo master 20:26
jnthnwrthngtn nine: Generally when working on it I either broke things and got massive explosions all over because the things I was touching are heavily used...or things worked.
20:27 TempIRCLogger left
jnthnwrthngtn It is suspect we're only seeing them with that PR. 20:27
20:28 TempIRCLogger joined
jnthnwrthngtn I can look into it after my long weekend away. Too tired to do so now. 20:30
20:31 kjp left
jnthnwrthngtn lizmat: I was about to ask how it helped them, then remembered...escaping :) 20:33
20:33 kjp joined
lizmat yup 20:33
jnthnwrthngtn Always nice when perf improvements show up so nicely in something realistic :) 20:34
lizmat indeed.... and such a simple change :-)
I think test-t will also benefit from these changes :-)
nine jnthnwrthngtn: have a nice time off!
jnthnwrthngtn lizmat: I susepct there'll be a long tail of not-immediately-obvious new-disp payoffs :) 20:38
A further test-t improvement would be welcome
nine: Thanks!
timo m: say 7117 * 100 / 8665 20:49
camelia 82.135026
timo now that's more promising
it's not really got a good place to go, but putting a "there was this much in the nursery already before starting the profile" and "there was this much in the nursery after stopping the profile" could be interesting 20:51
Geth MoarVM/dispatcher-replace-arg-syscall: 4cdce310f3 | (Timo Paulssen)++ | 3 files
implement replace-arg-literal-obj

for when you want to replace an argument with a literal object rather than a tracked value.
MoarVM: timo++ created pull request #1585:
Dispatcher replace arg syscall
21:33 CaCode_ joined 21:36 CaCode- left 21:38 CaCode- joined 21:41 CaCode_ left 22:33 kjp left 22:35 kjp joined 22:49 dogbert17 joined, dogbert11 left 22:56 dogbert17 left 23:03 dogbert17 joined 23:09 dogbert17 left
timo m: say (1500 * 40) / 1024 23:29
camelia 58.59375
timo m: say ((1500 * 40) / 1024) / (4 * 1024 * 1024)
camelia 0.0000139698
timo did i get the scales right? 1.5k less captures, which MVMCapture is 40 bytes big, is about 0.001% of a nursery fill? 23:30
m: say 100 * ((1500 * 40) / 1024) / (4 * 1024 * 1024)
camelia 0.001396984