Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:03 reportable6 left 00:05 reportable6 joined 02:06 squashable6 left
Geth MoarVM/new-disp: 43ddfce367 | (Jimmy Zhuo)++ | 3 files
Fix a bunch of warning on windows.
02:17
03:06 linkable6 left, evalable6 left 03:07 evalable6 joined, squashable6 joined 03:21 frost joined 04:34 releasable6 left, statisfiable6 left, shareable6 left, squashable6 left, unicodable6 left, benchable6 left, committable6 left, quotable6 left, bisectable6 left, notable6 left, sourceable6 left, reportable6 left, nativecallable6 left, bloatable6 left, coverable6 left, evalable6 left, greppable6 left, tellable6 left, nativecallable6 joined 04:35 squashable6 joined, greppable6 joined, reportable6 joined, coverable6 joined 04:36 shareable6 joined, statisfiable6 joined 05:08 linkable6 joined 05:34 evalable6 joined 05:35 committable6 joined, quotable6 joined 05:36 sourceable6 joined
Nicholas good *, #moarvm 05:50
06:02 reportable6 left
Geth MoarVM/new-disp: 33c84b1eaa | (Jimmy Zhuo)++ | 5 files
Fix more warnings
06:17
MoarVM/new-disp: 45a077ecb1 | (Jimmy Zhuo)++ | src/platform/win32/mmap.c
Fix mmap warnings on Windows
06:29
06:36 bisectable6 joined, unicodable6 joined 06:37 bloatable6 joined
nine Done a bunch of compilations. Best stage parse time was 34.361 for new-disp and 32.908 on master. At least mast and mbc times got much better, so in total it's 45.186 vs 45.064s. 06:46
To be fair though, new-disp contains improvements to mast and mbc that are not related to dispatchers and could be applied to master as well. 06:47
Darn...there seems to be another spesh bug. One of our applications fails to start due to "Type check failed for return value; expected CompUnit::Handle:D but got BOOTIO (BOOTIO)" in CompUnit::Loader.load-precompilation-file 07:13
Nicholas it's got some hash improvements which (slightly) reduce work during each GC run. But that might only be measurable with callgrind
07:34 tellable6 joined, releasable6 joined 07:36 benchable6 joined 08:04 reportable6 joined
lizmat latest numbers: 1.329 / .752 08:16
startup not noticeably different 08:17
for reference, on master on a similarly heated up machine: 1.311 / .624 08:23
startup times: master / new-disp: .100 / .160 08:24
08:36 lizmat_ joined, TempIRCLogger__ left 08:37 lizmat left, [Coke]_ joined, TempIRCLogger joined, sena_kun joined, lizmat_ left, lizmat joined 08:38 Geth left 08:39 Geth joined, [Coke] left 09:03 brrt joined
Nicholas good *, brrt 09:27
09:35 notable6 joined
jnthnwrthngtn lizmat: Wow, so if we exclude startup time, new-disp is winning on test-t for the single-threaded case. 09:46
lizmat yeah, looks like :) 09:47
jnthnwrthngtn Nice. If only getting startup down was easy :P
09:48 patrickb joined
patrickb o/ 09:49
Nicholas \o
jnthnwrthngtn: you could declare it SEP (for now). That's easy enough. 09:50
lizmat jnthnwrthngtn: fwiw, I would focus on getting it better after startup atm
Nicholas yes good point. Most slow programs are long lived :-) 09:51
jnthnwrthngtn It's good we're getting closer to the merge point, because I suspect the amount of time I can focus primarily on MoarVM/Raku is going to drop soonish (grant nearly used up). 09:53
patrickb I feel a bit stupid for asking this, but given startup time is below the perceivable delay, why is a longish startup time so bad? (I know long startup time hurts usecases like roast that starts up rakudo very often, but that's a corner case, isn't it?)
jnthnwrthngtn patrickb: The primary reason it bothers me is that precomp spawns a process for each module it pre-compiles. 09:55
Nicholas There's also a "marketing" reason - it can be measured and compared with other "competitor" languages, and being lower looks better. 09:56
I like jnthnwrthngtn's answer better.
patrickb Ah. So it hurts precompilation badly. Understood.
jnthnwrthngtn Nicholas: Yeah, but I already learned "never read the comments" :D
(On Reddit. In the code they're sometimes worth it...) 09:57
patrickb (I stilll hope nines++ work on in process precompilation will one day become a reality.)
nine gets flashbacks
patrickb (Sorry for that)
ls 09:58
brrt good * Nicholas, nine, patrickb, lizmat, jnthnwrthngtn 10:02
lizmat brrt o/
jnthnwrthngtn o/ brrt 10:04
Geth MoarVM/new-disp-cgoto: e3781f7290 | (Jimmy Zhuo)++ | 2 files
Add GCC computed goto for dispatcher.
10:06
MoarVM: zhuomingliang++ created pull request #1548:
Add GCC computed goto for dispatcher.New disp cgoto
10:07
jnthnwrthngtn oh wow, somebody has saved me a task :D 10:10
Nicholas (I thought somewhat similar - I hoped that someone else would do this before I got near it) 10:11
jnthnwrthngtn Though I'm confused that it seems to be always disabled: github.com/MoarVM/MoarVM/commit/e3...79806R2680 10:13
Nicholas I was *about* to check this - did my build actually build with it?
jnthnwrthngtn ah, nine++ already spotted it 10:14
patrickb o/ brrt 10:15
Nicholas If I change it to 1, it compiles 10:17
(ship it!)
and NQP is now somewhere into the build
have to go AFK for a little bit
10:30 discord-raku-bot left, discord-raku-bot joined
jnthnwrthngtn Seems Routine can become 24 bytes smaller (on MoarVM), which is a nice bit of memory to claw back 10:34
Although I'm about to eat some of that win with a bitfield on Signature, if an opt idea I have works out... :) 10:35
10:42 JimmyZ joined
JimmyZ github.com/MoarVM/MoarVM/commit/e3...79806R2680 was a quick push before I went home, and connect github is hard due to GFW, feel free to remove it. 10:43
:) 10:45
Nicholas JimmyZ: well, it does seem to work with the C code changed to enable it. ASAN hasn't said anything...
JimmyZ thanks for testing it
jnthnwrthngtn JimmyZ++ # thanks for doing this!
10:48 JimmyZ left 11:36 JimmyZ joined
JimmyZ dinner back :) 11:36
jnthnwrthngtn: the i < dp->num_ops check part is what I said: a bit ugly, and I really don't know how to get rid of it ;) 11:41
12:03 reportable6 left 12:04 reportable6 joined
nine JimmyZ: I think what jnthnwrthngtn meant was just do op = dp->ops[i++]; goto *LABELS[op.code]; unconditionally and just replace the NEXT; with return 1; for in OP(MVMDispOpcodeResultBytecode) and OP(MVMDispOpcodeResultCFunction) 12:05
s/for in/in/
JimmyZ nods 12:09
jnthnwrthngtn Also MVMDispOpcodeResultObj and friends for value results 12:53
nine yes 12:56
jnthnwrthngtn Mmm...red curry. 12:57
nine Oh how I'd love to have that right now... Enjoy :)
13:04 evalable6 left, linkable6 left 13:21 JimmyZ left
jnthnwrthngtn CORE.setting is full of things like: 13:21
multi sub infix:<==>(Num:D \a, Num:D \b --> Bool:D) {
I think at some point maybe it performed better that way 13:22
However, a new opt I've been playing with relies on them being $a (it does caller-side decont, not callee-side, which has many advantages)
(And was pretty much impossible to really do prior to new-disp) 13:23
lizmat jnthnwrthngtn: if you want me to change the core in that respect, then I will do that :-)
jnthnwrthngtn lizmat: Yeah, I've done a few of them to verify the effect of it, but there's a load more. And it's not a totally mechanical change in that some small handful of ops do need the raw thing 13:24
Like infix:<,>
Need to spectest to see how much fallout there will be too
Exactly 1 failing test. 13:30
That's quite good for such a huge hange 13:31
*change
And suggests this approach will indeed work out
lizmat jnthnwrthngtn: so do you want me to go through the core and change the \a to $a where it's possible ? 13:34
jnthnwrthngtn lizmat: Yes, maybe wait a moment to not merge conflict with the few I've done 13:36
So, the cost of what I've done is that when we do .VAR we gain a ScalarVAR 13:42
Not just a Scalar. This may actaully help us
lizmat I guess you'd want me to do that on the new-disp branch, right ? 13:45
jnthnwrthngtn Yes. I think I fixed the test. Moment. 13:46
m: say 1.21 / t/spec/S02-types/mu.t 13:47
camelia 5===SORRY!5=== Error while compiling <tmp>
Undeclared name:
S02-types used at line 1
Undeclared routines:
mu used at line 1
spec used at line 1. Did you mean 'sec'?
t used at line 1
jnthnwrthngtn lol
m: say 1.21 / 2.964 13:48
camelia 0.408232
lizmat that feels... impressive ?
jnthnwrthngtn Not sure quite how my change is so beneficial to that benchmark, but... :)
Oh, I see
for ^10_000_000 {
$total = $total + %h<a> + %h<b>;
}
The +s certainly end up in a better place... 13:49
13:59 brrt left
lizmat jnthnwrthngtn: just tell me when to go for it :-) 14:05
14:06 linkable6 joined
jnthnwrthngtn Pushed. Have at it! 14:06
github.com/rakudo/rakudo/commit/14...e8c5eR2623 is recommended reading on why this strategy holds quite some promise, btw 14:08
Assuming the RakuAST-based compiler is better at scope flattening and I get back to working on our EA, this could be a nice win
Assuming the RakuAST-based compiler is better at scope flattening and I get back to working on our EA, this could be a nice win 14:09
Nicholas EA? 14:16
lizmat Escape Analysis
Nicholas aha thanks
jnthnwrthngtn: I hope that your beer fridge is primed
jnthnwrthngtn It is well stocked at present :)
Nicholas But you'll be making a well-deserved dent soon? 14:17
jnthnwrthngtn A small one at least, yes :) 14:20
14:24 frost left
lizmat jnthnwrthngtn: so, by definition $a would be deconted, right ? 14:27
jnthnwrthngtn jnthn.net/tmp/fridge-status.jpg # current status
lizmat: If there's an Int type constraint, yes, so we could lose some nqp::decont in places too
Nicholas "bowmore barrel aged" - that's quite specific... 14:28
lizmat jnthnwrthngtn: looking at infix:<===>(Enumeration,Enumeration)
jnthnwrthngtn lizmat: Given Enumeration isn't Iterable, probably works out 14:29
lizmat ok, so it's only Iterables I need to worry about
jnthnwrthngtn Nicholas: I guess, though "whisky barrel aged" would leave open quite a range of possibilities 14:38
I mean, Jack Daniels and Octomore are both whisky but... :) 14:39
dogbert17 mumbles Ardbeg 14:41
lizmat jnthnwrthngtn: working on it, will be away for a few hours, then will continue :-) 14:44
14:45 patrickb left, patrickb joined
jnthnwrthngtn Cool :) 14:45
14:47 codesections joined 14:49 patrickb left, patrickb joined 14:51 patrickb left, patrickb joined 14:53 patrickb left, patrickb joined 14:55 patrickb left, patrickb joined
dogbert17 There are some very impressive speed gains in new-disp 14:56
jnthnwrthngtn dogbert17: You've been measuring? :)
14:57 patrickb left 14:58 patrickb joined
dogbert17 yeah, I have a bunch of smaller programs, problem solving tasks mostly, and many of them show impressive gains 14:58
jnthnwrthngtn dogbert17: relative to master, or to earlier new-disp?
dogbert17 master
one example, not the best, a program takes 19s on master and 12s on new-disp 14:59
the mest one is 24s on master and 8s on new-disp
*best
*best 15:00
jnthnwrthngtn whoa
dogbert17 :)
jnthnwrthngtn Are they using multi dispatch with `where`, or callsame, or such things? Or more "boring"?
dogbert17 I would say that they're plenty boring :) 15:01
jnthnwrthngtn Well, nice
dogbert17 Indeed 15:02
have you tried any cro apps?
jnthnwrthngtn Not yet.
15:04 patrickb left, patrickb joined 15:06 patrickb left, patrickb joined 15:07 evalable6 joined 15:08 patrickb left 15:09 patrickb joined 15:10 patrickb left 15:11 patrickb joined 15:13 patrickb left, patrickb joined 15:15 patrickb left, patrickb joined 15:19 patrickb left, patrickb joined
Geth MoarVM/new-disp-cgoto: e743b1bc43 | (Jimmy Zhuo)++ (committed by Jonathan Worthington) | 2 files
Add GCC computed goto for dispatcher.
15:23
15:23 patrickb left
timo i've been using moarperf under new-disp a little bit and it seems to run fine, i did not compare performance 15:24
Geth MoarVM/new-disp: e743b1bc43 | (Jimmy Zhuo)++ (committed by Jonathan Worthington) | 2 files
Add GCC computed goto for dispatcher.
15:31
jnthnwrthngtn Seems to help a little on the CORE.setting stage parse at least 15:32
timo the computed goto support? 15:38
jnthnwrthngtn Yes
timo so a c-level perf record should put the dispatcher_run somewhere near the top
we could also have computed goto for the code that translates disp programs to spesh, not sure if that'd be noticeable tho since the individual pieces of code there do more work than the ops in the run version would 15:39
is there a good number to shoot for in this regard? how big do the labelled bodies have to be for the computed goto benefits to be worth the hassle? 15:40
jnthnwrthngtn Given that we run that quite rarely anyway, probably not quite worth it 15:42
timo also interesting to see the type of "op" went from MVMDispProgramOp* to MVMDispProgramOp, which means we copy every time around the loop, but don't have pointer deref, do we know how that trades off, exactly?
MVMDispProgramOp is relatively small, thankfully 15:43
jnthnwrthngtn I spotted that and guessed it was measured, but yeah, I'm not totally sure. 15:44
timo ok my thinking is: this is equivalent to having all fields from the struct as local variables taken from the structs, yeah? since the struct is smaller than a cache line, we'd be reading all of it anyway. we read all the fields at most once, i think. some ops only have one of the two args from the union, in which case we pay a miniscule cost i think? but also, the compiler would already have turned 15:51
multiple accesses via the pointer to a single deref up front anyway
15:52 [Coke]_ is now known as [Cke, [Cke is now known as [Coke]
timo so, um, my assessment is: a shrug + "the compiler is already smart enough to pick whatever is the fastest here anyway" 15:53
jnthnwrthngtn Yeah, that's about my figuring too 15:56
16:05 discord-raku-bot left, linkable6 left 16:06 discord-raku-bot joined
timo MVMDispProgramRecordingValue can be made 8 bytes smaller by re-ordering sayeth pahole 16:24
MVMDispProgramRecordingCapture has a 4 byte hole before the *captures pointer member that could also be moved to the end so it becomes a bit smaller
though, if i remember my FSA size bins correctly (i don't) that may not change anything? 16:25
MVMDispResumptionData could get arg_source moved to the very end, that would close a 4 byte hole and make it go from 48 to 44
by swapping the pointer and code number in MVMNFGTrieNodeEntry it can go from 16 to 12 bytes, though if we align to 16 anyway that's obviously useless 16:29
MVMSpeshCandidateBody has 3 holes summing up to 11 bytes 16:31
MVMSpeshInline has a 1 byte hole and a 4 byte hole
MVMSpeshSimStackFrame has one 4 byte hole, offering a reduction from 88 to 84 bytes 16:32
MVMStaticFrameBody has one 1 byte hole and one 7 byte hole, for 232 -> 224 bytes 16:34
that's most of the interesting findings 16:39
jnthnwrthngtn Managed to get a bit off startup (with latest NQP push) 17:07
17:07 linkable6 joined
timo beautiful 17:12
MasterDuke (is|was)n't there some problem with `constant` for hashes and some combination of not-moarvm backend and/or not-linux os? 17:15
jnthnwrthngtn NQP didn't actually have a `my constant` for hashes until quite recently, when I added them for the sake for rakuast 17:16
well, `my constant` at all
MasterDuke ah, ok
jnthnwrthngtn I don't see why they'd not work off MoarVM, but even if they don't, this is MoarVM-specific code anyway
17:16 brrt joined 17:17 brrt joined
MasterDuke ha, yeah, ignore what i just said 17:17
timo impressed to see process_bb_for_deopt_usage very close to the top at 2.88% samples sampled by perf 17:42
jnthnwrthngtn Yeah, that's not a terribly cheap routine 17:45
otoh it runs on the spesh thread
timo indeed 17:46
we don't have terribly much during just startup that gets very hot
serializadion_read_ref is above process_bb_for_ and serialization_demand_object is below it 17:47
jnthnwrthngtn We currently allocate enough during Rakudo startup to hit a GC run, and a lot of those are MVMCapture and MVMTracked
(Part of the dispatch setup mechanism)
timo asm_exc_nmi sits in between, not sure what that encompasses. possibly related to memory management like brk and mprotect?! 17:48
oh no! we used to not have to do even one run
jnthnwrthngtn Dunno
I think if we exposed a dispatcher-replace-arg or some such we could avoid some insert/drop dances, or make it possible to drop n things 17:49
timo oh, it's possible that that's from perf's collection frequency timeout
codesections congrats on all the perf wins. As expected, though, new-disp does pay for it with a slower warm up; I'm measuring about a 55% penalty for a plain "Hello, world!" script (106.9±1.5ms versus 165.3±1.2ms)
timo i set it to -F max for this
jnthnwrthngtn codesections: Yes; don't know if that includes the NQP commit I pushed or not, though at most it's worth 5% or so 17:50
timo the lookup structure / cache for callsite transformations, that isn't going to help with GC pressure at all i imagine
i mean, it's still to be written, but wouldn't help that particular metric 17:51
jnthnwrthngtn timo: No, but being able to drop multiple args or replace an arg might help us
timo is that an additional syscall that we'd implement and use in our dispatchers? 17:52
jnthnwrthngtn timo: In that it gets rid of an intermediate capture
Yes
timo OK. i think i can build that
jnthnwrthngtn ++timo
timo replace-arg is for "in-place" putting a value (and its primspec / flag) in a spot in the capture and callsite? 17:53
jnthnwrthngtn The nqp-meth-call does a multi-drop like this:
my $args := nqp::dispatch('boot-syscall', 'dispatcher-drop-arg',
nqp::dispatch('boot-syscall', 'dispatcher-drop-arg', $capture, 0),
0);
Yeah, though we could even restrict it to something of the same primspec
Then we know the callsite is already fine 17:54
When I did --profile-compile -e '' then the nqp-meth-call dispatcher was one of the highest CPU users.
It also does this: 17:55
nqp::dispatch('boot-syscall', 'dispatcher-guard-type',
nqp::dispatch('boot-syscall', 'dispatcher-track-arg', $capture, 0));
nqp::dispatch('boot-syscall', 'dispatcher-guard-literal',
nqp::dispatch('boot-syscall', 'dispatcher-track-arg', $capture, 1));
codesections Oh, I thought that was an up-to-date build, but it looks like it didn't include the nqp fix; I'll re-run (though I know it'll be a minor effect – just curious, though)
jnthnwrthngtn But there's no reason we can't expose a dispatcher-guard-arg-type that takes an index
That's 2 less allocations every single time too 17:56
codesections: Probably minor, about 5ms for me
timo MVM_disp_program_record_capture_drop_arg as well as MVM_capture_drop_arg will both want a sibling for n args, if i'm not mistaken?
jnthnwrthngtn timo: Yes, I think so 17:57
timo optionally change both to take an n argument, but at least the MVM_capture_ one is kind of public API
jnthnwrthngtn timo: I think that the capture transform tree doesn't need to model this though
Kind of public? Don't think so, this is all new code in new-disp :) 17:58
timo capture transform tree is our supposed transformation cache thing?
jnthnwrthngtn No, it's part of the dispatch program recording
The insert/drop structure
I don't think we need to actually store an MVMCapture object at every level of it, however
(we do today because there's no multiple drops) 17:59
codesections yep, about a 5ms speedup for me too. Definitely something
jnthnwrthngtn Might need to harden it up a bit to cope with the new possibility of a null there
I should go home to sort out dinner. bbl o/
timo ok, there'd be one entry per dropped arg, but only the last one actually has a Capture to go with it 18:00
jnthnwrthngtn timo: Yes, exactly that
o/
18:02 reportable6 left
bartolin_ MasterDuke: just for the record: There was indeed a problem with using 'constant' for hashes on the JVM backend. But it should work nowadays: github.com/Raku/nqp/pull/717 18:09
MasterDuke i'm not crazy - cool. it's no longer a concern - even cooler. bartolin_++ 18:10
lizmat hmmm... can .isNaN ever be true on an Int ? 18:11
m: dd Int.isNaN
camelia Bool::False
timo yeah we don't hold NaN in Int i think 18:12
lizmat right...
this one strikes me as odd: multi sub prefix:<~>(Str:D $a --> Str:D) { $a.Str } 18:16
why coerce to Str when it is already a Str
?
timo could be a subtype of Str
dunno, really 18:17
lizmat well, but Str.Str is self 18:19
last I checked
multi method Str(Str:D:) { self } # indeed
jnthnwrthngtn: ^^ wonder how much special casing a method just returning self would bring 18:21
MasterDuke github.com/rakudo/rakudo/commit/e0...d6474be0d4 18:22
lizmat interesting 18:23
MasterDuke++ # research :-) 18:25
MasterDuke heh and then github.com/rakudo/rakudo/commit/66...a017bae876
lizmat well, I think it's ok for a subclass of Str to require it returning a subclass of Str on .Str 18:26
timo i must admit i don't really grasp how the CapturePath and its updating work exactly 18:27
MasterDuke yeah, removing the return constraint seems a little overly cautious
lizmat ok, with preliminary adaptations, I got test-t down to 1.278 18:39
was 1.329 18:40
note, this is still without the computed goto
this is encouraging :-)
timo "Can only use manipulate a capture known in this dispatch" wonder what the exact typo here is 18:46
i guess an "or" between use and manipulate would be safe
nine indeed 18:47
timo oh hey nine have you looked at the capture manipulation stuff at all? 18:48
nine a bit 18:52
fixing segfaults n'stuff
timo haha, gen-cat fails under "make" but not when i run the same commandline myself 18:57
19:05 reportable6 joined, brrt left
MasterDuke if a ctx->arg_info.callsite has arg_count of 1, why would it's arg_names be 0x0? 19:13
timo no named args in the callsite?
MasterDuke right, they're all positional. but we don't store the names of the positional args? 19:14
timo positional args don't have names 19:17
MasterDuke huh. i had no idea 19:19
timo if they did have names they wuld be named rather than positional
MasterDuke well, obviously at the nqp/rakudo level they have names, those can't be accessed by moarvm at all? 19:22
timo oh 19:23
that'd be a level higher, you'd have to get and introspect the Signature object for that i think
what is it you're doing?
MasterDuke colabti.org/irclogger/irclogger_lo...09-22#l231 19:25
the error comes from github.com/MoarVM/MoarVM/blob/mast...rgs.c#L129 19:26
so i was playing around with passing the ctx in and adding the arg_names to the message, but as you pointed out, args_names won't work 19:27
nine Ah, yes, there's just no way to do that in MoarVM
timo yeah, something "outside" of that will have to deal with the error
nine A great example of the downside of these adhoc string exceptions 19:28
timo checkarity is an op that is put into bytecode somewhere, and it'll throw that exception
well, we could have hll symbols that can be invoked to throw structured exceptions right?
it'd be more expensive to throw those than to just throw an adhoc 19:29
MasterDuke yeah, i took me a while to even find where it was coming from, i kept looking through the rakudo exceptions at first, but the message didn't exactly match any of them
well, it's a failure path, i'd be less concerned about it being slightly slower if we got better errors in result 19:30
timo i want to rubberduck a little 19:34
Nicholas Go for it!
timo i'm getting an error that a capture is passed to some op where the capture has not been properly registered by whatever op derived it from another one 19:36
what i changed in my branch is that there's now an extra op that drops multiple arguments from a capture in one go
but we still put multiple "dropped an arg" entries in the recording 19:37
just because we don't want to deal with adding that everywhere? probably
so the records in between don't get to have an actual Capture object, that's the saving we're looking for 19:38
the derivation path thingie works like this:
starting at the capture the dispatcher was initially invoked with (or alternatively the last one seen in a resumption op) 19:39
there will be an entry for every capture that was derived from the given one 19:40
we do a depth-first-search through the tree that is created from that
this is where we're not finding the Capture object 19:41
ah, maybe i see the problem now 19:42
oh i think it's now working? 19:46
aha. until the core setting comes up 19:47
so, what i had to change to make it work until then, was that i can only compute this path once, then have to handle pushing newly made records to the end of the path, since otherwise finding the right node depends on the Capture object, which we're skipping here 19:51
some of these capture transformation chains confuse me 20:00
one chain is -> drop arg 0 -> drop arg 0 -> insert a val at 0 -> drop arg 0 -> insert val at 0 -> drop argument 0 -> use that 20:02
like, shouldn't the "insert a val" ones hang off of the same previous entry?
Nicholas (I'm not doing very well at nodding or saying "yes")
and I don't know the answer to your question 20:03
timo hehe. 20:04
i'm adding more debug output 20:05
fprintf(stderr, "%sUnknown transforamtion\n", indent_str);
loving this typo
i think i'm dum 20:18
jnthnwrthngtn lizmat: "with preliminary adaptations" - to Rakudo or to test-t itself? 20:19
lizmat that's the 11 commits I did about an hour ago
nowhere complete yet, but it covered a lot of Str operators 20:20
especially Rational is a lot of work :-)
running a spectest for that now
timo ah jnthnwrthngtn, how do you debug issues with capture transform chain and such? 20:23
and why do these chains look so suspicious to me? adding arguments at 0, then dropping arguments at 0
jnthnwrthngtn lizmat: ah, cool, just pulled a bunch of those :) 20:25
timo: DUMP_RECORDINGS=1 or so; if you've got things right, then your "drop 2 at once" should produce exactly the same chain output as dropping one and then the other 20:27
So if 2 things are dropped you'd need to add 2 entries, they'd also both have the same index I guess, because if you drop two things starting at index 1, it's like you drop at index 1, and then drop again at index 1 20:28
Ah, the DUMP_RECORDINGS thing is a #define
At the top of program.c
timo i'm doing that already 20:31
comparing with/without is an idea
jnthnwrthngtn bbi30 20:35
20:37 linkable6 left, evalable6 left 20:39 linkable6 joined
timo i'm getting the feeling the record that's b0rking isn't being dumped before the exception happens, maybe i need to do something there as well 20:40
wow i'm nt smart 20:46
MasterDuke well, nt was much better than win98 20:47
timo ah, i'm not actually sure what exactly to put into emit_args_ops when the Capture is null 20:58
it wants to use the MVMCallsite at that step
i wonder if i can just skip in that case?! 20:59
oh look it works nao 21:00
the problem was perhaps that i was accidentally reusing the same MVMDispProgramRecordingCapture because it was on the stack in the loop, so after i pulled it out of the MVM_VECTOR where it was copied to, that made things improve a lot 21:03
otherwise the capture would end up in multiple nodes in the capture transformation tree i imagine
just skipping over the entry in the path is probably wrong, in order to calculate the untouched tail 21:11
since at the moment we only ever have null Capture in a path when we're dropping multiple arguments in a row, perhaps it's easy-ish to handle by counting the nulls before a capture and working from there 21:17
also the same index a couple times in a row 21:19
MoarVM oops: Impossible untouched arg tail length calculated in dispatch program 21:25
oops indeed
the code might be correct now 21:28
21:29 discord-raku-bot left, discord-raku-bot joined
lizmat jnthnwrthngtn: did another 8 files that have infix:< in them, getting too tired now 21:29
still 34 files to go... but that will be tomorrow
sena_kun lizmat++ 21:30
tellable6 2021-09-21T13:09:06Z #moarvm <jnthnwrthngtn> sena_kun For Test::Base I've opened github.com/tokuhirom/p6-Test-Base/pull/2
lizmat test-t at 1.275 now
lizmat calls it a night
MasterDuke nice 21:31
jnthnwrthngtn lizmat: Cool, that opt seems to be quite nice once we tweak our operators to take advantage :) 21:34
MasterDuke just got a fail in t/02-rakudo/15-gh_1202.t 21:35
but i think that's a known flapper
yep, passed on a re-run
jnthnwrthngtn Hm, curious, I've not seen that one flap 21:36
MasterDuke it is mentioned in github.com/rakudo/rakudo/issues/4212
jnthnwrthngtn ah, so pre-dates new-disp 21:37
MasterDuke yep
Geth MoarVM/disp_drop_multiple_args_at_once: 45d334b8e6 | (Timo Paulssen)++ | 7 files
add dispatcher-drop-n-args to optimize allocations

Instead of creating a MVMCapture and MVMCallsite for each step of removing arguments, we now offer a syscall that drops multiple arguments that live at the same index in one go.
The result is that the transformations tree can now contain null entries for the capture entry, which we have to interpret and deal with.
21:38
MasterDuke does anyone else have comparison numbers for make m-test? it was just 19s for me, but if i remember correctly (might not be) it's 15s on master
jnthnwrthngtn MasterDuke: 19s for first run, closer to 16s or so for second
Geth MoarVM: timo++ created pull request #1549:
add dispatcher-drop-n-args to optimize allocations
21:39
MasterDuke hm, and spectest was 178, i think master is ~135
21:39 evalable6 joined
MasterDuke got 19s for second run of m-test also 21:40
timo jnthnwrthngtn: have two pull requests for you to look over if you feel like it :3
MasterDuke second m-spectest was 171 though 21:42
timo i can't see a difference in maxrss 21:52
jnthnwrthngtn timo: I was more expecting less allocations than diffrent maxrss 21:54
Will probably review them in the morning, 'cus I'm feeling rather sleepy
timo OK, i don't have a one-liner to count that right away
having to recompile rakudo is also not the funnest :)
MasterDuke timo: startup any faster?
timo don't notice a difference i don't think 21:55
jnthnwrthngtn timo: Did you patch the dispatcher in NQP also? That'd be the bigger win at startup 21:56
timo ooh only in rakudo 21:57
MasterDuke there's a `throw_adhoc` macro in emit.dasc, but it assumes the message doesn't have any arguments. any reason not to create some `throw_adhoc(1|2|3)` macros that do expect the message to have arguments? assuming they just be put in ARG(3|4|5)? 22:00
Geth MoarVM/disp_drop_multiple_args_at_once: 814085079d | (Timo Paulssen)++ | 7 files
add dispatcher-drop-n-args to optimize allocations

Instead of creating a MVMCapture and MVMCallsite for each step of removing arguments, we now offer a syscall that drops multiple arguments that live at the same index in one go.
The result is that the transformations tree can now contain null entries for the capture entry, which we have to interpret and deal with.
22:01
timo forgot a "return". wonder how it still worked tbh
maybe thanks to optimization the return value remained in the right register or so
22:05 sena_kun left
timo ok could be around maybe a third of a meg less maxrss? 22:07
i'll have to take a few more measurements of just new-disp rakudo 22:08
jnthnwrthngtn raku --profile-compiled -e '' should hopefully show less allocations 22:10
timo right
i'll give that a try next 22:11
not going to do that for core setting compilation for now i don't think :) :) 22:13
any reason to use malloc over FSA for new_callsite stuff in core/callsite.c? 22:18
oh yeah 22:23
-e '' goes from 10392 MVMCapture to 8556 22:24
MasterDuke nice. still does a gc run?
timo BOOTCapture*
-e 'say("hello world") goes from 13085 BOOTCapture to 10741 22:25
-e '' doesn't GC on new-disp nor on my branch for me
-e 'say("hello world")' does one gc run on both
MasterDuke ah 22:26
jnthnwrthngtn timo: No reason, I suspect it pre-dates the FSA :)
timo 757KB retained down to 754KB - wonder if there's anything really interesting in there
but this is a single profile run, not sure how noisy it is 22:27
i'll quickly look into fsa for callsite flags and nameds 22:34
could also do it for the callsite itself
MasterDuke oh, i guess the problem with the multiple argument throw_adhoc macro is that the arguments could be any type... 22:42
Geth MoarVM/new-disp: df6d3c945a | (Jonathan Worthington)++ | src/spesh/osr.c
Log more info about OSR when it doesn't work out
22:54
MoarVM/new-disp: 366cd2252a | (Jonathan Worthington)++ | src/spesh/osr.c
Log more info about OSR when it doesn't work out
22:56
MasterDuke but if the cases i actually care about are just strings, then `mov ARG3, throw_arg_1` should be fine, right?
jnthnwrthngtn MasterDuke: I don't know, one'd have to look up the ABI for varargs (and then make sure it's the same for POSIX and Windows). 22:58
MasterDuke seems like it'd make sense to PR any changes against master, so CI can actually test it 23:00
jnthnwrthngtn I'm sure I've seen folks use NativeCall to call varargs things by just declaring functions that pass extra args which implies that the ABI is the same in...their situation.
The answer may be "it's way unportable in general but OK for x64"
MasterDuke yeah, my rough googling suggests it's cleaner for x64 than it was for x86 23:01
jnthnwrthngtn docs.microsoft.com/en-us/cpp/build...60#varargs
MasterDuke and i may get away with it for a small number of args
jnthnwrthngtn That makes it look like you get away with it on Windows for x64 23:02
MasterDuke yeah, was looking at that earlier. glad to know my interpretation seems to line up
Geth MoarVM/put_callsites_in_fsa: 2c3a99703e | (Timo Paulssen)++ | 5 files
Store callsites and its arrays in the FSA
23:16
timo huh. Ir went from 715m to 850m from without to with callsites in fsa 23:24
i didn't set spesh to blocking or anything tho, maybe i should've done that. and hash randomization off as well
that at least keeps the number stable while running the same branch 23:26
oh, no, i got it backwards apparently
MasterDuke it would be nice to be able to turn hash randomization off via env variable 23:27
timo ok it goes from like 889.2 mil to 887.7 mill? 23:30
MasterDuke 2m fewer, that's good 23:31
timo m: say 889.2 * 100 / 887.7 23:32
camelia 100.168976
timo m: say 889.2 R/ 887.7 * 100
camelia 99.831309
MasterDuke what if you make the size of the fsa bins bigger? i had a pr that made just the first page bigger, but istr discussion that make they should just all be bigger as we increased the use of the fsa 23:35
timo what number would i get for master tho? 23:36
i don't know in how far that would help things 23:37
MasterDuke what do you mean what number for master? 23:39
timo Ir for -e ''
MasterDuke '822,514,653 instructions' from `MVM_SPESH_BLOCKING=1 perf stat ./install/bin/raku -e ''` on master 23:40
timo well, that's not great
MasterDuke but we already know master is faster at startup, right? 23:41
i wonder how comparable the numbers are across possible cpu/os/compiler differences? 23:42
timo right
interestingly, the difference between amount of calls to fixed_size_alloc seems very small
oh, i looked at the wrong comparison file 23:43
77k to 111k
MasterDuke interesting, callgrind reports 772m 23:45
and cachegrind reports 773m
perf stat numbers are much more variable, as low as 799m and as high as 827 23:46
but i was only using MVM_SPESH_BLOCKING=1, i hadn't disabled hash randomization
timo 18k calls from callsite_drop_positional, 365.3k calls to _int_malloc (which both malloc and calloc call) in one, 396.8k in the other 23:47
̵ʼʼʼʼʼʼʼʼʼʼʼ 23:51
MasterDuke is that a raku interpreter in befunge? 23:53
timo maybe we should invent hq9+r with the "r" command that starts a rakudo 23:54
23:59 evalable6 left, linkable6 left
MasterDuke is to bed 23:59