Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:03
reportable6 left
00:05
reportable6 joined
02:06
squashable6 left
|
|||
Geth | MoarVM/new-disp: 43ddfce367 | (Jimmy Zhuo)++ | 3 files Fix a bunch of warning on windows. |
02:17 | |
03:06
linkable6 left,
evalable6 left
03:07
evalable6 joined,
squashable6 joined
03:21
frost joined
04:34
releasable6 left,
statisfiable6 left,
shareable6 left,
squashable6 left,
unicodable6 left,
benchable6 left,
committable6 left,
quotable6 left,
bisectable6 left,
notable6 left,
sourceable6 left,
reportable6 left,
nativecallable6 left,
bloatable6 left,
coverable6 left,
evalable6 left,
greppable6 left,
tellable6 left,
nativecallable6 joined
04:35
squashable6 joined,
greppable6 joined,
reportable6 joined,
coverable6 joined
04:36
shareable6 joined,
statisfiable6 joined
05:08
linkable6 joined
05:34
evalable6 joined
05:35
committable6 joined,
quotable6 joined
05:36
sourceable6 joined
|
|||
Nicholas | good *, #moarvm | 05:50 | |
06:02
reportable6 left
|
|||
Geth | MoarVM/new-disp: 33c84b1eaa | (Jimmy Zhuo)++ | 5 files Fix more warnings |
06:17 | |
MoarVM/new-disp: 45a077ecb1 | (Jimmy Zhuo)++ | src/platform/win32/mmap.c Fix mmap warnings on Windows |
06:29 | ||
06:36
bisectable6 joined,
unicodable6 joined
06:37
bloatable6 joined
|
|||
nine | Done a bunch of compilations. Best stage parse time was 34.361 for new-disp and 32.908 on master. At least mast and mbc times got much better, so in total it's 45.186 vs 45.064s. | 06:46 | |
To be fair though, new-disp contains improvements to mast and mbc that are not related to dispatchers and could be applied to master as well. | 06:47 | ||
Darn...there seems to be another spesh bug. One of our applications fails to start due to "Type check failed for return value; expected CompUnit::Handle:D but got BOOTIO (BOOTIO)" in CompUnit::Loader.load-precompilation-file | 07:13 | ||
Nicholas | it's got some hash improvements which (slightly) reduce work during each GC run. But that might only be measurable with callgrind | ||
07:34
tellable6 joined,
releasable6 joined
07:36
benchable6 joined
08:04
reportable6 joined
|
|||
lizmat | latest numbers: 1.329 / .752 | 08:16 | |
startup not noticeably different | 08:17 | ||
for reference, on master on a similarly heated up machine: 1.311 / .624 | 08:23 | ||
startup times: master / new-disp: .100 / .160 | 08:24 | ||
08:36
lizmat_ joined,
TempIRCLogger__ left
08:37
lizmat left,
[Coke]_ joined,
TempIRCLogger joined,
sena_kun joined,
lizmat_ left,
lizmat joined
08:38
Geth left
08:39
Geth joined,
[Coke] left
09:03
brrt joined
|
|||
Nicholas | good *, brrt | 09:27 | |
09:35
notable6 joined
|
|||
jnthnwrthngtn | lizmat: Wow, so if we exclude startup time, new-disp is winning on test-t for the single-threaded case. | 09:46 | |
lizmat | yeah, looks like :) | 09:47 | |
jnthnwrthngtn | Nice. If only getting startup down was easy :P | ||
09:48
patrickb joined
|
|||
patrickb | o/ | 09:49 | |
Nicholas | \o | ||
jnthnwrthngtn: you could declare it SEP (for now). That's easy enough. | 09:50 | ||
lizmat | jnthnwrthngtn: fwiw, I would focus on getting it better after startup atm | ||
Nicholas | yes good point. Most slow programs are long lived :-) | 09:51 | |
jnthnwrthngtn | It's good we're getting closer to the merge point, because I suspect the amount of time I can focus primarily on MoarVM/Raku is going to drop soonish (grant nearly used up). | 09:53 | |
patrickb | I feel a bit stupid for asking this, but given startup time is below the perceivable delay, why is a longish startup time so bad? (I know long startup time hurts usecases like roast that starts up rakudo very often, but that's a corner case, isn't it?) | ||
jnthnwrthngtn | patrickb: The primary reason it bothers me is that precomp spawns a process for each module it pre-compiles. | 09:55 | |
Nicholas | There's also a "marketing" reason - it can be measured and compared with other "competitor" languages, and being lower looks better. | 09:56 | |
I like jnthnwrthngtn's answer better. | |||
patrickb | Ah. So it hurts precompilation badly. Understood. | ||
jnthnwrthngtn | Nicholas: Yeah, but I already learned "never read the comments" :D | ||
(On Reddit. In the code they're sometimes worth it...) | 09:57 | ||
patrickb | (I stilll hope nines++ work on in process precompilation will one day become a reality.) | ||
nine gets flashbacks | |||
patrickb | (Sorry for that) | ||
ls | 09:58 | ||
brrt | good * Nicholas, nine, patrickb, lizmat, jnthnwrthngtn | 10:02 | |
lizmat | brrt o/ | ||
jnthnwrthngtn | o/ brrt | 10:04 | |
Geth | MoarVM/new-disp-cgoto: e3781f7290 | (Jimmy Zhuo)++ | 2 files Add GCC computed goto for dispatcher. |
10:06 | |
MoarVM: zhuomingliang++ created pull request #1548: Add GCC computed goto for dispatcher.New disp cgoto |
10:07 | ||
jnthnwrthngtn | oh wow, somebody has saved me a task :D | 10:10 | |
Nicholas | (I thought somewhat similar - I hoped that someone else would do this before I got near it) | 10:11 | |
jnthnwrthngtn | Though I'm confused that it seems to be always disabled: github.com/MoarVM/MoarVM/commit/e3...79806R2680 | 10:13 | |
Nicholas | I was *about* to check this - did my build actually build with it? | ||
jnthnwrthngtn | ah, nine++ already spotted it | 10:14 | |
patrickb | o/ brrt | 10:15 | |
Nicholas | If I change it to 1, it compiles | 10:17 | |
(ship it!) | |||
and NQP is now somewhere into the build | |||
have to go AFK for a little bit | |||
10:30
discord-raku-bot left,
discord-raku-bot joined
|
|||
jnthnwrthngtn | Seems Routine can become 24 bytes smaller (on MoarVM), which is a nice bit of memory to claw back | 10:34 | |
Although I'm about to eat some of that win with a bitfield on Signature, if an opt idea I have works out... :) | 10:35 | ||
10:42
JimmyZ joined
|
|||
JimmyZ | github.com/MoarVM/MoarVM/commit/e3...79806R2680 was a quick push before I went home, and connect github is hard due to GFW, feel free to remove it. | 10:43 | |
:) | 10:45 | ||
Nicholas | JimmyZ: well, it does seem to work with the C code changed to enable it. ASAN hasn't said anything... | ||
JimmyZ | thanks for testing it | ||
jnthnwrthngtn | JimmyZ++ # thanks for doing this! | ||
10:48
JimmyZ left
11:36
JimmyZ joined
|
|||
JimmyZ | dinner back :) | 11:36 | |
jnthnwrthngtn: the i < dp->num_ops check part is what I said: a bit ugly, and I really don't know how to get rid of it ;) | 11:41 | ||
12:03
reportable6 left
12:04
reportable6 joined
|
|||
nine | JimmyZ: I think what jnthnwrthngtn meant was just do op = dp->ops[i++]; goto *LABELS[op.code]; unconditionally and just replace the NEXT; with return 1; for in OP(MVMDispOpcodeResultBytecode) and OP(MVMDispOpcodeResultCFunction) | 12:05 | |
s/for in/in/ | |||
JimmyZ nods | 12:09 | ||
jnthnwrthngtn | Also MVMDispOpcodeResultObj and friends for value results | 12:53 | |
nine | yes | 12:56 | |
jnthnwrthngtn | Mmm...red curry. | 12:57 | |
nine | Oh how I'd love to have that right now... Enjoy :) | ||
13:04
evalable6 left,
linkable6 left
13:21
JimmyZ left
|
|||
jnthnwrthngtn | CORE.setting is full of things like: | 13:21 | |
multi sub infix:<==>(Num:D \a, Num:D \b --> Bool:D) { | |||
I think at some point maybe it performed better that way | 13:22 | ||
However, a new opt I've been playing with relies on them being $a (it does caller-side decont, not callee-side, which has many advantages) | |||
(And was pretty much impossible to really do prior to new-disp) | 13:23 | ||
lizmat | jnthnwrthngtn: if you want me to change the core in that respect, then I will do that :-) | ||
jnthnwrthngtn | lizmat: Yeah, I've done a few of them to verify the effect of it, but there's a load more. And it's not a totally mechanical change in that some small handful of ops do need the raw thing | 13:24 | |
Like infix:<,> | |||
Need to spectest to see how much fallout there will be too | |||
Exactly 1 failing test. | 13:30 | ||
That's quite good for such a huge hange | 13:31 | ||
*change | |||
And suggests this approach will indeed work out | |||
lizmat | jnthnwrthngtn: so do you want me to go through the core and change the \a to $a where it's possible ? | 13:34 | |
jnthnwrthngtn | lizmat: Yes, maybe wait a moment to not merge conflict with the few I've done | 13:36 | |
So, the cost of what I've done is that when we do .VAR we gain a ScalarVAR | 13:42 | ||
Not just a Scalar. This may actaully help us | |||
lizmat | I guess you'd want me to do that on the new-disp branch, right ? | 13:45 | |
jnthnwrthngtn | Yes. I think I fixed the test. Moment. | 13:46 | |
m: say 1.21 / t/spec/S02-types/mu.t | 13:47 | ||
camelia | 5===SORRY!5=== Error while compiling <tmp> Undeclared name: S02-types used at line 1 Undeclared routines: mu used at line 1 spec used at line 1. Did you mean 'sec'? t used at line 1 |
||
jnthnwrthngtn | lol | ||
m: say 1.21 / 2.964 | 13:48 | ||
camelia | 0.408232 | ||
lizmat | that feels... impressive ? | ||
jnthnwrthngtn | Not sure quite how my change is so beneficial to that benchmark, but... :) | ||
Oh, I see | |||
for ^10_000_000 { | |||
$total = $total + %h<a> + %h<b>; | |||
} | |||
The +s certainly end up in a better place... | 13:49 | ||
13:59
brrt left
|
|||
lizmat | jnthnwrthngtn: just tell me when to go for it :-) | 14:05 | |
14:06
linkable6 joined
|
|||
jnthnwrthngtn | Pushed. Have at it! | 14:06 | |
github.com/rakudo/rakudo/commit/14...e8c5eR2623 is recommended reading on why this strategy holds quite some promise, btw | 14:08 | ||
Assuming the RakuAST-based compiler is better at scope flattening and I get back to working on our EA, this could be a nice win | |||
Assuming the RakuAST-based compiler is better at scope flattening and I get back to working on our EA, this could be a nice win | 14:09 | ||
Nicholas | EA? | 14:16 | |
lizmat | Escape Analysis | ||
Nicholas | aha thanks | ||
jnthnwrthngtn: I hope that your beer fridge is primed | |||
jnthnwrthngtn | It is well stocked at present :) | ||
Nicholas | But you'll be making a well-deserved dent soon? | 14:17 | |
jnthnwrthngtn | A small one at least, yes :) | 14:20 | |
14:24
frost left
|
|||
lizmat | jnthnwrthngtn: so, by definition $a would be deconted, right ? | 14:27 | |
jnthnwrthngtn | jnthn.net/tmp/fridge-status.jpg # current status | ||
lizmat: If there's an Int type constraint, yes, so we could lose some nqp::decont in places too | |||
Nicholas | "bowmore barrel aged" - that's quite specific... | 14:28 | |
lizmat | jnthnwrthngtn: looking at infix:<===>(Enumeration,Enumeration) | ||
jnthnwrthngtn | lizmat: Given Enumeration isn't Iterable, probably works out | 14:29 | |
lizmat | ok, so it's only Iterables I need to worry about | ||
jnthnwrthngtn | Nicholas: I guess, though "whisky barrel aged" would leave open quite a range of possibilities | 14:38 | |
I mean, Jack Daniels and Octomore are both whisky but... :) | 14:39 | ||
dogbert17 mumbles Ardbeg | 14:41 | ||
lizmat | jnthnwrthngtn: working on it, will be away for a few hours, then will continue :-) | 14:44 | |
14:45
patrickb left,
patrickb joined
|
|||
jnthnwrthngtn | Cool :) | 14:45 | |
14:47
codesections joined
14:49
patrickb left,
patrickb joined
14:51
patrickb left,
patrickb joined
14:53
patrickb left,
patrickb joined
14:55
patrickb left,
patrickb joined
|
|||
dogbert17 | There are some very impressive speed gains in new-disp | 14:56 | |
jnthnwrthngtn | dogbert17: You've been measuring? :) | ||
14:57
patrickb left
14:58
patrickb joined
|
|||
dogbert17 | yeah, I have a bunch of smaller programs, problem solving tasks mostly, and many of them show impressive gains | 14:58 | |
jnthnwrthngtn | dogbert17: relative to master, or to earlier new-disp? | ||
dogbert17 | master | ||
one example, not the best, a program takes 19s on master and 12s on new-disp | 14:59 | ||
the mest one is 24s on master and 8s on new-disp | |||
*best | |||
*best | 15:00 | ||
jnthnwrthngtn | whoa | ||
dogbert17 | :) | ||
jnthnwrthngtn | Are they using multi dispatch with `where`, or callsame, or such things? Or more "boring"? | ||
dogbert17 | I would say that they're plenty boring :) | 15:01 | |
jnthnwrthngtn | Well, nice | ||
dogbert17 | Indeed | 15:02 | |
have you tried any cro apps? | |||
jnthnwrthngtn | Not yet. | ||
15:04
patrickb left,
patrickb joined
15:06
patrickb left,
patrickb joined
15:07
evalable6 joined
15:08
patrickb left
15:09
patrickb joined
15:10
patrickb left
15:11
patrickb joined
15:13
patrickb left,
patrickb joined
15:15
patrickb left,
patrickb joined
15:19
patrickb left,
patrickb joined
|
|||
Geth | MoarVM/new-disp-cgoto: e743b1bc43 | (Jimmy Zhuo)++ (committed by Jonathan Worthington) | 2 files Add GCC computed goto for dispatcher. |
15:23 | |
15:23
patrickb left
|
|||
timo | i've been using moarperf under new-disp a little bit and it seems to run fine, i did not compare performance | 15:24 | |
Geth | MoarVM/new-disp: e743b1bc43 | (Jimmy Zhuo)++ (committed by Jonathan Worthington) | 2 files Add GCC computed goto for dispatcher. |
15:31 | |
jnthnwrthngtn | Seems to help a little on the CORE.setting stage parse at least | 15:32 | |
timo | the computed goto support? | 15:38 | |
jnthnwrthngtn | Yes | ||
timo | so a c-level perf record should put the dispatcher_run somewhere near the top | ||
we could also have computed goto for the code that translates disp programs to spesh, not sure if that'd be noticeable tho since the individual pieces of code there do more work than the ops in the run version would | 15:39 | ||
is there a good number to shoot for in this regard? how big do the labelled bodies have to be for the computed goto benefits to be worth the hassle? | 15:40 | ||
jnthnwrthngtn | Given that we run that quite rarely anyway, probably not quite worth it | 15:42 | |
timo | also interesting to see the type of "op" went from MVMDispProgramOp* to MVMDispProgramOp, which means we copy every time around the loop, but don't have pointer deref, do we know how that trades off, exactly? | ||
MVMDispProgramOp is relatively small, thankfully | 15:43 | ||
jnthnwrthngtn | I spotted that and guessed it was measured, but yeah, I'm not totally sure. | 15:44 | |
timo | ok my thinking is: this is equivalent to having all fields from the struct as local variables taken from the structs, yeah? since the struct is smaller than a cache line, we'd be reading all of it anyway. we read all the fields at most once, i think. some ops only have one of the two args from the union, in which case we pay a miniscule cost i think? but also, the compiler would already have turned | 15:51 | |
multiple accesses via the pointer to a single deref up front anyway | |||
15:52
[Coke]_ is now known as [Cke,
[Cke is now known as [Coke]
|
|||
timo | so, um, my assessment is: a shrug + "the compiler is already smart enough to pick whatever is the fastest here anyway" | 15:53 | |
jnthnwrthngtn | Yeah, that's about my figuring too | 15:56 | |
16:05
discord-raku-bot left,
linkable6 left
16:06
discord-raku-bot joined
|
|||
timo | MVMDispProgramRecordingValue can be made 8 bytes smaller by re-ordering sayeth pahole | 16:24 | |
MVMDispProgramRecordingCapture has a 4 byte hole before the *captures pointer member that could also be moved to the end so it becomes a bit smaller | |||
though, if i remember my FSA size bins correctly (i don't) that may not change anything? | 16:25 | ||
MVMDispResumptionData could get arg_source moved to the very end, that would close a 4 byte hole and make it go from 48 to 44 | |||
by swapping the pointer and code number in MVMNFGTrieNodeEntry it can go from 16 to 12 bytes, though if we align to 16 anyway that's obviously useless | 16:29 | ||
MVMSpeshCandidateBody has 3 holes summing up to 11 bytes | 16:31 | ||
MVMSpeshInline has a 1 byte hole and a 4 byte hole | |||
MVMSpeshSimStackFrame has one 4 byte hole, offering a reduction from 88 to 84 bytes | 16:32 | ||
MVMStaticFrameBody has one 1 byte hole and one 7 byte hole, for 232 -> 224 bytes | 16:34 | ||
that's most of the interesting findings | 16:39 | ||
jnthnwrthngtn | Managed to get a bit off startup (with latest NQP push) | 17:07 | |
17:07
linkable6 joined
|
|||
timo | beautiful | 17:12 | |
MasterDuke | (is|was)n't there some problem with `constant` for hashes and some combination of not-moarvm backend and/or not-linux os? | 17:15 | |
jnthnwrthngtn | NQP didn't actually have a `my constant` for hashes until quite recently, when I added them for the sake for rakuast | 17:16 | |
well, `my constant` at all | |||
MasterDuke | ah, ok | ||
jnthnwrthngtn | I don't see why they'd not work off MoarVM, but even if they don't, this is MoarVM-specific code anyway | ||
17:16
brrt joined
17:17
brrt joined
|
|||
MasterDuke | ha, yeah, ignore what i just said | 17:17 | |
timo | impressed to see process_bb_for_deopt_usage very close to the top at 2.88% samples sampled by perf | 17:42 | |
jnthnwrthngtn | Yeah, that's not a terribly cheap routine | 17:45 | |
otoh it runs on the spesh thread | |||
timo | indeed | 17:46 | |
we don't have terribly much during just startup that gets very hot | |||
serializadion_read_ref is above process_bb_for_ and serialization_demand_object is below it | 17:47 | ||
jnthnwrthngtn | We currently allocate enough during Rakudo startup to hit a GC run, and a lot of those are MVMCapture and MVMTracked | ||
(Part of the dispatch setup mechanism) | |||
timo | asm_exc_nmi sits in between, not sure what that encompasses. possibly related to memory management like brk and mprotect?! | 17:48 | |
oh no! we used to not have to do even one run | |||
jnthnwrthngtn | Dunno | ||
I think if we exposed a dispatcher-replace-arg or some such we could avoid some insert/drop dances, or make it possible to drop n things | 17:49 | ||
timo | oh, it's possible that that's from perf's collection frequency timeout | ||
codesections | congrats on all the perf wins. As expected, though, new-disp does pay for it with a slower warm up; I'm measuring about a 55% penalty for a plain "Hello, world!" script (106.9±1.5ms versus 165.3±1.2ms) | ||
timo | i set it to -F max for this | ||
jnthnwrthngtn | codesections: Yes; don't know if that includes the NQP commit I pushed or not, though at most it's worth 5% or so | 17:50 | |
timo | the lookup structure / cache for callsite transformations, that isn't going to help with GC pressure at all i imagine | ||
i mean, it's still to be written, but wouldn't help that particular metric | 17:51 | ||
jnthnwrthngtn | timo: No, but being able to drop multiple args or replace an arg might help us | ||
timo | is that an additional syscall that we'd implement and use in our dispatchers? | 17:52 | |
jnthnwrthngtn | timo: In that it gets rid of an intermediate capture | ||
Yes | |||
timo | OK. i think i can build that | ||
jnthnwrthngtn | ++timo | ||
timo | replace-arg is for "in-place" putting a value (and its primspec / flag) in a spot in the capture and callsite? | 17:53 | |
jnthnwrthngtn | The nqp-meth-call does a multi-drop like this: | ||
my $args := nqp::dispatch('boot-syscall', 'dispatcher-drop-arg', | |||
nqp::dispatch('boot-syscall', 'dispatcher-drop-arg', $capture, 0), | |||
0); | |||
Yeah, though we could even restrict it to something of the same primspec | |||
Then we know the callsite is already fine | 17:54 | ||
When I did --profile-compile -e '' then the nqp-meth-call dispatcher was one of the highest CPU users. | |||
It also does this: | 17:55 | ||
nqp::dispatch('boot-syscall', 'dispatcher-guard-type', | |||
nqp::dispatch('boot-syscall', 'dispatcher-track-arg', $capture, 0)); | |||
nqp::dispatch('boot-syscall', 'dispatcher-guard-literal', | |||
nqp::dispatch('boot-syscall', 'dispatcher-track-arg', $capture, 1)); | |||
codesections | Oh, I thought that was an up-to-date build, but it looks like it didn't include the nqp fix; I'll re-run (though I know it'll be a minor effect – just curious, though) | ||
jnthnwrthngtn | But there's no reason we can't expose a dispatcher-guard-arg-type that takes an index | ||
That's 2 less allocations every single time too | 17:56 | ||
codesections: Probably minor, about 5ms for me | |||
timo | MVM_disp_program_record_capture_drop_arg as well as MVM_capture_drop_arg will both want a sibling for n args, if i'm not mistaken? | ||
jnthnwrthngtn | timo: Yes, I think so | 17:57 | |
timo | optionally change both to take an n argument, but at least the MVM_capture_ one is kind of public API | ||
jnthnwrthngtn | timo: I think that the capture transform tree doesn't need to model this though | ||
Kind of public? Don't think so, this is all new code in new-disp :) | 17:58 | ||
timo | capture transform tree is our supposed transformation cache thing? | ||
jnthnwrthngtn | No, it's part of the dispatch program recording | ||
The insert/drop structure | |||
I don't think we need to actually store an MVMCapture object at every level of it, however | |||
(we do today because there's no multiple drops) | 17:59 | ||
codesections | yep, about a 5ms speedup for me too. Definitely something | ||
jnthnwrthngtn | Might need to harden it up a bit to cope with the new possibility of a null there | ||
I should go home to sort out dinner. bbl o/ | |||
timo | ok, there'd be one entry per dropped arg, but only the last one actually has a Capture to go with it | 18:00 | |
jnthnwrthngtn | timo: Yes, exactly that | ||
o/ | |||
18:02
reportable6 left
|
|||
bartolin_ | MasterDuke: just for the record: There was indeed a problem with using 'constant' for hashes on the JVM backend. But it should work nowadays: github.com/Raku/nqp/pull/717 | 18:09 | |
MasterDuke | i'm not crazy - cool. it's no longer a concern - even cooler. bartolin_++ | 18:10 | |
lizmat | hmmm... can .isNaN ever be true on an Int ? | 18:11 | |
m: dd Int.isNaN | |||
camelia | Bool::False | ||
timo | yeah we don't hold NaN in Int i think | 18:12 | |
lizmat | right... | ||
this one strikes me as odd: multi sub prefix:<~>(Str:D $a --> Str:D) { $a.Str } | 18:16 | ||
why coerce to Str when it is already a Str | |||
? | |||
timo | could be a subtype of Str | ||
dunno, really | 18:17 | ||
lizmat | well, but Str.Str is self | 18:19 | |
last I checked | |||
multi method Str(Str:D:) { self } # indeed | |||
jnthnwrthngtn: ^^ wonder how much special casing a method just returning self would bring | 18:21 | ||
MasterDuke | github.com/rakudo/rakudo/commit/e0...d6474be0d4 | 18:22 | |
lizmat | interesting | 18:23 | |
MasterDuke++ # research :-) | 18:25 | ||
MasterDuke | heh and then github.com/rakudo/rakudo/commit/66...a017bae876 | ||
lizmat | well, I think it's ok for a subclass of Str to require it returning a subclass of Str on .Str | 18:26 | |
timo | i must admit i don't really grasp how the CapturePath and its updating work exactly | 18:27 | |
MasterDuke | yeah, removing the return constraint seems a little overly cautious | ||
lizmat | ok, with preliminary adaptations, I got test-t down to 1.278 | 18:39 | |
was 1.329 | 18:40 | ||
note, this is still without the computed goto | |||
this is encouraging :-) | |||
timo | "Can only use manipulate a capture known in this dispatch" wonder what the exact typo here is | 18:46 | |
i guess an "or" between use and manipulate would be safe | |||
nine | indeed | 18:47 | |
timo | oh hey nine have you looked at the capture manipulation stuff at all? | 18:48 | |
nine | a bit | 18:52 | |
fixing segfaults n'stuff | |||
timo | haha, gen-cat fails under "make" but not when i run the same commandline myself | 18:57 | |
19:05
reportable6 joined,
brrt left
|
|||
MasterDuke | if a ctx->arg_info.callsite has arg_count of 1, why would it's arg_names be 0x0? | 19:13 | |
timo | no named args in the callsite? | ||
MasterDuke | right, they're all positional. but we don't store the names of the positional args? | 19:14 | |
timo | positional args don't have names | 19:17 | |
MasterDuke | huh. i had no idea | 19:19 | |
timo | if they did have names they wuld be named rather than positional | ||
MasterDuke | well, obviously at the nqp/rakudo level they have names, those can't be accessed by moarvm at all? | 19:22 | |
timo | oh | 19:23 | |
that'd be a level higher, you'd have to get and introspect the Signature object for that i think | |||
what is it you're doing? | |||
MasterDuke | colabti.org/irclogger/irclogger_lo...09-22#l231 | 19:25 | |
the error comes from github.com/MoarVM/MoarVM/blob/mast...rgs.c#L129 | 19:26 | ||
so i was playing around with passing the ctx in and adding the arg_names to the message, but as you pointed out, args_names won't work | 19:27 | ||
nine | Ah, yes, there's just no way to do that in MoarVM | ||
timo | yeah, something "outside" of that will have to deal with the error | ||
nine | A great example of the downside of these adhoc string exceptions | 19:28 | |
timo | checkarity is an op that is put into bytecode somewhere, and it'll throw that exception | ||
well, we could have hll symbols that can be invoked to throw structured exceptions right? | |||
it'd be more expensive to throw those than to just throw an adhoc | 19:29 | ||
MasterDuke | yeah, i took me a while to even find where it was coming from, i kept looking through the rakudo exceptions at first, but the message didn't exactly match any of them | ||
well, it's a failure path, i'd be less concerned about it being slightly slower if we got better errors in result | 19:30 | ||
timo | i want to rubberduck a little | 19:34 | |
Nicholas | Go for it! | ||
timo | i'm getting an error that a capture is passed to some op where the capture has not been properly registered by whatever op derived it from another one | 19:36 | |
what i changed in my branch is that there's now an extra op that drops multiple arguments from a capture in one go | |||
but we still put multiple "dropped an arg" entries in the recording | 19:37 | ||
just because we don't want to deal with adding that everywhere? probably | |||
so the records in between don't get to have an actual Capture object, that's the saving we're looking for | 19:38 | ||
the derivation path thingie works like this: | |||
starting at the capture the dispatcher was initially invoked with (or alternatively the last one seen in a resumption op) | 19:39 | ||
there will be an entry for every capture that was derived from the given one | 19:40 | ||
we do a depth-first-search through the tree that is created from that | |||
this is where we're not finding the Capture object | 19:41 | ||
ah, maybe i see the problem now | 19:42 | ||
oh i think it's now working? | 19:46 | ||
aha. until the core setting comes up | 19:47 | ||
so, what i had to change to make it work until then, was that i can only compute this path once, then have to handle pushing newly made records to the end of the path, since otherwise finding the right node depends on the Capture object, which we're skipping here | 19:51 | ||
some of these capture transformation chains confuse me | 20:00 | ||
one chain is -> drop arg 0 -> drop arg 0 -> insert a val at 0 -> drop arg 0 -> insert val at 0 -> drop argument 0 -> use that | 20:02 | ||
like, shouldn't the "insert a val" ones hang off of the same previous entry? | |||
Nicholas | (I'm not doing very well at nodding or saying "yes") | ||
and I don't know the answer to your question | 20:03 | ||
timo | hehe. | 20:04 | |
i'm adding more debug output | 20:05 | ||
fprintf(stderr, "%sUnknown transforamtion\n", indent_str); | |||
loving this typo | |||
i think i'm dum | 20:18 | ||
jnthnwrthngtn | lizmat: "with preliminary adaptations" - to Rakudo or to test-t itself? | 20:19 | |
lizmat | that's the 11 commits I did about an hour ago | ||
nowhere complete yet, but it covered a lot of Str operators | 20:20 | ||
especially Rational is a lot of work :-) | |||
running a spectest for that now | |||
timo | ah jnthnwrthngtn, how do you debug issues with capture transform chain and such? | 20:23 | |
and why do these chains look so suspicious to me? adding arguments at 0, then dropping arguments at 0 | |||
jnthnwrthngtn | lizmat: ah, cool, just pulled a bunch of those :) | 20:25 | |
timo: DUMP_RECORDINGS=1 or so; if you've got things right, then your "drop 2 at once" should produce exactly the same chain output as dropping one and then the other | 20:27 | ||
So if 2 things are dropped you'd need to add 2 entries, they'd also both have the same index I guess, because if you drop two things starting at index 1, it's like you drop at index 1, and then drop again at index 1 | 20:28 | ||
Ah, the DUMP_RECORDINGS thing is a #define | |||
At the top of program.c | |||
timo | i'm doing that already | 20:31 | |
comparing with/without is an idea | |||
jnthnwrthngtn | bbi30 | 20:35 | |
20:37
linkable6 left,
evalable6 left
20:39
linkable6 joined
|
|||
timo | i'm getting the feeling the record that's b0rking isn't being dumped before the exception happens, maybe i need to do something there as well | 20:40 | |
wow i'm nt smart | 20:46 | ||
MasterDuke | well, nt was much better than win98 | 20:47 | |
timo | ah, i'm not actually sure what exactly to put into emit_args_ops when the Capture is null | 20:58 | |
it wants to use the MVMCallsite at that step | |||
i wonder if i can just skip in that case?! | 20:59 | ||
oh look it works nao | 21:00 | ||
the problem was perhaps that i was accidentally reusing the same MVMDispProgramRecordingCapture because it was on the stack in the loop, so after i pulled it out of the MVM_VECTOR where it was copied to, that made things improve a lot | 21:03 | ||
otherwise the capture would end up in multiple nodes in the capture transformation tree i imagine | |||
just skipping over the entry in the path is probably wrong, in order to calculate the untouched tail | 21:11 | ||
since at the moment we only ever have null Capture in a path when we're dropping multiple arguments in a row, perhaps it's easy-ish to handle by counting the nulls before a capture and working from there | 21:17 | ||
also the same index a couple times in a row | 21:19 | ||
MoarVM oops: Impossible untouched arg tail length calculated in dispatch program | 21:25 | ||
oops indeed | |||
the code might be correct now | 21:28 | ||
21:29
discord-raku-bot left,
discord-raku-bot joined
|
|||
lizmat | jnthnwrthngtn: did another 8 files that have infix:< in them, getting too tired now | 21:29 | |
still 34 files to go... but that will be tomorrow | |||
sena_kun | lizmat++ | 21:30 | |
tellable6 | 2021-09-21T13:09:06Z #moarvm <jnthnwrthngtn> sena_kun For Test::Base I've opened github.com/tokuhirom/p6-Test-Base/pull/2 | ||
lizmat | test-t at 1.275 now | ||
lizmat calls it a night | |||
MasterDuke | nice | 21:31 | |
jnthnwrthngtn | lizmat: Cool, that opt seems to be quite nice once we tweak our operators to take advantage :) | 21:34 | |
MasterDuke | just got a fail in t/02-rakudo/15-gh_1202.t | 21:35 | |
but i think that's a known flapper | |||
yep, passed on a re-run | |||
jnthnwrthngtn | Hm, curious, I've not seen that one flap | 21:36 | |
MasterDuke | it is mentioned in github.com/rakudo/rakudo/issues/4212 | ||
jnthnwrthngtn | ah, so pre-dates new-disp | 21:37 | |
MasterDuke | yep | ||
Geth | MoarVM/disp_drop_multiple_args_at_once: 45d334b8e6 | (Timo Paulssen)++ | 7 files add dispatcher-drop-n-args to optimize allocations Instead of creating a MVMCapture and MVMCallsite for each step of removing arguments, we now offer a syscall that drops multiple arguments that live at the same index in one go. The result is that the transformations tree can now contain null entries for the capture entry, which we have to interpret and deal with. |
21:38 | |
MasterDuke | does anyone else have comparison numbers for make m-test? it was just 19s for me, but if i remember correctly (might not be) it's 15s on master | ||
jnthnwrthngtn | MasterDuke: 19s for first run, closer to 16s or so for second | ||
Geth | MoarVM: timo++ created pull request #1549: add dispatcher-drop-n-args to optimize allocations |
21:39 | |
MasterDuke | hm, and spectest was 178, i think master is ~135 | ||
21:39
evalable6 joined
|
|||
MasterDuke | got 19s for second run of m-test also | 21:40 | |
timo | jnthnwrthngtn: have two pull requests for you to look over if you feel like it :3 | ||
MasterDuke | second m-spectest was 171 though | 21:42 | |
timo | i can't see a difference in maxrss | 21:52 | |
jnthnwrthngtn | timo: I was more expecting less allocations than diffrent maxrss | 21:54 | |
Will probably review them in the morning, 'cus I'm feeling rather sleepy | |||
timo | OK, i don't have a one-liner to count that right away | ||
having to recompile rakudo is also not the funnest :) | |||
MasterDuke | timo: startup any faster? | ||
timo | don't notice a difference i don't think | 21:55 | |
jnthnwrthngtn | timo: Did you patch the dispatcher in NQP also? That'd be the bigger win at startup | 21:56 | |
timo | ooh only in rakudo | 21:57 | |
MasterDuke | there's a `throw_adhoc` macro in emit.dasc, but it assumes the message doesn't have any arguments. any reason not to create some `throw_adhoc(1|2|3)` macros that do expect the message to have arguments? assuming they just be put in ARG(3|4|5)? | 22:00 | |
Geth | MoarVM/disp_drop_multiple_args_at_once: 814085079d | (Timo Paulssen)++ | 7 files add dispatcher-drop-n-args to optimize allocations Instead of creating a MVMCapture and MVMCallsite for each step of removing arguments, we now offer a syscall that drops multiple arguments that live at the same index in one go. The result is that the transformations tree can now contain null entries for the capture entry, which we have to interpret and deal with. |
22:01 | |
timo | forgot a "return". wonder how it still worked tbh | ||
maybe thanks to optimization the return value remained in the right register or so | |||
22:05
sena_kun left
|
|||
timo | ok could be around maybe a third of a meg less maxrss? | 22:07 | |
i'll have to take a few more measurements of just new-disp rakudo | 22:08 | ||
jnthnwrthngtn | raku --profile-compiled -e '' should hopefully show less allocations | 22:10 | |
timo | right | ||
i'll give that a try next | 22:11 | ||
not going to do that for core setting compilation for now i don't think :) :) | 22:13 | ||
any reason to use malloc over FSA for new_callsite stuff in core/callsite.c? | 22:18 | ||
oh yeah | 22:23 | ||
-e '' goes from 10392 MVMCapture to 8556 | 22:24 | ||
MasterDuke | nice. still does a gc run? | ||
timo | BOOTCapture* | ||
-e 'say("hello world") goes from 13085 BOOTCapture to 10741 | 22:25 | ||
-e '' doesn't GC on new-disp nor on my branch for me | |||
-e 'say("hello world")' does one gc run on both | |||
MasterDuke | ah | 22:26 | |
jnthnwrthngtn | timo: No reason, I suspect it pre-dates the FSA :) | ||
timo | 757KB retained down to 754KB - wonder if there's anything really interesting in there | ||
but this is a single profile run, not sure how noisy it is | 22:27 | ||
i'll quickly look into fsa for callsite flags and nameds | 22:34 | ||
could also do it for the callsite itself | |||
MasterDuke | oh, i guess the problem with the multiple argument throw_adhoc macro is that the arguments could be any type... | 22:42 | |
Geth | MoarVM/new-disp: df6d3c945a | (Jonathan Worthington)++ | src/spesh/osr.c Log more info about OSR when it doesn't work out |
22:54 | |
MoarVM/new-disp: 366cd2252a | (Jonathan Worthington)++ | src/spesh/osr.c Log more info about OSR when it doesn't work out |
22:56 | ||
MasterDuke | but if the cases i actually care about are just strings, then `mov ARG3, throw_arg_1` should be fine, right? | ||
jnthnwrthngtn | MasterDuke: I don't know, one'd have to look up the ABI for varargs (and then make sure it's the same for POSIX and Windows). | 22:58 | |
MasterDuke | seems like it'd make sense to PR any changes against master, so CI can actually test it | 23:00 | |
jnthnwrthngtn | I'm sure I've seen folks use NativeCall to call varargs things by just declaring functions that pass extra args which implies that the ABI is the same in...their situation. | ||
The answer may be "it's way unportable in general but OK for x64" | |||
MasterDuke | yeah, my rough googling suggests it's cleaner for x64 than it was for x86 | 23:01 | |
jnthnwrthngtn | docs.microsoft.com/en-us/cpp/build...60#varargs | ||
MasterDuke | and i may get away with it for a small number of args | ||
jnthnwrthngtn | That makes it look like you get away with it on Windows for x64 | 23:02 | |
MasterDuke | yeah, was looking at that earlier. glad to know my interpretation seems to line up | ||
Geth | MoarVM/put_callsites_in_fsa: 2c3a99703e | (Timo Paulssen)++ | 5 files Store callsites and its arrays in the FSA |
23:16 | |
timo | huh. Ir went from 715m to 850m from without to with callsites in fsa | 23:24 | |
i didn't set spesh to blocking or anything tho, maybe i should've done that. and hash randomization off as well | |||
that at least keeps the number stable while running the same branch | 23:26 | ||
oh, no, i got it backwards apparently | |||
MasterDuke | it would be nice to be able to turn hash randomization off via env variable | 23:27 | |
timo | ok it goes from like 889.2 mil to 887.7 mill? | 23:30 | |
MasterDuke | 2m fewer, that's good | 23:31 | |
timo | m: say 889.2 * 100 / 887.7 | 23:32 | |
camelia | 100.168976 | ||
timo | m: say 889.2 R/ 887.7 * 100 | ||
camelia | 99.831309 | ||
MasterDuke | what if you make the size of the fsa bins bigger? i had a pr that made just the first page bigger, but istr discussion that make they should just all be bigger as we increased the use of the fsa | 23:35 | |
timo | what number would i get for master tho? | 23:36 | |
i don't know in how far that would help things | 23:37 | ||
MasterDuke | what do you mean what number for master? | 23:39 | |
timo | Ir for -e '' | ||
MasterDuke | '822,514,653 instructions' from `MVM_SPESH_BLOCKING=1 perf stat ./install/bin/raku -e ''` on master | 23:40 | |
timo | well, that's not great | ||
MasterDuke | but we already know master is faster at startup, right? | 23:41 | |
i wonder how comparable the numbers are across possible cpu/os/compiler differences? | 23:42 | ||
timo | right | ||
interestingly, the difference between amount of calls to fixed_size_alloc seems very small | |||
oh, i looked at the wrong comparison file | 23:43 | ||
77k to 111k | |||
MasterDuke | interesting, callgrind reports 772m | 23:45 | |
and cachegrind reports 773m | |||
perf stat numbers are much more variable, as low as 799m and as high as 827 | 23:46 | ||
but i was only using MVM_SPESH_BLOCKING=1, i hadn't disabled hash randomization | |||
timo | 18k calls from callsite_drop_positional, 365.3k calls to _int_malloc (which both malloc and calloc call) in one, 396.8k in the other | 23:47 | |
̵ʼʼʼʼʼʼʼʼʼʼʼ | 23:51 | ||
MasterDuke | is that a raku interpreter in befunge? | 23:53 | |
timo | maybe we should invent hq9+r with the "r" command that starts a rakudo | 23:54 | |
23:59
evalable6 left,
linkable6 left
|
|||
MasterDuke is to bed | 23:59 |