github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
01:33
evalable6 left,
linkable6 left
01:36
linkable6 joined,
evalable6 joined
02:16
squashable6 left
02:18
squashable6 joined
04:01
MasterDuke left
|
|||
nwc10 | good *, #moarvm | 06:00 | |
06:48
MasterDuke joined
07:41
colomon__ joined,
colomon_ left
08:19
sena_kun joined
|
|||
jnthn | morning o/ | 08:54 | |
tellable6 | 2020-09-01T14:04:13Z #raku <[Coke]> jnthn: can you add your employer to the RSC nomination doc? | ||
nwc10 | \o | 09:02 | |
jnthn | Uff, doing a rebase of new-disp onto master isn't quite easy... | 09:34 | |
nwc10 | my fault? :-( | 09:39 | |
jnthn | Yeah, though it's maybe not quite so bad as I feared | 09:41 | |
Coffee is starting to work... :) | |||
timotimo excited to hear about new-disp getting back onto jnthn's desk | |||
nwc10 | I *had* thought that ASAN was excited by new-disp, but I can't recraete it currently | 09:42 | |
09:45
Altai-man joined
09:47
sena_kun left
|
|||
jnthn | src/disp/syscall.h:36:5: error: unknown type name āUT_hash_handleā UT_hash_handle hash_handle; | 09:51 | |
Yeah, I introduced a new hash... | |||
nwc10 | oh righto. mmm. | 09:52 | |
timotimo | shouldn't be harder to use the new hash kind | 09:53 | |
jnthn | So now I guess I need to figure out how to adapt this. | ||
timotimo | but is it enough to change some names? | ||
like, type names and function names? | |||
jnthn | Dunno. | 09:54 | |
timotimo | nwc10 would know :) | ||
jnthn | nwc10: It's a hash keyed on MVMString* (though can be on char* too) to a C structure; which is the appropriate kind of hash? | 09:55 | |
nwc10 | MVM_fixkey_hash with `name` as the key? | ||
at a guess | |||
because that's the one where the body doesn't move once you've created one | |||
jnthn | No, in fact the bodys are `static` data :) | ||
So it's not even allocated | |||
nwc10 | hangon, I'm a bit struggling with too many things here. $ork exists | 09:56 | |
the cheating suggestion is | |||
timotimo | $ork and the alien mindbenders | ||
nwc10 | look at many of the things like the dll registery and the extops registry and the nativecall stuff | ||
and see which one is closest to the semantics needed | 09:57 | ||
and cheat from there | |||
and it looks like one of the others, but I forget which | |||
[grepping] | |||
MVMContainerRegistry | 10:00 | ||
commit 040e9d6358cd65dfafb6e5403148c5f72c359e63 | |||
linkable6 | (2020-05-31) github.com/MoarVM/MoarVM/commit/040e9d6358 Implement MVMStrHash, which is UT Hash rearranged to be "right way out". | ||
nwc10 | IIRC that was the one with 3 static structures defined in the C code, and then a hash lookup needed for them. | 10:01 | |
jnthn tries to grok fixkey vs str | 10:07 | ||
nwc10 | fixkey indirects | ||
it keeps the allocated "thing" in a block of memory that doesn't move | |||
jnthn | ah, ok | ||
nwc10 | regular str inlines the allocated "thing" in the top level (ie only) array of hash entries | 10:08 | |
so it might move (on insert or delete of other keys) | 10:09 | ||
jnthn | I'm not allocating anything here really; it's all static memory | ||
nwc10 | yes, so a small C structure, which has a pointer to the static memory | ||
jnthn | ah, I need to introduce another struct. OK | ||
nwc10 | might be the right thing (in a "str" hash) | ||
jnthn | Yeah, that's what container configurer does. Was trying to see if I could avoid that :) | ||
nwc10 | but for the "str" hash you can't perma-root the MVMString * keys | ||
(because they might move) | |||
whereas a fixkey hash, because of the indirection | 10:10 | ||
the key can be in the permanent root set | |||
jnthn | ah, my key is in the permanent root set | 10:11 | |
nwc10 | then that means either: fixkey hash | 10:12 | |
or "take it out of the permanent roots set, str hash, and adding code to root.c" | |||
the former seems easier. | |||
jnthn | OK, back to fixkey hash then :) | ||
nwc10 | when I was making those sort of choices, I was guessing also that it was slightly more performant. trade off being "indirection" vs "an iterator loop for every GC run" | 10:13 | |
MasterDuke | i'm not paying 100% attention here, but would this not-moving thing be useful to reinstate that spesh optimization that had to be tossed? the sp_gethashentryvalue op | 10:16 | |
nwc10 | likely not, because that's for the MVMHash | ||
which is a 2 pointer structure | |||
and you'd have to replace it with a 1 pointer indirecting to a 2 pointer struct | 10:17 | ||
so 50% larger, 1 more pointer chase, for *every* NQP-visible hash | |||
If the NQP implementation exposed two different hashes to the NQP language level | 10:18 | ||
one where "samller, but things move" and "slightly larger, but they don't" | |||
jnthn | Hm, but given I've static memory to point at, and fixkey hash seems to want to allocate for me anyway, so I'll need an indirection struct whatever I do... | 10:19 | |
nwc10 | then the relevant code could choose to use the "slightly larger" hash for whatever-it-is that matters | ||
jnthn: yes, sometimes you can't win :-( | |||
jnthn | ...then it may be more efficient to be using the strhash, and just having it marked | ||
Since iiuc the fixkey hash will make it 2 levels of indirection | |||
nwc10 | it would | ||
jnthn | OK, then strhash | 10:20 | |
nwc10 | but | ||
as I sort of am guessing it here | |||
if your "static" structure is `struct MVMDispSysCall ~ | |||
then I think you ned up needing | |||
struct MVMDispSysCall { | 10:21 | ||
struct MVMStrHashHandle hash_handle; | |||
... all the other stuff | |||
}; | |||
where "all the other stuff" is all the current members, *except* `name` and `hash_handle`; | 10:22 | ||
because the new hash_handle struct contains the key, which is the name. | |||
MasterDuke | too bad, definite slowdown in accessing dyn vars compared to 2020.07 | 10:39 | |
jnthn | I've got that bit building again and am through the rebase, but the end result doesn't build 'cus it iterated a hash somewhere that has now gone away | 10:40 | |
ah, I just need to find the new way to iterate an MVMHash in code :) | 10:45 | ||
uh, from C code | |||
MasterDuke | committable6: 8fd029a3025307f8ccac11adeed77d170408b5a8~1,8fd029a3025307f8ccac11adeed77d170408b5a8 my $a; $a := $*OUT for ^5_000_000; say now - INIT now | 10:48 | |
committable6 | MasterDuke, Ā¦8fd029a3025307f8ccac11adeed77d170408b5a8~1: Ā«6.3943207ā¤Ā» Ā¦8fd029a: Ā«7.86781435ā¤Ā» | 10:49 | |
Geth | MoarVM/new-disp: 156 commits pushed by (Jonathan Worthington)++, (Timo Paulssen)++, (Daniel Green)++ review: github.com/MoarVM/MoarVM/compare/f...5fbb377c08 |
10:51 | |
jnthn | Well, it builds NQP and passes the NQP tests on the new-disp branch, so it can't be *that* broken | 10:52 | |
timotimo | whew, already 156 commits | 10:53 | |
jnthn | I didn't go back and rework all the commits in the branch to actually build, alas | ||
Just have the fixup commits at the end | 10:54 | ||
timotimo | MasterDuke: looks like finding/accessing dynamic variables got a bit faster? | ||
does that replicate on a local machine that's not doing anything else? | |||
MasterDuke | no the ~1 is before the nqp bump that brought in the moarvm with the new hash | 10:55 | |
timotimo | ah dang | ||
MasterDuke | i haven't tried to replicate locally. but a couple different run with committable6 show relatively consistent results | ||
jnthn | o MoarVM syscall with name 'dispatcher-delegate' at gen/moar/BOOTSTRAP/v6c.nqp:4204 (/home/jnthn/dev/rakudo/blib/Perl6/BOOTSTRAP/v6c.moarvm:) | 10:56 | |
oh, right, I probably need to make the str keys in that strhash :) | |||
*mark | |||
MasterDuke | linkable6: 8fd029a | ||
Geth: 8fd029a | |||
timotimo | shareable6: 8fd029a | 10:57 | |
shareable6 | timotimo, whateverable.6lang.org/8fd029a | ||
timotimo | this one? | ||
MasterDuke | i was just trying to get a link to the commit in the rakudo repo | ||
timotimo | oh | ||
linkable6: rakudo/8fd029a | 10:58 | ||
linkable6: help | |||
linkable6 | timotimo, Like this: R#1946 D#1234 MOAR#768 NQP#509 SPEC#242 RT#126800 S09:320 524f98cdc # See wiki for more examples: github.com/Raku/whateverable/wiki/Linkable | ||
timotimo | ok it's supposed to be able to do commits, but perhaps that one was too short? | ||
MasterDuke | 8fd029a3025307f8ccac11adeed77d170408b5a8 | ||
linkable6 | (2020-08-31) github.com/rakudo/rakudo/commit/8fd029a302 Bump NQP to get nwc10++ improved hash implementation | ||
Geth | MoarVM/new-disp: b2656d7011 | (Jonathan Worthington)++ | src/gc/roots.c Mark dispatcher syscall hash |
10:59 | |
MasterDuke | hm. maybe committable6, etc need to give longer hashes | ||
timotimo | actually would hope that committable and friends wouldn't always trigger linkable | 11:00 | |
which of course it could just filter by nickname | |||
jnthn | Yup, now Rakudo's rebased new-disp branch builds | ||
Pushed rebases of all the new-disp | 11:01 | ||
timotimo | great | ||
jnthn | Which means...lunch time :) bbl | 11:02 | |
timotimo | you would really expect the dynvar cache to give you performance unrelated to hash access? or is it hash-based too? | 11:07 | |
there's a dynvar cache performance analysis thing in moarvm plus a script to actually analyze the data it spits out | 11:08 | ||
OK, MVM_get_lexical_by_name is the top result and it does have MVM_index_hash_fetch | 11:13 | ||
but it could be that for this code it's actually not going via the hash route | 11:14 | ||
there's a "if there's not enough lexical names, just linear-scan them" in there, too | |||
and MVM_string_equal takes a lot of time there | |||
oh, it's currently looking for $*PROMISE while inside &DYNAMIC | 11:24 | ||
nine | Yes, IIRC that's where the sp_gethashentryvalue optimization helped | 11:25 | |
timotimo | who wants to measure if giving root temp push checking for going past the allocated number of temp roots an MVM_LIKELY or MVM_UNLIKELY makes any performance difference at all? | 11:33 | |
quite possible that gcc / clang already get it the right way around already | |||
(gdb) print name[0].body.storage.strands[0] | 11:38 | ||
$40 = {blob_string = 0x34cd7f0, start = 13, end = 14, repetitions = 0} | |||
(gdb) print name[0].body.storage.strands[1] | |||
$41 = {blob_string = 0x34cd7f0, start = 14, end = 15, repetitions = 0} | |||
(gdb) print name[0].body.storage.strands[2] | |||
$42 = {blob_string = 0x34cd7f0, start = 15, end = 18, repetitions = 0} | |||
pfft. | |||
this is $*OUT btw | |||
i thought we had an optimization where concatenating a string to another when the parts fit it will just extend the last strand | 11:39 | ||
11:46
sena_kun joined
11:48
Altai-man left
11:50
domidumont joined
11:51
domidumont left
|
|||
nwc10 | jnthn: to avoid an assert() fail you need: | 11:56 | |
-#define MVM_REPR_CORE_COUNT 48 | |||
+#define MVM_REPR_CORE_COUNT 47 | |||
NQP happy, Rakudo has failures, those that I looked at were "Use of Nil in string contex" | 12:00 | ||
er, context. | |||
jnthn | Hm, wonder how that didn't end up conflicting somewhere along the way | 12:15 | |
Hm, a bunch of new SIGSEGV in spectest | 12:16 | ||
nwc10 | I didn't reach the spectest | ||
jnthn | Oh? | 12:17 | |
ah, debug build maybe | |||
Geth | MoarVM/new-disp: e3e771400a | (Jonathan Worthington)++ | src/6model/reprs.h Correct REPR count; nwc10++ |
||
12:21
krunen left
12:22
krunen joined
|
|||
jnthn | grmbl, none of these feel inclined to SEGV when not being run as part of make spectest | 12:23 | |
nwc10 | I know exactly this feeling | 12:25 | |
timings? | |||
or worse | |||
I have this problem quite often | |||
jnthn | Dunno, the ones I'm looking at should, in theory, not be timing sensitive. All running with spesh blocking turned on | 12:32 | |
timotimo | perhaps rr's "chaos mode" could help tease out the problematic execution path | 12:47 | |
otherwise, rr can record multi-process-containing things, too | 12:50 | ||
12:52
brrt joined
13:06
domidumont joined
|
|||
nwc10 | I tried rr but it doesn't like all the AMD CPUs I seem to have everywhere. | 13:07 | |
brrt | \o | 13:10 | |
what's the problem that requires rr | 13:11 | ||
13:11
krunen left
13:13
domidumont left
|
|||
nwc10 | o/ | 13:13 | |
13:16 < jnthn> Hm, a bunch of new SIGSEGV in spectest | |||
13:23 < jnthn> grmbl, none of these feel inclined to SEGV when not being run as part of make spectest | |||
13:25 < nwc10> I know exactly this feeling | |||
13:15
squashable6 left,
domidumont joined
13:16
squashable6 joined
|
|||
brrt | ah, I see | 13:17 | |
yeah, that's hateful | |||
timotimo | rr uses a performance counter from the cpu that is apparently not reliable/exact enough on some AMD CPUs | 13:19 | |
[Coke] | jnthn: ignore that employer request. (sent the initial ask directly to a few folks, followup based on convo with lizmat to disregard was just tagging several people) | 13:21 | |
nwc10 | timotimo: that part I didn't know. I didn't spot any "why" as to why AMD wasn't supported. I had just assumed "we're all using Intel; patches weclome" | ||
jnthn | [Coke]: I already added the info anyway | ||
[Coke] | jnthn++ | ||
I think everyone knew yours already. | 13:22 | ||
jnthn | :D | ||
Goodness, getting back into the dispatch stuff is some effort. :) | |||
Probably because I left it at the point I had to solve a hard problem. :) | |||
timotimo | i got that from a ticket tracking the issue | 13:24 | |
github.com/mozilla/rr/issues/2034 | |||
nwc10 | aha, I didn't go looking into tickets | 13:26 | |
13:26
krunen joined
|
|||
jnthn | greppable6: &callsame | 13:46 | |
greppable6 | jnthn, 2 lines, 1 module: gist.github.com/3f61b735be32fb4926...e714f3564e | ||
14:10
domidumont left
14:11
domidumont joined
14:13
domidumont left,
domidumont joined
|
|||
MasterDuke | hm. where was i with that zen slice vs whatever slice investigation? | 14:13 | |
i think i initially got sidetracked by the oddness with char ranges, but that was a red herring | 14:14 | ||
timotimo | that was a startling performance difference, right? | 14:17 | |
MasterDuke | m: my @a = (^2_000).pick(*); my @b; @b = @a[] for ^1_000; say @b[(^@a).pick]; say @b[0]; say now - INIT now | ||
camelia | 1836 1606 1.88939135 |
||
MasterDuke | m: my @a = (^2_000).pick(*); my @b; @b = @a[*] for ^1_000; say @b[(^@a).pick]; say @b[0]; say now - INIT now | ||
camelia | 421 1632 4.4834577 |
||
MasterDuke | the whatever slice ends up in a cycle involving N calls to reify-at-least, that then each create a cycle involving two or three other calls and then back in reify-at-least | 14:21 | |
timotimo | did i ever compare them with callgrind? | ||
MasterDuke | i don't recall seeing any numbers | ||
nwc10 | I/O, I/O, it's off to async purgatory we go. | ||
I'm still not convinced that async is the future | |||
timotimo | i hope it isn't a case of the profiler going wild | 14:22 | |
nwc10 | and the failure mode seems to be purgatory. Not hell. | ||
MasterDuke | but i'm not profiling and you can see the difference above | ||
nwc10 | you know where you are with hell. | ||
timotimo | right, i meant more with that cycle you're refering to | ||
MasterDuke | and no, i added a print in reify-at-least and it goes crazy | ||
timotimo | OK | 14:23 | |
MasterDuke | the numbers match up with what the profiler showed | ||
with the array being just 100 elements, callgrind says 3.5B instructions for whatever, 1.3B instructions for zen | 14:27 | ||
doubling it to 200 elements give 6.1B for whatever, 1.9B for zen | 14:32 | ||
14:34
brrt left
|
|||
timotimo | it spends a whole lotta time "resolve_using_guards"ing | 14:40 | |
so my guess is perhaps in the one case it succeeds in inlining something very hot, in the other it doesn't | |||
MasterDuke | there are a ton of deopts | ||
pretty much one for every call to reify-at-least that's not a start of a cycle | 14:41 | ||
i think it's doing a type specialization for reify-at-least after the first couple hundred are all for one type, but then almost all of the rest of the hundreds of thousands of calls are with a different type | 14:44 | ||
timotimo: do you know the answer to this question? colabti.org/irclogger/irclogger_lo...-08-23#l28 | 14:46 | ||
jnthn | MasterDuke: It's correct; it never backs out at the moment. | 14:47 | |
MasterDuke | k | ||
jnthn: btw, did you happen to see that recent article (i saw it on HN) about jits and specializations and such? | 14:48 | ||
jnthn | Hm, which one? :) | ||
MasterDuke | it seemed like the sort of thing you'd be interested in | ||
trying to think of some search terms to find it... | 14:50 | ||
timotimo | i wonder if it would be something for the spesh log | 14:52 | |
putting deopts in there | |||
MasterDuke | jnthn: webkit.org/blog/10308/speculation-...criptcore/ | 14:54 | |
timotimo: i manually added some prints, let me see if i can find that output | 14:55 | ||
jnthn | Hadn't seen that one; thanks | ||
MasterDuke | 246477 Deopt ones requested by interpreter because of sp_guardtype, got 'Rakudo::Iterator::Gather' but want 'Iterator' | 14:56 | |
492954 Deopt ones requested by interpreter because of sp_guardtype, got 'List::Todo' but want 'Iterator' | |||
timotimo: is ^^^ the sort of info you were thinking of? | 14:58 | ||
timotimo | right | 15:01 | |
but i mean specifically the MVMSpeshLog that records what happened in code to build specializations from | 15:02 | ||
MasterDuke | how easy would it be to "undo" a speshialization? count deopts and if they hit some multiple of successful calls remove the specialization and generate a new one? | ||
what would that get us in this case? | 15:03 | ||
it = the MVMSpeshLog | 15:04 | ||
*that | |||
timotimo | well, it's a relatively cheap way to record stuff-that-happened | ||
since the executing thread can just fire more and more info in there and the spesh worker will later figure out how it fits together | 15:05 | ||
MasterDuke | it seems to me that the underlying problem is the cycle that's happening. the cost of it is only exacerbated by the deopts | ||
timotimo | when it's in the interpreter, we could find out which exact guard in the bytecode does the deopt | 15:07 | |
and whether it's near the beginning or end | |||
like, if it deopts near the end, the impact shouldn't be as terrible | 15:09 | ||
if it deopts near the beginning, we'd be running at no-spesh speeds | |||
MasterDuke | ah, right | ||
timotimo | oh, can you measure how fast it runs when spesh is turned off? | 15:10 | |
MasterDuke | for an array of 2k elems, 5.9s for whatever, 1.3s for zen | 15:11 | |
timotimo | and your local timings with spesh were what again? | ||
MasterDuke | approx 2.8s for whatever, 0.2s for zen | 15:13 | |
jnthn | Wow, even with the deopt horror it still comes out ahead... | ||
MasterDuke | committable6: releases my @a = (^1_000).pick(*); my @b; @b = @a[*] for ^1_000; say @b[0]; say now - INIT now | 15:15 | |
committable6 | MasterDuke, gist.github.com/9143496e37dedcce0f...4e1df64aff | 15:17 | |
MasterDuke | eh, not seeing anything terribly useful in there | 15:18 | |
15:45
Altai-man joined
15:48
sena_kun left
|
|||
jnthn | This took quite a bit of coffee and snacks, but I think I've finally found a way to make a bunch of different outstanding dispatch needs boil down to one key addition to the model... Rough notes at gist.github.com/jnthn/1c1d717a2351...5d3135c32a | 16:02 | |
I *think* this will all be enough for the specializer to see what is going on too | 16:03 | ||
nwc10 | 'naks! My niece knows about 'naks! | 16:09 | |
I have read that and I can't comment usefully on it. | 16:11 | ||
jnthn | I wrote that and I'm not sure I can :P | 16:18 | |
16:19
zakharyas joined
|
|||
jnthn | But I'm happy that it at least seems to be one mechanism to deal with many things. | 16:19 | |
nwc10 | I was going to comment non-usefully that I'm sure that "cats" is a good thing, but then I realised that it's CATS and that seems, um, LTA: en.wikipedia.org/wiki/All_your_bas...long_to_us | 16:21 | |
16:43
brrt joined
17:19
domidumont left,
domidumont joined
17:44
brrt left
18:31
sena_kun joined
18:32
Altai-man left
18:36
brrt joined
18:42
domidumont left
19:44
AlexDaniel left
19:45
AlexDaniel joined,
AlexDaniel left,
AlexDaniel joined,
Altai-man joined
19:47
leont joined
19:48
sena_kun left
|
|||
MasterDuke | well, after a delay, i'm back at this. i think the deopt business is a larger scope problem that's not required to be fixed to improve the example above's behavior. i.e., fixing the reify-at-least cycle will also get rid of the deopts | 20:10 | |
21:16
zakharyas left
21:39
brrt left
22:00
[Coke] joined,
[Coke] left,
[Coke] joined
|
|||
timotimo | MasterDuke: do you think you can individually back out the optimization that relied on hash entries not moving? | 22:53 | |
that way we can split apart "merge of new hash" and "removal of that one optimization" | 22:54 | ||
oh, wait, that's for the dynamic variable thing | |||
not the star vs zen slice one | |||
23:17
MasterDuke left
23:46
sena_kun joined
23:48
Altai-man left
|