| IRC logs at
Set by AlexDaniel on 12 June 2018.
01:33 evalable6 left, linkable6 left 01:36 linkable6 joined, evalable6 joined 02:16 squashable6 left 02:18 squashable6 joined 04:01 MasterDuke left
nwc10 good *, #moarvm 06:00
06:48 MasterDuke joined 07:41 colomon__ joined, colomon_ left 08:19 sena_kun joined
jnthn morning o/ 08:54
tellable6 2020-09-01T14:04:13Z #raku <[Coke]> jnthn: can you add your employer to the RSC nomination doc?
nwc10 \o 09:02
jnthn Uff, doing a rebase of new-disp onto master isn't quite easy... 09:34
nwc10 my fault? :-( 09:39
jnthn Yeah, though it's maybe not quite so bad as I feared 09:41
Coffee is starting to work... :)
timotimo excited to hear about new-disp getting back onto jnthn's desk
nwc10 I *had* thought that ASAN was excited by new-disp, but I can't recraete it currently 09:42
09:45 Altai-man joined 09:47 sena_kun left
jnthn src/disp/syscall.h:36:5: error: unknown type name ‘UT_hash_handle’ UT_hash_handle hash_handle; 09:51
Yeah, I introduced a new hash...
nwc10 oh righto. mmm. 09:52
timotimo shouldn't be harder to use the new hash kind 09:53
jnthn So now I guess I need to figure out how to adapt this.
timotimo but is it enough to change some names?
like, type names and function names?
jnthn Dunno. 09:54
timotimo nwc10 would know :)
jnthn nwc10: It's a hash keyed on MVMString* (though can be on char* too) to a C structure; which is the appropriate kind of hash? 09:55
nwc10 MVM_fixkey_hash with `name` as the key?
at a guess
because that's the one where the body doesn't move once you've created one
jnthn No, in fact the bodys are `static` data :)
So it's not even allocated
nwc10 hangon, I'm a bit struggling with too many things here. $ork exists 09:56
the cheating suggestion is
timotimo $ork and the alien mindbenders
nwc10 look at many of the things like the dll registery and the extops registry and the nativecall stuff
and see which one is closest to the semantics needed 09:57
and cheat from there
and it looks like one of the others, but I forget which
MVMContainerRegistry 10:00
commit 040e9d6358cd65dfafb6e5403148c5f72c359e63
linkable6 (2020-05-31) Implement MVMStrHash, which is UT Hash rearranged to be "right way out".
nwc10 IIRC that was the one with 3 static structures defined in the C code, and then a hash lookup needed for them. 10:01
jnthn tries to grok fixkey vs str 10:07
nwc10 fixkey indirects
it keeps the allocated "thing" in a block of memory that doesn't move
jnthn ah, ok
nwc10 regular str inlines the allocated "thing" in the top level (ie only) array of hash entries 10:08
so it might move (on insert or delete of other keys) 10:09
jnthn I'm not allocating anything here really; it's all static memory
nwc10 yes, so a small C structure, which has a pointer to the static memory
jnthn ah, I need to introduce another struct. OK
nwc10 might be the right thing (in a "str" hash)
jnthn Yeah, that's what container configurer does. Was trying to see if I could avoid that :)
nwc10 but for the "str" hash you can't perma-root the MVMString * keys
(because they might move)
whereas a fixkey hash, because of the indirection 10:10
the key can be in the permanent root set
jnthn ah, my key is in the permanent root set 10:11
nwc10 then that means either: fixkey hash 10:12
or "take it out of the permanent roots set, str hash, and adding code to root.c"
the former seems easier.
jnthn OK, back to fixkey hash then :)
nwc10 when I was making those sort of choices, I was guessing also that it was slightly more performant. trade off being "indirection" vs "an iterator loop for every GC run" 10:13
MasterDuke i'm not paying 100% attention here, but would this not-moving thing be useful to reinstate that spesh optimization that had to be tossed? the sp_gethashentryvalue op 10:16
nwc10 likely not, because that's for the MVMHash
which is a 2 pointer structure
and you'd have to replace it with a 1 pointer indirecting to a 2 pointer struct 10:17
so 50% larger, 1 more pointer chase, for *every* NQP-visible hash
If the NQP implementation exposed two different hashes to the NQP language level 10:18
one where "samller, but things move" and "slightly larger, but they don't"
jnthn Hm, but given I've static memory to point at, and fixkey hash seems to want to allocate for me anyway, so I'll need an indirection struct whatever I do... 10:19
nwc10 then the relevant code could choose to use the "slightly larger" hash for whatever-it-is that matters
jnthn: yes, sometimes you can't win :-(
jnthn ...then it may be more efficient to be using the strhash, and just having it marked
Since iiuc the fixkey hash will make it 2 levels of indirection
nwc10 it would
jnthn OK, then strhash 10:20
nwc10 but
as I sort of am guessing it here
if your "static" structure is `struct MVMDispSysCall ~
then I think you ned up needing
struct MVMDispSysCall { 10:21
struct MVMStrHashHandle hash_handle;
... all the other stuff
where "all the other stuff" is all the current members, *except* `name` and `hash_handle`; 10:22
because the new hash_handle struct contains the key, which is the name.
MasterDuke too bad, definite slowdown in accessing dyn vars compared to 2020.07 10:39
jnthn I've got that bit building again and am through the rebase, but the end result doesn't build 'cus it iterated a hash somewhere that has now gone away 10:40
ah, I just need to find the new way to iterate an MVMHash in code :) 10:45
uh, from C code
MasterDuke committable6: 8fd029a3025307f8ccac11adeed77d170408b5a8~1,8fd029a3025307f8ccac11adeed77d170408b5a8 my $a; $a := $*OUT for ^5_000_000; say now - INIT now 10:48
committable6 MasterDuke, ¦8fd029a3025307f8ccac11adeed77d170408b5a8~1: «6.3943207␤» ¦8fd029a: «7.86781435␤» 10:49
Geth MoarVM/new-disp: 156 commits pushed by (Jonathan Worthington)++, (Timo Paulssen)++, (Daniel Green)++
jnthn Well, it builds NQP and passes the NQP tests on the new-disp branch, so it can't be *that* broken 10:52
timotimo whew, already 156 commits 10:53
jnthn I didn't go back and rework all the commits in the branch to actually build, alas
Just have the fixup commits at the end 10:54
timotimo MasterDuke: looks like finding/accessing dynamic variables got a bit faster?
does that replicate on a local machine that's not doing anything else?
MasterDuke no the ~1 is before the nqp bump that brought in the moarvm with the new hash 10:55
timotimo ah dang
MasterDuke i haven't tried to replicate locally. but a couple different run with committable6 show relatively consistent results
jnthn o MoarVM syscall with name 'dispatcher-delegate' at gen/moar/BOOTSTRAP/v6c.nqp:4204 (/home/jnthn/dev/rakudo/blib/Perl6/BOOTSTRAP/v6c.moarvm:) 10:56
oh, right, I probably need to make the str keys in that strhash :)
MasterDuke linkable6: 8fd029a
Geth: 8fd029a
timotimo shareable6: 8fd029a 10:57
shareable6 timotimo,
timotimo this one?
MasterDuke i was just trying to get a link to the commit in the rakudo repo
timotimo oh
linkable6: rakudo/8fd029a 10:58
linkable6: help
linkable6 timotimo, Like this: R#1946 D#1234 MOAR#768 NQP#509 SPEC#242 RT#126800 S09:320 524f98cdc # See wiki for more examples:
timotimo ok it's supposed to be able to do commits, but perhaps that one was too short?
MasterDuke 8fd029a3025307f8ccac11adeed77d170408b5a8
linkable6 (2020-08-31) Bump NQP to get nwc10++ improved hash implementation
Geth MoarVM/new-disp: b2656d7011 | (Jonathan Worthington)++ | src/gc/roots.c
Mark dispatcher syscall hash
MasterDuke hm. maybe committable6, etc need to give longer hashes
timotimo actually would hope that committable and friends wouldn't always trigger linkable 11:00
which of course it could just filter by nickname
jnthn Yup, now Rakudo's rebased new-disp branch builds
Pushed rebases of all the new-disp 11:01
timotimo great
jnthn Which means...lunch time :) bbl 11:02
timotimo you would really expect the dynvar cache to give you performance unrelated to hash access? or is it hash-based too? 11:07
there's a dynvar cache performance analysis thing in moarvm plus a script to actually analyze the data it spits out 11:08
OK, MVM_get_lexical_by_name is the top result and it does have MVM_index_hash_fetch 11:13
but it could be that for this code it's actually not going via the hash route 11:14
there's a "if there's not enough lexical names, just linear-scan them" in there, too
and MVM_string_equal takes a lot of time there
oh, it's currently looking for $*PROMISE while inside &DYNAMIC 11:24
nine Yes, IIRC that's where the sp_gethashentryvalue optimization helped 11:25
timotimo who wants to measure if giving root temp push checking for going past the allocated number of temp roots an MVM_LIKELY or MVM_UNLIKELY makes any performance difference at all? 11:33
quite possible that gcc / clang already get it the right way around already
(gdb) print name[0][0] 11:38
$40 = {blob_string = 0x34cd7f0, start = 13, end = 14, repetitions = 0}
(gdb) print name[0][1]
$41 = {blob_string = 0x34cd7f0, start = 14, end = 15, repetitions = 0}
(gdb) print name[0][2]
$42 = {blob_string = 0x34cd7f0, start = 15, end = 18, repetitions = 0}
this is $*OUT btw
i thought we had an optimization where concatenating a string to another when the parts fit it will just extend the last strand 11:39
11:46 sena_kun joined 11:48 Altai-man left 11:50 domidumont joined 11:51 domidumont left
nwc10 jnthn: to avoid an assert() fail you need: 11:56
NQP happy, Rakudo has failures, those that I looked at were "Use of Nil in string contex" 12:00
er, context.
jnthn Hm, wonder how that didn't end up conflicting somewhere along the way 12:15
Hm, a bunch of new SIGSEGV in spectest 12:16
nwc10 I didn't reach the spectest
jnthn Oh? 12:17
ah, debug build maybe
Geth MoarVM/new-disp: e3e771400a | (Jonathan Worthington)++ | src/6model/reprs.h
Correct REPR count; nwc10++
12:21 krunen left 12:22 krunen joined
jnthn grmbl, none of these feel inclined to SEGV when not being run as part of make spectest 12:23
nwc10 I know exactly this feeling 12:25
or worse
I have this problem quite often
jnthn Dunno, the ones I'm looking at should, in theory, not be timing sensitive. All running with spesh blocking turned on 12:32
timotimo perhaps rr's "chaos mode" could help tease out the problematic execution path 12:47
otherwise, rr can record multi-process-containing things, too 12:50
12:52 brrt joined 13:06 domidumont joined
nwc10 I tried rr but it doesn't like all the AMD CPUs I seem to have everywhere. 13:07
brrt \o 13:10
what's the problem that requires rr 13:11
13:11 krunen left 13:13 domidumont left
nwc10 o/ 13:13
13:16 < jnthn> Hm, a bunch of new SIGSEGV in spectest
13:23 < jnthn> grmbl, none of these feel inclined to SEGV when not being run as part of make spectest
13:25 < nwc10> I know exactly this feeling
13:15 squashable6 left, domidumont joined 13:16 squashable6 joined
brrt ah, I see 13:17
yeah, that's hateful
timotimo rr uses a performance counter from the cpu that is apparently not reliable/exact enough on some AMD CPUs 13:19
[Coke] jnthn: ignore that employer request. (sent the initial ask directly to a few folks, followup based on convo with lizmat to disregard was just tagging several people) 13:21
nwc10 timotimo: that part I didn't know. I didn't spot any "why" as to why AMD wasn't supported. I had just assumed "we're all using Intel; patches weclome"
jnthn [Coke]: I already added the info anyway
[Coke] jnthn++
I think everyone knew yours already. 13:22
jnthn :D
Goodness, getting back into the dispatch stuff is some effort. :)
Probably because I left it at the point I had to solve a hard problem. :)
timotimo i got that from a ticket tracking the issue 13:24
nwc10 aha, I didn't go looking into tickets 13:26
13:26 krunen joined
jnthn greppable6: &callsame 13:46
greppable6 jnthn, 2 lines, 1 module:
14:10 domidumont left 14:11 domidumont joined 14:13 domidumont left, domidumont joined
MasterDuke hm. where was i with that zen slice vs whatever slice investigation? 14:13
i think i initially got sidetracked by the oddness with char ranges, but that was a red herring 14:14
timotimo that was a startling performance difference, right? 14:17
MasterDuke m: my @a = (^2_000).pick(*); my @b; @b = @a[] for ^1_000; say @b[(^@a).pick]; say @b[0]; say now - INIT now
camelia 1836
MasterDuke m: my @a = (^2_000).pick(*); my @b; @b = @a[*] for ^1_000; say @b[(^@a).pick]; say @b[0]; say now - INIT now
camelia 421
MasterDuke the whatever slice ends up in a cycle involving N calls to reify-at-least, that then each create a cycle involving two or three other calls and then back in reify-at-least 14:21
timotimo did i ever compare them with callgrind?
MasterDuke i don't recall seeing any numbers
nwc10 I/O, I/O, it's off to async purgatory we go.
I'm still not convinced that async is the future
timotimo i hope it isn't a case of the profiler going wild 14:22
nwc10 and the failure mode seems to be purgatory. Not hell.
MasterDuke but i'm not profiling and you can see the difference above
nwc10 you know where you are with hell.
timotimo right, i meant more with that cycle you're refering to
MasterDuke and no, i added a print in reify-at-least and it goes crazy
timotimo OK 14:23
MasterDuke the numbers match up with what the profiler showed
with the array being just 100 elements, callgrind says 3.5B instructions for whatever, 1.3B instructions for zen 14:27
doubling it to 200 elements give 6.1B for whatever, 1.9B for zen 14:32
14:34 brrt left
timotimo it spends a whole lotta time "resolve_using_guards"ing 14:40
so my guess is perhaps in the one case it succeeds in inlining something very hot, in the other it doesn't
MasterDuke there are a ton of deopts
pretty much one for every call to reify-at-least that's not a start of a cycle 14:41
i think it's doing a type specialization for reify-at-least after the first couple hundred are all for one type, but then almost all of the rest of the hundreds of thousands of calls are with a different type 14:44
timotimo: do you know the answer to this question? 14:46
jnthn MasterDuke: It's correct; it never backs out at the moment. 14:47
MasterDuke k
jnthn: btw, did you happen to see that recent article (i saw it on HN) about jits and specializations and such? 14:48
jnthn Hm, which one? :)
MasterDuke it seemed like the sort of thing you'd be interested in
trying to think of some search terms to find it... 14:50
timotimo i wonder if it would be something for the spesh log 14:52
putting deopts in there
MasterDuke jnthn: 14:54
timotimo: i manually added some prints, let me see if i can find that output 14:55
jnthn Hadn't seen that one; thanks
MasterDuke 246477 Deopt ones requested by interpreter because of sp_guardtype, got 'Rakudo::Iterator::Gather' but want 'Iterator' 14:56
492954 Deopt ones requested by interpreter because of sp_guardtype, got 'List::Todo' but want 'Iterator'
timotimo: is ^^^ the sort of info you were thinking of? 14:58
timotimo right 15:01
but i mean specifically the MVMSpeshLog that records what happened in code to build specializations from 15:02
MasterDuke how easy would it be to "undo" a speshialization? count deopts and if they hit some multiple of successful calls remove the specialization and generate a new one?
what would that get us in this case? 15:03
it = the MVMSpeshLog 15:04
timotimo well, it's a relatively cheap way to record stuff-that-happened
since the executing thread can just fire more and more info in there and the spesh worker will later figure out how it fits together 15:05
MasterDuke it seems to me that the underlying problem is the cycle that's happening. the cost of it is only exacerbated by the deopts
timotimo when it's in the interpreter, we could find out which exact guard in the bytecode does the deopt 15:07
and whether it's near the beginning or end
like, if it deopts near the end, the impact shouldn't be as terrible 15:09
if it deopts near the beginning, we'd be running at no-spesh speeds
MasterDuke ah, right
timotimo oh, can you measure how fast it runs when spesh is turned off? 15:10
MasterDuke for an array of 2k elems, 5.9s for whatever, 1.3s for zen 15:11
timotimo and your local timings with spesh were what again?
MasterDuke approx 2.8s for whatever, 0.2s for zen 15:13
jnthn Wow, even with the deopt horror it still comes out ahead...
MasterDuke committable6: releases my @a = (^1_000).pick(*); my @b; @b = @a[*] for ^1_000; say @b[0]; say now - INIT now 15:15
committable6 MasterDuke, 15:17
MasterDuke eh, not seeing anything terribly useful in there 15:18
15:45 Altai-man joined 15:48 sena_kun left
jnthn This took quite a bit of coffee and snacks, but I think I've finally found a way to make a bunch of different outstanding dispatch needs boil down to one key addition to the model... Rough notes at 16:02
I *think* this will all be enough for the specializer to see what is going on too 16:03
nwc10 'naks! My niece knows about 'naks! 16:09
I have read that and I can't comment usefully on it. 16:11
jnthn I wrote that and I'm not sure I can :P 16:18
16:19 zakharyas joined
jnthn But I'm happy that it at least seems to be one mechanism to deal with many things. 16:19
nwc10 I was going to comment non-usefully that I'm sure that "cats" is a good thing, but then I realised that it's CATS and that seems, um, LTA: 16:21
16:43 brrt joined 17:19 domidumont left, domidumont joined 17:44 brrt left 18:31 sena_kun joined 18:32 Altai-man left 18:36 brrt joined 18:42 domidumont left 19:44 AlexDaniel left 19:45 AlexDaniel joined, AlexDaniel left, AlexDaniel joined, Altai-man joined 19:47 leont joined 19:48 sena_kun left
MasterDuke well, after a delay, i'm back at this. i think the deopt business is a larger scope problem that's not required to be fixed to improve the example above's behavior. i.e., fixing the reify-at-least cycle will also get rid of the deopts 20:10
21:16 zakharyas left 21:39 brrt left 22:00 [Coke] joined, [Coke] left, [Coke] joined
timotimo MasterDuke: do you think you can individually back out the optimization that relied on hash entries not moving? 22:53
that way we can split apart "merge of new hash" and "removal of that one optimization" 22:54
oh, wait, that's for the dynamic variable thing
not the star vs zen slice one
23:17 MasterDuke left 23:46 sena_kun joined 23:48 Altai-man left