github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
timotimo froggs!!! 00:38
Geth MoarVM: 7507090328 | (Jonathan Worthington)++ | 4 files
Make spesh thread more GC-responsive

By making us able to mark the spesh graph more completely, and making the one we're currently working on available for GC marking, we can introduce GC sync points at a number of locations in the optimization process. This can notably reduce the GC latency when optimizing larger graphs.
Didn't measure repeatedly, but spectest time decreased by ~8s and CORE.setting compilation by ~2s; the latter feels a bit on the high side, so I'd take these numbers with a pinch of salt.
10:51
jnthn timotimo: Would be intersting to see if your GC latency profiling looks any different after ^^ :) 10:53
masak .oO( take these numbers with a punch of salt ) 11:13
dogbert2 only 24 degrees centigrade outside :) 11:17
lizmat only 34.6 here outside :-( 12:32
timotimo jnthn: i find it surprising that we wouldn't have to also mark facts known values and types 12:33
jnthn We do mark them?
timotimo but i guess if something's in there, it wouldn't have died in the nursery?!
oh!
hah, i see that now
i should have expanded the code before saying that
Geth MoarVM: 2f36e2666a | (Jonathan Worthington)++ | src/gc/allocation.c
Add branch hint macros to nursery allocation

Seems to shave a little off various programs; the effect was visible with callgrind.
12:39
diakopter .tell brrt well hopefully the static analysis wouldn't even need to expand the macros..? 12:42
yoleaux diakopter: I'll pass your message to brrt.
Geth MoarVM: 67a9afef60 | (Jonathan Worthington)++ | 3 files
Have sp_fastcreate do a direct nursery allocation

Since we should never use this when we're in gen2 allocation mode. Saves a branch and a function call per fastcreate.
12:58
timotimo jnthn: i tried to come up with a spesh plugin for p6assign to a Proxy, but i think i was still missing some important detail; it looks like what we actually call to invoke the proxy has to have very slightly different arguments, which is why the invokespec has two tiny subs that create a Scalar and call &!FETCH with it 13:05
got a hot tip for me?
lizmat
.oO( don't go outside now )
timotimo :) 13:06
jnthn Yeah, it's doing an extra wrapping of the thing I think 13:07
timotimo so my first instinct was to say "build a little closure inside the spesh plugin's run and return that"
but i don't think a closure like that is optimizable by spesh then?
jnthn That can work, especially as we can inline closures
timotimo i thought we can only do that if it's the outer we inline it into? 13:08
jnthn No
timotimo or was that the magic trick for "getlexvia"?
jnthn Can do it more generally
Right, the via trick :)
That was one of those things where when I thought of it, I wondered why I didn't think of it 2 years beforehand... :)
timotimo i'll give it a try again, perhaps
annoyingly, we mostly emit p6store in typical situations where we have proxies 13:09
timotimo like when assigning to the result of a method call 13:09
jnthn Ho, is that not yet using a spesh plugin? 13:11
Can't always remember what I did and didn't get around to :) 13:12
timotimo only p6assign i think 13:13
let me check again
timotimo well, there's an "assign" spesh plugin, but not a "store" one 13:14
jnthn Aha 13:19
Time to write it! :)
timotimo i wonder if i should have a look 13:20
buses and such
timotimo oooh, stage parse is below 60s again 13:29
jnthn :) 13:37
timotimo ok, after having a "breakfast" i can try to figure out why my current attempt gives me No such method 'CALL-ME' for invocant of type 'Bool' 14:49
timotimo MoarVM oops: Too many levels of inlining popped 14:51
when i try to MVM_SPESH_NODELAY the code
ah, the data doesn't actually land in lexicals, only registers 14:55
that makes the debugserver less helpful
OK, that's interesting. instead of getting the FETCH that i've assigned to the object in the constructor, i apparently resolved Proxy's actual FETCH method instead 14:57
oh, hah 14:58
i've been consistently messing up FETCH and STORE
i'm trying to write a spesh plugin for STORE, not for FETCH!
though FETCH will also want one
oh, no, i think i'm actually wrongfully deconting somewhere, thus causing FETCH to be run when i actually wanted to work with the proxy object itself 14:59
timotimo jnthn, do i need something like speshguardsf to ensure i'm getting the same code object? i've first tried spesguardobj, but i that gets very unhappy if you "Proxy.new" many, many times 15:11
timotimo ooh, getstaticcode could be the right one? 15:16
hm, but i think i'd have to have that as a spesh-recorded instruction, too 15:17
otherwise i can't guard against the result
got a preliminary implementation of that guard 15:26
well, need to implement the guard, i only added the guard instruction 15:28
timotimo OK, it runs, that's good. 15:47
now for timing 15:50
aaw, it's barely faster 15:52
m: say <4.34 4.47 4.38 4.36> / 4 15:54
camelia 1
timotimo m)
m: say <4.34 4.47 4.38 4.36>.sum / 4
camelia 4.3875
timotimo m: say <4.73 4.71 4.64 4.64>.sum / 4 15:55
camelia 4.68
timotimo m: say 4.3875 / 4.68
camelia 0.9375
timotimo it didn't end up inlining the actual fetch sub that was put into the proxy, just the sub that creates the Scalar and passes that on 15:56
timotimo maybe it could only be better if we did a second round of logging after the spesh plugin has been resolved and inlined 15:59
timotimo jnthn: should i upload PRs for moar, nqp, and rakudo for this spesh plugin? 16:03
or is 7% not worth the hassle?
not even 7% 16:04
jnthn Sends PRs, I can review them and see what complexity we get for the speedup 16:06
timotimo sure thing. maybe you also see what could make spesh better at figuring out the inner inline
Geth MoarVM/speshplugin_guardstaticcode: 8dabbcc01b | (Timo Paulssen)++ | 8 files
add speshguardgetstaticcode, for closures and such

lets a spesh plugin figure out if a given object, such as the $!do attribute of a Code, is "the same" across invocations - ignoring what exactly it closes over.
16:09
Geth MoarVM: timo++ created pull request #932:
add speshguardgetstaticcode, for closures and such
16:10
timotimo github.com/rakudo/rakudo/pull/2189 - also has the link to the other pull requests in the description 16:14
brrt \o 16:32
yoleaux 12:42Z <diakopter> brrt: well hopefully the static analysis wouldn't even need to expand the macros..?
timotimo o/ brrt 16:34
sp_fastalloc is simple enough to have the most likely case inlined into the expr template, IMO 16:35
there's a check for size > 0 in the function version, but on the jit level we already know the size exactly 16:36
brrt: any idea for something like MVM_LIKELY or MVM_UNLIKELY at the exprjit level?
jnthn m: say 3.68 / 7.34 16:41
camelia 0.501362
timotimo oh wow
that's a nice ratio
jnthn Yeah, it's for gist.github.com/jnthn/196263d7e888...b2e42b2008 16:42
My changes today have made that run in half the time 16:43
Some unpushed
brrt not yet timotimo 16:44
i'm not sure how MVM_LIKELY is compiled
timotimo no clue. does X86_64 encode branch predictor hints into the assembly implicitly or explicitly? 16:44
i.e. is gcc flipping the then and else branches around so the more likely one is the one that needs no goto? 16:45
jnthn I at first thought the latter
Oh, it may really just be that
Watever it does seems to have an effect on instruction count though
timotimo hm, "instruction fetch" is what valgrind counts, isn't it? 16:46
jnthn Thing so 16:47
*Think
timotimo i imagine it perhaps has to do with fetches being not byte-per-byte, but whole cache-lines at once?
and if the unlikely then branch starts in the same cache line, it gets fetched, and the likely else branch might start in the middle of the right cache line, too
brrt hmm, that's cute 16:49
we'd need either a new node, or a node annotation (flags?) 16:58
timotimo probably having a node would be easiest 16:59
but feel free to push that off to the medium future 17:00
Kaiepi thoughts on implementing asm jit support for x32? 17:37
brrt sure 17:48
we hate x32
brrt = we, in this case :-P 17:49
I'm going to qualify that a bit better... 17:50
I think when we say x32, we mean the 'run 64 bit code with 32 bit addresses' - and presumably integers as well?
it's a windows thing iirc 17:51
I actually think that is a pretty sane idea, given how much better x86-64 is than x86
but, on the other hand....
brrt here's my general hypothesis on perl6, adoption, and performance 17:52
there's two places where performance is going to matter for perl6 adoption
1): on developer laptops
brrt 2): on production servers 17:53
brrt all other platforms, including ARMv7, AArch64, x32, MIPS, PowerPC, whatever 17:53
(production servers are going to have Xeon-style processors, typically virtualized) 17:54
all other platforms don't really matter. Not in this phase of adoption
if we ever get as big as perl5, then there will be a horde of users clamoring for their obscure platforms. And we'll have a moarvm porters group who will make sure that happens
timotimo there's also x32 for linux 17:55
it's also not very difficult for a perl6 program to balloon up to 4 gigs %) 17:56
brrt that, too
unfortunately 17:57
timotimo brrt: do you think porting the fast path of sp_fastcreate to exprjit will lose us the benefits gained from adding the branch hint macros to the allocate_nursery function? 18:00
Kaiepi so that's a "keep it on the sideburner until later?"
brrt as far as i'm concerned, yes 18:01
and maybe at that time x32 might no longer be a thing
timotimo: I'm not sure; the only way to know is to test
brrt an open problem is - how do I, with the current register allocator, or something else which is both fast and correct, prevent an unlikely branch from forcing a spill in a likely branch 18:05
lucasb just to note that I'm still on x86 32bit... sure my next machine will be 64bit and I'll never bother with the non-existent JIT for 32b, but until then... :) 18:08
I wonder, is it even possible to have a variant of the current JIT for 32bit? Would brrt be willying to guide someone doing that?
lucasb *willingly 18:09
brrt it is totally possible 18:10
and if somebody were up for it, I'd help, sure
it's just that I'm not going to spend time on it myself
lucasb that's ok 18:11
timotimo the lego jit should not be terribly hard, right? 18:13
since it spills and loads at every conceivable point
number of registers isn't such a big problem
having to implement the calling conventions is a little bit of work i guess? 18:14
brrt there's a bunch of different ones 18:15
and you basically can't share any code
timotimo of course there are m(
brrt and I make no guarantees that any of the 'top' constructs make a lot of sense for the lower constructs 18:16
Kaiepi i'm up for it 18:19
i will need some help though
brrt obviously 18:21
how much do you know about assembly language
Kaiepi not very much, i cargo-culted it going off the other examples in src/jit/x64/emit.dasc 18:22
are there any resources you could point me too so i could learn more?
brrt hmm. there's the dynasm docs here: corsix.github.io/dynasm-doc/tutorial.html 18:28
I woudl suggest you start playing with that to get a feel for assembly
Kaiepi aight 18:29
thanks
lucasb Kaiepi++ I totally encourage that! 18:35
brrt if nothing else you'll learn some things :-) 18:36
lucasb I wouldn't know how to do that. By the time I acquire the skills, the platform will be long gone, and maybe possibly even Earth itself
Kaiepi it's always good to learn some new things
lucasb but I can try help Kaiepi :)
brrt assembly really isn't all that hard though
there's very few things you have to take into account 18:37
Geth MoarVM: xelak6++ created pull request #934:
Get the number of bytes to be processed from the current buffer and not from the header.
20:25
jnthn D'oh, I put beer to the freezer for a bit so it'd be nice and cool, and now I've got iced beer... 21:17
timotimo better than a freezer full of "spicy" beer shards 21:46
more spiky than spicy, really
in german it works better because "scharf" means both "sharp" and "spicy"
jnthn :) 22:08
It hadn't frozen through fully, and now I'm near the end of the class and it's getting warm 22:09
Geth MoarVM: 448e75bd3d | (Alexius Korzinek)++ | src/strings/utf8_c8.c
Get the number of bytes to be processed from the current buffer and not from the header.

This fixes issue #2158.
22:22
MoarVM: 3e679da29a | (Jonathan Worthington)++ (committed using GitHub Web editor) | src/strings/utf8_c8.c
Merge pull request #934 from xelak6/master

Get the number of bytes to be processed from the current buffer and not from the header.
travis-ci MoarVM build errored. Jonathan Worthington 'Merge pull request #934 from xelak6/master 22:43
travis-ci.org/MoarVM/MoarVM/builds/413340536 github.com/MoarVM/MoarVM/compare/6...679da29adb
timotimo hm. perhaps i should have backed up a thing or two from /tmp before rebooting 23:35
i found timo.github.io/_site/weeklychanges...thing.html 23:36