Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:02 reportable6 left 00:05 reportable6 joined 00:37 evalable6 joined 01:37 evalable6 left, linkable6 left 01:38 linkable6 joined 02:18 [Coke] left 02:21 [Coke] joined 02:27 tbrowder left 02:28 tbrowder joined 02:39 evalable6 joined 05:33 notable6 left, linkable6 left, statisfiable6 left, committable6 left, quotable6 left, unicodable6 left, squashable6 left, nativecallable6 left, evalable6 left, coverable6 left, shareable6 left, sourceable6 left, bloatable6 left, benchable6 left, bisectable6 left, greppable6 left, tellable6 left, releasable6 left, reportable6 left, evalable6 joined, linkable6 joined 05:35 nativecallable6 joined, tellable6 joined, benchable6 joined 05:36 statisfiable6 joined, quotable6 joined 06:02 reportable6 joined
nine jnthnwrthngtn: we really should merge github.com/MoarVM/MoarVM/pull/1573 before the release 06:27
06:33 bisectable6 joined 06:34 releasable6 joined 06:35 bloatable6 joined, coverable6 joined 07:31 patrickb joined 07:33 sourceable6 joined 07:34 greppable6 joined, notable6 joined 08:33 squashable6 joined 08:35 shareable6 joined
lizmat I'll wait to do a bump until that merge happened 09:04
jnthnwrthngtn moarning o/ 09:34
Bit breezy on the walk to the office, but nothing like I imagine it was yesterday :)
Good: a sudden gust didn't blow me into the river
Bad: a sudden gust didn't blow me into a pub either
09:34 committable6 joined
jnthnwrthngtn nine: Oh! Somehow I thought that was already in; I'm not sure why 09:35
Nicholas \o
Geth MoarVM: b92ca73b48 | (Stefan Seifert)++ | src/spesh/optimize.c
Fix uninitialized registers after deopt from dispatch guards

A dispatch gets translated to a sequence of operations culminating in the runbytecode instruction. The pre-deopt index of the original instruction will be found on the runbytecode itself or any of the guards stacked up before it. When looking for the pre-deopt index, we didn't take into account, that the instruction holding a suitable deopt pre ins annotation may also itself have ... (10 more lines)
MoarVM: 8c7b734d87 | (Jonathan Worthington)++ (committed using GitHub Web editor) | src/spesh/optimize.c
Merge pull request #1573 from MoarVM/fix-pea-segfaults

Fix uninitialized registers after deopt from dispatch guards
lizmat I'll take that as my cue ? 09:38
jnthnwrthngtn ^^
jnthnwrthngtn Yup
lizmat oki
jnthnwrthngtn japhb: Nice to see more new-disp speedup results. Even the worst of the rather variable mandelbrot measurements is better too... 09:46
lizmat MoarVM bumpred 09:56
.oO( bumpred? )
Nicholas burped?
jnthnwrthngtn So, further to my experiment with moving ->work allocations into the callstack, it seems that we can also safely allocate env there for stack-allocated frames, with the condition that upon heap promotion, we also copy the env area into something allocated by the FSA too 10:01
nine jnthnwrthngtn: regarding the finalizer discussion. What makes a point (like returning) safe for invoking something? Or what's the problem with invoking after arbitrary ops? 10:03
jnthnwrthngtn Reasoning: the only way we could end up with something looking at the ->env at a distance (such as from another thread) is if there was a heap reference.
nine: At a minimum, deopt relies on every such point being a deopt all point
nine: And that same mechanism is also used by the frame walker and now also dispatch resumption 10:04
You even fixed a bug not long back where we had an invocation by, I think, loadbytecode, and it wasn't marked as a deopt all point
nine Is there a cost to making something (like e.g. goto) a deoptallpoint? 10:05
Hah! Indeed I did :)
jnthnwrthngtn Yes, deopt points imply deopt usages in the spesh graph. Those in turn inhibit things like DCE, set elimination, etc.
To the degree I've been trying to work out how a replay scheme would look (that is, we deopt not to the precise instruction, but to the last pure instruction, so deopt points come earlier and we get less deopt usages) 10:06
Almost every time you look at a spesh log and think "grr, why didn't it delete this instruction", the answer is a deopt usage. 10:07
nine Feels like I should have been able to come up with all these answers myself :/ I certainly knew each of these bits already at some point
jnthnwrthngtn Not sure they're obvious. 10:08
nine Well I did fix a missing deopatllpoint and I did wonder quite a few times why we couldn't simplify the bytecode some more and ran into deopt when investigating 10:09
lizmat hmm... the logs server appears to use significantly more memory after this bump
(in my dev situation, the live server still runs on 2021.09 10:10
jnthnwrthngtn grr, of course doing the env thing is going to make OSR a little more fun too... 10:14
nine But, but, but, more fun is good, isn't it? :D 10:16
jnthnwrthngtn Maybe not when it's me, Friday, and pointer arithmetic :D 10:17
On the upside, I think I won't end up having to do a delicate GC dance around continuations, which was about 60% of the time spent on getting ->work allocated on the callstack 10:18
10:34 evalable6 left, linkable6 left 10:37 evalable6 joined 10:44 Altai-man joined
Altai-man lizmat++ # bump 10:49
tellable6 2021-10-20T18:21:31Z #raku-dev <tbrowder> Altai-man you are very welcome
10:49 Geth left 10:50 Geth joined
nine Nice to be able to close a segfault issue ticket with just a comment for a change :) (re #4520) 10:59
jnthnwrthngtn grmbl, wonder how I've managed to make a segv... 11:26
lunch, bbiab 11:41
12:02 reportable6 left 12:37 linkable6 joined
jnthnwrthngtn back 12:49
Guess I shouldn't feel too bad, my mistake involved one of the 3 hardest things in computer science... :P 13:19
Altai-man You don't mean an off-by-one right? :P 13:21
nine jnthnwrthngtn: the comment in github.com/MoarVM/MoarVM/blob/mast...sp.c#L1155 only talks about runbytecode, but that flag is also set for runcfunc. Which one is wrong? 13:23
jnthnwrthngtn No, cache invalidation 13:27
nine const_i64 r11(0), liti64(1099511627775) # [014] unboxed literal to value 1099511627775 13:36
sp_runnativecall r5(3), r9(0), liti64(140737352346240), r10(0), r11(0)
That's just beautiful :)
13:36 unicodable6 joined 13:37 linkable6 left
jnthnwrthngtn :D 13:39
13:40 linkable6 joined
jnthnwrthngtn nine: runcfunc in optimize.c does have code to free them 13:40
nine so the comment should include runcfunc? 13:41
jnthnwrthngtn nine: So I'd say the comment is wrong
I think it originally was only runbytecode and then it got tweaked
nine "Make sure we delay release of temporaries since optimization can add further ones." covers it well enough I'd say. It's clear from the case statements to which ops this applies to 13:42
jnthnwrthngtn up 13:44
OK, apart from fixing up OSR, moving env to the callstack for non-heap frames seems to work
Another 1.5s off the full Rakudo build, around 1s of it from stage parse 13:45
nine LOL "MoarVM panic: Unknown disaptch op when resolving callsite" 13:47
jnthnwrthngtn wat
nine Why can't I find the source of this message? Because I didn't copy and paste it into my ack command. I typed it fresh and didn't do the typo
jnthnwrthngtn Oh, I only just spotted it! 13:48
nine Turns out, there are quite a few places one needs to add new dispatchy ops to
Now why does it try to mark that int register like an object pointer in the GC? 13:50
Easy: because sometimes just copying code without understanding it is not really enough. Asked for a temp register with the wrong kind in the UnboxInt translation 13:53
Geth MoarVM/new-disp-nativecall: 11 commits pushed by (Stefan Seifert)++
review: github.com/MoarVM/MoarVM/compare/e...1a37f36665
nine This push contains the first working version of sp_runnativecall 14:19
jnthnwrthngtn Wow, including JITting? 14:37
Ah, I guess it's easily possible without that 14:38
ah, I see :) 14:39
Still very nice progress
nine Surprisingly this seems to cover all the native calls that occur during csv-ip5xs.pl 14:46
So in the good tradition of benchmark driven development, the next step will indeed be to get some JITing going 14:47
jnthnwrthngtn Does it show an improvement with this much done? 14:54
nine I do think so 15:02
15:05 reportable6 joined
nine 0m14.187s before 0m13.902s after (best of 10 runs each) 15:06
Variability is high with results in a range of up to +2s, but values seem to be better on average as well. 15:07
jnthnwrthngtn I guess the JITting would be the big win at this point, since it'll eliminate all of the dyncall/libffi setup overhead 15:08
nine In theory it could have even been worse, as sp_runnativecall causes JIT bails which sp_dispatch_o doesn't
jnthnwrthngtn Ah, yes, that also 15:09
Geth MoarVM/cheaper-frames: 92f3bac575 | (Jonathan Worthington)++ | 5 files
Allocate some frame environments on the callstack

Only do this for frames that live on the callstack rather than on the heap. This is the case when we have a lexical environment but it is never captured or the frame doesn't escape in other ways. When frames do escape, we have to also move the environment out of the callstack and onto the heap. We already do go to some effort to allocate on the heap ... (5 more lines)
15:17 patrickb left 15:25 nebuchadnezzar left
nine Now that I think of it, there's actually no reason to insist on native functions to box their results. 15:31
jnthnwrthngtn Not at all :) 16:04
Hm, I thought that splitting out the specialized vs. unspecialized forms of MVM_frame_dispatch would be a win, but it apparently is not one at all 16:07
Or I did something wrong.
Looking forward to the new nativecall JIT integration, so we can get rid of frame->args and frame->cur_args_callsite 16:09
nine Would be nice if I could make it a much more regular part of the JIT, too 16:11
jnthnwrthngtn m: say 1.222 / 1.278 16:19
camelia 0.956182
jnthnwrthngtn Seems the work/env move is another 4% off test-t
Seems it's about that off everything that isn't a micro-benchark that ends up mostly inlined 16:21
nine Hm...to be able to avoid the unboxing, we'd have to start out with a natively typed dispatch instruction like dispatch_i instead of dispatch_o. But can't get it to emit that even when assigning the result of an --> int64 sub directly into an int64 variable 16:23
jnthnwrthngtn No, we don't do the code-gen for that properly yet 16:25
I think setting a .returns on the QAST::Op call node would do it 16:26
nine Sounds like a bit of a yak 16:31
jnthnwrthngtn Indeed 16:41
Probably not immediately worth it
nine I don't even find where the call node really gets created 16:52
jnthnwrthngtn nine: Maybe the easiest place is in the optimizer, where if we know what we're calling, we can look at the .returns of the callee, and set that on the QAST::Op call 17:00
I don't think it can be done at the creation time of that node as subs can be post-declared, for example, so we don't try and resolve the sub at that point
home time o/ 17:06
17:08 Altai-man left
nine jnthnwrthngtn: apparently past you has already thought along the same lines and implemented just that: github.com/rakudo/rakudo/blob/mast....nqp#L3188 17:42
A little over 10 years ago actually: github.com/rakudo/rakudo/commit/c0...810595f4e6 17:46
lizmat wow 17:54
nine Of course that raises the question of why we don't see dispatch_i used then. The answer is: we set the returns on the QAST::Want instead of the QAST::Op(:op<call>) node 17:55
lizmat so that never worked ? 17:56
nine It did. Till commit 3cc9d765b2b350c9d15d0164ed53a9914b333afb in 2012 17:59
17:59 linkable6 left
lizmat well, that's before I really got involved, so "never" is pretty accurate to me then 18:01
18:02 reportable6 left 18:05 reportable6 joined
jnthnwrthngtn nine: Nice detective work :) 18:10
nine Still quite the yak. Fixing that leads to "Unsupported register return kind for dispatch op" with a $res_kind of $MVM_reg_int32 18:12
18:12 nebuchadnezzar joined
nine Which kinda makes sense. We need to extend smaller ints to the full width if we want to use native registers. So needs a coercion there 18:14
Though not even the old invocation code had such coercions. It simply used the primspec to decide on the instruction kind. 18:21
20:01 evalable6 left 20:02 evalable6 joined
jnthnwrthngtn m: say 1.79 / 2.07 20:28
camelia 0.864734
jnthnwrthngtn A recursive fib benchmark (we can't inline recursions, so it's a decent test of callframe setup/teradown) shows a nice improvement with work/env moved to the callstack 20:29
MasterDuke nice 21:24
21:31 SmokeMachine left, discord-raku-bot left, leont left, Nicholas left, Nicholas joined, discord-raku-bot joined, SmokeMachine joined 21:32 leont joined
timo \o/ 21:48
21:58 discord-raku-bot left, discord-raku-bot joined 22:02 linkable6 joined 22:04 kjp left 22:06 kjp joined 22:51 kjp left 22:52 kjp joined 23:02 Mondenkind is now known as moon-child
japhb Rebuilt rakudo just after the bump (so only the bump commit new on the rakudo side, but 31 commits farther on the MoarVM side). mandelbrot-pixels was yet faster, but still variable: 23:46
16 zooms of 62460 pixels each in 5.202 seconds = 192102 pixels/second
16 zooms of 62460 pixels each in 5.505 seconds = 181546 pixels/second
16 zooms of 62460 pixels each in 6.526 seconds = 153125 pixels/second
16 zooms of 62460 pixels each in 5.891 seconds = 169635 pixels/second
16 zooms of 62460 pixels each in 6.493 seconds = 153912 pixels/second
Carefully watching it, it seems like the speed was uneven within a single run (not just overall slower or faster) -- some zooms were noticeably slower than others each run. 23:47
(I suppose there's some of that to be expected just from the math, but it was enough to make me wonder.) 23:48
That variability is *after* quiescing my machine.
(I thought that might have been contributing last time -- despite it not having affected the 2021.09 runs much -- so I shut down most of my apps.) 23:51
After running the test a bunch of times, I'm seeing that 15xxxx pixels/second is way more common than the faster variants, but they still do show up occasionally.
Yeah, confirmed faster with attacks benchmark as well: 23:54
Min: 73.0 ms (13.7 fps) - Ave: 190.9 ms (5.2 fps) - Max: 376.1 ms (2.7 fps)
50%: 186.9 ms - 75%: 235.3 ms - 90%: 259.0 ms - 95%: 269.8 ms - 99%: 359.9 ms
timo might be interesting to extract out of a spesh log which frames get inlined into which other frames, and to see if that differs noticeably between slow and fast runs 23:55
japhb Is there an easy way to do that already, or is it a SMOP? 23:58