Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:02
reportable6 left
00:05
reportable6 joined
00:37
evalable6 joined
01:37
evalable6 left,
linkable6 left
01:38
linkable6 joined
02:18
[Coke] left
02:21
[Coke] joined
02:27
tbrowder left
02:28
tbrowder joined
02:39
evalable6 joined
05:33
notable6 left,
linkable6 left,
statisfiable6 left,
committable6 left,
quotable6 left,
unicodable6 left,
squashable6 left,
nativecallable6 left,
evalable6 left,
coverable6 left,
shareable6 left,
sourceable6 left,
bloatable6 left,
benchable6 left,
bisectable6 left,
greppable6 left,
tellable6 left,
releasable6 left,
reportable6 left,
evalable6 joined,
linkable6 joined
05:35
nativecallable6 joined,
tellable6 joined,
benchable6 joined
05:36
statisfiable6 joined,
quotable6 joined
06:02
reportable6 joined
|
|||
nine | jnthnwrthngtn: we really should merge github.com/MoarVM/MoarVM/pull/1573 before the release | 06:27 | |
06:33
bisectable6 joined
06:34
releasable6 joined
06:35
bloatable6 joined,
coverable6 joined
07:31
patrickb joined
07:33
sourceable6 joined
07:34
greppable6 joined,
notable6 joined
08:33
squashable6 joined
08:35
shareable6 joined
|
|||
lizmat | I'll wait to do a bump until that merge happened | 09:04 | |
jnthnwrthngtn | moarning o/ | 09:34 | |
Bit breezy on the walk to the office, but nothing like I imagine it was yesterday :) | |||
Good: a sudden gust didn't blow me into the river | |||
Bad: a sudden gust didn't blow me into a pub either | |||
09:34
committable6 joined
|
|||
jnthnwrthngtn | nine: Oh! Somehow I thought that was already in; I'm not sure why | 09:35 | |
Nicholas | \o | ||
Geth | MoarVM: b92ca73b48 | (Stefan Seifert)++ | src/spesh/optimize.c Fix uninitialized registers after deopt from dispatch guards A dispatch gets translated to a sequence of operations culminating in the runbytecode instruction. The pre-deopt index of the original instruction will be found on the runbytecode itself or any of the guards stacked up before it. When looking for the pre-deopt index, we didn't take into account, that the instruction holding a suitable deopt pre ins annotation may also itself have ... (10 more lines) |
09:37 | |
MoarVM: 8c7b734d87 | (Jonathan Worthington)++ (committed using GitHub Web editor) | src/spesh/optimize.c Merge pull request #1573 from MoarVM/fix-pea-segfaults Fix uninitialized registers after deopt from dispatch guards |
|||
lizmat | I'll take that as my cue ? | 09:38 | |
jnthnwrthngtn ^^ | |||
jnthnwrthngtn | Yup | ||
lizmat | oki | ||
jnthnwrthngtn | japhb: Nice to see more new-disp speedup results. Even the worst of the rather variable mandelbrot measurements is better too... | 09:46 | |
lizmat | MoarVM bumpred | 09:56 | |
.oO( bumpred? ) |
|||
Nicholas | burped? | ||
jnthnwrthngtn | So, further to my experiment with moving ->work allocations into the callstack, it seems that we can also safely allocate env there for stack-allocated frames, with the condition that upon heap promotion, we also copy the env area into something allocated by the FSA too | 10:01 | |
nine | jnthnwrthngtn: regarding the finalizer discussion. What makes a point (like returning) safe for invoking something? Or what's the problem with invoking after arbitrary ops? | 10:03 | |
jnthnwrthngtn | Reasoning: the only way we could end up with something looking at the ->env at a distance (such as from another thread) is if there was a heap reference. | ||
nine: At a minimum, deopt relies on every such point being a deopt all point | |||
nine: And that same mechanism is also used by the frame walker and now also dispatch resumption | 10:04 | ||
You even fixed a bug not long back where we had an invocation by, I think, loadbytecode, and it wasn't marked as a deopt all point | |||
nine | Is there a cost to making something (like e.g. goto) a deoptallpoint? | 10:05 | |
Hah! Indeed I did :) | |||
jnthnwrthngtn | Yes, deopt points imply deopt usages in the spesh graph. Those in turn inhibit things like DCE, set elimination, etc. | ||
To the degree I've been trying to work out how a replay scheme would look (that is, we deopt not to the precise instruction, but to the last pure instruction, so deopt points come earlier and we get less deopt usages) | 10:06 | ||
Almost every time you look at a spesh log and think "grr, why didn't it delete this instruction", the answer is a deopt usage. | 10:07 | ||
nine | Feels like I should have been able to come up with all these answers myself :/ I certainly knew each of these bits already at some point | ||
jnthnwrthngtn | Not sure they're obvious. | 10:08 | |
nine | Well I did fix a missing deopatllpoint and I did wonder quite a few times why we couldn't simplify the bytecode some more and ran into deopt when investigating | 10:09 | |
lizmat | hmm... the logs server appears to use significantly more memory after this bump | ||
(in my dev situation, the live server still runs on 2021.09 | 10:10 | ||
jnthnwrthngtn | grr, of course doing the env thing is going to make OSR a little more fun too... | 10:14 | |
nine | But, but, but, more fun is good, isn't it? :D | 10:16 | |
jnthnwrthngtn | Maybe not when it's me, Friday, and pointer arithmetic :D | 10:17 | |
On the upside, I think I won't end up having to do a delicate GC dance around continuations, which was about 60% of the time spent on getting ->work allocated on the callstack | 10:18 | ||
10:34
evalable6 left,
linkable6 left
10:37
evalable6 joined
10:44
Altai-man joined
|
|||
Altai-man | lizmat++ # bump | 10:49 | |
tellable6 | 2021-10-20T18:21:31Z #raku-dev <tbrowder> Altai-man you are very welcome | ||
10:49
Geth left
10:50
Geth joined
|
|||
nine | Nice to be able to close a segfault issue ticket with just a comment for a change :) (re #4520) | 10:59 | |
jnthnwrthngtn | grmbl, wonder how I've managed to make a segv... | 11:26 | |
lunch, bbiab | 11:41 | ||
12:02
reportable6 left
12:37
linkable6 joined
|
|||
jnthnwrthngtn back | 12:49 | ||
Guess I shouldn't feel too bad, my mistake involved one of the 3 hardest things in computer science... :P | 13:19 | ||
Altai-man | You don't mean an off-by-one right? :P | 13:21 | |
nine | jnthnwrthngtn: the comment in github.com/MoarVM/MoarVM/blob/mast...sp.c#L1155 only talks about runbytecode, but that flag is also set for runcfunc. Which one is wrong? | 13:23 | |
jnthnwrthngtn | No, cache invalidation | 13:27 | |
nine | const_i64 r11(0), liti64(1099511627775) # [014] unboxed literal to value 1099511627775 | 13:36 | |
sp_runnativecall r5(3), r9(0), liti64(140737352346240), r10(0), r11(0) | |||
That's just beautiful :) | |||
13:36
unicodable6 joined
13:37
linkable6 left
|
|||
jnthnwrthngtn | :D | 13:39 | |
13:40
linkable6 joined
|
|||
jnthnwrthngtn | nine: runcfunc in optimize.c does have code to free them | 13:40 | |
nine | so the comment should include runcfunc? | 13:41 | |
jnthnwrthngtn | nine: So I'd say the comment is wrong | ||
Yes | |||
I think it originally was only runbytecode and then it got tweaked | |||
nine | "Make sure we delay release of temporaries since optimization can add further ones." covers it well enough I'd say. It's clear from the case statements to which ops this applies to | 13:42 | |
jnthnwrthngtn | up | 13:44 | |
*yup | |||
OK, apart from fixing up OSR, moving env to the callstack for non-heap frames seems to work | |||
Another 1.5s off the full Rakudo build, around 1s of it from stage parse | 13:45 | ||
nine | LOL "MoarVM panic: Unknown disaptch op when resolving callsite" | 13:47 | |
jnthnwrthngtn | wat | ||
nine | Why can't I find the source of this message? Because I didn't copy and paste it into my ack command. I typed it fresh and didn't do the typo | ||
jnthnwrthngtn | Oh, I only just spotted it! | 13:48 | |
nine | Turns out, there are quite a few places one needs to add new dispatchy ops to | ||
Now why does it try to mark that int register like an object pointer in the GC? | 13:50 | ||
Easy: because sometimes just copying code without understanding it is not really enough. Asked for a temp register with the wrong kind in the UnboxInt translation | 13:53 | ||
Geth | MoarVM/new-disp-nativecall: 11 commits pushed by (Stefan Seifert)++ review: github.com/MoarVM/MoarVM/compare/e...1a37f36665 |
14:18 | |
nine | This push contains the first working version of sp_runnativecall | 14:19 | |
jnthnwrthngtn | Wow, including JITting? | 14:37 | |
Ah, I guess it's easily possible without that | 14:38 | ||
ah, I see :) | 14:39 | ||
Still very nice progress | |||
nine | Surprisingly this seems to cover all the native calls that occur during csv-ip5xs.pl | 14:46 | |
So in the good tradition of benchmark driven development, the next step will indeed be to get some JITing going | 14:47 | ||
jnthnwrthngtn | Does it show an improvement with this much done? | 14:54 | |
nine | I do think so | 15:02 | |
15:05
reportable6 joined
|
|||
nine | 0m14.187s before 0m13.902s after (best of 10 runs each) | 15:06 | |
Variability is high with results in a range of up to +2s, but values seem to be better on average as well. | 15:07 | ||
jnthnwrthngtn | I guess the JITting would be the big win at this point, since it'll eliminate all of the dyncall/libffi setup overhead | 15:08 | |
nine | In theory it could have even been worse, as sp_runnativecall causes JIT bails which sp_dispatch_o doesn't | ||
jnthnwrthngtn | Ah, yes, that also | 15:09 | |
Geth | MoarVM/cheaper-frames: 92f3bac575 | (Jonathan Worthington)++ | 5 files Allocate some frame environments on the callstack Only do this for frames that live on the callstack rather than on the heap. This is the case when we have a lexical environment but it is never captured or the frame doesn't escape in other ways. When frames do escape, we have to also move the environment out of the callstack and onto the heap. We already do go to some effort to allocate on the heap ... (5 more lines) |
15:11 | |
15:17
patrickb left
15:25
nebuchadnezzar left
|
|||
nine | Now that I think of it, there's actually no reason to insist on native functions to box their results. | 15:31 | |
jnthnwrthngtn | Not at all :) | 16:04 | |
Hm, I thought that splitting out the specialized vs. unspecialized forms of MVM_frame_dispatch would be a win, but it apparently is not one at all | 16:07 | ||
Or I did something wrong. | |||
Looking forward to the new nativecall JIT integration, so we can get rid of frame->args and frame->cur_args_callsite | 16:09 | ||
nine | Would be nice if I could make it a much more regular part of the JIT, too | 16:11 | |
jnthnwrthngtn | m: say 1.222 / 1.278 | 16:19 | |
camelia | 0.956182 | ||
jnthnwrthngtn | Seems the work/env move is another 4% off test-t | ||
Seems it's about that off everything that isn't a micro-benchark that ends up mostly inlined | 16:21 | ||
nine | Hm...to be able to avoid the unboxing, we'd have to start out with a natively typed dispatch instruction like dispatch_i instead of dispatch_o. But can't get it to emit that even when assigning the result of an --> int64 sub directly into an int64 variable | 16:23 | |
jnthnwrthngtn | No, we don't do the code-gen for that properly yet | 16:25 | |
I think setting a .returns on the QAST::Op call node would do it | 16:26 | ||
nine | Sounds like a bit of a yak | 16:31 | |
jnthnwrthngtn | Indeed | 16:41 | |
Probably not immediately worth it | |||
nine | I don't even find where the call node really gets created | 16:52 | |
jnthnwrthngtn | nine: Maybe the easiest place is in the optimizer, where if we know what we're calling, we can look at the .returns of the callee, and set that on the QAST::Op call | 17:00 | |
I don't think it can be done at the creation time of that node as subs can be post-declared, for example, so we don't try and resolve the sub at that point | |||
home time o/ | 17:06 | ||
17:08
Altai-man left
|
|||
nine | jnthnwrthngtn: apparently past you has already thought along the same lines and implemented just that: github.com/rakudo/rakudo/blob/mast....nqp#L3188 | 17:42 | |
A little over 10 years ago actually: github.com/rakudo/rakudo/commit/c0...810595f4e6 | 17:46 | ||
lizmat | wow | 17:54 | |
nine | Of course that raises the question of why we don't see dispatch_i used then. The answer is: we set the returns on the QAST::Want instead of the QAST::Op(:op<call>) node | 17:55 | |
lizmat | so that never worked ? | 17:56 | |
nine | It did. Till commit 3cc9d765b2b350c9d15d0164ed53a9914b333afb in 2012 | 17:59 | |
17:59
linkable6 left
|
|||
lizmat | well, that's before I really got involved, so "never" is pretty accurate to me then | 18:01 | |
18:02
reportable6 left
18:05
reportable6 joined
|
|||
jnthnwrthngtn | nine: Nice detective work :) | 18:10 | |
nine | Still quite the yak. Fixing that leads to "Unsupported register return kind for dispatch op" with a $res_kind of $MVM_reg_int32 | 18:12 | |
18:12
nebuchadnezzar joined
|
|||
nine | Which kinda makes sense. We need to extend smaller ints to the full width if we want to use native registers. So needs a coercion there | 18:14 | |
Though not even the old invocation code had such coercions. It simply used the primspec to decide on the instruction kind. | 18:21 | ||
20:01
evalable6 left
20:02
evalable6 joined
|
|||
jnthnwrthngtn | m: say 1.79 / 2.07 | 20:28 | |
camelia | 0.864734 | ||
jnthnwrthngtn | A recursive fib benchmark (we can't inline recursions, so it's a decent test of callframe setup/teradown) shows a nice improvement with work/env moved to the callstack | 20:29 | |
MasterDuke | nice | 21:24 | |
21:31
SmokeMachine left,
discord-raku-bot left,
leont left,
Nicholas left,
Nicholas joined,
discord-raku-bot joined,
SmokeMachine joined
21:32
leont joined
|
|||
timo | \o/ | 21:48 | |
21:58
discord-raku-bot left,
discord-raku-bot joined
22:02
linkable6 joined
22:04
kjp left
22:06
kjp joined
22:51
kjp left
22:52
kjp joined
23:02
Mondenkind is now known as moon-child
|
|||
japhb | Rebuilt rakudo just after the bump (so only the bump commit new on the rakudo side, but 31 commits farther on the MoarVM side). mandelbrot-pixels was yet faster, but still variable: | 23:46 | |
16 zooms of 62460 pixels each in 5.202 seconds = 192102 pixels/second | |||
16 zooms of 62460 pixels each in 5.505 seconds = 181546 pixels/second | |||
16 zooms of 62460 pixels each in 6.526 seconds = 153125 pixels/second | |||
16 zooms of 62460 pixels each in 5.891 seconds = 169635 pixels/second | |||
16 zooms of 62460 pixels each in 6.493 seconds = 153912 pixels/second | |||
Carefully watching it, it seems like the speed was uneven within a single run (not just overall slower or faster) -- some zooms were noticeably slower than others each run. | 23:47 | ||
(I suppose there's some of that to be expected just from the math, but it was enough to make me wonder.) | 23:48 | ||
That variability is *after* quiescing my machine. | |||
(I thought that might have been contributing last time -- despite it not having affected the 2021.09 runs much -- so I shut down most of my apps.) | 23:51 | ||
After running the test a bunch of times, I'm seeing that 15xxxx pixels/second is way more common than the faster variants, but they still do show up occasionally. | |||
Yeah, confirmed faster with attacks benchmark as well: | 23:54 | ||
Min: 73.0 ms (13.7 fps) - Ave: 190.9 ms (5.2 fps) - Max: 376.1 ms (2.7 fps) | |||
50%: 186.9 ms - 75%: 235.3 ms - 90%: 259.0 ms - 95%: 269.8 ms - 99%: 359.9 ms | |||
timo | might be interesting to extract out of a spesh log which frames get inlined into which other frames, and to see if that differs noticeably between slow and fast runs | 23:55 | |
japhb | Is there an easy way to do that already, or is it a SMOP? | 23:58 |