Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:27 MasterDuke left 01:02 Util_ left 02:16 frost joined
timo i'll donate the MVM_inline_cache_get_kind implementation 03:19
Geth MoarVM/new-disp: 45fb8e3eb5 | (Timo Paulssen)++ | 2 files
allow kind of inline cache entry to be determined outside of inline_cache.c
timo gist.github.com/timo/36c7b304a2917...f910bd395d 03:42
[timo@kritter nqp]$ grep -c -B1 "dispatch program" /tmp/testrun.speshlog.txt 03:43
[timo@kritter nqp]$ grep -B1 "dispatch program" /tmp/testrun.speshlog.txt | grep -c "Common DP"
this is a random nqp test file i grabbed: t/nqp/102-multidim.t
so out of 449 times a dispatch program appears for a monomorphic dispatch it'll be the one that's just "guard for the first arg being of a given type, and being concrete, then load that argument, then setting that argument as the result_o 03:44
not really all that helpful, just 1) stealable code to grab the corresponding MVMDispProgram for a given dispatch_* instruction in a spesh graph, and a really common and very short disp program that could be the first target to get a full disp program compiled to spesh ops and such 03:47
and of course the patch where i did that the last time there's other important stuff like splitting the version of incoming arguments and all that jazz 03:52
03:59 Util joined 05:37 frost left 06:04 reportable6 joined 06:38 frost joined 06:52 patrickb joined 06:53 squashable6 left 06:55 squashable6 joined 07:00 MasterDuke joined
Nicholas good *, #moarvm 07:41
07:45 frost left 09:10 Kaipi left 09:12 Kaiepi joined
jnthnwrthngtn moarning o/ 09:14
I see the night shift has delivered :)
Nicholas \o 09:15
09:22 linkable6 joined 09:23 linkable6 left 09:37 frost joined
MasterDuke this is odd. if i'm on my branch that correctly excludes DEPRECATED ops from being genned by update_ops.raku, and do in fact regen them, the nqp build breaks 09:38
with `moarop bindexcategory is not void`. it's not a problem with bindexcategory itself, but the interaction of `MAST::Ops.WHO<@offsets>` and `MAST::Ops.WHO<@values>` 09:41
the offset for that op is the same in both cases, but the value at the offset is different (the values list is shorter with deprecated ops excluded, not surprisingly) 09:42
jnthnwrthngtn Ah wow, `make test` builds with spesh and JIT in NQP, just not expr JIT, probably due to the sp_getarg_* impl needing updating 09:43
Nicholas "such wow | so JIT | faster" ? 09:45
MasterDuke does rakudo build (faster)?
jnthnwrthngtn Didn't try yet :) 09:46
Was just seeing if it's obvious (to me) how to update the expr JIT
It's not
MasterDuke: From 60.8s to 54.3s on my machine 09:52
Note it's doing no optimization around dispatch programs yet
MasterDuke nice
jnthnwrthngtn So no spesh linking, no inlining, etc. 09:53
Nice it makes it through CORE.setting though
Hm, some SEGVs in make test 09:54
Geth MoarVM/new-disp: 8471f1e008 | (Jonathan Worthington)++ | src/spesh/args.c
Fix overflow when processing long arg captures

This would not come up under the previous design, where anything that was called with a flattening callsite would refuse to spesh in the first place and other maximums protected this code. Now, however, we need to cope with the input capture being larger in size. Rather than refusing it with a size check, permit it when we have a slurpy positional.
jnthnwrthngtn Less SEGVs in make test
.oO( uh-oh, here comes the fewrer... )
lizmat :-) 10:21
jnthnwrthngtn Oh wow, make spectest on new-disp takes so long apparently 'cus a small number of tests take forever 10:30
lizmat and probably eat your memory as well :-)
jnthnwrthngtn Curiously not 10:31
One had eaten 1.5% of the my RAM, the other 0.3%
dogbert17 jnthnwrthngtn: try t/spec/S05-modifier/ignorecase.t
lizmat ah, just infinilooping? or did they finish in the end ? 10:32
Geth MoarVM/new-disp: 562249e6a1 | (Jonathan Worthington)++ | src/jit/core_templates.expr
Disable some expr JIT templates for now

They need updating for the new calling conventions.
MoarVM/new-disp: 0d5e398d93 | (Jonathan Worthington)++ | src/core/frame.c
Switch spesh back on

While running it with the various spesh stressing flags may turn up something, we don't (at least with the expression JIT temporarily limited) seem to fail anything more than with spesh turned off. Note that this doesn't do the integration of dispatch programs with spesh
  (and thus there's no spesh linking or inlining at present), so only a
small speedup is to be expected from this.
jnthnwrthngtn lizmat: Most of them did eventually finish, after some ran for 3 minutes I cancelled them
dogbert17: ooh, that's an interesting one indeed. 10:36
dogbert17 yeah, snails pace :)
10:37 evalable6 left
dogbert17 perf top shows a massive amount of time spent in MVM_disp_program_run 10:40
10:40 evalable6 joined
jnthnwrthngtn sigh, why does callgrind not give me any output 10:41
oh, because you can't run `valgrind --tool=callgrind ./rakudo-m ...` 'cus it does execve or whatever 10:42
Ah right, I never imposed an upper limit on the level of polymorphism at a given dispatch cache site. Of course going widly polymorphic may expose a bug 10:51
So putting in a limit is easy. What's a bit surprising is that my first guess at the limit causes a huge slowdown in CORE.setting compilation 10:54
Nicholas 42 wasn't the correct answer? 10:55
jnthnwrthngtn It might yet prove to be :P
OK, I think t/spec/MISC/bug-coverage.t is a real infilooper 11:10
Geth MoarVM/new-disp: 692fad885d | (Jonathan Worthington)++ | 2 files
Enforce an upper limit on inline cache size

This is a rather higher number than I expected to end up with, and warrants further investigation. With anything much smaller than this, we see a significant slowdown in CORE.setting compilation time.
jnthnwrthngtn m: sub (uint8 $x) { Nil until $x; }(2) 11:14
camelia (timeout)
jnthnwrthngtn Uh. So how does this test not hang on master if it times out there too? 11:15
m: sub (uint8 $x) { say $x; Nil until $x; }(2)
camelia (timeout)2
jnthnwrthngtn Test file wasn't touched for over a year 11:16
MasterDuke weird. it does complete ok even if you just run it manually (i.e., `./rakudo-m -I lib/ t/spec/MISC/bug-coverage.t`) 11:20
jnthnwrthngtn What if you run just the thing I evaluated?
MasterDuke hang 11:21
jnthnwrthngtn :S
If you look at the bytecode it compiles into it's obviously wrong
I just don't get how the test can possibly pass on master
MasterDuke perf says it's spending all its time in MVM_interp_run 11:22
jnthnwrthngtn Yeah, it code-gens a loop where it reads an uninitialized register
Maybe 'cus it fails to emit an instruction to coerce the unsinged value into a full-width one for the test or something 11:23
MasterDuke you've already looked at the MVM_dump_bytecode output?
it hangs for me with a signed int also 11:24
jnthnwrthngtn Yes
OK, well, not a new-disp specific bug, anyway
MasterDuke hangs on all unsigned (sized or not), succeeds on signed unsized or size 64 11:26
11:37 patrickb left, patrickb joined 11:39 patrickb left, patrickb joined 11:41 patrickb left, patrickb joined 11:43 patrickb left, patrickb joined 11:45 patrickb left, patrickb joined 11:49 patrickb left, patrickb joined 11:51 patrickb left 11:52 patrickb joined 11:53 patrickb left 11:54 patrickb joined 11:55 patrickb left 11:56 patrickb joined 11:57 patrickb left 11:58 patrickb joined, patrickb left 12:02 reportable6 left 12:39 Altai-man left, Altai-man joined 12:47 linkable6 joined
dogbert17 jnthnwrthngtn: are we, at this point, expecting MVM_SPESH_NODELAY=1 and MVM_SPESH_BLOCKING=1 to SEGV? 12:54
jnthnwrthngtn It might. 12:57
I didn't try it
dogbert17 Thread 1 "moar" received signal SIGSEGV, Segmentation fault. 12:58
jnthnwrthngtn But I'd not be surprised if there's more things that need a fixup
dogbert17 0x00007ffff7917ba2 in handle_bind_failure (failure_record=<optimized out>, tc=0x555555559e40) at src/core/callstack.c:382
382 ice->run_dispatch(tc, ice_ptr, ice, id, callsite, args, &(failure_record->flag),
jnthnwrthngtn Hmm
What if you disable JIT?
dogbert17 MVM_SPESH_NODELAY=1 MVM_SPESH_BLOCKING=1 ./rakudo-m -Ilib t/02-rakudo/12-proto-arity-count.t # SEGV for me
it still fails 12:59
Geth MoarVM/new-disp: 2d2f0183c8 | (Jonathan Worthington)++ | src/disp/syscall.c
Add a couple of lexical capture related syscalls

These will be used to eliminate a couple of Rakudo extops.
dogbert17 with MVM_JIT_DISABLE=1 that is 13:00
timo yeah, exciting! 13:02
rid us of these extops 13:03
dogbert17 perhaps something for our intrepid bughunters timo and nine :)
jnthnwrthngtn At least 3 more really have to go away 13:04
Although getting rid of the invocation protocol is...quite a bit of effort 13:05
timo the thing that also has fallback and call-me, right?
oh but also coercers and i guess junctions would also want to go in there?
can you imagine using dispatchers to make junction autothreading faster 13:06
jnthnwrthngtn I think I already did make it an iota less bad when it's a multi failover - or at least, we avoid one round of flatten/slurp along the way 13:07
MasterDuke timo: is there any way to use your appimages to run scripts on the filesystem?
jnthnwrthngtn timo: I'm glancing over your dispatch program to ops work; I note the problem about it deopting to the wrong place, but I think that's a case of marking it :predeoptonepoint instead of :deoptonepoint 13:10
(Which I'm trying to do now, but it blows up, though I can guess why...)
timo :+1: glad to see it be of use, even if just to skip over some tedious switch/case writing or something 13:13
13:15 leedo_ joined
jnthnwrthngtn Hm, I can't actually see why putting the deopt point before it blows up, actually. We don't even deopt yet. 13:17
13:18 discord-raku-bot left, leedo left
jnthnwrthngtn Ah, turns out at least partly 'cus code-gen wasn't aware of pre/post deopt indexes so shoved it in the wrong place 13:34
Geth MoarVM/new-disp: 48b6f4e564 | (Jonathan Worthington)++ | 6 files
Mark dispatch_* as pre-deopt instructions

That is, we should deopt to before them, not after them, since if we fail to meet the guards then we need to attempt the dispatch via the dispatch mechanism (maybe recording a new program) in order to put the dispatch into effect.
MasterDuke github.com/MoarVM/MoarVM/pull/1518 has been updated and works fine. now after a regen and nqp rebuild+rebootstrap, src/vm/moar/stage0/MASTOps.moarvm decreases in size from 1257384 to 1210008 13:44
m: say 1257384 - 1210008
camelia 47376
Geth MoarVM/new-disp: adfb5cd4b5 | (Timo Paulssen)++ (committed by Jonathan Worthington) | 3 files
Stash away spesh correlation ID earlier

So that it will always be the correct one.
MasterDuke i don't think github.com/MoarVM/MoarVM/pull/1518 will cause any new/more rebasing problems 13:58
timo m: say (1257384 - 1210008) / 1257384 14:10
camelia 0.037678
timo i'll take it
MasterDuke it'll be even greater after new-disp because it deprecates a ton more and only adds a couple 14:15
m: say 1257384 - 1177856; say (1257384 - 1177856) / 1257384 14:18
camelia 79528
timo cool 14:19
MasterDuke ^^^ i cherry-picked the commit into new-disp and re-ran it
nine jnthnwrthngtn: if that Nil until $x thing reads an uninitialized register, maybe it just so happens to find a value there that lets it continue? And what it finds there is influenced by what other code is run 14:22
MasterDuke any reason not to merge github.com/MoarVM/MoarVM/pull/1518 ? 14:29
timo nine: don't we zero-initialize a frame's work area before entering? 14:33
jnthnwrthngtn nine: Yes, I think that's what's going on
timo: Only pointer regs (.o and .s) 14:35
iirc, anyway
Oh, alternatively, maybe we do zero everything but we had a prior bit of code we generated poke a non-zero into that 14:36
It looks like a mis-compile somewhere in the QAST -> MAST though
timo well, we null-initialize .o and .s registers by copying vmnull in there 14:37
i'm sorry i still haven't put the recursion limit into spesh's inliner 14:45
14:48 frost left
Geth MoarVM: dyknon++ created pull request #1519:
Stream Decoder: Disallow incomplete code at EOF
timo i put the type of object in the program dump output. is it always MVMObject or is it sometimes an STable in the gc_constants? i assume it can be 15:55
Load collectable constant at index 2 (of type NQPRegex) into temporary 1 15:56
this is what it looks like
random peek into core setting compilation shows a good number of "guard argument type concrete, guard literal string argument, set value from collectible constant, return result from that" 16:02
jnthnwrthngtn I suspect that's a ton of sinks being turned into very little :) 16:03
timo in these cases it's mostly (or always?) an NQPRoutine that gets returned 16:04
maybe i should also output the name and cuuid or perhaps filename/line number of the routine there 16:05
this is calls from !reduce for example
could that be the surprisingly megamorphic one perhaps?
gist.github.com/timo/f283631677a91...1236b4dd35 - there we go 16:42
clearly this is the self."$name"() thing 16:43
if !reduce could get inlined into the regex it belongs to it'd make that dispatch go away immediately, cool.
do we call !reduce from inside the regex code?
if so that'd do it i expect 16:44
nine That segfault in t/02-rakudo/12-proto-arity-count.t is because we're dereferencing a NULL. The NULL comes from the disp_inline_cache (most entries are NULL). 16:46
Ok, so there is a dispatch inline cache per static frame. The desired cache slot is found via mapping the current position in the bytecode. This position will obviously be different in speshed code. 17:06
So we're gonna need sp_ versions of all ops that access this cache. These versions will take either the original bytecode position (found in an annotation) or a slot number as a literal.
MasterDuke how many ops is that? 17:07
nine Indeed, ops like sp_getlexstatic_o and sp_dispatch_o already exist which take byte bytecode position as argument. assertparamcheck seems missing
Oh, no, they actually take the slot number directly. There's an MVM_disp_inline_cache_get_spesh version for that 17:09
jnthnwrthngtn nine: That analysis makes sense, fwiw 17:11
It seems that MVM_spesh_manipulate_split_version is pretty busted
Or at least, it looks most likely to be the guilty party in why we're getting odd things going on with versions when translating dispatch programs into ops 17:12
Geth MoarVM/compile-basic-disp-program: aef5408a40 | (Jonathan Worthington)++ | 5 files
Start to translate dispatch programs in spesh

This gets us able to translate some dispatch programs with basic guards and producing a result which is a value. Thanks to timo++ for a first draft, which I borrowed some of the code here from.
jnthnwrthngtn I'm pretty sure I'm too tired to figure it out now, but I suspect it's much too naive 17:14
I think it should be walking successors up to a PHI 17:16
Instead it's trying to get away with visiting dominance children
timo ooooooh yeeaaahhhh 17:21
Geth MoarVM/new-disp: 381d78d8a6 | (Stefan Seifert)++ | 6 files
Fix segfaults caused by accessing wrong disp inline cache slot in speshed code

There is a dispatch inline cache per static frame. The desired cache slot is found via mapping the current position in the bytecode. This position will be different in speshed code. So we need a spesh version of assertparamcheck that already knows the correct cache slot to query.
nine This gets us further, but it still ends in a segfault. This time in build_cfg 17:30
timo i see a boatload of unbox dispatches are being compiled, that's cool 17:49
nine 9896086 looks a bit high for an ins literal 17:51
18:04 reportable6 joined
nine Huh...we're processing the MVMOpInfo of param_op_s, but according to the bytecode dump, there's no such instruction in the static frame... 18:06
timo 1017 aborted disp compile due to load constant obj or string 18:12
i'm assuming the programs that this would unlock have some other ops that would currently abort compilation tho 18:13
nine I think we're stumbling over a dispatch_o instruction with 353 arguments 18:15
timo gist.github.com/timo/94003db670409...f87024a70f 18:19
oh? that sounds like .. a lot :D
Geth MoarVM/new-disp: 02361112b8 | (Stefan Seifert)++ | src/spesh/graph.c
Fix segfault caused by int overflow when builindg cfg in spesh

dispatch_* instructions have a variable number of arguments that can get quite large. In the case of a call with 353 arguments, we overflowed arg_size causing us to jump to a weird location in the bytecode when processing. Use a more appropriate int size.
timo well spotted 18:20
nine This gets us through t/02-rakudo/12-proto-arity-count.t with MVM_SPESH_NODELAY=1 and MVM_SPESH_BLOCKING=1 :)
timo i'm kind of a tiny bit annoyed we don't have anything in place that automatically generates a "give me the string name for a given enum value" function per enum
nine dogbert17: done :D
19:04 linkable6 left, evalable6 left, evalable6 joined 19:07 linkable6 joined
timo put in "load object or string to temporary" instruction, including creating new temporaries if necessary 19:22
why do i get so many segfaults :) :) 19:31
326 assert(deopt_idx < 0 || seen_deopt_idx); 19:32
this asplodes
(wasn't segfault, was abort)
i cherry-picked the last two commits from new-disp, i was thinking that should fix all possible things! 19:33
lizmat so what is the state of the GC in MoarVM? It's still stop the world right? And one thread doing the GC, no ? 19:41
timo every thread can participate, as long as they aren't blocked 19:42
but yes, everything stops before gc starts and everything resumes once all gc is done 19:43
lizmat and we don't have a way of saying: forget about GC for this run of a program, right? 19:45
timo not really, no. there's a flag that can be set internally to moarvm that just allocates everything in the gen2, that leads to rather massive memory increase depending on your workload, though 19:51
lizmat would that make sense for oneliners ?
or is it a MoarVM compile time flag ? 19:52
timo it's run-time, but we don't have anything exposed to set it yet 19:55
what were you thinking of? a flag to rakudo's commandline?
lizmat yeah, would be nice to experiment with, see if it reduces runtime in some situations? 20:11
timo the trivial way to set that up crashes, unfortunately 20:17
[Coke] everyone++ .for new-disp work! 20:25
lizmat indeed! everyone++ 20:26
nine lizmat: remember that even one liners run the full blown compiler to compile that line. And that is gonna eat quite some memory. 20:36
lizmat ah, yes... hmmm
timo how often do we run the gc during a short one-liner? 20:37
nine The GC is also pretty fast. And we can re-use the memory it reclaims, so we don't have to ask the operating system for more all the time
timo just have one huge ring buffer and assume when we've re-reached the beginning the data there is probably not in use any more
nine Getting definitive number would be intersting though :)
The gen2 default flag cannot be active all the time. There are parts of MoarVM which don't work when that's active 20:38
For just getting numbers one could increase the nurseey size until no more GC happens 20:39
20:57 Kaiepi left 21:00 Kaiepi joined
timo has anybody tried the one liner in question and looked where the allocations come from? i would assume from just .lc on the big list of strings 21:03
21:12 Kaiepi left
japhb nine: Our GC is decent, but for real-time code it has a couple of difficult behaviors: variability in duration (especially from the mix of no GC versus nursery GC versus full GC), uncontrolled entry (which can be fudged a bit by requesting GC start at a known point), and non-trivial minimum length (which can eat up a bunch of time in the RT loop whenever it triggers). 21:13
timo with a gen2 of possibly unbounded size, not much we can do unless we get incremental gc to work, and then memory size may grow infinitely as a trade-off 21:20
japhb timo: There are a fair number of cases that I can give a bound on memory used per second; it might be nice to be able to tune something like an incremental GC for that. 21:32
timo: For example, in real-time gaming, I might be able to negotiate the rate and size of world updates with the server (this isn't a theoretical; several engines do this so that lower-bandwidth players can play at all). 21:33
timo i guess that's fair. just have to make sure the gc can keep up with the rate of new garbage appearing 21:36
i guess "realtime" isn't something you'd just slap onto something and expect to be done with it 21:37
i don't think we're doing any compensation for not-full nurseries when GC is triggered either by a different thread or by forcing GC directly 21:41
21:41 Kaiepi joined
timo that can lead to a thread that slowly fills its nursery to put stuff in the old generation too early 21:42
[Coke] how to force rakudo build of new-disp to use latest moarvm/nqp? just --gen-nqp=new-disp says "yes, you have that." 21:45
timo i recommend cd-ing into the folders and just git pull-ing 21:48
japhb timo: Yeah, that's part of why (for example) requesting a GC after processing each world update is suboptimal -- because it's going to cause other threads to promote more than they should, which means worse full GC. 21:52
[Coke] timo: ... yah, that doesn't work. have to also kill install/ ? Eff it, will just nuke it and start over. :| 21:53
timo oh, i see 21:59
well, i always cd in and "make install" anyway
[Coke] in the subdirs? oh 22:00
timo japhb: since we have per-thread nurseries, i could imagine an API to get nursery status would be like "i give you a native num array and you fill it with the percentages", but nurseries can be different sizes, we create further threads with smaller nurseries, so perhaps an array of ints with filled, size, filled, size, filled, size
japhb What causes a nursery to grow, and how much does it grow? And can you specify a minimum size for new thread nurseries? 22:02
I guess there's also the problem that the special threads are going to grow differently than the general threads 22:03
So you'd need to know thread type as well (affinity, general queue, spesh, etc.) 22:04
timo there's a starting thread size and a normal size. main thread starts with normal size, other threads start with smaller size and the first time they are in a gc run they use the full size
japhb Oh interesting.
timo 3:#define MVM_NURSERY_SIZE 4194304 22:05
8: * does not grow. If MVM_NURSERY_SIZE is smaller than this value (as is
10:#define MVM_NURSERY_THREAD_START 131072
japhb But is there any detection of GC thrash that would cause the "normal size" to grow at all?
Ah, I see the "does not grow" bit, hmmm.
timo oh 22:06
i was wrong
the size doubles if the thread was the one to initiate gc
japhb 128 kB or 4 MB, hmmm
Oh! Can that repeat indefinitely?
timo no, only up to 4 MB
japhb Ah, I get it. 4 MB for first thread, 128 kB for additional threads, which can double up to 5 times and then stops when it hits the 4 MB cap. 22:07
timo that's it 22:08
japhb Why is MVM_NURSERY_SIZE a #define and not a tuneable? Just NYI? 22:09
timo i believe 4 MB was chosen to be the size of typical L3 caches
japhb Oh that's interesting. (And I think no longer accurate, since I was just reading that Apple's ARM chips have a crapton of LLC.) 22:10
timo right, they just put a whole boatload of fast RAM extremely close to their CPI
[Coke] do we have a list of current failures/issues? t/02-rakudo/18-pseudostash.t on mac failed 3/3 22:29
timo i'm not aware of such a list 22:30
22:41 evalable6 left, linkable6 left
MasterDuke there's a rakudo issue for flappers 23:26
heh, my ryzen 3 3700x has 2x16mb L3 cache 23:27
23:30 squashable6 left 23:31 squashable6 joined