Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:02 reportable6 left 00:04 tellable6 joined, squashable6 joined, bisectable6 joined 01:03 statisfiable6 joined, releasable6 joined, reportable6 joined 01:04 sourceable6 joined 02:05 evalable6 joined 03:05 bloatable6 joined 04:51 MasterDuke left 05:20 frost joined 06:03 reportable6 left 07:03 evalable6 left, notable6 left, statisfiable6 left, unicodable6 left, releasable6 left, quotable6 left, greppable6 left, bloatable6 left, linkable6 left, sourceable6 left, committable6 left, tellable6 left, coverable6 left, nativecallable6 left, shareable6 left, benchable6 left, squashable6 left, bisectable6 left, linkable6 joined 07:04 tellable6 joined, shareable6 joined, reportable6 joined, Kaiepi left, unicodable6 joined, notable6 joined, coverable6 joined, evalable6 joined 07:06 committable6 joined, statisfiable6 joined 07:24 MasterDuke joined 08:04 benchable6 joined, bisectable6 joined 08:05 quotable6 joined 08:41 patrickb joined 08:51 patrickb left
Nicholas good *, #moarvm 08:53
new-disp can't build the Rakudo setting with MVM_SPESH_NODELAY=1
without that (but with everythign else) it can
jnthnwrthngtn Hm, I thought it could pre-vacation... 09:02
Though I may mis-remember 09:03
09:03 greppable6 joined, nativecallable6 joined
Nicholas I *thought* that it could too. But at times I've manged to get my checkouts all muddled 09:03
09:04 squashable6 joined
MasterDuke how multi-threading safe are the metamethods supposed to be in general? github.com/rakudo/rakudo/pull/4501 is a fix for parsing a grammar in multiple threads, which does seem like something that could reasonably happen 09:20
but what about github.com/rakudo/rakudo/blob/mast...nqp#L9-L24 (just as a random example)? 09:21
or github.com/rakudo/rakudo/blob/mast...nCache.nqp 09:24
jnthnwrthngtn There's a general expection that they should be used from a single thread during construction, and effectively immutable after compose. Should a meta-object want to have mutable state post-compose then it should take care of threading concerns. 09:28
oops. general *expectation* 09:29
So add_attribute clearly is construction time (pre-compose), so it's user error to call it in parallel. 09:30
The concretization cache I've no idea about; it claims it's used only at compile time (which I guess would mean pre-compose too), but if it's causing bother when parsing grammars from multiple threads, that's clearly not quite the whole story.
MasterDuke ok, i think that makes sense to me 09:31
jnthnwrthngtn I don't quite understand what it's achieving, tbh.
MasterDuke ConcretizationCache.nqp wasn't the problem, that was just another (somewhat) random example
jnthnwrthngtn ah, k
*ok 09:32
MasterDuke the PR is a further fix for MVM_oops in github.com/Raku/roast/blob/master/...#L151-L168 09:34
follow on to github.com/rakudo/rakudo/pull/4496
dogbert11 as a foolow up to what Nicholas wrote, there's a bug which leads to compilations errors when MVM_SPESH_NODELAY is enabled 09:49
arghh, crap spelling :( 09:50
===SORRY!=== Error while compiling /home/dogbert/repos/rakudo/t/spec/S04-statements/with.t 09:52
is default on shaped Scalar not yet implemented. Sorry.
the above is the error and it is 100 percent reproducible, just set nursery to e.g. 12k and use MVM_SPESH_NODELAY
jnthnwrthngtn Hm, this is curious. --profile-compile on CORE.c.setting seems to record stuff, but then fails to spit out the SQL file at the end (and doens't mention that it's writing profiling data either) 10:56
MasterDuke yeah, it's been doing that for a while 11:05
i successfully did a --profile-compile of CORE.c on 2020-12-12, but i don't know when after that it broke 11:06
CORE.e succeeds 11:08
47mb sql file
12:02 reportable6 left 12:03 TempIRCLogger left, TempIRCLogger joined
jnthnwrthngtn Seems it's writing the SQL if I remove --target and --output 12:21
Now I just have to hope it finishes without running out of memory...
MasterDuke how much do you have? 12:25
jnthnwrthngtn 64GB. It finished fine in the end 12:33
12:33 jgaz joined 12:35 jgaz left 12:36 jgaz joined
jnthnwrthngtn Then had to tell Comma it could use more memory in order to get it loaded into the profiler UI there :) 12:37
12:37 jgaz left 12:41 JimmyZ joined
JimmyZ github.com/MoarVM/MoarVM/blob/mast....h#L25-L34 # two MVM_VECTOR_ELEMS definded here 12:43
defined
MasterDuke interesting, i'll have to try removing those two flags, though i do only have 32gb 12:47
find anything interesting in the profile?
13:04 moon-child left 13:12 reportable6 joined
jnthnwrthngtn Was looking to see if dispatchers show up high in profiling 13:12
13:12 sourceable6 joined, bloatable6 joined
jnthnwrthngtn The answer to that seems to be "no" 13:12
It's a bit hard to spot them, but I know what files they are in, and looking for things in those files shows that almost no time is spent there.
13:12 moon-child joined
jnthnwrthngtn And it's a very similar story for the Raku ones; they're all in BOOTSTRAP, very little time is spent in there 13:12
13:24 jnthnwrthngtn left, rypervenche left, Util_ left, cognominal_ left, bloatable6 left, sourceable6 left, reportable6 left, JimmyZ left, TempIRCLogger left, squashable6 left, nativecallable6 left, greppable6 left, quotable6 left, bisectable6 left, benchable6 left, MasterDuke left, statisfiable6 left, committable6 left, evalable6 left, coverable6 left, notable6 left, unicodable6 left, shareable6 left, tellable6 left, linkable6 left, frost left, camelia left, moon-child left, japhb left, Nicholas left, JRaspass left, lizmat left, bartolin_ left, raydiak left, Altai-man left, dogbert11 left, kjp left, jdv left, vrurg left, leont left, nine left, Voldenet left, gfldex left, rba left, [Coke] left, AlexDaniel left, leedo left, timo left, discord-raku-bot left, nebuchadnezzar left, Geth left, harrow left, tbrowder left, psydroid left, ugexe left, samcv left 13:28 moon-child joined, bloatable6 joined, sourceable6 joined, reportable6 joined, JimmyZ joined, TempIRCLogger joined, squashable6 joined, nativecallable6 joined, greppable6 joined, quotable6 joined, bisectable6 joined, benchable6 joined, MasterDuke joined, statisfiable6 joined, committable6 joined, evalable6 joined, coverable6 joined, notable6 joined, unicodable6 joined, shareable6 joined, tellable6 joined, linkable6 joined, frost joined, japhb joined, camelia joined, Nicholas joined, JRaspass joined, bartolin_ joined, raydiak joined, Altai-man joined, dogbert11 joined, leont joined, lizmat joined, kjp joined, jnthnwrthngtn joined, timo joined, jdv joined, nine joined, discord-raku-bot joined, psydroid joined, AlexDaniel joined, nebuchadnezzar joined, vrurg joined, ugexe joined, gfldex joined, rba joined, Voldenet joined, [Coke] joined, leedo joined, Geth joined, tbrowder joined, harrow joined, rypervenche joined, Util_ joined, cognominal_ joined, samcv joined
jnthnwrthngtn On the upside, I spotted some LHF places to reduce boxing allocations 13:32
And that's 7 million or so GC allocations less when compiling CORE.setting 13:33
m: say 1819 - 1721
camelia 98
jnthnwrthngtn Nearly 100 less GC runs. Too bad most GC runs are fast so I don't see much wallcock speedup :)
nine Shall I introduce some O(n³) to the GC? I'm sure I can find something... 13:34
MasterDuke that's a misspelling i've never seen before... 13:36
jnthnwrthngtn I had to read what I wrote like 4 times to see it :P 13:37
MasterDuke i wonder if ram prices have come down any 13:41
huh, i thought i tried using list_s in sorted_keys and something broke. maybe it was some other sorting routine 13:44
13:45 JimmyZ left
MasterDuke and nope, the same 32gb i bought for $160 almost exactly two years ago is now $200 13:46
nine Sell! Sell! Sell! 13:47
jnthnwrthngtn Curiously, among the new-disp changes, I've made stage mbc be 0.88s rather than 1.4s in master :) 13:49
Alas, that's the only state that's better 13:50
*stage
44.4s parse on new-disp, 37.0s on master
9.37s mast on new-disp, 7.93s on master 13:51
13:52 Guest7810 joined
MasterDuke nine: have you looked at github.com/rakudo/rakudo/pull/4501 ? 13:53
Altai-man that's probably very obvious, but I recall e.g. inlining / spesh not working at its full power, is it already back?
nine MasterDuke: the patch looks good. But as the discussion shows its hard to find the right layer for putting in concurrency safeguards. So I'm a bit reluctant to give judgement... 13:55
jnthnwrthngtn Altai-man: In theory, for NQP code - which is what dominates - it's back. 13:56
Altai-man: For Raku code not, but that isn't what's running at compile time of the setting 13:57
OK, this is weird, CORE.setting profiling the compile on master gets killed
as OOM
And really does use much more than on new-disp 13:58
That's a bit odd. I can't guess why.
MasterDuke yeah. i feel a little bit better because i'd started to do something similar to the latest commit, but then switched to just the clone->modify->replace when that fixed the MVM_oops'es and was a smaller change
jnthnwrthngtn So I can't easily get that kind of profile. I can get a callgrind of master for comparison purposes, which will nicely run while I have a short meeting :) 13:59
MasterDuke jnthnwrthngtn: i'm running profile on master now, but had to create a 64gb swapfile, which is almost completely filled (and it's been running for 7 min now) 14:00
i was just recently reading about how good zram is on linux nowadays, i may give that a try next instead of the swapfile 14:01
14:13 Guest7810 left 14:14 Guest6661 joined
jnthnwrthngtn Back 14:39
MasterDuke 2.7gb sql file and still writing
jnthnwrthngtn I think there's a significant inlining difference after all. On master I see 95 million calls to MVM_frame_invoke. On new-disp, 111 million to MVM_frame_dispatch. 14:40
nine That's about the same ratio as the times for stage parse 14:42
14:46 frost left
jnthnwrthngtn Curiously, MVM_SPESH_INLINE_LOG=1 doesn't look unusual. 14:48
(It's doing the inlines I'd expect)
nine So, do we have other optimizations besides inlining that reduce the number of executed calls? 14:50
jnthnwrthngtn Don't think so 14:55
MasterDuke finally finished with a 3.1gb sql file 14:57
dogbert11 dumb question warning: should MVM_spesh_osr_poll_for_result be called even if MVM_SPESH_DISABLE=1 ? 15:01
nine dogbert11: MVM_spesh_osr_poll_for_result checks whether spesh (including osr) is enabled, so calling it is ok. After all the osrpoint ops will still be in the bytecode. 15:04
dogbert11 nine: thx, I became confused when I saw two spesh related functions among the top ten when doing a callgrind_annotate on a program where spesh was disabled 15:08
jnthnwrthngtn Comparing callgrind further: we make a lot less inlining attempts, but we also end up with a lot less calls to optimize_runbytecode, which is the successor to optimize_call, and so perhaps we're missing translating more dispatch programs than I think
MasterDuke doh. tried to read the sql in sqlite3 and got `Error: near line 180950: out of memory` 15:14
which is only 7 lines from the end of the file 15:15
jnthnwrthngtn ugh 15:24
I found one easy NYI that was causing quite a few missed dispatch program translations: when we were monomorphic in a given specialization but there was a polymorphic inline cache entry, we didn't translate the dispatch program 15:27
Geth MoarVM/new-disp: 0a3e14cc29 | (Jonathan Worthington)++ | src/spesh/disp.c
Handle missed monomorphism in specializations

We may have a polymorphic callsite according to the inline cache, but also have statistics indicating that, for the specialization we are producing, only one of the dispatch programs tends to be hit. In this case, translate that dispatch program.
15:31
jnthnwrthngtn Hm, a more general restructuring is warranted in this area, in fact. :) 15:41
nine dogbert11: I think you were right. That spesh issue seems to be new. It doesn't occur at the commit before my hllize dispatcher work. 15:52
Question now is: is it the hllize dispatcher that's wrong or does it just uncover some spesh oddity? 15:53
dogbert11 nine: what does your gut tell you? 15:58
16:04 releasable6 joined
nine Nine's debug rule #1: the bug is always in your own code 16:08
Even weirder: the original exception that happens is: Malformed UTF-8 near bytes 20 4c 8b at line 1 col 53 16:17
Geth MoarVM/new-disp: a4bbc1b5d6 | (Jonathan Worthington)++ | src/spesh/disp.c
Use inline cache entry to drive optimization

If we see that the inline cache entry indicates a callsite we cannot translate, there's no point doing further analysis. Similarly, if it is monomorphic then we don't need to consider the chosen dispatch program statistics; just translate it right off. (This also gets us better outcomes in some cases where we miss statistics for whatever reason.) We thus only look at the logged outcomes when we have a polymorphic inline cache entry and might be able to use the stats to turn it monomorphic.
nine And the Malformed UTF-8 is the debug name of a class. Because the class_handle in the getattr is totally bogus 16:21
Since the speshed version of the broken frame doesn't have a getattr_o op anymore, I assume we deopted 16:25
16:25 dogbert11 left
nine Oh, the "object" getattr_o is trying to get the STable from is already an STable 16:27
Taking a bit of a leap here: maybe my hllization dispatcher needs to guard on concreteness as well?
Looks like it already does: nqp::dispatch('boot-syscall', 'dispatcher-guard-concreteness', $arg); 16:28
But, the lang_hllize dispatcher doesn't! 16:31
And adding the guard there as well seems to fix the test case
jnthnwrthngtn nine: If you use dispatcher-track-attr, it automatically adds guards on type and concreteness 16:34
To ensure that such can't be forgotten
nine jnthnwrthngtn: the missing guard is in lang_hllize, i.e. in MoarVM. The getattr thing is just a victim of a wrong hllize result. 16:36
jnthnwrthngtn ah 16:38
So with the above commits I certainly see an improvement 16:39
Although still not enough
nine The problem with taking such leaps (despite the success) is....I don't really know what to write in the commit message :D I actually don't know how the missing guard led to the STable getting where an object should be. 16:41
Geth MoarVM/new-disp: 4b5a965c95 | (Stefan Seifert)++ | src/disp/boot.c
Guard for concreteness of lang_hllize arg

Otherwise we may end up with an STable where we expect an MVMObject
16:44
nine In a pinch...write what you know
16:48 dogbert17 joined
nine What the?! I tested the settings build with MVM_SPESH_BLOCKING and MVM_SPESH_NODELAY and even with a smaller nursery before committing. But now the build fails with "Cannot call trait_mod:<is>; no signatures match" 16:48
And it's clearly the added guard. How can that be?
dogbert17 nine: FWIW, your fix removed one of the problems I had, the other one is still present 16:50
i.e. MVM_SPESH_NODELAY=1 ./rakudo-m -Ilib t/spec/S32-io/utf16.t tends to hang 16:51
Geth MoarVM/new-disp: ff8a2d8e7c | (Stefan Seifert)++ | src/disp/boot.c
Revert "Guard for concreteness of lang_hllize arg"

This reverts commit 4b5a965c95054e28ffb6c5fc1b7b1707c69b41e1.
16:55
dogbert17 oops 16:56
nine I think the change had a GC issue anyway
Nicholas "we'll fix it in post", er, on a rebase... 17:01
jnthnwrthngtn Hm, 5 million less frames created according to callgrind 17:14
Down from ~112 million to ~107 million. master still at 95 million. 17:15
Quite a lot more optimize_runbytecode calls, though 17:18
Nicholas dogbert17: ah yes, that was one that hangs for me like that too 17:19
17:43 dogbert17 left, dogbert17 joined 17:53 dogbert17 left 18:01 dogbert17 joined 18:02 reportable6 left 18:03 reportable6 joined
MasterDuke wow, i just did a profile-compile of core.c on new-disp and it's only 1gb (compared to master's 3.1gb) 18:57
timo wow, how is that 19:01
MasterDuke arg. but still `Error: near line 59273: out of memory`, only 5 lines from the end this time
timo ahahaha
sqlite says that?
MasterDuke yeah 19:02
if i do a .read of the file
timo you can possibly just toss out the garbage collector runs
they aren't quite as important as the rest
and you can also almost read them just from the sql
MasterDuke huh, i just did `grep -v 'INSERT INTO gcs' >new_prof` and sqlite3 didn't like .reading new_prof. lots of `Error: near line 2997: UNIQUE constraint failed: calls.id` 19:06
timo oh, hum
are they spread over multiple lines perhaps?
MasterDuke guess so
timo but that could be more a syntax error 19:07
MasterDuke btw, i wonder if it would be better to wrap each individual section in BEGIN/END, rather than the entire file
oh, and weird, they aren't spread over multiple lines 19:11
the only lines that don't have 'INSERT INTO' are the BEGIN/END and the 'CREATE TABLE' lines 19:12
timo oh did you make sure to restart the sqlite shell in between .read? 19:13
MasterDuke oh, but the deallocations table has a `FOREIGN KEY(gc_seq_num, gc_thread_id) REFERENCES gcs(sequence_num, thread_id)`
timo oh, ok, that's not helpful then 19:14
MasterDuke nothing references the deallocations table, so i can just exclude those too
timo right 19:15
MasterDuke no, same error...
doh, nm. that works 19:17
Error: near line 59271: out of memory 19:20
well, it did populate a bunch of stuff 19:21
hm, not sure i trust what's there 19:24
i don't believe that github.com/rakudo/rakudo/blob/new-...t.pm6#L116 is the top by exclusive time 19:25
lizmat for core setting?
I would be surprised that it is even used in the core setting? 19:26
MasterDuke yeah, maybe those last three lines i'm missing are important 19:28
huh, sqlite doesn't appear to be filling my memory. it must be some internal thing? 19:34
timo probably using shared memory with a five on disk or perhaps in /tmp 19:37
and it forces it to be resident in memory maybe? 19:38
19:42 dogbert17 left, dogbert11 joined 19:46 Guest6661 left 19:48 TempIRCLogger left, TempIRCLogger joined
MasterDuke huh. i stuck a `END;BEGIN;` in the middle of the file and it finished reading it 19:52
but it still give wacky results
timo :o 19:53
MasterDuke maybe we should have it write `END;BEGIN;` every 10k lines or so 19:58
20:06 dogbert17 joined 20:09 dogbert11 left 20:10 dogbert17 left 20:13 dogbert17 joined
jnthnwrthngtn fwiw, I shoved the sql output from new-disp CORE.setting into Comma's profile viewer and the data looked sane, so I think the SQL itself is alright 20:49
MasterDuke huh 20:50
`select case when r.name = "" then "<anon>" else r.name end as name, r.file, r.line, sum(entries) as entries, sum(case when rec_depth = 0 then inclusive_time else 0 end) as inclusive_time, sum(exclusive_time) as exclusive_time from calls c, routines r where c.id = r.id group by c.id order by exclusive_time desc limit 30;` is what i was running
20:52 dogbert17 left
MasterDuke i'm not sure i've used comma's profile viewer, how do i do that? 20:54
timo i'm not sure if it lets you import an existing sql file or only lets you record stuff anew 20:57
oh lol
MasterDuke heh, maybe if i just copy it to '/tmp/comma-profiler.sql' 20:58
no 20:59
21:00 dogbert17 joined
timo probably needs exact timing 21:01
jnthnwrthngtn timo: No released version lets you do that, it's a new feature we're adding for the next release :) 21:08
Which I needed to do some testing of today anyway.
MasterDuke ah, nice
timo :+1: 21:09
jnthnwrthngtn Seems much of the opt shortcoming that remains is because we're ending up with missing typle tuples and static frame info in the spesh stats for some calls 21:27
I'm really not sure why
timo anything related to OSR? which is what i saw, but that was in raku code, not nqp code inside rakudo 21:28
jnthnwrthngtn No, I see lots of non-OSR cases 21:29
Weird
afk for a bit
timo i've always wanted a little thermal printer, i should hook that up to spesh logs 21:30
(not the dumps, the actual logs that threads sumbit to the spesh thread)
22:39 dogbert11 joined 22:43 dogbert17 left
Geth MoarVM/new-disp: 725cad0bbc | (Jonathan Worthington)++ | 4 files
Reinstate building arg tuples from facts

This can give us some further opportunities for specialization linking and inlining.
23:13
MoarVM/new-disp: cd30dcbc6d | (Jonathan Worthington)++ | 3 files
Stub in callstack record for calls set up in C

For the case where we will pass them arguments. Today we steal the arguments buffer of the current frame, however that is going away, so we'll need another way to store the arguments being passed and keep them marked.
MoarVM/new-disp: 7b09353220 | (Jonathan Worthington)++ | src/core/interp.c
Use local for bytecode offset calculation

Rather than going through a level of indirection.
23:27
jnthnwrthngtn Still haven't got to the bottom of why it seems to miss so many things 23:28
Enough for today, though. 23:29