Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:02
reportable6 left
00:04
tellable6 joined,
squashable6 joined,
bisectable6 joined
01:03
statisfiable6 joined,
releasable6 joined,
reportable6 joined
01:04
sourceable6 joined
02:05
evalable6 joined
03:05
bloatable6 joined
04:51
MasterDuke left
05:20
frost joined
06:03
reportable6 left
07:03
evalable6 left,
notable6 left,
statisfiable6 left,
unicodable6 left,
releasable6 left,
quotable6 left,
greppable6 left,
bloatable6 left,
linkable6 left,
sourceable6 left,
committable6 left,
tellable6 left,
coverable6 left,
nativecallable6 left,
shareable6 left,
benchable6 left,
squashable6 left,
bisectable6 left,
linkable6 joined
07:04
tellable6 joined,
shareable6 joined,
reportable6 joined,
Kaiepi left,
unicodable6 joined,
notable6 joined,
coverable6 joined,
evalable6 joined
07:06
committable6 joined,
statisfiable6 joined
07:24
MasterDuke joined
08:04
benchable6 joined,
bisectable6 joined
08:05
quotable6 joined
08:41
patrickb joined
08:51
patrickb left
|
|||
Nicholas | good *, #moarvm | 08:53 | |
new-disp can't build the Rakudo setting with MVM_SPESH_NODELAY=1 | |||
without that (but with everythign else) it can | |||
jnthnwrthngtn | Hm, I thought it could pre-vacation... | 09:02 | |
Though I may mis-remember | 09:03 | ||
09:03
greppable6 joined,
nativecallable6 joined
|
|||
Nicholas | I *thought* that it could too. But at times I've manged to get my checkouts all muddled | 09:03 | |
09:04
squashable6 joined
|
|||
MasterDuke | how multi-threading safe are the metamethods supposed to be in general? github.com/rakudo/rakudo/pull/4501 is a fix for parsing a grammar in multiple threads, which does seem like something that could reasonably happen | 09:20 | |
but what about github.com/rakudo/rakudo/blob/mast...nqp#L9-L24 (just as a random example)? | 09:21 | ||
or github.com/rakudo/rakudo/blob/mast...nCache.nqp | 09:24 | ||
jnthnwrthngtn | There's a general expection that they should be used from a single thread during construction, and effectively immutable after compose. Should a meta-object want to have mutable state post-compose then it should take care of threading concerns. | 09:28 | |
oops. general *expectation* | 09:29 | ||
So add_attribute clearly is construction time (pre-compose), so it's user error to call it in parallel. | 09:30 | ||
The concretization cache I've no idea about; it claims it's used only at compile time (which I guess would mean pre-compose too), but if it's causing bother when parsing grammars from multiple threads, that's clearly not quite the whole story. | |||
MasterDuke | ok, i think that makes sense to me | 09:31 | |
jnthnwrthngtn | I don't quite understand what it's achieving, tbh. | ||
MasterDuke | ConcretizationCache.nqp wasn't the problem, that was just another (somewhat) random example | ||
jnthnwrthngtn | ah, k | ||
*ok | 09:32 | ||
MasterDuke | the PR is a further fix for MVM_oops in github.com/Raku/roast/blob/master/...#L151-L168 | 09:34 | |
follow on to github.com/rakudo/rakudo/pull/4496 | |||
dogbert11 | as a foolow up to what Nicholas wrote, there's a bug which leads to compilations errors when MVM_SPESH_NODELAY is enabled | 09:49 | |
arghh, crap spelling :( | 09:50 | ||
===SORRY!=== Error while compiling /home/dogbert/repos/rakudo/t/spec/S04-statements/with.t | 09:52 | ||
is default on shaped Scalar not yet implemented. Sorry. | |||
the above is the error and it is 100 percent reproducible, just set nursery to e.g. 12k and use MVM_SPESH_NODELAY | |||
jnthnwrthngtn | Hm, this is curious. --profile-compile on CORE.c.setting seems to record stuff, but then fails to spit out the SQL file at the end (and doens't mention that it's writing profiling data either) | 10:56 | |
MasterDuke | yeah, it's been doing that for a while | 11:05 | |
i successfully did a --profile-compile of CORE.c on 2020-12-12, but i don't know when after that it broke | 11:06 | ||
CORE.e succeeds | 11:08 | ||
47mb sql file | |||
12:02
reportable6 left
12:03
TempIRCLogger left,
TempIRCLogger joined
|
|||
jnthnwrthngtn | Seems it's writing the SQL if I remove --target and --output | 12:21 | |
Now I just have to hope it finishes without running out of memory... | |||
MasterDuke | how much do you have? | 12:25 | |
jnthnwrthngtn | 64GB. It finished fine in the end | 12:33 | |
12:33
jgaz joined
12:35
jgaz left
12:36
jgaz joined
|
|||
jnthnwrthngtn | Then had to tell Comma it could use more memory in order to get it loaded into the profiler UI there :) | 12:37 | |
12:37
jgaz left
12:41
JimmyZ joined
|
|||
JimmyZ | github.com/MoarVM/MoarVM/blob/mast....h#L25-L34 # two MVM_VECTOR_ELEMS definded here | 12:43 | |
defined | |||
MasterDuke | interesting, i'll have to try removing those two flags, though i do only have 32gb | 12:47 | |
find anything interesting in the profile? | |||
13:04
moon-child left
13:12
reportable6 joined
|
|||
jnthnwrthngtn | Was looking to see if dispatchers show up high in profiling | 13:12 | |
13:12
sourceable6 joined,
bloatable6 joined
|
|||
jnthnwrthngtn | The answer to that seems to be "no" | 13:12 | |
It's a bit hard to spot them, but I know what files they are in, and looking for things in those files shows that almost no time is spent there. | |||
13:12
moon-child joined
|
|||
jnthnwrthngtn | And it's a very similar story for the Raku ones; they're all in BOOTSTRAP, very little time is spent in there | 13:12 | |
13:24
jnthnwrthngtn left,
rypervenche left,
Util_ left,
cognominal_ left,
bloatable6 left,
sourceable6 left,
reportable6 left,
JimmyZ left,
TempIRCLogger left,
squashable6 left,
nativecallable6 left,
greppable6 left,
quotable6 left,
bisectable6 left,
benchable6 left,
MasterDuke left,
statisfiable6 left,
committable6 left,
evalable6 left,
coverable6 left,
notable6 left,
unicodable6 left,
shareable6 left,
tellable6 left,
linkable6 left,
frost left,
camelia left,
moon-child left,
japhb left,
Nicholas left,
JRaspass left,
lizmat left,
bartolin_ left,
raydiak left,
Altai-man left,
dogbert11 left,
kjp left,
jdv left,
vrurg left,
leont left,
nine left,
Voldenet left,
gfldex left,
rba left,
[Coke] left,
AlexDaniel left,
leedo left,
timo left,
discord-raku-bot left,
nebuchadnezzar left,
Geth left,
harrow left,
tbrowder left,
psydroid left,
ugexe left,
samcv left
13:28
moon-child joined,
bloatable6 joined,
sourceable6 joined,
reportable6 joined,
JimmyZ joined,
TempIRCLogger joined,
squashable6 joined,
nativecallable6 joined,
greppable6 joined,
quotable6 joined,
bisectable6 joined,
benchable6 joined,
MasterDuke joined,
statisfiable6 joined,
committable6 joined,
evalable6 joined,
coverable6 joined,
notable6 joined,
unicodable6 joined,
shareable6 joined,
tellable6 joined,
linkable6 joined,
frost joined,
japhb joined,
camelia joined,
Nicholas joined,
JRaspass joined,
bartolin_ joined,
raydiak joined,
Altai-man joined,
dogbert11 joined,
leont joined,
lizmat joined,
kjp joined,
jnthnwrthngtn joined,
timo joined,
jdv joined,
nine joined,
discord-raku-bot joined,
psydroid joined,
AlexDaniel joined,
nebuchadnezzar joined,
vrurg joined,
ugexe joined,
gfldex joined,
rba joined,
Voldenet joined,
[Coke] joined,
leedo joined,
Geth joined,
tbrowder joined,
harrow joined,
rypervenche joined,
Util_ joined,
cognominal_ joined,
samcv joined
|
|||
jnthnwrthngtn | On the upside, I spotted some LHF places to reduce boxing allocations | 13:32 | |
And that's 7 million or so GC allocations less when compiling CORE.setting | 13:33 | ||
m: say 1819 - 1721 | |||
camelia | 98 | ||
jnthnwrthngtn | Nearly 100 less GC runs. Too bad most GC runs are fast so I don't see much wallcock speedup :) | ||
nine | Shall I introduce some O(nĀ³) to the GC? I'm sure I can find something... | 13:34 | |
MasterDuke | that's a misspelling i've never seen before... | 13:36 | |
jnthnwrthngtn | I had to read what I wrote like 4 times to see it :P | 13:37 | |
MasterDuke | i wonder if ram prices have come down any | 13:41 | |
huh, i thought i tried using list_s in sorted_keys and something broke. maybe it was some other sorting routine | 13:44 | ||
13:45
JimmyZ left
|
|||
MasterDuke | and nope, the same 32gb i bought for $160 almost exactly two years ago is now $200 | 13:46 | |
nine | Sell! Sell! Sell! | 13:47 | |
jnthnwrthngtn | Curiously, among the new-disp changes, I've made stage mbc be 0.88s rather than 1.4s in master :) | 13:49 | |
Alas, that's the only state that's better | 13:50 | ||
*stage | |||
44.4s parse on new-disp, 37.0s on master | |||
9.37s mast on new-disp, 7.93s on master | 13:51 | ||
13:52
Guest7810 joined
|
|||
MasterDuke | nine: have you looked at github.com/rakudo/rakudo/pull/4501 ? | 13:53 | |
Altai-man | that's probably very obvious, but I recall e.g. inlining / spesh not working at its full power, is it already back? | ||
nine | MasterDuke: the patch looks good. But as the discussion shows its hard to find the right layer for putting in concurrency safeguards. So I'm a bit reluctant to give judgement... | 13:55 | |
jnthnwrthngtn | Altai-man: In theory, for NQP code - which is what dominates - it's back. | 13:56 | |
Altai-man: For Raku code not, but that isn't what's running at compile time of the setting | 13:57 | ||
OK, this is weird, CORE.setting profiling the compile on master gets killed | |||
as OOM | |||
And really does use much more than on new-disp | 13:58 | ||
That's a bit odd. I can't guess why. | |||
MasterDuke | yeah. i feel a little bit better because i'd started to do something similar to the latest commit, but then switched to just the clone->modify->replace when that fixed the MVM_oops'es and was a smaller change | ||
jnthnwrthngtn | So I can't easily get that kind of profile. I can get a callgrind of master for comparison purposes, which will nicely run while I have a short meeting :) | 13:59 | |
MasterDuke | jnthnwrthngtn: i'm running profile on master now, but had to create a 64gb swapfile, which is almost completely filled (and it's been running for 7 min now) | 14:00 | |
i was just recently reading about how good zram is on linux nowadays, i may give that a try next instead of the swapfile | 14:01 | ||
14:13
Guest7810 left
14:14
Guest6661 joined
|
|||
jnthnwrthngtn | Back | 14:39 | |
MasterDuke | 2.7gb sql file and still writing | ||
jnthnwrthngtn | I think there's a significant inlining difference after all. On master I see 95 million calls to MVM_frame_invoke. On new-disp, 111 million to MVM_frame_dispatch. | 14:40 | |
nine | That's about the same ratio as the times for stage parse | 14:42 | |
14:46
frost left
|
|||
jnthnwrthngtn | Curiously, MVM_SPESH_INLINE_LOG=1 doesn't look unusual. | 14:48 | |
(It's doing the inlines I'd expect) | |||
nine | So, do we have other optimizations besides inlining that reduce the number of executed calls? | 14:50 | |
jnthnwrthngtn | Don't think so | 14:55 | |
MasterDuke | finally finished with a 3.1gb sql file | 14:57 | |
dogbert11 | dumb question warning: should MVM_spesh_osr_poll_for_result be called even if MVM_SPESH_DISABLE=1 ? | 15:01 | |
nine | dogbert11: MVM_spesh_osr_poll_for_result checks whether spesh (including osr) is enabled, so calling it is ok. After all the osrpoint ops will still be in the bytecode. | 15:04 | |
dogbert11 | nine: thx, I became confused when I saw two spesh related functions among the top ten when doing a callgrind_annotate on a program where spesh was disabled | 15:08 | |
jnthnwrthngtn | Comparing callgrind further: we make a lot less inlining attempts, but we also end up with a lot less calls to optimize_runbytecode, which is the successor to optimize_call, and so perhaps we're missing translating more dispatch programs than I think | ||
MasterDuke | doh. tried to read the sql in sqlite3 and got `Error: near line 180950: out of memory` | 15:14 | |
which is only 7 lines from the end of the file | 15:15 | ||
jnthnwrthngtn | ugh | 15:24 | |
I found one easy NYI that was causing quite a few missed dispatch program translations: when we were monomorphic in a given specialization but there was a polymorphic inline cache entry, we didn't translate the dispatch program | 15:27 | ||
Geth | MoarVM/new-disp: 0a3e14cc29 | (Jonathan Worthington)++ | src/spesh/disp.c Handle missed monomorphism in specializations We may have a polymorphic callsite according to the inline cache, but also have statistics indicating that, for the specialization we are producing, only one of the dispatch programs tends to be hit. In this case, translate that dispatch program. |
15:31 | |
jnthnwrthngtn | Hm, a more general restructuring is warranted in this area, in fact. :) | 15:41 | |
nine | dogbert11: I think you were right. That spesh issue seems to be new. It doesn't occur at the commit before my hllize dispatcher work. | 15:52 | |
Question now is: is it the hllize dispatcher that's wrong or does it just uncover some spesh oddity? | 15:53 | ||
dogbert11 | nine: what does your gut tell you? | 15:58 | |
16:04
releasable6 joined
|
|||
nine | Nine's debug rule #1: the bug is always in your own code | 16:08 | |
Even weirder: the original exception that happens is: Malformed UTF-8 near bytes 20 4c 8b at line 1 col 53 | 16:17 | ||
Geth | MoarVM/new-disp: a4bbc1b5d6 | (Jonathan Worthington)++ | src/spesh/disp.c Use inline cache entry to drive optimization If we see that the inline cache entry indicates a callsite we cannot translate, there's no point doing further analysis. Similarly, if it is monomorphic then we don't need to consider the chosen dispatch program statistics; just translate it right off. (This also gets us better outcomes in some cases where we miss statistics for whatever reason.) We thus only look at the logged outcomes when we have a polymorphic inline cache entry and might be able to use the stats to turn it monomorphic. |
||
nine | And the Malformed UTF-8 is the debug name of a class. Because the class_handle in the getattr is totally bogus | 16:21 | |
Since the speshed version of the broken frame doesn't have a getattr_o op anymore, I assume we deopted | 16:25 | ||
16:25
dogbert11 left
|
|||
nine | Oh, the "object" getattr_o is trying to get the STable from is already an STable | 16:27 | |
Taking a bit of a leap here: maybe my hllization dispatcher needs to guard on concreteness as well? | |||
Looks like it already does: nqp::dispatch('boot-syscall', 'dispatcher-guard-concreteness', $arg); | 16:28 | ||
But, the lang_hllize dispatcher doesn't! | 16:31 | ||
And adding the guard there as well seems to fix the test case | |||
jnthnwrthngtn | nine: If you use dispatcher-track-attr, it automatically adds guards on type and concreteness | 16:34 | |
To ensure that such can't be forgotten | |||
nine | jnthnwrthngtn: the missing guard is in lang_hllize, i.e. in MoarVM. The getattr thing is just a victim of a wrong hllize result. | 16:36 | |
jnthnwrthngtn | ah | 16:38 | |
So with the above commits I certainly see an improvement | 16:39 | ||
Although still not enough | |||
nine | The problem with taking such leaps (despite the success) is....I don't really know what to write in the commit message :D I actually don't know how the missing guard led to the STable getting where an object should be. | 16:41 | |
Geth | MoarVM/new-disp: 4b5a965c95 | (Stefan Seifert)++ | src/disp/boot.c Guard for concreteness of lang_hllize arg Otherwise we may end up with an STable where we expect an MVMObject |
16:44 | |
nine | In a pinch...write what you know | ||
16:48
dogbert17 joined
|
|||
nine | What the?! I tested the settings build with MVM_SPESH_BLOCKING and MVM_SPESH_NODELAY and even with a smaller nursery before committing. But now the build fails with "Cannot call trait_mod:<is>; no signatures match" | 16:48 | |
And it's clearly the added guard. How can that be? | |||
dogbert17 | nine: FWIW, your fix removed one of the problems I had, the other one is still present | 16:50 | |
i.e. MVM_SPESH_NODELAY=1 ./rakudo-m -Ilib t/spec/S32-io/utf16.t tends to hang | 16:51 | ||
Geth | MoarVM/new-disp: ff8a2d8e7c | (Stefan Seifert)++ | src/disp/boot.c Revert "Guard for concreteness of lang_hllize arg" This reverts commit 4b5a965c95054e28ffb6c5fc1b7b1707c69b41e1. |
16:55 | |
dogbert17 | oops | 16:56 | |
nine | I think the change had a GC issue anyway | ||
Nicholas | "we'll fix it in post", er, on a rebase... | 17:01 | |
jnthnwrthngtn | Hm, 5 million less frames created according to callgrind | 17:14 | |
Down from ~112 million to ~107 million. master still at 95 million. | 17:15 | ||
Quite a lot more optimize_runbytecode calls, though | 17:18 | ||
Nicholas | dogbert17: ah yes, that was one that hangs for me like that too | 17:19 | |
17:43
dogbert17 left,
dogbert17 joined
17:53
dogbert17 left
18:01
dogbert17 joined
18:02
reportable6 left
18:03
reportable6 joined
|
|||
MasterDuke | wow, i just did a profile-compile of core.c on new-disp and it's only 1gb (compared to master's 3.1gb) | 18:57 | |
timo | wow, how is that | 19:01 | |
MasterDuke | arg. but still `Error: near line 59273: out of memory`, only 5 lines from the end this time | ||
timo | ahahaha | ||
sqlite says that? | |||
MasterDuke | yeah | 19:02 | |
if i do a .read of the file | |||
timo | you can possibly just toss out the garbage collector runs | ||
they aren't quite as important as the rest | |||
and you can also almost read them just from the sql | |||
MasterDuke | huh, i just did `grep -v 'INSERT INTO gcs' >new_prof` and sqlite3 didn't like .reading new_prof. lots of `Error: near line 2997: UNIQUE constraint failed: calls.id` | 19:06 | |
timo | oh, hum | ||
are they spread over multiple lines perhaps? | |||
MasterDuke | guess so | ||
timo | but that could be more a syntax error | 19:07 | |
MasterDuke | btw, i wonder if it would be better to wrap each individual section in BEGIN/END, rather than the entire file | ||
oh, and weird, they aren't spread over multiple lines | 19:11 | ||
the only lines that don't have 'INSERT INTO' are the BEGIN/END and the 'CREATE TABLE' lines | 19:12 | ||
timo | oh did you make sure to restart the sqlite shell in between .read? | 19:13 | |
MasterDuke | oh, but the deallocations table has a `FOREIGN KEY(gc_seq_num, gc_thread_id) REFERENCES gcs(sequence_num, thread_id)` | ||
timo | oh, ok, that's not helpful then | 19:14 | |
MasterDuke | nothing references the deallocations table, so i can just exclude those too | ||
timo | right | 19:15 | |
MasterDuke | no, same error... | ||
doh, nm. that works | 19:17 | ||
Error: near line 59271: out of memory | 19:20 | ||
well, it did populate a bunch of stuff | 19:21 | ||
hm, not sure i trust what's there | 19:24 | ||
i don't believe that github.com/rakudo/rakudo/blob/new-...t.pm6#L116 is the top by exclusive time | 19:25 | ||
lizmat | for core setting? | ||
I would be surprised that it is even used in the core setting? | 19:26 | ||
MasterDuke | yeah, maybe those last three lines i'm missing are important | 19:28 | |
huh, sqlite doesn't appear to be filling my memory. it must be some internal thing? | 19:34 | ||
timo | probably using shared memory with a five on disk or perhaps in /tmp | 19:37 | |
and it forces it to be resident in memory maybe? | 19:38 | ||
19:42
dogbert17 left,
dogbert11 joined
19:46
Guest6661 left
19:48
TempIRCLogger left,
TempIRCLogger joined
|
|||
MasterDuke | huh. i stuck a `END;BEGIN;` in the middle of the file and it finished reading it | 19:52 | |
but it still give wacky results | |||
timo | :o | 19:53 | |
MasterDuke | maybe we should have it write `END;BEGIN;` every 10k lines or so | 19:58 | |
20:06
dogbert17 joined
20:09
dogbert11 left
20:10
dogbert17 left
20:13
dogbert17 joined
|
|||
jnthnwrthngtn | fwiw, I shoved the sql output from new-disp CORE.setting into Comma's profile viewer and the data looked sane, so I think the SQL itself is alright | 20:49 | |
MasterDuke | huh | 20:50 | |
`select case when r.name = "" then "<anon>" else r.name end as name, r.file, r.line, sum(entries) as entries, sum(case when rec_depth = 0 then inclusive_time else 0 end) as inclusive_time, sum(exclusive_time) as exclusive_time from calls c, routines r where c.id = r.id group by c.id order by exclusive_time desc limit 30;` is what i was running | |||
20:52
dogbert17 left
|
|||
MasterDuke | i'm not sure i've used comma's profile viewer, how do i do that? | 20:54 | |
timo | i'm not sure if it lets you import an existing sql file or only lets you record stuff anew | 20:57 | |
oh lol | |||
MasterDuke | heh, maybe if i just copy it to '/tmp/comma-profiler.sql' | 20:58 | |
no | 20:59 | ||
21:00
dogbert17 joined
|
|||
timo | probably needs exact timing | 21:01 | |
jnthnwrthngtn | timo: No released version lets you do that, it's a new feature we're adding for the next release :) | 21:08 | |
Which I needed to do some testing of today anyway. | |||
MasterDuke | ah, nice | ||
timo | :+1: | 21:09 | |
jnthnwrthngtn | Seems much of the opt shortcoming that remains is because we're ending up with missing typle tuples and static frame info in the spesh stats for some calls | 21:27 | |
I'm really not sure why | |||
timo | anything related to OSR? which is what i saw, but that was in raku code, not nqp code inside rakudo | 21:28 | |
jnthnwrthngtn | No, I see lots of non-OSR cases | 21:29 | |
Weird | |||
afk for a bit | |||
timo | i've always wanted a little thermal printer, i should hook that up to spesh logs | 21:30 | |
(not the dumps, the actual logs that threads sumbit to the spesh thread) | |||
22:39
dogbert11 joined
22:43
dogbert17 left
|
|||
Geth | MoarVM/new-disp: 725cad0bbc | (Jonathan Worthington)++ | 4 files Reinstate building arg tuples from facts This can give us some further opportunities for specialization linking and inlining. |
23:13 | |
MoarVM/new-disp: cd30dcbc6d | (Jonathan Worthington)++ | 3 files Stub in callstack record for calls set up in C For the case where we will pass them arguments. Today we steal the arguments buffer of the current frame, however that is going away, so we'll need another way to store the arguments being passed and keep them marked. |
|||
MoarVM/new-disp: 7b09353220 | (Jonathan Worthington)++ | src/core/interp.c Use local for bytecode offset calculation Rather than going through a level of indirection. |
23:27 | ||
jnthnwrthngtn | Still haven't got to the bottom of why it seems to miss so many things | 23:28 | |
Enough for today, though. | 23:29 |