Geth MoarVM/dedicated_nursery_memory_area: 38de7b77fc | (Timo Paulssen)++ | 11 files
attempt to get all nursery areas into one memory area
11:55
MoarVM/dedicated_nursery_memory_area: e515c1802d | (Timo Paulssen)++ | 9 files
fix up nursery area feature
timo this is the implementation of what i mentioned the other day
11:59 sena_kun joined
ab5tract timo: any noticeable performance impact? 13:56
ab5tract wonders what a good stress test of this might look like
timo yes, check #raku-dev 13:57
difficult to say what a workload would look like that really tickles the benefits for this 13:58
or do you mean to find out if it does stuff wrong?
i'm thinking performance improvements might be more strongly pronounced in situations where the whole system is kind of low on memory? so that older objects would likely be out of cache or even in swap by the time a nursery collection happens 14:00
in general, this change would make accesses to older objects during a nursery collection rarer 14:01
if these older objects were already accessed recently, before the collection started, they wouldn't get a real benefit from this, besides maybe not having to get them into L1 cache where they are much more unlikely to be
lizmat timo: have you tried timing the spectest with it ? 14:02
timo i have not. any volunteers? :)
lizmat shouldn't it be as simple as "make spectest" for you?
timo ideally, the system wouldn't be busy otherwise 14:03
lizmat grab a cup of tea while it's doing that? 14:04
timo how fast are your spec test runs :o
lizmat I mean, a. see if your work triggers any issuesn, and b. if there is a sigificant difference
Files=1359, Tests=119744, 212 wallclock secs (17.50 usr 3.61 sys + 1218.51 cusr 73.82 csys = 1313.44 CPU)
was the last one I did today, for the substr-rw fix 14:05
timo huh ok that's not so bad
lizmat so less than 4 minutes (well, at least for me on a M1)
timo just two runs with my new stuff: 150.42 +- 40.39 seconds time elapsed ( +- 26.85% ) 14:20
185 wallclock secs (12.95 usr 3.96 sys + 1727.56 cusr 307.15 csys = 2051.62 CPU)
104 wallclock secs (12.91 usr 3.83 sys + 1769.88 cusr 313.71 csys = 2100.33 CPU) 14:21
lizmat end neither of these are without your patches I assume?
*and
timo correct. same thing both times 14:22
lizmat yeah, than the first one did a lot of precomping that the second one didn't need to do
*then
timo the first one was the very fast one
lizmat it was ? 14:23
timo yup
could be related to thermals though
lizmat yeah, probably then
timo my beast of a cooler doesn't seem to get the temperature below 85 degC
i should go from 20 test jobs to maybe 14? 10?
lizmat possibly... it's been a while since I tested on Intel hardware 14:24
timo this is ƄMD :P 14:25
lizmat the M1 throttles after a while as well, but apparently way later than Intel (or AMD)
timo the feature turned off: 112.386 +- 0.203 seconds time elapsed 14:26
for this one, both took the same amount of time
lizmat yeah, that's within noise
timo 2,112,932.19 msec task-clock with my stuff, 2,151,957.20 msec task-clock without my stuff 14:28
i believe "task-clock" counts the time for which the task was on CPU, so excludes things like waiting for external stuff, or sleep, etc 14:29
most spec tests might fall under the "worst case" for this optimization tbh 14:31
lizmat I know, and even then it appears to make a (small) difference 14:34
so that's good :-)
timo with my stuff: 120 wallclock secs (11.44 usr 3.20 sys + 1290.16 cusr 235.46 csys = 1540.26 CPU) and 120 wallclock secs (11.11 usr 3.43 sys + 1297.14 cusr 234.72 csys = 1546.40 CPU)
this is with TEST_JOBS=12 14:35
125.633 +- 0.170 seconds time elapsed ( +- 0.14% )
this also includes the startup time with the git clone and fudgeall i think
without my stuff: 121 wallclock secs (11.50 usr 3.31 sys + 1320.40 cusr 238.02 csys = 1573.23 CPU) and 121 wallclock secs (11.44 usr 3.39 sys + 1318.10 cusr 235.63 csys = 1568.56 CPU) 14:41
127.33503 +- 0.00425 seconds time elapsed
very small improvement, but doesn't seem to be very noisy
csys being 1573s and 1568s without my stuff and 1540s and 1546s looks pretty decent 14:42
but the slower of the faster ones is just 2s better than the faster of the slower ones 14:43
lizmat still... I wouldn't mind having my spectests run a bit faster, as I tend to run a lot of them
timo that amount tiny :( 14:45
Geth MoarVM/dedicated_nursery_memory_area: d455611795 | (Timo Paulssen)++ | 3 files
remove remaining debugspam
14:46
timo and i have no idea how it would impact performance on the M1 14:47
it has a famously big third level cache or something like that?
or just very fast main memory access speed
lizmat could well be... I'm just happy it runs spectests about 2 as fast as on my 2019 Intel MBP
*2x 14:48
ab5tract I think the Raku-only script from this work could be an interesting benchmark here: 5ab5traction5.bearblog.dev/an-init...raku-code/ 14:49
timo you could try compiling my branch and see if it does anything. you can compare with MVM_NO_NURSERY_RANGE=1 and without the env var entirely
ab5tract I've got to run for now but will post some timings later
> could well be... I'm just happy it runs spectests about 2 as fast as on my 2019 Intel MBP 14:50
wait, it runs the spectests 2x as fast??
timo that's lizmat's M1 vs MBP 14:51
ab5tract ahh
lizmat 2.4 GHz 8-core i9 vs M1
ab5tract had my heart jumping there for a second
14:53 lizmat_ joined
ab5tract do we have any tooling around bumping moar/nqp versions? I've been doing it manually when hacking on the under-layers and I've found it to be a bit annoying 14:56
to keep track of
lizmat_ we used to have tooling to track which commits would be part of a bump (Zoffixx++)
14:57 lizmat left
lizmat_ but since then work on MoarVM has slowed so much that I try to follow up each commit on MoarVM with a bump 14:57
for better bisectability
14:57 lizmat_ left 14:58 lizmat joined
ab5tract I'm thinking more about how it's a PITA to manually cd into MoarVM, capture the current commit hash, cd into nqp, bump, commit, save the hash, cd into rakudo and bump again 15:05
lizmat afk&
ab5tract it's no big deal to bump when the code is all ready to bump
but doing that while trying to get something in MoarVM changed that requires verification in Rakudo has not been the most pleasant experience so far 15:06
Here's what I have in my history for making it less painful: 15:10
`cd nqp/MoarVM/; g commit -a --amend; g describe | pbcopy; cd ..; pbpaste > ./tools/templates/MOAR_REVISION; g commit -a --amend; g describe | pbcopy; cd ..; pbpaste > ./tools/templates/NQP_REVISION`
17:02 sena_kun left
ab5tract timo: does `MVM_NO_NURSERY_RANGE=1` activate or deactivate your new code? 18:12
unfortunately I don't see an impact in either case 18:17
oops, (l)user error 18:23
I've managed to actually get the revision tags updated correctly, but I'm seeing this while compiling: gist.github.com/ab5tract/fa1c067fc...92e5d94781 18:27
it appears that macOS refers to the same feature as `MAP_FIXED` 18:36
hmm.. I guess it's not exactly the same. 18:39
anyway, it compiles with `MAP_FIXED`. But I don't see any significant perf change in either direction with `MVM_NO_NURSERY_RANGE` :/ 18:40
this article implies that this should be expected since MAP_FIXED_NOREPLACE is basically Linux/x86 only? tia.mat.br/posts/2022/05/23/using-...-jits.html 18:44
18:51 japhb left 20:54 MasterDuke joined
MasterDuke ab5tract: zoffix had a tool, i think called just 'z', that he used for interacting with moarvm/nqp/rakudo. i believe it had a `bump` subcommand that did all that 20:55
wonder how raku would fare in this comparison? justine.lol/lex/ 21:14
timo: very interesting. fwiw, i didn't notice an improvement compiling CORE.e (though my benchmarking was not extremely rigorous) 21:25
22:54 sena_kun joined