Geth | MoarVM/dedicated_nursery_memory_area: 38de7b77fc | (Timo Paulssen)++ | 11 files attempt to get all nursery areas into one memory area |
11:55 | |
MoarVM/dedicated_nursery_memory_area: e515c1802d | (Timo Paulssen)++ | 9 files fix up nursery area feature |
|||
timo | this is the implementation of what i mentioned the other day | ||
11:59
sena_kun joined
|
|||
ab5tract | timo: any noticeable performance impact? | 13:56 | |
ab5tract wonders what a good stress test of this might look like | |||
timo | yes, check #raku-dev | 13:57 | |
difficult to say what a workload would look like that really tickles the benefits for this | 13:58 | ||
or do you mean to find out if it does stuff wrong? | |||
i'm thinking performance improvements might be more strongly pronounced in situations where the whole system is kind of low on memory? so that older objects would likely be out of cache or even in swap by the time a nursery collection happens | 14:00 | ||
in general, this change would make accesses to older objects during a nursery collection rarer | 14:01 | ||
if these older objects were already accessed recently, before the collection started, they wouldn't get a real benefit from this, besides maybe not having to get them into L1 cache where they are much more unlikely to be | |||
lizmat | timo: have you tried timing the spectest with it ? | 14:02 | |
timo | i have not. any volunteers? :) | ||
lizmat | shouldn't it be as simple as "make spectest" for you? | ||
timo | ideally, the system wouldn't be busy otherwise | 14:03 | |
lizmat | grab a cup of tea while it's doing that? | 14:04 | |
timo | how fast are your spec test runs :o | ||
lizmat | I mean, a. see if your work triggers any issuesn, and b. if there is a sigificant difference | ||
Files=1359, Tests=119744, 212 wallclock secs (17.50 usr 3.61 sys + 1218.51 cusr 73.82 csys = 1313.44 CPU) | |||
was the last one I did today, for the substr-rw fix | 14:05 | ||
timo | huh ok that's not so bad | ||
lizmat | so less than 4 minutes (well, at least for me on a M1) | ||
timo | just two runs with my new stuff: 150.42 +- 40.39 seconds time elapsed ( +- 26.85% ) | 14:20 | |
185 wallclock secs (12.95 usr 3.96 sys + 1727.56 cusr 307.15 csys = 2051.62 CPU) | |||
104 wallclock secs (12.91 usr 3.83 sys + 1769.88 cusr 313.71 csys = 2100.33 CPU) | 14:21 | ||
lizmat | end neither of these are without your patches I assume? | ||
*and | |||
timo | correct. same thing both times | 14:22 | |
lizmat | yeah, than the first one did a lot of precomping that the second one didn't need to do | ||
*then | |||
timo | the first one was the very fast one | ||
lizmat | it was ? | 14:23 | |
timo | yup | ||
could be related to thermals though | |||
lizmat | yeah, probably then | ||
timo | my beast of a cooler doesn't seem to get the temperature below 85 degC | ||
i should go from 20 test jobs to maybe 14? 10? | |||
lizmat | possibly... it's been a while since I tested on Intel hardware | 14:24 | |
timo | this is ĆMD :P | 14:25 | |
lizmat | the M1 throttles after a while as well, but apparently way later than Intel (or AMD) | ||
timo | the feature turned off: 112.386 +- 0.203 seconds time elapsed | 14:26 | |
for this one, both took the same amount of time | |||
lizmat | yeah, that's within noise | ||
timo | 2,112,932.19 msec task-clock with my stuff, 2,151,957.20 msec task-clock without my stuff | 14:28 | |
i believe "task-clock" counts the time for which the task was on CPU, so excludes things like waiting for external stuff, or sleep, etc | 14:29 | ||
most spec tests might fall under the "worst case" for this optimization tbh | 14:31 | ||
lizmat | I know, and even then it appears to make a (small) difference | 14:34 | |
so that's good :-) | |||
timo | with my stuff: 120 wallclock secs (11.44 usr 3.20 sys + 1290.16 cusr 235.46 csys = 1540.26 CPU) and 120 wallclock secs (11.11 usr 3.43 sys + 1297.14 cusr 234.72 csys = 1546.40 CPU) | ||
this is with TEST_JOBS=12 | 14:35 | ||
125.633 +- 0.170 seconds time elapsed ( +- 0.14% ) | |||
this also includes the startup time with the git clone and fudgeall i think | |||
without my stuff: 121 wallclock secs (11.50 usr 3.31 sys + 1320.40 cusr 238.02 csys = 1573.23 CPU) and 121 wallclock secs (11.44 usr 3.39 sys + 1318.10 cusr 235.63 csys = 1568.56 CPU) | 14:41 | ||
127.33503 +- 0.00425 seconds time elapsed | |||
very small improvement, but doesn't seem to be very noisy | |||
csys being 1573s and 1568s without my stuff and 1540s and 1546s looks pretty decent | 14:42 | ||
but the slower of the faster ones is just 2s better than the faster of the slower ones | 14:43 | ||
lizmat | still... I wouldn't mind having my spectests run a bit faster, as I tend to run a lot of them | ||
timo | that amount tiny :( | 14:45 | |
Geth | MoarVM/dedicated_nursery_memory_area: d455611795 | (Timo Paulssen)++ | 3 files remove remaining debugspam |
14:46 | |
timo | and i have no idea how it would impact performance on the M1 | 14:47 | |
it has a famously big third level cache or something like that? | |||
or just very fast main memory access speed | |||
lizmat | could well be... I'm just happy it runs spectests about 2 as fast as on my 2019 Intel MBP | ||
*2x | 14:48 | ||
ab5tract | I think the Raku-only script from this work could be an interesting benchmark here: 5ab5traction5.bearblog.dev/an-init...raku-code/ | 14:49 | |
timo | you could try compiling my branch and see if it does anything. you can compare with MVM_NO_NURSERY_RANGE=1 and without the env var entirely | ||
ab5tract | I've got to run for now but will post some timings later | ||
> could well be... I'm just happy it runs spectests about 2 as fast as on my 2019 Intel MBP | 14:50 | ||
wait, it runs the spectests 2x as fast?? | |||
timo | that's lizmat's M1 vs MBP | 14:51 | |
ab5tract | ahh | ||
lizmat | 2.4 GHz 8-core i9 vs M1 | ||
ab5tract | had my heart jumping there for a second | ||
14:53
lizmat_ joined
|
|||
ab5tract | do we have any tooling around bumping moar/nqp versions? I've been doing it manually when hacking on the under-layers and I've found it to be a bit annoying | 14:56 | |
to keep track of | |||
lizmat_ | we used to have tooling to track which commits would be part of a bump (Zoffixx++) | ||
14:57
lizmat left
|
|||
lizmat_ | but since then work on MoarVM has slowed so much that I try to follow up each commit on MoarVM with a bump | 14:57 | |
for better bisectability | |||
14:57
lizmat_ left
14:58
lizmat joined
|
|||
ab5tract | I'm thinking more about how it's a PITA to manually cd into MoarVM, capture the current commit hash, cd into nqp, bump, commit, save the hash, cd into rakudo and bump again | 15:05 | |
lizmat | afk& | ||
ab5tract | it's no big deal to bump when the code is all ready to bump | ||
but doing that while trying to get something in MoarVM changed that requires verification in Rakudo has not been the most pleasant experience so far | 15:06 | ||
Here's what I have in my history for making it less painful: | 15:10 | ||
`cd nqp/MoarVM/; g commit -a --amend; g describe | pbcopy; cd ..; pbpaste > ./tools/templates/MOAR_REVISION; g commit -a --amend; g describe | pbcopy; cd ..; pbpaste > ./tools/templates/NQP_REVISION` | |||
17:02
sena_kun left
|
|||
ab5tract | timo: does `MVM_NO_NURSERY_RANGE=1` activate or deactivate your new code? | 18:12 | |
unfortunately I don't see an impact in either case | 18:17 | ||
oops, (l)user error | 18:23 | ||
I've managed to actually get the revision tags updated correctly, but I'm seeing this while compiling: gist.github.com/ab5tract/fa1c067fc...92e5d94781 | 18:27 | ||
it appears that macOS refers to the same feature as `MAP_FIXED` | 18:36 | ||
hmm.. I guess it's not exactly the same. | 18:39 | ||
anyway, it compiles with `MAP_FIXED`. But I don't see any significant perf change in either direction with `MVM_NO_NURSERY_RANGE` :/ | 18:40 | ||
this article implies that this should be expected since MAP_FIXED_NOREPLACE is basically Linux/x86 only? tia.mat.br/posts/2022/05/23/using-...-jits.html | 18:44 | ||
18:51
japhb left
20:54
MasterDuke joined
|
|||
MasterDuke | ab5tract: zoffix had a tool, i think called just 'z', that he used for interacting with moarvm/nqp/rakudo. i believe it had a `bump` subcommand that did all that | 20:55 | |
wonder how raku would fare in this comparison? justine.lol/lex/ | 21:14 | ||
timo: very interesting. fwiw, i didn't notice an improvement compiling CORE.e (though my benchmarking was not extremely rigorous) | 21:25 | ||
22:54
sena_kun joined
|