#moarvm on 2 November 2024 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
Geth	MoarVM/dedicated_nursery_memory_area: 38de7b77fc \| (Timo Paulssen)++ \| 11 files attempt to get all nursery areas into one memory area	11:55	Copy link Message link Add to gist Remove
	MoarVM/dedicated_nursery_memory_area: e515c1802d \| (Timo Paulssen)++ \| 9 files fix up nursery area feature		Copy link Message link Add to gist Remove
timo	this is the implementation of what i mentioned the other day		Copy link Message link Add to gist Remove
11:59 sena_kun joined
ab5tract	timo: any noticeable performance impact?	13:56	Copy link Message link Add to gist Remove
	ab5tract wonders what a good stress test of this might look like		Copy link Message link Add to gist Remove
timo	yes, check #raku-dev	13:57	Copy link Message link Add to gist Remove
	difficult to say what a workload would look like that really tickles the benefits for this	13:58	Copy link Message link Add to gist Remove
	or do you mean to find out if it does stuff wrong?		Copy link Message link Add to gist Remove
	i'm thinking performance improvements might be more strongly pronounced in situations where the whole system is kind of low on memory? so that older objects would likely be out of cache or even in swap by the time a nursery collection happens	14:00	Copy link Message link Add to gist Remove
	in general, this change would make accesses to older objects during a nursery collection rarer	14:01	Copy link Message link Add to gist Remove
	if these older objects were already accessed recently, before the collection started, they wouldn't get a real benefit from this, besides maybe not having to get them into L1 cache where they are much more unlikely to be		Copy link Message link Add to gist Remove
lizmat	timo: have you tried timing the spectest with it ?	14:02	Copy link Message link Add to gist Remove
timo	i have not. any volunteers? :)		Copy link Message link Add to gist Remove
lizmat	shouldn't it be as simple as "make spectest" for you?		Copy link Message link Add to gist Remove
timo	ideally, the system wouldn't be busy otherwise	14:03	Copy link Message link Add to gist Remove
lizmat	grab a cup of tea while it's doing that?	14:04	Copy link Message link Add to gist Remove
timo	how fast are your spec test runs :o		Copy link Message link Add to gist Remove
lizmat	I mean, a. see if your work triggers any issuesn, and b. if there is a sigificant difference		Copy link Message link Add to gist Remove
	Files=1359, Tests=119744, 212 wallclock secs (17.50 usr 3.61 sys + 1218.51 cusr 73.82 csys = 1313.44 CPU)		Copy link Message link Add to gist Remove
	was the last one I did today, for the substr-rw fix	14:05	Copy link Message link Add to gist Remove
timo	huh ok that's not so bad		Copy link Message link Add to gist Remove
lizmat	so less than 4 minutes (well, at least for me on a M1)		Copy link Message link Add to gist Remove
timo	just two runs with my new stuff: 150.42 +- 40.39 seconds time elapsed ( +- 26.85% )	14:20	Copy link Message link Add to gist Remove
	185 wallclock secs (12.95 usr 3.96 sys + 1727.56 cusr 307.15 csys = 2051.62 CPU)		Copy link Message link Add to gist Remove
	104 wallclock secs (12.91 usr 3.83 sys + 1769.88 cusr 313.71 csys = 2100.33 CPU)	14:21	Copy link Message link Add to gist Remove
lizmat	end neither of these are without your patches I assume?		Copy link Message link Add to gist Remove
	*and		Copy link Message link Add to gist Remove
timo	correct. same thing both times	14:22	Copy link Message link Add to gist Remove
lizmat	yeah, than the first one did a lot of precomping that the second one didn't need to do		Copy link Message link Add to gist Remove
	*then		Copy link Message link Add to gist Remove
timo	the first one was the very fast one		Copy link Message link Add to gist Remove
lizmat	it was ?	14:23	Copy link Message link Add to gist Remove
timo	yup		Copy link Message link Add to gist Remove
	could be related to thermals though		Copy link Message link Add to gist Remove
lizmat	yeah, probably then		Copy link Message link Add to gist Remove
timo	my beast of a cooler doesn't seem to get the temperature below 85 degC		Copy link Message link Add to gist Remove
	i should go from 20 test jobs to maybe 14? 10?		Copy link Message link Add to gist Remove
lizmat	possibly... it's been a while since I tested on Intel hardware	14:24	Copy link Message link Add to gist Remove
timo	this is ÄMD :P	14:25	Copy link Message link Add to gist Remove
lizmat	the M1 throttles after a while as well, but apparently way later than Intel (or AMD)		Copy link Message link Add to gist Remove
timo	the feature turned off: 112.386 +- 0.203 seconds time elapsed	14:26	Copy link Message link Add to gist Remove
	for this one, both took the same amount of time		Copy link Message link Add to gist Remove
lizmat	yeah, that's within noise		Copy link Message link Add to gist Remove
timo	2,112,932.19 msec task-clock with my stuff, 2,151,957.20 msec task-clock without my stuff	14:28	Copy link Message link Add to gist Remove
	i believe "task-clock" counts the time for which the task was on CPU, so excludes things like waiting for external stuff, or sleep, etc	14:29	Copy link Message link Add to gist Remove
	most spec tests might fall under the "worst case" for this optimization tbh	14:31	Copy link Message link Add to gist Remove
lizmat	I know, and even then it appears to make a (small) difference	14:34	Copy link Message link Add to gist Remove
	so that's good :-)		Copy link Message link Add to gist Remove
timo	with my stuff: 120 wallclock secs (11.44 usr 3.20 sys + 1290.16 cusr 235.46 csys = 1540.26 CPU) and 120 wallclock secs (11.11 usr 3.43 sys + 1297.14 cusr 234.72 csys = 1546.40 CPU)		Copy link Message link Add to gist Remove
	this is with TEST_JOBS=12	14:35	Copy link Message link Add to gist Remove
	125.633 +- 0.170 seconds time elapsed ( +- 0.14% )		Copy link Message link Add to gist Remove
	this also includes the startup time with the git clone and fudgeall i think		Copy link Message link Add to gist Remove
	without my stuff: 121 wallclock secs (11.50 usr 3.31 sys + 1320.40 cusr 238.02 csys = 1573.23 CPU) and 121 wallclock secs (11.44 usr 3.39 sys + 1318.10 cusr 235.63 csys = 1568.56 CPU)	14:41	Copy link Message link Add to gist Remove
	127.33503 +- 0.00425 seconds time elapsed		Copy link Message link Add to gist Remove
	very small improvement, but doesn't seem to be very noisy		Copy link Message link Add to gist Remove
	csys being 1573s and 1568s without my stuff and 1540s and 1546s looks pretty decent	14:42	Copy link Message link Add to gist Remove
	but the slower of the faster ones is just 2s better than the faster of the slower ones	14:43	Copy link Message link Add to gist Remove
lizmat	still... I wouldn't mind having my spectests run a bit faster, as I tend to run a lot of them		Copy link Message link Add to gist Remove
timo	that amount tiny :(	14:45	Copy link Message link Add to gist Remove
Geth	MoarVM/dedicated_nursery_memory_area: d455611795 \| (Timo Paulssen)++ \| 3 files remove remaining debugspam	14:46	Copy link Message link Add to gist Remove
timo	and i have no idea how it would impact performance on the M1	14:47	Copy link Message link Add to gist Remove
	it has a famously big third level cache or something like that?		Copy link Message link Add to gist Remove
	or just very fast main memory access speed		Copy link Message link Add to gist Remove
lizmat	could well be... I'm just happy it runs spectests about 2 as fast as on my 2019 Intel MBP		Copy link Message link Add to gist Remove
	*2x	14:48	Copy link Message link Add to gist Remove
ab5tract	I think the Raku-only script from this work could be an interesting benchmark here: 5ab5traction5.bearblog.dev/an-init...raku-code/	14:49	Copy link Message link Add to gist Remove
timo	you could try compiling my branch and see if it does anything. you can compare with MVM_NO_NURSERY_RANGE=1 and without the env var entirely		Copy link Message link Add to gist Remove
ab5tract	I've got to run for now but will post some timings later		Copy link Message link Add to gist Remove
	> could well be... I'm just happy it runs spectests about 2 as fast as on my 2019 Intel MBP	14:50	Copy link Message link Add to gist Remove
	wait, it runs the spectests 2x as fast??		Copy link Message link Add to gist Remove
timo	that's lizmat's M1 vs MBP	14:51	Copy link Message link Add to gist Remove
ab5tract	ahh		Copy link Message link Add to gist Remove
lizmat	2.4 GHz 8-core i9 vs M1		Copy link Message link Add to gist Remove
ab5tract	had my heart jumping there for a second		Copy link Message link Add to gist Remove
14:53 lizmat_ joined
ab5tract	do we have any tooling around bumping moar/nqp versions? I've been doing it manually when hacking on the under-layers and I've found it to be a bit annoying	14:56	Copy link Message link Add to gist Remove
	to keep track of		Copy link Message link Add to gist Remove
lizmat_	we used to have tooling to track which commits would be part of a bump (Zoffixx++)		Copy link Message link Add to gist Remove
14:57 lizmat left
lizmat_	but since then work on MoarVM has slowed so much that I try to follow up each commit on MoarVM with a bump	14:57	Copy link Message link Add to gist Remove
	for better bisectability		Copy link Message link Add to gist Remove
14:57 lizmat_ left 14:58 lizmat joined
ab5tract	I'm thinking more about how it's a PITA to manually cd into MoarVM, capture the current commit hash, cd into nqp, bump, commit, save the hash, cd into rakudo and bump again	15:05	Copy link Message link Add to gist Remove
lizmat	afk&		Copy link Message link Add to gist Remove
ab5tract	it's no big deal to bump when the code is all ready to bump		Copy link Message link Add to gist Remove
	but doing that while trying to get something in MoarVM changed that requires verification in Rakudo has not been the most pleasant experience so far	15:06	Copy link Message link Add to gist Remove
	Here's what I have in my history for making it less painful:	15:10	Copy link Message link Add to gist Remove
	`cd nqp/MoarVM/; g commit -a --amend; g describe \| pbcopy; cd ..; pbpaste > ./tools/templates/MOAR_REVISION; g commit -a --amend; g describe \| pbcopy; cd ..; pbpaste > ./tools/templates/NQP_REVISION`		Copy link Message link Add to gist Remove
17:02 sena_kun left
ab5tract	timo: does `MVM_NO_NURSERY_RANGE=1` activate or deactivate your new code?	18:12	Copy link Message link Add to gist Remove
	unfortunately I don't see an impact in either case	18:17	Copy link Message link Add to gist Remove
	oops, (l)user error	18:23	Copy link Message link Add to gist Remove
	I've managed to actually get the revision tags updated correctly, but I'm seeing this while compiling: gist.github.com/ab5tract/fa1c067fc...92e5d94781	18:27	Copy link Message link Add to gist Remove
	it appears that macOS refers to the same feature as `MAP_FIXED`	18:36	Copy link Message link Add to gist Remove
	hmm.. I guess it's not exactly the same.	18:39	Copy link Message link Add to gist Remove
	anyway, it compiles with `MAP_FIXED`. But I don't see any significant perf change in either direction with `MVM_NO_NURSERY_RANGE` :/	18:40	Copy link Message link Add to gist Remove
	this article implies that this should be expected since MAP_FIXED_NOREPLACE is basically Linux/x86 only? tia.mat.br/posts/2022/05/23/using-...-jits.html	18:44	Copy link Message link Add to gist Remove
18:51 japhb left 20:54 MasterDuke joined
MasterDuke	ab5tract: zoffix had a tool, i think called just 'z', that he used for interacting with moarvm/nqp/rakudo. i believe it had a `bump` subcommand that did all that	20:55	Copy link Message link Add to gist Remove
	wonder how raku would fare in this comparison? justine.lol/lex/	21:14	Copy link Message link Add to gist Remove
	timo: very interesting. fwiw, i didn't notice an improvement compiling CORE.e (though my benchmarking was not extremely rigorous)	21:25	Copy link Message link Add to gist Remove
22:54 sena_kun joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!