#moarvm on 6 July 2024 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
00:38 MasterDuke joined
MasterDuke	timo1++!! the branch is passing all the CI jobs that main is!	00:39	Copy link Message link Add to gist Remove
	great find, i was definitely not looking at that code at all for the fix		Copy link Message link Add to gist Remove
timo1	:)	00:46	Copy link Message link Add to gist Remove
	thanks for your effort to resurrect this code		Copy link Message link Add to gist Remove
	i've found two spots that were creating fresh string objects with 0 graphemes in them		Copy link Message link Add to gist Remove
MasterDuke	likewise thanks for getting it started	00:47	Copy link Message link Add to gist Remove
	that doesn't sound great		Copy link Message link Add to gist Remove
timo1	it was just the decode functions for latin1 and ascii; one of them was called at least from the generated config.c where we create a hash from what Configure.pl has found out, and some of these keys got an empty string bound. surely the majority of empty strings from latin1 came from somewhere else though	00:48	Copy link Message link Add to gist Remove
	and the other is from cu_get_string, which every CU probably has the empty string in it, and that was giving us one fresh empty string per compunit		Copy link Message link Add to gist Remove
	which there's many compunit when compiling the core setting, thanks to "compile in context"	00:49	Copy link Message link Add to gist Remove
	near the end of core setting compilation, there's 17.9% MVM_STRING_STRAND with just a single strand. i want to double-check that these don't have repetition or so, and maybe there can be an in-situ strand storage type	00:50	Copy link Message link Add to gist Remove
MasterDuke	isn't there a VM constant for the empty string?	00:51	Copy link Message link Add to gist Remove
timo1	there is, we just weren't returning it in the right spot :)		Copy link Message link Add to gist Remove
	extra fun: don't try to return the empty string constant always when the bytes count is 0, because surprise surprise, the very first empty string gets created in that same function :D	00:52	Copy link Message link Add to gist Remove
MasterDuke	he	00:56	Copy link Message link Add to gist Remove
	*heh		Copy link Message link Add to gist Remove
	i was gratified by those stats you posted earlier, glad to see in-situ is being used a bunch	00:57	Copy link Message link Add to gist Remove
timo1	r: say 647437 * 0.2	01:02	Copy link Message link Add to gist Remove
camelia	Can't open perl script "/home/camelia/rakudo-j-inst/bin/eval-client.pl": No such file or directory		Copy link Message link Add to gist Remove
	129487.4		Copy link Message link Add to gist Remove
timo1	m: say 129487 / 256	01:03	Copy link Message link Add to gist Remove Run code
camelia	505.808594		Copy link Message link Add to gist Remove
timo1	there's like enough single-grapheme in_situ_8 strings to store every possible single byte 500 times :D		Copy link Message link Add to gist Remove
MasterDuke	don't you also have a branch with a cache of short string?	01:04	Copy link Message link Add to gist Remove
	github.com/MoarVM/MoarVM/compare/m...ring_cache		Copy link Message link Add to gist Remove
	i think i started to rebase it, but it was a little trickier than the in-situ-string one	01:05	Copy link Message link Add to gist Remove
	but that might be a good next project after in-situ-strings is merged		Copy link Message link Add to gist Remove
	it's not actually that big of a diff	01:06	Copy link Message link Add to gist Remove
timo1	but is it actually helpful, i wonder	01:07	Copy link Message link Add to gist Remove
	it's already a lot better to have the single character strings in situ instead of with its own allocation		Copy link Message link Add to gist Remove
MasterDuke	dunno if it's helpful, but have to rebase to measure		Copy link Message link Add to gist Remove
timo1	my investigation into what single character string would be most worth caching: $	01:25	Copy link Message link Add to gist Remove
	:! 6.8; :$ 36.4; :& 3.1; :+ 1.6; :, 16.2; :- 1.2; :? 4.4; :@ 1.6;		Copy link Message link Add to gist Remove
	:D 4.9; :a 1.8; :b 1.4; :i 2.3; :\| 1.5;		Copy link Message link Add to gist Remove
	left out anything with a too low percentage	01:26	Copy link Message link Add to gist Remove
	but i think it's very funny that caching $ would be worth so much		Copy link Message link Add to gist Remove
MasterDuke	ugh, i'm seeing emojis	01:27	Copy link Message link Add to gist Remove
timo1	hehe. ok on second thought maybe : isn't the best character to show "this is a character directly, rather than a hex code for the ascii value"	01:28	Copy link Message link Add to gist Remove
	earlier in a run i get this: :1 5.6; :2 5.8; :3 5.6; :4 5.4;	01:29	Copy link Message link Add to gist Remove
	:5 5.4; :6 5.4; :7 5.4; :8 5.4; :9 5.2;		Copy link Message link Add to gist Remove
MasterDuke	no 0?	01:30	Copy link Message link Add to gist Remove
timo1	funny enough, no	01:34	Copy link Message link Add to gist Remove
MasterDuke	that's...surprising		Copy link Message link Add to gist Remove
timo1	i think a good chunk of the medium-length in_situ_8 strings come from the lc/uc thing i might have a branch for	01:35	Copy link Message link Add to gist Remove
MasterDuke	?	01:36	Copy link Message link Add to gist Remove
	rakudo branch?	01:43	Copy link Message link Add to gist Remove
timo1	actually might be nqp?	01:48	Copy link Message link Add to gist Remove
	ok, QRegex's MATCH uses nqp::split against "=" to figure out the list of names a capture has (it can be more than 1) and that generates quite an amount of strings	02:15	Copy link Message link Add to gist Remove
MasterDuke	split should be making them in-situ if possible. but you think they're good candidates for the short-string-cache?	02:18	Copy link Message link Add to gist Remove
timo1	they are in situ in this case yes	02:23	Copy link Message link Add to gist Remove
MasterDuke	off to bed. should have to some time tomorrow to follow up though	02:37	Copy link Message link Add to gist Remove
05:46 [Coke] left 05:47 [Coke] joined 08:34 sena_kun joined
Geth	MoarVM/main: 2c866ae859 \| (Patrick Böker)++ \| src/core/exceptions.c Fix returning from LEAVE to surrounding scope When a LEAVE phaser is run during an unwind, there is no valid return_address set in the LEAVE phasers outer frame. (It's return_address is actually set to interp_cur_op which is still the one that started the unwind in some unrelated frame.) Thus when a `return` in a LEAVE happens, the handler of the outer frame is missed, because `search_frame_handlers_lex()` ... (19 more lines)	11:48	Copy link Message link Add to gist Remove
	MoarVM/main: 08848c7e45 \| (Patrick Böker)++ (committed using GitHub Web editor) \| src/core/exceptions.c Merge pull request #1785 from patrickbkr/leave-return-handler-miss-fix Fix returning from LEAVE to surrounding scope		Copy link Message link Add to gist Remove
	lizmat will do the appropriate bumping	11:53	Copy link Message link Add to gist Remove
MasterDuke	what's the overall feel about github.com/MoarVM/MoarVM/pull/1802 now that CI is passing? ready to merge? anything else to add? more optimization needed?	12:59	Copy link Message link Add to gist Remove
lizmat	if you could also update github.com/MoarVM/MoarVM/blob/main...e.markdown with these changes, that would be nice	13:07	Copy link Message link Add to gist Remove
	that will help me in updating MoarVM::Bytecode :-)	13:08	Copy link Message link Add to gist Remove
MasterDuke	don't think anything would change there, i don't see mention of a string's storage type	13:10	Copy link Message link Add to gist Remove
lizmat	perhaps that could be documented though?	13:11	Copy link Message link Add to gist Remove
MasterDuke	yeah, probably a good idea	13:12	Copy link Message link Add to gist Remove
lizmat	I'm not sure that would make sense, but any more guidance in grokking MoarVM internals would be appreciated by people now and in the future :)		Copy link Message link Add to gist Remove
	possibly even yourself :-)		Copy link Message link Add to gist Remove
MasterDuke	hm, why can't i profile compiling CORE.e? sometimes i can't do CORE.c because it used too much memory, but CORE.e should be fine	13:21	Copy link Message link Add to gist Remove
lizmat	perhaps CORE.e does something async?	13:22	Copy link Message link Add to gist Remove
MasterDuke	can't do CORE.d either	13:23	Copy link Message link Add to gist Remove
timo1	i wonder if the microbenchmark where the number of allocations went way down but the time didn't shows how good mimalloc is at many very short-lived allocations, or something like that	15:00	Copy link Message link Add to gist Remove
	the implication might be "the microbenchmark is too far removed from real use cases"?	15:21	Copy link Message link Add to gist Remove
	i thought a bit more about the short string cache; when building the stuff that generates statistics about strings, I've come to appreciate the lifetime of the worklist object, and that it's passed into every gc_mark function	20:56	Copy link Message link Add to gist Remove
	it's also per-thread, so accessing it is always safe, no need to worry about concurrently running gcs of different threads	20:57	Copy link Message link Add to gist Remove
	and doesn't stay around in between gc runs	20:58	Copy link Message link Add to gist Remove
	the existing branch, the "cache one grapheme strings" one, had to make sure that everything that could create strings with just one character would look in the cache first	21:00	Copy link Message link Add to gist Remove
	that's a lot more places that have to be changed to take advantage of this, whereas giving the gc collect phase something that deduplicates strings would hit every string created how-ever. with the drawback that the string makes it into the nursery and therefore causes the next gc run to be sooner	21:01	Copy link Message link Add to gist Remove
	a cache that also holds longer strings would be beneficial for things like common keys in hashes, though		Copy link Message link Add to gist Remove
	we build boatloads of these when parsing code, for all the named match captures	21:02	Copy link Message link Add to gist Remove
	and as i've seen whenever we have something like <foo=bar> we recreate the foo and maybe also the bar string and put them as keys to a newly created hash		Copy link Message link Add to gist Remove
	that might be a lot of potential for deduplication there	21:03	Copy link Message link Add to gist Remove
MasterDuke	oh interesting. reducing mem used during parse/build would be very welcome	21:17	Copy link Message link Add to gist Remove
21:18 vrurg joined
timo1	having the gc run do the caching and finding also means that the cost of that is only paid for strings that survive long enough for GC to happen, whereas having it at the spot where, for example, substr or split creates it would be paid for all strings, even the very short-lived ones	22:19	Copy link Message link Add to gist Remove
22:24 sena_kun left
timo1	it would be bonkers if we could store in-situ characters inside a strand's pointer that normally goes to another string	22:44	Copy link Message link Add to gist Remove
	no idea how often that would actually work	22:45	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!