#moarvm on 12 May 2025 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
02:06 Voldenet left 02:08 Voldenet joined
lizmat	and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2025/05/12/2025-...c-in-time/	13:19	Copy link Message link Add to gist Remove
timo	one case where our ropes implementation should really give good benefits is when getting the substrings out of a big heap of Match objects	14:51	Copy link Message link Add to gist Remove
	though we basically already store the orig/target string and offset and length in the Match object and only create the roped substring object when the Match has .Str called on it	14:52	Copy link Message link Add to gist Remove
	(because we also want the start and length separately, and we don't have an API to ask a rope for its parts, and it would also be kind of awkward to handle)		Copy link Message link Add to gist Remove
	but imagine feeding /usr/share/dict/words into a grammar that just matches one line into one match object, and imagine we were eagerly creating these substrings as full objects	14:54	Copy link Message link Add to gist Remove
	we should probably still look into immediately turning roped strings into buffer-backed strings if they are very small, after some measurements of how performance differs between the two		Copy link Message link Add to gist Remove
	strings that fit in-situ will probably always outperform rope-based substrings	14:55	Copy link Message link Add to gist Remove
	also consider the case of having a my Str $result = ""; and then a loop that does $result ~= $something repeatedly	14:56	Copy link Message link Add to gist Remove
	if we don't have ropes, we would be copying the buffer in the $result over and over again to make the next $result	14:57	Copy link Message link Add to gist Remove
	this is something common in python and CPython has an optimization where if a string's reference count is 1, then they will just mutate it in-place		Copy link Message link Add to gist Remove
	and early on, Pypy didn't have something specifically for this use case, and some existing python scripts would run in quadratic time in pypy but not in cpython	14:58	Copy link Message link Add to gist Remove
	we can potentially still do something smarter in our case. right now when appending a string to a roped string we create a new rope array by copying the previous one and adding the extra ropes. maybe at some point we want to collapse the strands if we're not already doing that?	14:59	Copy link Message link Add to gist Remove
	we already don't do nested stranded strings		Copy link Message link Add to gist Remove
	i kind of use ropes and strands interchangeably. i think a rope is supposed to refer to a string made of strands?	15:00	Copy link Message link Add to gist Remove
	something we don't have any logic or heuristics for is what happens when you have a huge string that you take a few substrings of, and never look at the original string ever again. we do keep the huge string around in memory because the substrings point at it, but we don't do any stats or anything to figure that out at run time	15:03	Copy link Message link Add to gist Remove
	similarly we don't measure what synthetic code points have been created, and whether any of them have become unused		Copy link Message link Add to gist Remove
	that's the exhaustion problem japhb alluded to earlier. a very long-running moar process can slowly accumulate synthetics over time. an "attack" on this particular feature may take gigabytes of text to be fed in? still have to make a proof-of-concept to figure out if it's possible to make more synthetics with less input	15:04	Copy link Message link Add to gist Remove
	oh wow. Konsole is not happy about my combining character printing test	15:10	Copy link Message link Add to gist Remove
	$4 = 0xe716a	15:32	Copy link Message link Add to gist Remove
	:utfsize(9712968)		Copy link Message link Add to gist Remove
	that's 0xe716a synthetics generated after 9_712_968 bytes of utf8, but i'm just using one A followed by 4 combining characters out of the range of 0x300..0x36f	15:33	Copy link Message link Add to gist Remove
	choosing the same initial character also results in a very wide NFG trie node I think	15:35	Copy link Message link Add to gist Remove
	whew, memory usage is also going up quite a bit	15:40	Copy link Message link Add to gist Remove
	29k entries in the "free at safepoint" linked list	15:44	Copy link Message link Add to gist Remove
	that's not just 16 bytes per entry for the linked list alone ...	15:45	Copy link Message link Add to gist Remove
16:34 mandrillone joined
mandrillone	timo: looks like adding complexity to complexity	16:37	Copy link Message link Add to gist Remove
	I’d get rid of sub strings altogether	16:38	Copy link Message link Add to gist Remove
	In any case, it’s an implementation detail	16:39	Copy link Message link Add to gist Remove
16:41 mandrillone left

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!