Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
02:06
Voldenet left
02:08
Voldenet joined
|
|||
lizmat | and yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2025/05/12/2025-...c-in-time/ | 13:19 | |
timo | one case where our ropes implementation should really give good benefits is when getting the substrings out of a big heap of Match objects | 14:51 | |
though we basically already store the orig/target string and offset and length in the Match object and only create the roped substring object when the Match has .Str called on it | 14:52 | ||
(because we also want the start and length separately, and we don't have an API to ask a rope for its parts, and it would also be kind of awkward to handle) | |||
but imagine feeding /usr/share/dict/words into a grammar that just matches one line into one match object, and imagine we were eagerly creating these substrings as full objects | 14:54 | ||
we should probably still look into immediately turning roped strings into buffer-backed strings if they are very small, after some measurements of how performance differs between the two | |||
strings that fit in-situ will probably always outperform rope-based substrings | 14:55 | ||
also consider the case of having a my Str $result = ""; and then a loop that does $result ~= $something repeatedly | 14:56 | ||
if we don't have ropes, we would be copying the buffer in the $result over and over again to make the next $result | 14:57 | ||
this is something common in python and CPython has an optimization where if a string's reference count is 1, then they will just mutate it in-place | |||
and early on, Pypy didn't have something specifically for this use case, and some existing python scripts would run in quadratic time in pypy but not in cpython | 14:58 | ||
we can potentially still do something smarter in our case. right now when appending a string to a roped string we create a new rope array by copying the previous one and adding the extra ropes. maybe at some point we want to collapse the strands if we're not already doing that? | 14:59 | ||
we already don't do nested stranded strings | |||
i kind of use ropes and strands interchangeably. i think a rope is supposed to refer to a string made of strands? | 15:00 | ||
something we don't have any logic or heuristics for is what happens when you have a huge string that you take a few substrings of, and never look at the original string ever again. we do keep the huge string around in memory because the substrings point at it, but we don't do any stats or anything to figure that out at run time | 15:03 | ||
similarly we don't measure what synthetic code points have been created, and whether any of them have become unused | |||
that's the exhaustion problem japhb alluded to earlier. a very long-running moar process can slowly accumulate synthetics over time. an "attack" on this particular feature may take gigabytes of text to be fed in? still have to make a proof-of-concept to figure out if it's possible to make more synthetics with less input | 15:04 | ||
oh wow. Konsole is *not* happy about my combining character printing test | 15:10 | ||
$4 = 0xe716a | 15:32 | ||
:utfsize(9712968) | |||
that's 0xe716a synthetics generated after 9_712_968 bytes of utf8, but i'm just using one A followed by 4 combining characters out of the range of 0x300..0x36f | 15:33 | ||
choosing the same initial character also results in a very wide NFG trie node I think | 15:35 | ||
whew, memory usage is also going up quite a bit | 15:40 | ||
29k entries in the "free at safepoint" linked list | 15:44 | ||
that's not just 16 bytes per entry for the linked list alone ... | 15:45 | ||
16:34
mandrillone joined
|
|||
mandrillone | timo: looks like adding complexity to complexity | 16:37 | |
I’d get rid of sub strings altogether | 16:38 | ||
In any case, it’s an implementation detail | 16:39 | ||
16:41
mandrillone left
|