Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
Geth MoarVM: MasterDuke17++ created pull request #1940:
Fix two GCC compiler warnings
00:33
lizmat And yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2025/05/19/2025-...s-is-mini/ 10:01
13:19 JimmyZhuo joined 13:33 JimmyZhuo left 16:09 kjp left 16:10 kjp joined
jnthn patrickb: (prevent dispatch programs from being recorded) You probably don't want to do that, because recording is the expensive step, but you can either 1) return something that receives the callee, does the work, and calls the callee, or 2) do it using dispatch resumptions, as are used in for example a `proto` with a body, that needs to run the body and then continue the multi-dispatch at the {*} 20:58
.tell MasterDuke I somewhat suspect that, just as simd has been used for various kinds of fast parsing and validation, one could write something that makes good use of them to at lesat blast through utf-8 content and verify that it is already in normalized form (at least all ASCII range and no \r can tell that to a limited degree). A multi-pass algorithm is poor memory locality, so I'm not surprised that fast 21:08
tellable6 jnthn, I'll pass your message to MasterDuke
jnthn utf8 decoding followed by normalization wasn't faster. Trouble is what if you chew from a meg and only then spot a problem, but I guess you could make a strand for the OK part and then continue with the full path.
*chew through 21:09
21:36 mandrillone joined
mandrillone What if Unicode normalization is done only when it is needed. That is when comparing strings? 21:37
It often happens that one compares a single string with many others. In that case you can normalize this single string and then the algorithm for anything else is very simple (bar some unrealistic corner case) 21:38
It is rare to sort long list of strings 21:39
for others uses, such as counting graphemes, you don’t really need full fledged normalization 21:41
also it is something you rarely do IMHO
If you go with “Unicode normalization only when needed”, then you need to make string comparison non commutative from the performance perspective. A convention can be chosen for what is the pivot 21:43
or some heuristic implemented to detect that
21:43 mandrillone left
lizmat mandrillone: how would that handle .chars ? 21:53
tellable6 lizmat, I'll pass your message to mandrillone
21:55 rakkable left, rakkable joined
jnthn By making it O(n) instead of O(1), I guess. And I dunno how one is meant to apply normalization to only one of the strings involved in a comparison and expect a sensible result, nor to know which of the strings is the more compared one locally. 22:09
23:46 mandrillone joined
mandrillone Indeed, O(n) instead of O(1). Possibly with caching of the result 23:48
tellable6 2025-05-19T21:53:13Z #moarvm <lizmat> mandrillone: how would that handle .chars ?
mandrillone If normalization is applied locally, you expect a small number of synthetics. The second string doesn’t need normalization since any graphene that doesn’t belong to the first string can be just handled on the flight 23:51
23:54 mandrillone left 23:55 mandrillone joined
mandrillone Thing is that string comparison is a very frequent operation as such you expect some form of caching. The base algorithm should just store string as is and then normalize on the flight on each simple comparison. By starting with this terrible algorithm, one can improve it a lot step by step 23:56
focusing on meaningful optimizations
23:56 mandrillone left