Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
Geth | MoarVM: MasterDuke17++ created pull request #1940: Fix two GCC compiler warnings |
00:33 | |
lizmat | And yet another Rakudo Weekly News hits the Net: rakudoweekly.blog/2025/05/19/2025-...s-is-mini/ | 10:01 | |
13:19
JimmyZhuo joined
13:33
JimmyZhuo left
16:09
kjp left
16:10
kjp joined
|
|||
jnthn | patrickb: (prevent dispatch programs from being recorded) You probably don't want to do that, because recording is the expensive step, but you can either 1) return something that receives the callee, does the work, and calls the callee, or 2) do it using dispatch resumptions, as are used in for example a `proto` with a body, that needs to run the body and then continue the multi-dispatch at the {*} | 20:58 | |
.tell MasterDuke I somewhat suspect that, just as simd has been used for various kinds of fast parsing and validation, one could write something that makes good use of them to at lesat blast through utf-8 content and verify that it is already in normalized form (at least all ASCII range and no \r can tell that to a limited degree). A multi-pass algorithm is poor memory locality, so I'm not surprised that fast | 21:08 | ||
tellable6 | jnthn, I'll pass your message to MasterDuke | ||
jnthn | utf8 decoding followed by normalization wasn't faster. Trouble is what if you chew from a meg and only then spot a problem, but I guess you could make a strand for the OK part and then continue with the full path. | ||
*chew through | 21:09 | ||
21:36
mandrillone joined
|
|||
mandrillone | What if Unicode normalization is done only when it is needed. That is when comparing strings? | 21:37 | |
It often happens that one compares a single string with many others. In that case you can normalize this single string and then the algorithm for anything else is very simple (bar some unrealistic corner case) | 21:38 | ||
It is rare to sort long list of strings | 21:39 | ||
for others uses, such as counting graphemes, you don’t really need full fledged normalization | 21:41 | ||
also it is something you rarely do IMHO | |||
If you go with “Unicode normalization only when needed”, then you need to make string comparison non commutative from the performance perspective. A convention can be chosen for what is the pivot | 21:43 | ||
or some heuristic implemented to detect that | |||
21:43
mandrillone left
|
|||
lizmat | mandrillone: how would that handle .chars ? | 21:53 | |
tellable6 | lizmat, I'll pass your message to mandrillone | ||
21:55
rakkable left,
rakkable joined
|
|||
jnthn | By making it O(n) instead of O(1), I guess. And I dunno how one is meant to apply normalization to only one of the strings involved in a comparison and expect a sensible result, nor to know which of the strings is the more compared one locally. | 22:09 | |
23:46
mandrillone joined
|
|||
mandrillone | Indeed, O(n) instead of O(1). Possibly with caching of the result | 23:48 | |
tellable6 | 2025-05-19T21:53:13Z #moarvm <lizmat> mandrillone: how would that handle .chars ? | ||
mandrillone | If normalization is applied locally, you expect a small number of synthetics. The second string doesn’t need normalization since any graphene that doesn’t belong to the first string can be just handled on the flight | 23:51 | |
23:54
mandrillone left
23:55
mandrillone joined
|
|||
mandrillone | Thing is that string comparison is a very frequent operation as such you expect some form of caching. The base algorithm should just store string as is and then normalize on the flight on each simple comparison. By starting with this terrible algorithm, one can improve it a lot step by step | 23:56 | |
focusing on meaningful optimizations | |||
23:56
mandrillone left
|