3 Dec 2025
disbot6 <jubilatious1_98524> From the same 2019 samcv post: [Base Emoji]([ZWJ][Emoji])* 05:56
<shimmerfairy> That would've made more sense back then I think, but nowadays there are additional rules to the grapheme boundaries, and I suspect that's not enough anymore. The rules used to be a lot simpler, and at least at the MoarVM level that resulted in assumptions and design decisions that don't hold anymore. For samemark, I think we really need to take the time to consider what a "mark" is, in relation to all graphemes. 06:03
Something I can consider after getting Unicode 17 in rakudo users' hands.
<shimmerfairy> (And from prior experience with trying to write it in a C++ project, case-insensitive stuff probably needs a once-over too, not just mark-insensitive stuff.)
<jubilatious1_98524> Interesting: Unicode/Grapheme rules are advancing! I'm assuming you might also want to look at ignoremark at some future date (seems related). Here's an issue opened by @alabamenhu : github.com/Raku/problem-solving/issues/276 06:16
<shimmerfairy> Giving that a quick glance, I think as we consider features which "get inside" grapheme clusters, we might inevitably run into something Raku has been ignoring for decades, but which has always been a part of Unicode: grapheme cluster rules are meant to be tailorable, and it shouldn't be surprising if operations on graphemes are impossible to make universal without allowing for tailoring somewhere. Something to keep in the 06:23
back of our minds for now, at least.
<shimmerfairy> Also, when you see anybody talk about Indic scripts in these old discussion, keep in mind that those concerns are largely (if not entirely) obsolete now that Unicode added the InCB rules to the default grapheme rules in Unicode 15.1. 06:24
<jubilatious1_98524> m: "🏄‍♀️".say 06:33
<Raku eval> 🏄‍♀️
<jubilatious1_98524> m: "🏄‍♀️".chars.say 06:34
<Raku eval> 1
<jubilatious1_98524> I haven't seen discussions on Indic scripts, but I will certainly keep your pointer in mind. 06:37
<jubilatious1_98524> I've been trying to find a discussion I had with @alabamenhu about grapheme indexing (emojis, flags). It concerned denoting a [Base] character as index=0 and going negative to the left (-3,-2,-1) when prepended, and going positive (1,2,3) to the right when appended. Simple number line. 06:43
<jubilatious1_98524> Anyway, I can't find it! Must be on Github somewhere as I've searched the mailing-list.
<jubilatious1_98524> Here's something I did fing: Brian Wisti's emoji look-up tool: randomgeekery.org/post/2022/08/emo...with-raku/
<shimmerfairy> The last discussions page you linked discusses them, which is why I brought that point up. 06:53
<jubilatious1_98524> Indic scripts? Must have overlooked. Thanks! 06:56
<jubilatious1_98524> Here's a little about indexing graphemes according to the "number line" scheme above: github.com/Raku/problem-solving/is...-821946770 07:05
<shimmerfairy> My first instinct is that, if you want to operate on strings at the codepoint level (that is, you can't abstract your operations away to the grapheme level), then you should be using codepoint strings like NFKD or Uni instead. But that's not a reasonable suggestion so long as Raku's support for those kinds of strings is so lacking. (Something that's been on my mind for years.) 07:12
ShimmerFairy As of right now, I've got roast tests and MoarVM changes ready to go, I'm just curious if people want the MoarVM bits through a pull request again, or perhaps as a branch on the main repo (just to let people have a look before it becomes part of main). 12:53
Once the MoarVM changes land in main, I'll feel OK with putting the updated tests out, and then I can also bump MOAR_REVISION on nqp and push the necessary changes for Rakudo in turn. 12:54
btw, I just checked, and I don't think I have push access to MoarVM or rakudo at the moment. (I am in the Raku organization, so stuff there should be accessible.) 13:09
patrickb I'd say let's go for a PR. 13:23
Geth MoarVM: ShimmerFairy++ created pull request #1975:
Update to Unicode 17
13:32
patrickb 🎉 13:34
ShimmerFairy Since I don't like the idea of people possibly waiting on me to share the rest of my changes while I'm sleeping, what I'm going to do is make a pull request for my Rakudo changes, and push my roast changes to a branch, so that things can be merged in at any time. The only things I can't prepare ahead of time are the NQP and Moar revision bumps, for obvious reasons. 16:00
I don't think I need to open up a pull request for the roast branch, but that's everything I think. There's stuff I came across that I'd like to improve and touch up, and at the very least I want to write up an upgrade guide in case anybody finds themself in my place in the future. But this is enough to get Unicode 17 in the hands of rakudo users, assuming no objections. 16:16
lizmat ShimmerFairy++ 17:08
hope to be merging this the coming days 17:09
[Coke] Yah, let's try to get this item by the end of the weekend, if we can. 17:36
then I can kick off a blin run asap
4 Dec 2025
Geth MoarVM/main: 8 commits pushed by Faye++, (Elizabeth Mattijsen)++ 16:40
lizmat looks like Stage parse dropped about .5 second for me 16:46
[Coke] nice. 17:03
timo that's a surprise 17:13
ShimmerFairy If I had to guess why the parse stage dropped (and it's a total guess), maybe calling the boundary-finding function once per grapheme, rather than once per string position, reduced some function call overhead. 17:18
timo oh, does the time taken to decode the input file actually land in the timing for stage parse? i guess that makes sense 17:19
ShimmerFairy I'd guess so, since the only thing beforehand is "stage start", which I'm guessing is just "the program's started". 17:27
timo when i run `perf record -F10000 -g rakudo -e '"gen/moar/CORE.c.setting".IO.slurp.chars.say'` i get 8.92% time spent in MVM_string_utf8_decode and children, and a total 0.131048554 seconds time elapsed 17:50
it would be quite a feat for stage parse to drop .5 seconds from just faster utf8 decoding if the performance on my machine is in any way representative of what liz's computer does? 17:52
lizmat hmmm... well... maybe I was too enthusiastic : next Stage parse was at 25.0 (before 24.2) so I guess there's a bit of noise there 17:54
timo yeah, that's always a concern