01:59 librasteve_ left
disbot6 <jubilatious1_98524> @shimmerfairy Is this 2019 Github issue relevant? It mostly concentrates on samemark but also commentary on [ZWJ] and emojis. AlexDaniel and samcv commenting: github.com/Raku/problem-solving/issues/61 05:40
<jubilatious1_98524> m: say "\c[Canada]".samemark("é");
<Raku eval> 🇨́
<jubilatious1_98524> Gee, samemark seems so powerful, could be really useful for a boatload of problems! 05:44
<jubilatious1_98524> Apologies...looks like that issue has been resolved....must doo more research first. 05:46
<shimmerfairy> It's not the same exact issue (still works funky on my local copy with the grapheme fixes), but it is related because samemark has to mess around inside a grapheme cluster. Wouldn't be surprised if samemark is in desperate need of some more careful consideration, it's functionality that was clearly designed for diacritics on Latin scripts and not much else. 05:47
<jubilatious1_98524> Thanks. Glad you are on the hunt for solutions! 05:51
<jubilatious1_98524> @samcv wrote in 2019: _"What I think is the most reasonable solution is to treat Grapheme_Cluster_Break=Extend and Grapheme_ClusterBreak=Prepend as “mark”’s. " Do you agree?
<jubilatious1_98524> (...might be pulling you off-topic here. Feel free to circle back if it strikes your fancy). 05:54
<jubilatious1_98524> From the same 2019 samcv post: [Base Emoji]([ZWJ][Emoji])* 05:56
<shimmerfairy> That would've made more sense back then I think, but nowadays there are additional rules to the grapheme boundaries, and I suspect that's not enough anymore. The rules used to be a lot simpler, and at least at the MoarVM level that resulted in assumptions and design decisions that don't hold anymore. For samemark, I think we really need to take the time to consider what a "mark" is, in relation to all graphemes. 06:03
Something I can consider after getting Unicode 17 in rakudo users' hands.
<shimmerfairy> (And from prior experience with trying to write it in a C++ project, case-insensitive stuff probably needs a once-over too, not just mark-insensitive stuff.)
<jubilatious1_98524> Interesting: Unicode/Grapheme rules are advancing! I'm assuming you might also want to look at ignoremark at some future date (seems related). Here's an issue opened by @alabamenhu : github.com/Raku/problem-solving/issues/276 06:16
<shimmerfairy> Giving that a quick glance, I think as we consider features which "get inside" grapheme clusters, we might inevitably run into something Raku has been ignoring for decades, but which has always been a part of Unicode: grapheme cluster rules are meant to be tailorable, and it shouldn't be surprising if operations on graphemes are impossible to make universal without allowing for tailoring somewhere. Something to keep in the 06:23
back of our minds for now, at least.
<shimmerfairy> Also, when you see anybody talk about Indic scripts in these old discussion, keep in mind that those concerns are largely (if not entirely) obsolete now that Unicode added the InCB rules to the default grapheme rules in Unicode 15.1. 06:24
<jubilatious1_98524> m: "🏄‍♀️".say 06:33
<Raku eval> 🏄‍♀️
<jubilatious1_98524> m: "🏄‍♀️".chars.say 06:34
<Raku eval> 1
<jubilatious1_98524> I haven't seen discussions on Indic scripts, but I will certainly keep your pointer in mind. 06:37
<jubilatious1_98524> I've been trying to find a discussion I had with @alabamenhu about grapheme indexing (emojis, flags). It concerned denoting a [Base] character as index=0 and going negative to the left (-3,-2,-1) when prepended, and going positive (1,2,3) to the right when appended. Simple number line. 06:43
<jubilatious1_98524> Anyway, I can't find it! Must be on Github somewhere as I've searched the mailing-list.
<jubilatious1_98524> Here's something I did fing: Brian Wisti's emoji look-up tool: randomgeekery.org/post/2022/08/emo...with-raku/
<shimmerfairy> The last discussions page you linked discusses them, which is why I brought that point up. 06:53
<jubilatious1_98524> Indic scripts? Must have overlooked. Thanks! 06:56
<jubilatious1_98524> Here's a little about indexing graphemes according to the "number line" scheme above: github.com/Raku/problem-solving/is...-821946770 07:05
<shimmerfairy> My first instinct is that, if you want to operate on strings at the codepoint level (that is, you can't abstract your operations away to the grapheme level), then you should be using codepoint strings like NFKD or Uni instead. But that's not a reasonable suggestion so long as Raku's support for those kinds of strings is so lacking. (Something that's been on my mind for years.) 07:12
10:10 librasteve_ joined
ShimmerFairy As of right now, I've got roast tests and MoarVM changes ready to go, I'm just curious if people want the MoarVM bits through a pull request again, or perhaps as a branch on the main repo (just to let people have a look before it becomes part of main). 12:53
Once the MoarVM changes land in main, I'll feel OK with putting the updated tests out, and then I can also bump MOAR_REVISION on nqp and push the necessary changes for Rakudo in turn. 12:54
btw, I just checked, and I don't think I have push access to MoarVM or rakudo at the moment. (I am in the Raku organization, so stuff there should be accessible.) 13:09
patrickb I'd say let's go for a PR. 13:23
Geth MoarVM: ShimmerFairy++ created pull request #1975:
Update to Unicode 17
13:32
patrickb 🎉 13:34
ShimmerFairy Since I don't like the idea of people possibly waiting on me to share the rest of my changes while I'm sleeping, what I'm going to do is make a pull request for my Rakudo changes, and push my roast changes to a branch, so that things can be merged in at any time. The only things I can't prepare ahead of time are the NQP and Moar revision bumps, for obvious reasons. 16:00
I don't think I need to open up a pull request for the roast branch, but that's everything I think. There's stuff I came across that I'd like to improve and touch up, and at the very least I want to write up an upgrade guide in case anybody finds themself in my place in the future. But this is enough to get Unicode 17 in the hands of rakudo users, assuming no objections. 16:16
lizmat ShimmerFairy++ 17:08
hope to be merging this the coming days 17:09
[Coke] Yah, let's try to get this item by the end of the weekend, if we can. 17:36
then I can kick off a blin run asap
19:56 librasteve_ left 23:43 rakkable left, rakkable joined