#moarvm on 30 November 2025 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
00:00 vrurg joined 01:35 librasteve_ left
ShimmerFairy	m: say "\x[1193F]A".chars # should be "1", since 2016	03:00	Copy link Message link Add to gist Remove Run code
camelia	2		Copy link Message link Add to gist Remove
ShimmerFairy	It's suddenly become apparent to me that, if I manage the Unicode upgrade well enough, all of Moar's unicode support ought to get a once-over.	03:01	Copy link Message link Add to gist Remove
	It's kinda fun to work through the grapheme code and finding situations where it doesn't work right. For instance, "A\c[ZWJ]🧀" is misinterpreted as a single grapheme, because the implementation of rule GB11 is too broad.	06:58	Copy link Message link Add to gist Remove
lizmat	++ShimmerFairy yete again :-)	08:58	Copy link Message link Add to gist Remove
ShimmerFairy	I've decided that unfortunately the grapheme breaker function needs to be completely rewritten. It was written for a world where you only needed one codepoint behind and ahead of the possible break point, but nowadays we have a number of rules that depend on more context. The current function only just manages RI grapheme state, but bolting additional stateful checks on would be awkward.	09:04	Copy link Message link Add to gist Remove
	Perhaps the people who worked on the original function could work it in, but to me at least the design doesn't fit the current ruleset anymore.	09:05	Copy link Message link Add to gist Remove
lizmat	hmmm... I hope that's not going to be too detrimental to decoding efficiency	09:08	Copy link Message link Add to gist Remove
ShimmerFairy	I think it should be fine, since the state machine approach I'm trying to write up right now would let you skip rule checks that can't possibly be true. For a first pass the "one ahead/behind only" rules are mostly handled in a single state, but I think it could be broken down further to skip more checks on each run.	09:12	Copy link Message link Add to gist Remove
lizmat	that sounds good: more power to ya!	09:24	Copy link Message link Add to gist Remove
ShimmerFairy	Out of curiosity, is there an established way of profiling NFG string handling? I figured keeping track of how long 'make stresstest' takes would be informative, but if there's a better method I'll use it instead.	10:28	Copy link Message link Add to gist Remove
10:59 librasteve_ joined
lizmat	timo might know	11:30	Copy link Message link Add to gist Remove
disbot6	<jubilatious1_98524> m: say "\x[1193F]".chars;	17:24	Copy link Message link Add to gist Remove
	<Raku eval> 1		Copy link Message link Add to gist Remove
timo	it seemed to me like we already had something that can do more than one ahead and behind with some state kept, especially for the regional indicators handling that wants multiple-of-two codes	17:30	Copy link Message link Add to gist Remove
disbot6	<jubilatious1_98524> m: say "\x[1193F]";	17:33	Copy link Message link Add to gist Remove
	<Raku eval> 𑤿		Copy link Message link Add to gist Remove
timo	the "does a string need re-checking after concat" check may be more interesting?	17:34	Copy link Message link Add to gist Remove
disbot6	<jubilatious1_98524> I don't know if \x[1193F] is a free-standing character or not.	17:36	Copy link Message link Add to gist Remove
timo	trying to get something from unicode.org and it's taking ... a minute?	17:43	Copy link Message link Add to gist Remove
	looks like 1193F is InCB=None and Grapheme_Extend is No, but Grapheme_Cluster_Break is Prepend	17:47	Copy link Message link Add to gist Remove
	so with it being a Prepend that means we should never break after it (except of course at end-of-text)	17:48	Copy link Message link Add to gist Remove
disbot6	<jubilatious1_98524> Amazing!	17:51	Copy link Message link Add to gist Remove
	<jubilatious1_98524> m: say Unicode.version;	17:56	Copy link Message link Add to gist Remove
	<Raku eval> v15.0		Copy link Message link Add to gist Remove
	<jubilatious1_98524> m: say "A\c[ZWJ]🧀".chars	18:06	Copy link Message link Add to gist Remove
	<Raku eval> 1		Copy link Message link Add to gist Remove
timo	you think that's not right?	18:16	Copy link Message link Add to gist Remove
19:08 patrickb left, patrickb joined 19:25 vrurg_ joined, linkable6 left, notable6 left, linkable6 joined, sugarbeet left 19:26 sugarbeet joined, bloatable6 left, benchable6 left, tellable6 left 19:27 vrurg left 19:29 notable6 joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!