#moarvm on 8 April 2026 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
ShimmerFairy	I'm not actually super familiar with how MoarVM stores its Unicode properties, but if I ever get around to modernizing that ucd perl script I certainly will be.	06:36	Copy link Message link Add to gist Remove
08:03 vrurg_ joined, vrurg left 11:40 librasteve_ left
[Coke]	ShimmerFairy: any thoughts on making the unicode_1_name property available? I may take a stab at that so I can try to close a module issue request I gave myself. :)	13:04	Copy link Message link Add to gist Remove
ShimmerFairy	So long as you don't expose it as if it were just another kind of character name (that is, .uniprop("Unicode_1_Name") but not "\c[OLD NAME]"), I think that'd be OK. I need to refresh my memory, but I'm confident that 1.0 names are not part of the character name namespace, so integrating it into that namespace in places like \c[...] needs to be thought through.	13:11	Copy link Message link Add to gist Remove
[Coke]	I just want it to be available in .uniprop, honestly	13:18	Copy link Message link Add to gist Remove
ShimmerFairy	Yeah, that's fine, and in general all Unicode properties ought to be available anyway (well, except for the provisional ones, but none exist in the "base" UCD properties, only in Unihan and similar areas)	13:27	Copy link Message link Add to gist Remove
14:22 woodi_ left, woodi joined 14:23 librasteve_ joined 14:31 kjp left 14:43 kjp joined
timo	I see that we are storing "EGYPTIAN HIEROGLYPH-13460"/* 13460 / through "EGYPTIAN HIEROGLYPH-143FA"/ 143FA */ with their full name; we do have a mechanism to handle codepoints where the number is part of the name, so that could be added to that	15:37	Copy link Message link Add to gist Remove
lizmat	does that also apply to:	15:39	Copy link Message link Add to gist Remove
	m: say 0xEFFFD.chr.uniname		Copy link Message link Add to gist Remove Run code
evalable6	<reserved-EFFFD>		Copy link Message link Add to gist Remove
lizmat	?		Copy link Message link Add to gist Remove
	or is there logic to generate that uniname ?	15:40	Copy link Message link Add to gist Remove
timo	same for "KHITAN SMALL SCRIPT CHARACTER-18B00"/* 18B00 / through /"KHITAN SMALL SCRIPT CHARACTER-18CD5"/ 18CD5 */ but that's a much smaller block		Copy link Message link Add to gist Remove
	CONTROL, RESERVED, SURROGATE, PRIVATE-USE all use this	15:41	Copy link Message link Add to gist Remove
	also CJK UNIFIED IDEOGRAPH-, CJK COMPATIBILITY IDEOGRAPH-, and TANGUT IDEOGRAPH-	15:42	Copy link Message link Add to gist Remove
[Coke]	I opened github.com/rakudo/rakudo/issues/6108	15:45	Copy link Message link Add to gist Remove
timo	gist.github.com/timo/5b4c22ffeb8a5...7da5b03ec1 words in unicode character names by frequency with examples of each	15:57	Copy link Message link Add to gist Remove
	so i think unicode names have just the letters A through Z, the digits 0 through 9, the dash - and a single space	16:08	Copy link Message link Add to gist Remove
	leaving me with theoretically 226 byte values to encode a longer word, or choosing a few of these values as a prefix to a second byte giving 225 + 255, or 224 + 2 * 255, or 225 - x + x * 255 for x < 225?	16:11	Copy link Message link Add to gist Remove
	hm, but when we create the hash for looking up characters by name we put const char * in the entries for the actual names	16:16	Copy link Message link Add to gist Remove
	so then in the cases where the hash is needed we would end up with 1x the "compressed" storage and then extra space for the expanded versions of these characters?	16:17	Copy link Message link Add to gist Remove
	OTOH the "compression" scheme is round-trippable; we could compress a string before looking it up in the hash then we'd just store the compressed name in there and can be using const char* into static memory still	16:19	Copy link Message link Add to gist Remove
ShimmerFairy	Looking at the standard, EGYPTIAN HIEROGLYPH is one of the kinds of "derived" names listed, so the fact that MoarVM doesn't encode them in the same way as the others comes down to me not knowing I should think of it when I was doing the 17.0 upgrade.	16:21	Copy link Message link Add to gist Remove
timo	the hieroglyphs are kind of two sections, one where the code point number is in it, one where the mapping isn't 1:1 because there's like multiple variants of one in a row before the next number	16:22	Copy link Message link Add to gist Remove
ShimmerFairy	Oh that's weird, UnicodeData.txt didn't "compress" the egyptian hierogylphs into ranges like the CJK ideographs are, which explains why this new kind of derived name slipped through unnoticed.	16:23	Copy link Message link Add to gist Remove
timo	m: .uniname.say for 0x13000..0x13010		Copy link Message link Add to gist Remove Run code
evalable6	EGYPTIAN HIEROGLYPH A001 EGYPTIAN HIEROGLYPH A0…		Copy link Message link Add to gist Remove
	timo, Full output: gist.github.com/b534312bdf6eb40c21...6cc773e732		Copy link Message link Add to gist Remove
timo	m: .uniname.words.tail.say for 0x13000..0x13010		Copy link Message link Add to gist Remove Run code
evalable6	A001 A002 A003 A004 A005 A005A A006 A006A A006B A007 A008 A009 A010 A011 A012 A013 A014		Copy link Message link Add to gist Remove
timo	... hard to see but there's 006, 006A, 006B there	16:24	Copy link Message link Add to gist Remove
ShimmerFairy	Actually, it looks like a few of the derived name ranges are spelled out.		Copy link Message link Add to gist Remove
	Oh huh, looks like Unicode's wording is in need of updating. Rule NR2 suggests that only characters with Ideographic=True are possibly affected by its rule, but the Egyptian hieroglyphs mentioned in the relevant table don't have that property.	16:29	Copy link Message link Add to gist Remove
timo	sounds like you can post a bug report to the unicode consortium! :D	16:30	Copy link Message link Add to gist Remove
ShimmerFairy	I wonder, would it be worth it for ucd2c.pl to handle derived names that UnicodeData spells out? Currently it only does special stuff for the ranges that aren't already spelled out, and that just gives them all a dummy name that tells moarvm to generate the real name at runtime, I think.	16:35	Copy link Message link Add to gist Remove
	timo: btw, was it mimalloc that was causing fetch issues a little while ago? Updating moar I just got the message (on a second 'git pull') of "fatal: couldn't find remote ref refs/heads/master", and turns out the mimalloc main branch is, er, "main".	16:40	Copy link Message link Add to gist Remove
timo	right, we have something called "extents", MVM_NUM_UNICODE_EXTENTS counts them, and they do something in generate_codepoints_by_name		Copy link Message link Add to gist Remove
	uh yeah could have been mimalloc	16:41	Copy link Message link Add to gist Remove
	i wonder why it was referencing "master" in your case; is that something that requires "git submodule sync" to fix or just update?	16:42	Copy link Message link Add to gist Remove
ShimmerFairy	On the second pull it fell back to grabbing the commit directly. I just ran "git submodule sync" but I don't know how to check if it solved the problem, since my copy of moar is now up-to-date.	16:43	Copy link Message link Add to gist Remove
timo	git submodules still serve to stump, it seems	16:51	Copy link Message link Add to gist Remove
ShimmerFairy	I just checked, and I've had this copy of the repo since at least mid-2016, so I'm willing to bet that something was just outdated, and that perhaps that sync fixed things for the future.	16:53	Copy link Message link Add to gist Remove
18:25 vrurg_ left 18:37 vrurg joined 18:40 vrurg left 18:57 vrurg joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!