#moarvm on 21 February 2026 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
06:04 kjp joined 08:35 librasteve_ joined
lizmat	do we actually have a tutorial / documentation on how encoding / decoding actually works in MoarVM	10:40	Copy link Message link Add to gist Remove
	in light of adding new encodings ?		Copy link Message link Add to gist Remove
	possibly from module land	10:43	Copy link Message link Add to gist Remove
ShimmerFairy	A quick check of the docs directory says no, and I imagine there isn't any. MoarVM has been languishing with a very paltry selection of alternate encodings for its entire existence.	11:26	Copy link Message link Add to gist Remove
	I will say I'm not a fan of adding more encodings to MoarVM at this precise moment, because it's a very hardcoded affair, and I don't think it would scale well at all. (Honestly, I wonder if we shouldn't switch to an external library that's already done the work of implementing the world's encodings.)	11:31	Copy link Message link Add to gist Remove
lizmat	but a library that does NFG ?	12:05	Copy link Message link Add to gist Remove
	fwiw... I guess I'll check the pure Raku road a bit then		Copy link Message link Add to gist Remove
ShimmerFairy	NFG is just NFC + grapheme clusters, nothing that would be difficult to ask an external Unicode library for help with. (But I don't mean to suggest that swapping to external Unicode support would be trivial.)	12:13	Copy link Message link Add to gist Remove
	Just to be clear, adding new encodings ought to be totally doable, I just get nervous about how many `case MVM_encoding_type_foo:`s there'll be if we gave MoarVM a more reasonable selection of encodings to support.	12:15	Copy link Message link Add to gist Remove
lizmat	understood...	12:18	Copy link Message link Add to gist Remove
	fwiw I don't think additional encodings would really require the type of speed that MoarVM supplies		Copy link Message link Add to gist Remove
ShimmerFairy	I don't know what the motivation for doing it ourselves was when MoarVM started, but I think it's worth reconsidering, since Unicode (+ the world of other encodings) is so big. ICU is the obvious option, and unlike when MoarVM started it's actually under the control of Unicode itself now.	12:25	Copy link Message link Add to gist Remove
lizmat	so, do you have an inkling of an idea how much effort it would be to use ICU in MoarVM ?	12:27	Copy link Message link Add to gist Remove
ShimmerFairy	It wouldn't solve every possible Unicode issue (IIRC anybody who wants the Unihan properties is still on their own), but it would mean less work on the MoarVM side overall. Now that I'm thinking of it, maybe I ought to experiment with swapping it in sometime, just to see.	12:28	Copy link Message link Add to gist Remove
lizmat	++ShimmerFairy	12:33	Copy link Message link Add to gist Remove
ShimmerFairy	I've only used it on occasion myself, and I'm not super familiar with MoarVM code in general, but I would first guess that it only changes how things are implemented; ideally you wouldn't see any NQP/Raku code break, though some things might behave differently.		Copy link Message link Add to gist Remove
	One example of that last point: currently in Rakudo querying properties requires some pretty precise spelling, when Unicode actually recommends a way to loosely match them. If the ICU interface to properties handles that automatically (which I think it does), then you suddenly get less fiddly /<:Whitespace>/ and the like.	12:34	Copy link Message link Add to gist Remove
lizmat	ah, and a whole set of lookup hashes in the rakudo core could go then :-)	12:40	Copy link Message link Add to gist Remove
ShimmerFairy	I don't know if Rakudo itself does anything with processing property names, but at the very least MoarVM would be freed of a lot of stuff it has to do (and sometimes hasn't been doing). Most if not all of the UCD parsing stuff could go, for instance.	12:42	Copy link Message link Add to gist Remove
nemokosch	I wonder how well it's going to integrate with Raku regexes	12:46	Copy link Message link Add to gist Remove
13:14 MasterDuke joined
MasterDuke	ShimmerFairy: did you see github.com/MoarVM/MoarVM/issues/1988 ?	13:15	Copy link Message link Add to gist Remove
ShimmerFairy	probably not, looking		Copy link Message link Add to gist Remove
	I didn't poke at any of the ucd2c.pl code that actually writes out C code, so I'm very unfamiliar with how the resulting C files actually work.	13:20	Copy link Message link Add to gist Remove
MasterDuke	i know nothing about it either, but the C looks incorrect enough that i would assume some unicode tests should fail	13:27	Copy link Message link Add to gist Remove
13:48 MasterDuke left
timo	gustedt.wordpress.com/2026/02/15/d...and-clang/ i wish we didn't have to target msvc :\|	14:37	Copy link Message link Add to gist Remove
ShimmerFairy	That seems like a neat feature. At least it being in a future version of C means you're not stuck desperately waiting to be allowed to use the latest tech. Also note the future stdmchar.h header, which would give us some long-overdue functions for converting between the various UTFs, as well as the current locale's char and wchar_t.	14:47	Copy link Message link Add to gist Remove
timo	we'll probably only have to wait 20 years this time for MSVC to catch up	14:48	Copy link Message link Add to gist Remove
ShimmerFairy	I've got a long long history of banging my head against the wall waiting for gcc to implement some new C++ thing I want. (Luckily, they were remarkably quick with reflections for once, so that's nice.) Almost as bad as waiting for C and C++ to have good Unicode support (why I'm so excited for stdmchar.h).	14:50	Copy link Message link Add to gist Remove
	It seems that GCC and Clang had a "cleanup" attribute long before "defer" was a thing. I wonder if MSVC has a similar compiler extension that could be leaned on in the meantime.	14:54	Copy link Message link Add to gist Remove
timo	hey ShimmerFairy, you might have an idea for this: can we do something smarter for our temporary roots than the push and pop every time a relevant variable goes into or out of scope, with the help of the C compiler? the same mechanism that helps unroll stack frames based on where the IP currently is should theoretically be usable to turn the IP in a frame into a list of addresses relative to stack	15:08	Copy link Message link Add to gist Remove
	or frame pointer that need to be rooted at arbitrary moments		Copy link Message link Add to gist Remove
	but i don't have any ideas how we could get the C compiler to give that to us		Copy link Message link Add to gist Remove
	apart from using the full DWARF debug info and picking out the bits we want		Copy link Message link Add to gist Remove
	which seems like a lot of work both to program and to do at run time	15:09	Copy link Message link Add to gist Remove
	since there's so much in there we're not interested in		Copy link Message link Add to gist Remove
ShimmerFairy	I'm not familiar with the garbage collection stuff at all (had to look up what you meant by "temporary roots"), but IIUC the idea is that you want raw C pointers (always pointers?) to follow the GC-controlled objects they're pointing to?	15:21	Copy link Message link Add to gist Remove
timo	right, this is only about pointers to gc-managed objects that live on the C stack in all the internal C functions moarvm has	15:23	Copy link Message link Add to gist Remove
	we have the MVMROOT{,2,3,4,5,6,7,8} macros that make it kind of not so annoying to work with compared to explicitly pushing and popping	15:24	Copy link Message link Add to gist Remove
	but that's really not more than syntactic sugar		Copy link Message link Add to gist Remove
	i reckon that we do orders of magnitude more useless pushes and pops "just in case" we have to GC than pushes and pops that actually end up mattering		Copy link Message link Add to gist Remove
	so even though scanning the stack for roots would surely be more work than just taking them out of our little temporary roots stack, we'd only do it when we actually need to do GC, and that should make a huge difference	15:25	Copy link Message link Add to gist Remove
	or rather, that should give us a decent budget for doing work at that point in time		Copy link Message link Add to gist Remove
ShimmerFairy	Since I'm far more experienced in C++, in that world I'd expect pointers to GC objects to be represented by a class which registers itself as a callback in the GC, and then that information rests with the GC for the lifetime of the "gc_pointer" object. No clue if that's possible in C, or if a C++ class that the C code gets pointers to would work.	15:28	Copy link Message link Add to gist Remove
	timo: If your idea is to walk up the callstack at runtime to search for pointers, I think my starting point would be looking at how C++ exceptions are implemented, since they have to walk up the stack to find a suitable `catch`.	15:29	Copy link Message link Add to gist Remove
timo	right, that's stack unwinding, the compiler generates tables of "instructions" that are keyed on the instruction pointer relative to the function's start address essentially		Copy link Message link Add to gist Remove
	the thing about the "register a callback with the gc" idea is that I think it would be equivalent to what we do with the temporary roots at the moment; add it to a list when it goes "into scope", remove it from the list when it "leaves the scope"	15:32	Copy link Message link Add to gist Remove
	we already had a fun run-in with unwinding tables when the windows C runtime switched the "always properly unwind stack frames" switch in their longjmp implementation on	15:35	Copy link Message link Add to gist Remove
	the problem being, moarvm's JIT wasn't generating & registering these tables for the generated code, so trying to MVM_exception_throw_adhoc past a jitted frame back into the interpreter would crash	15:36	Copy link Message link Add to gist Remove
ShimmerFairy	I think the key issue is that somebody has to know who to update when things get shuffled around. I feel like the only way to avoid costs would be to set up the lists of variables-to-update in compile time, one list for every possible callstack, and then at the start of each relevant function is a `MVM_GLOBAL_list_to_update = &this_list`.	15:37	Copy link Message link Add to gist Remove
	(and if a function can be part of mulitple possible callstacks, that becomes way more annoying to do)	15:38	Copy link Message link Add to gist Remove
timo	I'm still hoping there's something that can really only have run-time impact in the actual event where it's necessary	15:40	Copy link Message link Add to gist Remove
	I'm actually writing a blog post vaguely related to this, but it's focused on APIs that are meant to be used when you generate code at run time	15:41	Copy link Message link Add to gist Remove
ShimmerFairy	Worth noting that the vaguely similar idea of C++ exceptions is one of two places in that language where you're forced to pay a runtime cost even when you don't want it. (The other being RTTI for polymorphic stuff.)	15:42	Copy link Message link Add to gist Remove
timo	oh, can you elaborate on the runtime cost for c++ exceptions?	15:44	Copy link Message link Add to gist Remove
ShimmerFairy	While I'm far, far from the kind of expert programmer who could meaningfully help with this, so far it seems to me like one of those programming tasks where you wish there was a better way, but isn't. Copying an image to framebuffer is always gonna be an O(width * height) operation for something in the computer, for instance.	15:45	Copy link Message link Add to gist Remove
	timo: I may be misremembering slightly, I'm not familiar with the low-level details. There's certainly a space cost, but apparently in the Itanium ABI runtime cost only occurs when exceptions are triggered?	15:46	Copy link Message link Add to gist Remove
timo	whoa, pthread already includes an unwinder in itself for the pthread cancellation handlers feature	15:47	Copy link Message link Add to gist Remove
	I am certain that an IP-keyed table similar to exception handler tables can be used to get the addresses of gc-relevant pointers only at the critical moments, the main difficulty is to find a way where I wouldn't have to do it all by hand, for example by parsing all the generated .s files the compiler spits out :D	15:51	Copy link Message link Add to gist Remove
ShimmerFairy	I wonder if you could manage something by putting a list of pointers on the stack, and then a marker value that the GC can detect on the stack for functions to update. Maybe something like `{ DEFINE_UPDATE_MARKER(2); void * update_list[2] = {&foo, &bar}; ... }`		Copy link Message link Add to gist Remove
timo	non-precise GCs, as opposed to the precise GC we have in moarvm, literally go through the stack and look for "stuff that looks like pointers to GC managed objects" with a variety of tricks to make it more reliable	15:52	Copy link Message link Add to gist Remove
ShimmerFairy	(the idea in that example being that the marker comes just before the list, and tells you how big the list is)		Copy link Message link Add to gist Remove
timo	I wonder if that's difficult to do with our use case, since the scopes variables are rooted in don't correspond 1:1 with function bodies	15:54	Copy link Message link Add to gist Remove
ShimmerFairy	The compiler has to map out where all the stack variables go for that function, even if its lifetime is a lot shorter than the whole function. (Though I'm more familiar with generated assembler on old computers at the moment, so maybe x86_64 assembler does it different.) I think as long as a space on the stack doesn't get reused for pointers and non-pointers, it would work.	15:58	Copy link Message link Add to gist Remove
	That is to say, as long as a function with blocks like `{ GC_Pointer * foo = init_thing(); ... } ... { int bar = 42; ... }` doesn't decide to have `foo` and `bar` share the same spot on the stack.	15:59	Copy link Message link Add to gist Remove
timo	on higher optimization levels, I would say it's very likely that spots get reused	16:00	Copy link Message link Add to gist Remove
	I would even say that it's not uncommon for stack frames to change in size from moment to moment		Copy link Message link Add to gist Remove
	after all, that's what the "push" and "pop" instructions do		Copy link Message link Add to gist Remove
	and even though they are often not emitted literally in optimized code, there definitely is add and sub to the stack pointer register that I've seen	16:01	Copy link Message link Add to gist Remove
	I asked on the compiler explorer discord, they may have a good idea for where I can look	16:02	Copy link Message link Add to gist Remove
ShimmerFairy	Yeah, my thinking is colored by the fact that I've been staring at ancient 32-bit MIPS code for a while now, particularly code which very rarely uses a frame pointer (that is, the functions are compiled so that they can allocate all the stack they need at the start and not think about it again).		Copy link Message link Add to gist Remove
timo	:D	16:04	Copy link Message link Add to gist Remove
	I have a mild interest in internals of retro video games	16:06	Copy link Message link Add to gist Remove
	so I have at least a very mild familiarity with stuff like that I guess?		Copy link Message link Add to gist Remove
	frame pointers are expensive to keep up to date, so it wouldn't be a surprise to see that code written for devices that have clock rates in the "multiple megahertz" would not bother to keep them	16:08	Copy link Message link Add to gist Remove
	without frame pointers you can still fully properly unwind your stacks as long as you have the dwarf debug info (or equivalent), it's just more work		Copy link Message link Add to gist Remove
	I've seen linux distros discuss moving back to -fno-omit-frame-pointer by default because it makes profiling and debugging and core dumps and all that much nicer to work with, and I agree. the performance impact seems not too terrible either	16:09	Copy link Message link Add to gist Remove
ShimmerFairy	(To be clear, just in case the terms mean different things in AMD64, in MIPS land there are no explicit stack-handling instructions, so you just decrement the sp register at function start, and re-increment it at function end. the fp (frame pointer) register only gets used when you need to allocate more stack space mid-function for some reason.)	16:12	Copy link Message link Add to gist Remove
timo	that makes sense		Copy link Message link Add to gist Remove
ShimmerFairy	btw, looking at what could maybe be done with ICU, I was immediately reminded that it loves working on strings in UTF-16 form, which is annoying. Unless people have been bristling at MoarVM's memory usage wrt strings, that would mean converting to UTF-16 almost any time you want to use ICU's whole-string functions.	16:20	Copy link Message link Add to gist Remove
	Luckily, property lookup will accept individual codepoints just fine, so that at least wouldn't be an immediate headache.		Copy link Message link Add to gist Remove
timo	I had the impression that ICU is bad and we're far ahead of the competition because we rolled our own	16:21	Copy link Message link Add to gist Remove
	that's probably an opinion formed a decade ago and not thought about since then	16:22	Copy link Message link Add to gist Remove
ShimmerFairy	I've not been the biggest fan of ICU in the past, but really just for "I could totally write a better library (but never will)" reasons. When I need to do Unicode stuff in C++ it gets the job done well enough. I think the only real bad thing is that it's one of those insufferable projects that pegs its SONAME to the library's version, so ABI breakage happens on every major library version update.	16:24	Copy link Message link Add to gist Remove
	I'm curious as to why MoarVM avoided ICU or some other external library in the past, but in any case I think it's worth exploring. When I went to update MoarVM for the latest Unicode, I couldn't help but begin to think about MoarVM's Unicode support long term; what happens the next time nobody is around to update this one project-specific Unicode impl?	16:26	Copy link Message link Add to gist Remove
timo	we should extract moarvm's unicode stuff into a library that other projects can use, maybe we'll find volunteers that way?	16:29	Copy link Message link Add to gist Remove
	not 100% serious, but also not 100% joking	16:30	Copy link Message link Add to gist Remove
ShimmerFairy	That wouldn't be the worst idea, actually. I've always wanted to write a decent C++ library that let you work with strings like you can in Raku (after all these years I finally wrote something of a grammar library recently), but I'd probably want to give it C bindings at some point anyway.	16:31	Copy link Message link Add to gist Remove
	One point in favor of ICU is that, since 2016 (i.e. a few years after MoarVM started), it was transferred over to the Unicode consortium's control. So while I don't think explicitly advertised as, like, The Official Library™, it also would be fair to say it's not merely "just another Unicode library". I would think that fact makes it a more reliable place to hang your hat than any other existing library.	16:34	Copy link Message link Add to gist Remove
timo	IIRC parrot was using ICU and the decision was explicitly between ICU or roll-your-own, though i'm not sure if any alternatives were out there at the time and were considered	16:35	Copy link Message link Add to gist Remove
	librakunicode	16:39	Copy link Message link Add to gist Remove
ShimmerFairy	There are plenty of libraries out there, but I think ICU's probably the most comprehensive.		Copy link Message link Add to gist Remove
	timo: Now that you've put the idea in my head, I think factoring things out would be a worthwhile idea to consider. At any rate, I think the current state of things isn't good longterm. If you think about it, asking virtual machine devs to periodically remember to be Unicode devs is a bit fragile.	16:42	Copy link Message link Add to gist Remove
timo	right, getting people who are unicode devs into our boat by offering something that is useful for not just virtual machine devs who happen to need something for unicode could be a good choice	16:43	Copy link Message link Add to gist Remove
	I feel like something like this has been at least mentioned a few times in the past by at least two different people		Copy link Message link Add to gist Remove
ShimmerFairy	And like I said, I've been itching for a good C++ unicode library for a long time now (ICU is Java adapted to C adapted to C++, so it's not exactly built for the language). In fact, when I was thinking of replacing `ucd2c.pl` with a Raku script, I was wondering about if I'd just be duplicating work for my hypothetical future library, which would also need to parse the UCD into something.	16:45	Copy link Message link Add to gist Remove
timo	I wonder how much mileage we could get out of something like ucd2sqlite	16:48	Copy link Message link Add to gist Remove
	put something more programming-language-agnostic in the middle		Copy link Message link Add to gist Remove
	so if someone wants to build a python or whatever version the "parse ucd" parts don't need to be touched at all	16:49	Copy link Message link Add to gist Remove
ShimmerFairy	tbf there is the XML version of UCD provided by Unicode. Though in my opinion the true hard part is really just designing a good API to Unicode properties, the parsing feels relatively easy in comparison.	16:50	Copy link Message link Add to gist Remove
	see: www.unicode.org/reports/tr42/	16:51	Copy link Message link Add to gist Remove
timo	that could be. i haven't seriously touched the ucd2pl parts of the thing in a long long time	16:52	Copy link Message link Add to gist Remove
ShimmerFairy	My main beef with ucd2c.pl is just that it's Perl, which I'm not familiar with and means we don't get to use Grammars to parse the files. (OTOH, Raku's forceful string normalization might affect how well it can parse things here.)	16:54	Copy link Message link Add to gist Remove
	.oO(I wonder if C++26 reflections would be helpful for this sort of codegen task...)	16:56	Copy link Message link Add to gist Remove
timo	do the ucd source file we use actually have unicode strings in them? I seem to recall only the latin all-uppercase names and hexadecimal codes and property short codes and full names and such	17:06	Copy link Message link Add to gist Remove
ShimmerFairy	I don't think so? I remember having to think about normalization when working on one of the roast tests, but that's because you have to assemble the test codepoint sequences into strings to actually test them.	17:11	Copy link Message link Add to gist Remove
	Most UCD files are kept deliberately ASCII-only, with I believe only the various Test files using × and ÷ as the extent of their non-ASCII usage. I think as long as a Raku script sticks to `Uni.new(@codes).encode("utf-8")`, there shouldn't be any trouble.	17:12	Copy link Message link Add to gist Remove
21:46 kjp left 21:47 kjp joined 23:25 librasteve_ left

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!