#moarvm on 14 October 2025 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
08:29 librasteve_ joined
timo	gist.github.com/timo/a517f68af5eb8...14c229c56f random science experiment for the NFA lovers in chat: how often is each character in the core.c.setting visited by the NFA engine	13:28	Copy link Message link Add to gist Remove
jdv	more than it needs to be iirc	13:42	Copy link Message link Add to gist Remove
timo	hm?	13:55	Copy link Message link Add to gist Remove
jdv	that's just something i remember someone saying. probably Larry.	14:21	Copy link Message link Add to gist Remove
	a very long time ago		Copy link Message link Add to gist Remove
timo	yeah, it's true	14:23	Copy link Message link Add to gist Remove
	i'm not sure if we ever got concrete numbers for that though		Copy link Message link Add to gist Remove
	common knowledge suggests that if you want to go through a string fast, you ought to use SIMD where your "multiple data" is "a bunch of characters at a time", but the NFA takes one character and then goes through all the states that are active at the moment	14:26	Copy link Message link Add to gist Remove
	the information about the states is relatively spread out, so we can't simply SIMD with multiple states at the same time for one character of the string	14:27	Copy link Message link Add to gist Remove
lizmat	so maybe the NFA machinery needs an overhaul / re-implementation?	14:49	Copy link Message link Add to gist Remove
timo	it's a bit of an open question i guess	14:53	Copy link Message link Add to gist Remove
Voldenet	nfa could use heuristics (a'la astar) and the most likely path - that'd allow simd, but would require backtracking to the rarer unvisited states	14:56	Copy link Message link Add to gist Remove
timo	i still haven't gotten good visualization output for a given NFA, so i don't even know if the nfa optimizer is working as good as it could be. i think it doesn't do very much	14:57	Copy link Message link Add to gist Remove
	it's also important to remember that the NFA returns multiple viable matches, ordered by how long the literal in them is (if it uses the longlit feature)	15:00	Copy link Message link Add to gist Remove
	not sure if it would be viable to have one implementation that finds the best result quickly and returns it, so that we only have to do the exhaustive thing when the grammar engine comes back and says "give me more, please"	15:02	Copy link Message link Add to gist Remove
Voldenet	Well, in practice it'd favor one style of raku over the other if it was too naive…	17:29	Copy link Message link Add to gist Remove
	however, if matching would know "this CU favors these states based on previous evaluations" then it might be faster if the style of file is consistent enough	17:30	Copy link Message link Add to gist Remove
	in fact, it could be even possible to explicitly set most expected rules: e.g. `#pragma nfa-likely scope-declarator:sym<my> routine-declarator:sym<sub>`	17:36	Copy link Message link Add to gist Remove
	so these end up being evaluated first		Copy link Message link Add to gist Remove
	that doesn't fully sound like a good idea, but it's an idea	17:37	Copy link Message link Add to gist Remove
	if every reachable nfa state was represented as a revisitable graph node (not in any specific order, meaning that the same text could be scanned many times) it'd be slower in the worst case	17:43	Copy link Message link Add to gist Remove
	so it'd all ride on the assumption that heuristics were clever enough		Copy link Message link Add to gist Remove
	and most bytes were simply scanned once and immediately matched correctly, which may be even impossible	17:44	Copy link Message link Add to gist Remove
	other possibilities would be grouping states by probabilities and scanning bytes once per group - it'd attempt matching most likely outcomes first	17:47	Copy link Message link Add to gist Remove
	but that makes simd matching as hard as it is now	17:48	Copy link Message link Add to gist Remove
timo	i'm not sure i get what you mean		Copy link Message link Add to gist Remove
	i think i'm seeing an NFA that has unreachable states. that's rather wasteful	17:49	Copy link Message link Add to gist Remove
Voldenet	I mean that "golden path" could extend to multiple rules, reducing number of outcomes	17:51	Copy link Message link Add to gist Remove
timo	not going to make much of a difference in terms of performance, i don't think		Copy link Message link Add to gist Remove
Voldenet	but I see why having simple "golden path" dfa is nonsense, hardly any code can be this simple	17:54	Copy link Message link Add to gist Remove
timo	the most egregious waste of performance is probably still the re-running of NFAs when an equivalent NFA has already run as part of another protoregex further up the stack	18:02	Copy link Message link Add to gist Remove
lizmat	I guess we would need "coverage" like functionality to find out what is actually happening?	18:03	Copy link Message link Add to gist Remove
timo	kinda working on it	18:14	Copy link Message link Add to gist Remove
	# after optimizing an NFA with 21886 states, was only able to reach 14176 states!	19:00	Copy link Message link Add to gist Remove
	gist.github.com/timo/d30c0cab57e58...86ca5598cd	19:06	Copy link Message link Add to gist Remove
	oh that makes the browser unhappy huh	19:12	Copy link Message link Add to gist Remove
	gist.github.com/timo/d30c0cab57e58...states.txt - unreachable states found in NFAs while compiling Perl6/Grammar.nqp	19:13	Copy link Message link Add to gist Remove
	gist.githubusercontent.com/timo/d3...patch.diff - the patch to get the same output locally to play around		Copy link Message link Add to gist Remove
	i think my nfa dump code missed that sometimes a fate edge actually does have a "to"	20:49	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!