11 Oct 2025 | |||
[Coke] kills the blin run | 16:37 | ||
[Coke] shuts down the VM so he's not paying for it. :| | 16:38 | ||
Voldenet | otoh on intel CPUs rapidhash has half the throughput than xxh3 for some reason | 16:39 | |
Geth | MoarVM/revert-1959-update_rapidhash_to_v3: 9d79b310f2 | (Elizabeth Mattijsen)++ (committed using GitHub Web editor) | 2 files Revert "Bump rapidhash to v3" This reverts commit 204a73098bc5d3e34db467c85cb2d6b3ac7991de. |
16:42 | |
MoarVM: lizmat++ created pull request #1960: Revert "Bump rapidhash to v3" |
|||
MoarVM/main: 70225c6b5e | (Elizabeth Mattijsen)++ (committed using GitHub Web editor) | 2 files Revert "Bump rapidhash to v3" (#1960) This reverts commit 204a73098bc5d3e34db467c85cb2d6b3ac7991de. |
|||
lizmat | [Coke]: sorry, was a little over eager doing something :-( | 16:43 | |
Geth | MoarVM/revert-1960-revert-1959-update_rapidhash_to_v3: a61094c65c | (Elizabeth Mattijsen)++ (committed using GitHub Web editor) | 2 files Revert "Revert "Bump rapidhash to v3" (#1960)" This reverts commit 70225c6b5efd10aa2c930982f2cfebf4e7073ba7. |
||
MoarVM: lizmat++ created pull request #1961: Bump rapidhash to v3 |
16:44 | ||
lizmat | [Coke]: sorry for the noise :-( | ||
[Coke] | no worries! | 16:48 | |
Geth | MoarVM: MasterDuke17++ created pull request #1962: Bump libtommath to 1.3.0 |
17:56 | |
japhb | [Coke]: Apologies for starting something; I was just studying and pontificating .... | 21:26 | |
[Coke] | heee. no worries, everything is fine. | 21:56 | |
13 Oct 2025 | |||
librasteve_ | rakudoweekly.blog/2025/10/13/2025-41-tchotchke/ | 18:15 | |
MasterDuke | what synthetics can a latin1 string have? | 18:37 | |
tellable6 | 2025-10-11T16:41:46Z #raku-dev <[Coke]> masterduke - if you're serious about getting that into 2025.10, please make sure we get version bumps up to rakudo. Ping me when it's done so I can restart blin. Hopefully no issues on other platforms. | ||
2025-10-11T16:49:08Z #raku-dev <[Coke]> masterduke oops, nevermind! | |||
[Coke] | my previous comment about synthetics - I didn't realize he was using a grammar, so that might not be relevant. | 19:09 | |
unless this is unrelated, in which case: \r\n, maybe? | |||
(is that actually synthetic?) | 19:10 | ||
MasterDuke | [Coke]: unrelated. i'm just confused by github.com/MoarVM/MoarVM/blob/main...tin1.c#L36 | ||
but yeah, i believe we normalize it into a synthetic | 19:11 | ||
14 Oct 2025 | |||
timo | gist.github.com/timo/a517f68af5eb8...14c229c56f random science experiment for the NFA lovers in chat: how often is each character in the core.c.setting visited by the NFA engine | 13:28 | |
jdv | more than it needs to be iirc | 13:42 | |
timo | hm? | 13:55 | |
jdv | that's just something i remember someone saying. probably Larry. | 14:21 | |
a very long time ago | |||
timo | yeah, it's true | 14:23 | |
i'm not sure if we ever got concrete numbers for that though | |||
common knowledge suggests that if you want to go through a string fast, you ought to use SIMD where your "multiple data" is "a bunch of characters at a time", but the NFA takes one character and then goes through all the states that are active at the moment | 14:26 | ||
the information about the states is relatively spread out, so we can't simply SIMD with multiple states at the same time for one character of the string | 14:27 | ||
lizmat | so maybe the NFA machinery needs an overhaul / re-implementation? | 14:49 | |
timo | it's a bit of an open question i guess | 14:53 | |
Voldenet | nfa could use heuristics (a'la astar) and the most likely path - that'd allow simd, but would require backtracking to the rarer unvisited states | 14:56 | |
timo | i still haven't gotten good visualization output for a given NFA, so i don't even know if the nfa optimizer is working as good as it could be. i think it doesn't do very much | 14:57 | |
it's also important to remember that the NFA returns multiple viable matches, ordered by how long the literal in them is (if it uses the longlit feature) | 15:00 | ||
not sure if it would be viable to have one implementation that finds the best result quickly and returns it, so that we only have to do the exhaustive thing when the grammar engine comes back and says "give me more, please" | 15:02 | ||
Voldenet | Well, in practice it'd favor one style of raku over the other if it was too naive… | 17:29 | |
however, if matching would know "this CU favors these states based on previous evaluations" then it might be faster if the style of file is consistent enough | 17:30 | ||
in fact, it could be even possible to explicitly set most expected rules: e.g. `#pragma nfa-likely scope-declarator:sym<my> routine-declarator:sym<sub>` | 17:36 | ||
so these end up being evaluated first | |||
that doesn't fully sound like a good idea, but it's an idea | 17:37 | ||
if every reachable nfa state was represented as a revisitable graph node (not in any specific order, meaning that the same text could be scanned many times) it'd be slower in the worst case | 17:43 | ||
so it'd all ride on the assumption that heuristics were clever enough | |||
and most bytes were simply scanned once and immediately matched correctly, which may be even impossible | 17:44 | ||
other possibilities would be grouping states by probabilities and scanning bytes once per group - it'd attempt matching most likely outcomes first | 17:47 | ||
but that makes simd matching as hard as it is now | 17:48 | ||
timo | i'm not sure i get what you mean | ||
i think i'm seeing an NFA that has unreachable states. that's rather wasteful | 17:49 | ||
Voldenet | I mean that "golden path" could extend to multiple rules, reducing number of outcomes | 17:51 | |
timo | not going to make much of a difference in terms of performance, i don't think | ||
Voldenet | but I see why having simple "golden path" dfa is nonsense, hardly any code can be this simple | 17:54 | |
timo | the most egregious waste of performance is probably still the re-running of NFAs when an equivalent NFA has already run as part of another protoregex further up the stack | 18:02 | |
lizmat | I guess we would need "coverage" like functionality to find out what is actually happening? | 18:03 | |
timo | kinda working on it | 18:14 | |
# after optimizing an NFA with 21886 states, was only able to reach 14176 states! | 19:00 | ||
gist.github.com/timo/d30c0cab57e58...86ca5598cd | 19:06 | ||
oh that makes the browser unhappy huh | 19:12 | ||
gist.github.com/timo/d30c0cab57e58...states.txt - unreachable states found in NFAs while compiling Perl6/Grammar.nqp | 19:13 | ||
gist.githubusercontent.com/timo/d3...patch.diff - the patch to get the same output locally to play around | |||
i think my nfa dump code missed that sometimes a fate edge actually does have a "to" | 20:49 |