MasterDuke if anyone is curious, here's a commit that slows down MVM_string_find_cclass github.com/MasterDuke17/MoarVM/com...ee00db79f3 01:41
but if it inspires any suggestions to instead make it faster...i'm all ears 01:42
`MVMCodepoint cp = 0 <= g ? g : MVM_nfg_get_synthetic_info(tc, g)->codes[0];` is there any way to get rid of the branch/conditional, given we know we're looking for newlines? 01:49
MasterDuke even slower github.com/MasterDuke17/MoarVM/com...b4017b7a26 04:29
timo uhh `MVMCodepoint cp = 0 <= g ? g : MVM_nfg_get_synthetic_info(tc, g)->codes[0];` is this ternary the right way around? 08:58
... it's 0 <= g and not g <= 0, i just haven't woken up yet 09:30
i think due to the grapheme cluster algorithm a few of the things we are checking for in that check after getting the codes[0] from the synthetic grapheme can never happen, like you can't make a grapheme cluster out of vertical tab and something or form feed and something (that's 0x0b and 0x0c) and i think the only grapheme cluster that can have \r as its first entry maybe cannot be anything other 09:35
than our crlf synthetic
lizmat perhaps of interest for MoarVM: www.theregister.com/2025/02/13/has...akthrough/ 19:01
jdv where's jnthn? i thought he likes that stuff. 19:10
just kidding. wasnt it nic clark that did the recent hash stuff? 19:13
?ghr?
ugexe github.com/MWARDUNI/ElasticHashing...Hashing.py 19:15
timo there seems to be a big caveat that the bounds they have found are not possible to achieve when you can also delete elements from a hash, if i read this correctly, which i'm quite possibly not 20:28
lizmat ok, so a MoarVM / NQP "map" like structure would be needed to take advantage of this 20:34
on which a nqp::deletekey would fail 20:35
timo i'm not following that python implementation 20:36
for one, the example uses a delta of 0.1 and the paper says 1/delta is supposed to be a power of two, but 10 isn't a power of two? maybe i've not read the paper far enough 20:38
lizmat 10 is a power of 2 in binary ? 20:42
.tell MasterDuke17 looks like the find_cclass update is causing HTTP::Tiny to fail 20:43
tellable6 lizmat, I'll pass your message to MasterDuke
lizmat .tell MasterDuke17 symptoms are having newlines at the end of a string where they were not expected according to the test 20:44
tellable6 lizmat, I'll pass your message to MasterDuke
ugexe probably a \r\n issue 20:46
timo i guess the code may just be incomplete? 20:49
lizmat: you mean an unmerged branch, yeah? 20:50
lizmat no, in bumped Rakudo in main, will break in 2025.02 as it stands now 20:51
timo lizmat: yes, but not when you divide 1 / 0.1 when you write 0.1 like that in python code, then it's actually 10 in decimal and not a power of two
do you perhaps mean utf8 decoding changes? i don't see any changes to find_cclass merged to moarvm/main 20:52
lizmat github.com/rakudo/rakudo/commit/3b...94aeab140b 20:53
I think a0b70918771669555e635ec ? 20:54
linkable6 (2025-01-26) github.com/MoarVM/MoarVM/commit/a0b7091877 Speedup MVM_string_find_cclass by adding cases...
timo oh, i guess i just didn't search back long enough 20:57
well, scroll rather than search
looks like CI was pretty red for that. was that perhaps one of the CI runs where quick bumps broke half the runs? 20:58
oh ... oh no ... 21:01
... disregard my last message 21:02
lizmat what message? :-)
timo yes, exactly 21:06
the reproducing example is just "zef install HTTP::Tiny" and have it run its test cases? 21:07
lizmat yup
timo aha, the shared code for GRAPHEME_8 and GRAPHEME_ASCII don't look up synthetic info 21:14
so they don't understand that -1 is crlf
m: .raku.say for "Hello there\r\nHow are you".lines 21:16
camelia "Hello there"
"How are you"
timo m: .raku.say for "Hello there\r\nHow are you".encode("latin-1").decode("latin-1").lines 21:17
camelia "Hello there\r\nHow are you"
timo it also doesn't do it for IN_SITU_8 21:20
got a patch 21:27
Geth MoarVM/give_find_cclass_synthetics_back: 23cc069152 | (Timo Paulssen)++ | src/strings/ops.c
Re-instate lookup of synthetic codepoint info for find_cclass

It got left out accidentally. Since GRAPHEME_8 and IN_SITU_8 can store negative numbers, and especially newlines are a case of this because of the crlf synth, we need to go through the synthetic info lookup.
21:42
MoarVM: timo++ created pull request #1913:
Re-instate lookup of synthetic codepoint info for find_cclass
21:44
timo 02-rakudo/15-gh_1202.t crashed again huh. 21:54
a commenter on The Register (in the article about elastic hashing) points out that this is also only for hashing where the size of the table is known ahead of time 22:00
lizmat timo: am going to merge that PR, looks green enough to me :-) 22:20
Geth MoarVM/main: 21d96b374e | timo++ (committed using GitHub Web editor) | src/strings/ops.c
Re-instate lookup of synthetic codepoint info for find_cclass (#1913)

It got left out accidentally. Since GRAPHEME_8 and IN_SITU_8 can store negative numbers, and especially newlines are a case of this because of the crlf synth, we need to go through the synthetic info lookup.
22:21
lizmat HTTP::Tiny clean! 22:33
ugexe seems odd only 1 module broke 22:39
timo we should revive the C level coverage reports 22:47