Geth MoarVM: MasterDuke17++ created pull request #1871:
Add fast path when difference for 8-bit strings
03:13
09:40 sena_kun joined
Geth MoarVM/main: 492e511f0d | MasterDuke17++ (committed using GitHub Web editor) | src/strings/ops.c
Add fast path when difference for 8-bit strings

If we're comparing 8-bit strings and there's a difference, we don't need to go through the generic grapheme-iterator path, since we know there won't be combining synthetics.
10:57
timo hold up
lizmat hold up? 10:58
timo i was about to comment on this
lizmat ah, ok ;-(
I took nine's approval, and the code appeared simple enough
timo 8 bit grapheme storage, i.e. not "only in ascii range", includes synthetics
we can't just say the lower of the synthetic codepoint is the lower one for real 10:59
because those are allocated first-come-first-served
Geth MoarVM/main: 639e401db3 | (Elizabeth Mattijsen)++ | src/strings/ops.c
Revert "Add fast path when difference for 8-bit strings"

This reverts commit 492e511f0df59fadc44c2fb690b3e877a5834f40.
timo well, a full revert is maybe a bit much since we didn't bump yet
lizmat better be safe than sorry, I'd say 11:00
timo we will have to check if either of the two graphemes is a synthetic, in which case we can't do the fast path. we have to see if it's still faster to do it this way when the additional check goes in
lizmat I was just about to say :-)
timo i have to go AFK for a bit so i can't properly create a test case that shows this
lizmat I'll keep my handz in daz pokkets 11:01
timo but it'd probably look something like "create two buffers of utf8 bytes that are decoded in two different orders after program start which result in a character with lots of combiners on it so it's a synthetic, guaranteed. then compare strings that are less than 8 graphemes long, the same length, and end in one and the other synthetic, respectively"
if my worry is correct, those would give different results based on which synthetic was registered first by decoding the buf 11:02
we can't just create the buf from a string in the same program run because then the synthetic grapheme would be registered already at compile time and then depend on where it's seen when reading in the source code
lizmat
.oO( oh what a tangled web we weave :-)
11:03
timo where does that come from btw?
lizmat nosweatshakespeare.com/quotes/famo...-we-weave/ 11:04
timo m: my $with_a = Buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = Buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); say $with_a cmp $with_b 11:09
camelia ===SORRY!=== Error while compiling <tmp>
Undeclared name:
Buf8 used at lines 1, 1. Did you mean 'buf8', 'Buf'?
timo m: my $with_a = buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); say $with_a cmp $with_b
camelia Less
timo forgot to decode
m: my $with_a = buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); say $with_a.decode cmp $with_b.decode
camelia Less
timo m: my $with_a = buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); $with_b.decode; $with_a.decode; say $with_a.decode cmp $with_b.decode
camelia Less
timo m: my $with_a = buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); $with_a.decode; $with_b.decode; say $with_a.decode cmp $with_b.decode 11:10
camelia Less
timo ok, with the changes from the PR these all still have to give "Less"
22:06 kjp left, kjp_ joined 22:43 kjp_ left, kjp joined 22:58 sena_kun left