Geth | MoarVM: MasterDuke17++ created pull request #1871: Add fast path when difference for 8-bit strings |
03:13 | |
09:40
sena_kun joined
|
|||
Geth | MoarVM/main: 492e511f0d | MasterDuke17++ (committed using GitHub Web editor) | src/strings/ops.c Add fast path when difference for 8-bit strings If we're comparing 8-bit strings and there's a difference, we don't need to go through the generic grapheme-iterator path, since we know there won't be combining synthetics. |
10:57 | |
timo | hold up | ||
lizmat | hold up? | 10:58 | |
timo | i was about to comment on this | ||
lizmat | ah, ok ;-( | ||
I took nine's approval, and the code appeared simple enough | |||
timo | 8 bit grapheme storage, i.e. not "only in ascii range", includes synthetics | ||
we can't just say the lower of the synthetic codepoint is the lower one for real | 10:59 | ||
because those are allocated first-come-first-served | |||
Geth | MoarVM/main: 639e401db3 | (Elizabeth Mattijsen)++ | src/strings/ops.c Revert "Add fast path when difference for 8-bit strings" This reverts commit 492e511f0df59fadc44c2fb690b3e877a5834f40. |
||
timo | well, a full revert is maybe a bit much since we didn't bump yet | ||
lizmat | better be safe than sorry, I'd say | 11:00 | |
timo | we will have to check if either of the two graphemes is a synthetic, in which case we can't do the fast path. we have to see if it's still faster to do it this way when the additional check goes in | ||
lizmat | I was just about to say :-) | ||
timo | i have to go AFK for a bit so i can't properly create a test case that shows this | ||
lizmat | I'll keep my handz in daz pokkets | 11:01 | |
timo | but it'd probably look something like "create two buffers of utf8 bytes that are decoded in two different orders after program start which result in a character with lots of combiners on it so it's a synthetic, guaranteed. then compare strings that are less than 8 graphemes long, the same length, and end in one and the other synthetic, respectively" | ||
if my worry is correct, those would give different results based on which synthetic was registered first by decoding the buf | 11:02 | ||
we can't just create the buf from a string in the same program run because then the synthetic grapheme would be registered already at compile time and then depend on where it's seen when reading in the source code | |||
lizmat | .oO( oh what a tangled web we weave :-) |
11:03 | |
timo | where does that come from btw? | ||
lizmat | nosweatshakespeare.com/quotes/famo...-we-weave/ | 11:04 | |
timo | m: my $with_a = Buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = Buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); say $with_a cmp $with_b | 11:09 | |
camelia | ===SORRY!=== Error while compiling <tmp> Undeclared name: Buf8 used at lines 1, 1. Did you mean 'buf8', 'Buf'? |
||
timo | m: my $with_a = buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); say $with_a cmp $with_b | ||
camelia | Less | ||
timo | forgot to decode | ||
m: my $with_a = buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); say $with_a.decode cmp $with_b.decode | |||
camelia | Less | ||
timo | m: my $with_a = buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); $with_b.decode; $with_a.decode; say $with_a.decode cmp $with_b.decode | ||
camelia | Less | ||
timo | m: my $with_a = buf8.new(0x41, 0xCD, 0x99, 0xE2, 0x83, 0xB0); my $with_b = buf8.new(0x42, 0xCD, 0x99, 0xE2, 0x83, 0xB0); $with_a.decode; $with_b.decode; say $with_a.decode cmp $with_b.decode | 11:10 | |
camelia | Less | ||
timo | ok, with the changes from the PR these all still have to give "Less" | ||
22:06
kjp left,
kjp_ joined
22:43
kjp_ left,
kjp joined
22:58
sena_kun left
|