jnthn | I ended up reviewing it anyway | 00:02 | |
Left a couple of small comments but don't see anything serious. | |||
samcv | ok cool :) | 00:04 | |
thanks! | |||
i find my spelling decreases the deeper i get into unicode land | 00:06 | ||
jnthn, does it load the hash table of unicode properties everytime moar starts, even before them being used? | 00:13 | ||
jnthn | It builds the hashes the first time we need them | 00:14 | |
samcv | kk | ||
jnthn | But I think even the most basic Rakudo invocation ends up using them somewhere | ||
samcv | 1st time property called that we don't have a case for? | ||
jnthn | iirc there's just a static variable that is NULL | 00:15 | |
I don't actually like this much because we can't clean them up | |||
samcv | i know we short circuit some cases | ||
jnthn | Not to mention concurrency control | ||
I'd rather they were hung off MVMInstance | |||
samcv | what would be the best way? | ||
instead of its ops thingy? | |||
oh yeah i saw that comment there | 00:16 | ||
jnthn | To be clear - the data ucd2c spits out being static is fine | ||
But we iterate it at some point to make hashes | |||
And then we never free them on a --full-cleanup, and I don't think we acquire any kind of lock around their setup | |||
samcv | we only iterate over the keys and key values hashes right? | ||
ah | |||
jnthn | Those are the only two I can think of | ||
samcv | i mean would not having a hash be better? | 00:17 | |
i mean they are only used to look up propcodes | |||
jnthn | That can happen relatively often though | 00:18 | |
I think in some of the code-gen the regex engine does it relies on that being cheap | |||
otoh we also constant fold them in spesh, iirc | |||
samcv | well rakudo at least caches property values | ||
jnthn | In uniprop? | ||
samcv | in most places | ||
there and some other spots, not sure about nqp | |||
samcv wonders if we maybe should in nqp | 00:19 | ||
jnthn | Yeah, I thinking about the code we generate for /<:Foo>/ and similar. | ||
samcv | nqp has state right? | ||
jnthn | state? | ||
samcv | state $var; | ||
jnthn | No | ||
samcv | vs my | ||
how to retain state? or is it not possible | |||
jnthn | Also, state doesn't enforce any kind of locking scheme also. | ||
samcv | ah | ||
jnthn | Though that only matters in some cases | ||
samcv | what would be better to use? | 00:20 | |
jnthn | Well, the classic state emulation trick is just a lexical in the outer scope | ||
samcv | (in either nqp or rakudo) | ||
jnthn | The hashes themselves are fine by me, though | ||
samcv | is it faster to access it from nqp or rakudo or query moarvm each time for the propcode? | ||
i guess i could bench it | |||
jnthn | Yeah | ||
I mean, we'd only need a lock acquisition if the thing isn't already set up | 00:21 | ||
Provided we're careful | |||
It's not that much work to fix the lookup hash init/teardown up to be robust, so that's the path of least resistance at least. | |||
(Also note this has never actually caused a demonstrable problem for somebody.) | 00:22 | ||
samcv | very true | ||
jnthn | (The worst it does is a bit of clutter in valgrind leak check output.) | ||
I do wonder a tad why even a simple perl6 -e '' needs the prop hahses built | 00:23 | ||
samcv | well Rakudo::Internals.PROPCODE is 8.8x slower | ||
jnthn | If you don't have the hash? | ||
samcv | testing it 10000000 times | ||
no i mean nqp vs rakudo op | |||
both run from rakudo | |||
jnthn | Ah | 00:24 | |
samcv | that was for misses. guessing it will be similar for non-miss | ||
i mean misses i would think would have a bigger improment but | |||
jnthn | *nod* | ||
OK, sleep time for me... | 00:25 | ||
samcv | night ! | ||
jnthn | 'night o/ | ||
notviki | jnthn: seems Actions uses nqp::unipropcode in mainline to cashe it for parsing franctional nbumber: github.com/rakudo/rakudo/blob/nom/....nqp#L7230 | ||
samcv | notviki, that can be optimized for sure. unless unboxing it is super slow | 00:27 | |
or whatever | |||
er wait | |||
no that just caches propcodes not properties | |||
also my int $nu := +nqp::getuniprop_str($code, $nuprop); | 00:28 | ||
why do we call uniprop_str | 00:29 | ||
m: say '1.uniprop('Numeric_Value_Numerator') | |||
camelia | rakudo-moar 9fc616: OUTPUT«===SORRY!=== Error while compiling <tmp>Two terms in a rowat <tmp>:1------> say '1.uniprop('⏏Numeric_Value_Numerator') expecting any of: infix infix stopper postfix statement end …» | ||
samcv | m: say '1'.uniprop('Numeric_Value_Numerator') | ||
camelia | rakudo-moar 9fc616: OUTPUT«1» | ||
samcv | m: say '1'.uniprop-int('Numeric_Value_Numerator') | ||
camelia | rakudo-moar 9fc616: OUTPUT«3» | ||
samcv | what | ||
oh it's an enum... well | |||
guess what | |||
i added integer properties!! :) | 00:30 | ||
so that can be optimized | |||
notviki, actually it's pretty ironic | 00:34 | ||
we are looking up the numerical value of a codepoint. and to look up its numerical value we have to get a string back | 00:35 | ||
which we then have to find its numerical value | |||
to get the final one | |||
hmm on second thought we will have to store a lot more bits if we do it as integer. should have an enum of ints instead of an enum of strings | 00:37 | ||
ack | |||
at least will be easier since I already have an int type | 00:41 | ||
TimToady | for the record, I'm fine with changing cmp to some kind of standard collation semantics if the overhead is only 1% | 02:20 | |
samcv | yeah | 02:21 | |
i think leg we should keep the same | |||
but cmp mostly just 'does what i mean' in my opinion | |||
TimToady | well, arguably, leg is more closely related to strings than cmp is | ||
samcv | exactly | ||
less equal greater sounds more like numerical codepoint less equal greater etc | 02:22 | ||
cmp just seems like compare | |||
TimToady | well, it can be argued the other way 'round too | ||
samcv | i suppose so | ||
TimToady | cmp is more generic, leg is Str-specific | ||
in fact, we've gone around before on the exact semantics of cmp, and realized there's no one solution that preserves every expected consistency | 02:23 | ||
samcv | yeah | ||
TimToady | so I'd be more interested in making leg purely unicodical, whilc cmp just needs some kind of consistent ordering as a guarantee, so sort never fails | 02:24 | |
just speaking off the cuff... | |||
samcv | sort never fails for this new stuff either | ||
unless the two graphemes are equal it won't return equal | 02:25 | ||
TimToady | if the Str vs Str subset of cmp happens to match leg, that's a plus | ||
samcv | cmp doesn't call leg though. but they may have both the same nqp op i don't remember. | ||
TimToady | well, leg is coerceive, so they could only share some of the impl | 02:26 | |
*cive | |||
samcv | when do you think cmp should compare two strings by human sorting order versus by codepoint? also i have to add | 02:27 | |
comparing two strings, when they have synthetic codepoints is not going to work with the old cmp/leg | |||
it won't collate properly | |||
TimToady | I've sort of been paying half an ear to all this while doing family stuff, but by and large I do like what I'm seeing scroll by | ||
samcv | cool | 02:28 | |
TimToady | but natural sorting is not going to be consistent, any way you implement cmp | ||
samcv | what do you mean by not consistent | ||
consistent to what? | |||
TimToady | unicode collation is a good default for when you know two things are strings, but there are many conflicting ideas about a decent generic comparison | 02:29 | |
samcv | oh yeah i only mean when they _are_ both strings | ||
if they aren't we shouldn't sort that way even if we *could* | |||
TimToady | but there's a certain degree of slop; you'd like things that are vaguely numeric to sort primarily as Real, while things that are vaguely stringy to sort more like leg | 02:30 | |
but how much slop you allow there is negotiable | |||
samcv | but when you want to compare two strings, usually you either want to know if they're equal or not, or which is greater or less than the other, and i think the user should be able to ignore that codepoints exist most of the time | ||
TimToady | so the approaches that say first sort on typename, then withing the type, are a bit too rigid for what people expect | 02:31 | |
samcv | if there's some like new-unicode standard with totally different codepoints i'd like cmp to still sort the same | ||
than being totally random (at least for strings) | |||
TimToady | to the extent that we can lump all stringy things together and treat them consistently, AS IF they had been tranformed to NFG or NFD, I'm fine with having kind of a three part cmp | 02:32 | |
numeric things, stringy things, and things that fall into neigher category | |||
samcv | well i just mean plain Str objects | 02:33 | |
the unicode collation refers to NFC not NFD luckily or it would suck to do | |||
TimToady | which is another way of saying, first things that sort under <=>, then things that sort under leg, then other stuff | ||
well, most of NFG is just NFC, till we hit synthetics, so that's good | 02:34 | ||
samcv | if it's an NFD to NFD comparison i think that is a case where the programmer IS caring about actual codepoints | 02:35 | |
TimToady never really got much into the collation end of things except to make sure we designed so we could do what they want somehow eventually :) | |||
samcv | heh | ||
TimToady appreciates you're digging into it, for that reason | 02:36 | ||
samcv | glad to be here :) | ||
TimToady | my fallback position was that we might need to have a different operator for official collation, but if we can sneak it into leg with relatively little pain, I'm all for keeping things simpler | 02:37 | |
samcv | TimToady, passes all the spectests on 6.c-errata | 02:38 | |
and the only master ones it fails, are tests which uh | |||
TimToady | and we do have ways of saying: $a leg $b :foo($bar) to sneak other parameters into the comparison at need | ||
samcv | test stringified versions of unsorted hashes and things… | ||
which made me a little sad reading the tests, but oh well | |||
TimToady | though I suspect that typical users of non-typical legs will simply define their own operator | 02:39 | |
samcv | you saw the moarvm pull description? i go into pretty good detail about the mvmop | ||
though theer's one thing i forgot about | |||
TimToady | haven't actually read it, just seen the discussion go by, but if jnthn++ is happy with it, so am I :) | ||
samcv | tiebreakers. so we have a bitshift where 1 is test primary level (alphabetic/symbol sort), 2 is diacritic type things and 3 is case and 4 is basically implementation defined | 02:40 | |
but if the person doesn't want us to tiebreak in case unicode defines collation for those characters, there should be a way | |||
could just make it 8 though. that's not too hard i guess | 02:41 | ||
since it _is_ tetriary. so. i guess i'm fine. | |||
TimToady | my &infix:<myleg> = &infix:<leg>.assuming(:foo($bar)) should handle most of this | ||
samcv | i think i maybe thought of this yesterday but i've been awake a while | ||
TimToady | so we should just pick a reasonable default | 02:42 | |
samcv | github.com/MoarVM/MoarVM/pull/474 you might find this enlightening | 02:46 | |
i pretty well summarize most all the things | |||
and where it could go in the future. so i am hoping that this op will be future portable. and eventually we can implement language specific sorting (someday…) | |||
but yeah please read that, and i think you will understand collation better | |||
TimToady | hmm, I wonder if it should be called 'unileg' instead, since it's string specific | ||
and calling it 'cmp' kinda fights our generic ideas about cmp | |||
I'll try and read it after dinner if I get a chance, but we're visiting family currently, for some reason... :) | |||
samcv | ok :) | ||
02:48
ilbot3 joined
|
|||
samcv | TimToady, it should be noted that on the jvm, their string compare does primary and tertiary levels at least | 02:54 | |
so cmp already behaved differently on the two backends | 02:55 | ||
their string compare function is literally Compare, and Compare is used for many different types of objects | |||
well it seems to actually on jvm perform differently if it is a string versus like a character. which is pretty odd | 03:02 | ||
java actually says they sort 'lexically' and are super vague about it. probably for the reason that they aren't guaranteeing it won't change by leaving it intentionally vague | 03:08 | ||
05:09
pyrimidi_ joined
07:33
pyrimidine joined
09:04
domidumont joined
09:06
FROGGS joined
09:11
domidumont joined
|
|||
samcv | .tell jnthn have a question about synthetic codepoints. How are they generated, deterministically? we will need to have synthetic graphemes for things like Emoji and such, can be like 5 or 6 codepoints for one grapheme | 10:20 | |
yoleaux2 | samcv: I'll pass your message to jnthn. | ||
samcv | hopefully we can do something like this? | 10:21 | |
timotimo | they are not created deterministically | 11:01 | |
samcv, synthetics are assigned in the order of first use | 11:02 | ||
jnthn | samcv: So, going to look at your patch | 12:14 | |
yoleaux2 | 10:20Z <samcv> jnthn: have a question about synthetic codepoints. How are they generated, deterministically? we will need to have synthetic graphemes for things like Emoji and such, can be like 5 or 6 codepoints for one grapheme | ||
samcv | ok cool | 12:15 | |
jnthn | samcv: About GraphemeIter, actually if collation is defined on NFC then I'd just use CodepointIter | ||
Which will give you codepoints | |||
So you don't have to care about synthetics at all | |||
samcv | so this patch is 'perfect' (i guess) i have more changes staged but they trigger the <:space> bug. but | ||
ah k | |||
jnthn | And yeah, you'd have an iter for each string | ||
samcv | cool, will look into that | ||
jnthn | Basically, GraphemeIter and CodepointIter let you work through the units in a string | 12:16 | |
At grapheme or (NFC) codepoint level | |||
samcv | ah k | ||
jnthn | And they take care of the fact that the string may be represented in memory in a few different ways | ||
samcv | oh and i have bidi matching brackets imlemented in a branch of nqp which is super neat | ||
jnthn | Including traversing strands | ||
samcv | for matching delimiters | ||
jnthn | ooh :) | ||
samcv | yeah :) | 12:17 | |
jnthn, pretty sweet github.com/perl6/nqp/commit/7b68aa...d5e60ba853 | |||
12:18
FROGGS joined
|
|||
samcv | no longer will we have to search through a pretty long string of all delimiters + search and find the other bracket (inside the text we're parsing) | 12:18 | |
jnthn | \o/ | 12:19 | |
samcv | and we can also improve the error message like if you do Q<? > where the ? is a combining char | ||
because we can see the delimiter by property, and see it has no match(when it should if it's a normal thing) | 12:20 | ||
then say oh hey | |||
looks like you have a combining character directy after | |||
instead of trying to match it as a grapheme for how we allow arbitrary delimiters | 12:21 | ||
jnthn | Just to make sure: if I merge PR #474 we won't be busting spectests? | ||
samcv | yep! | ||
heh | |||
not gonna make that mistake again haha | |||
jnthn | :) | 12:22 | |
samcv | 6.c errata and master are All Correct | ||
jnthn | Unless you'd prefer otherwise, I'm inclined to merge thsi now | ||
samcv | nope go ahead and merge :) | ||
\o/ | |||
jnthn | And using codepoint iter or adding the Unicode DB script can come later | ||
dalek | arVM: 874b6bd | jnthn++ | / (5 files): Revert "Fix RT #122471 and #122470 return <control-0000> for \0 and other controls" |
12:23 | |
arVM: e8a2c43 | samcv++ | / (4 files): Add code for processing the Unicode collation weights to ucd2c.pl remove unused MVMROOT |
|||
synopsebot6 | Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122471 | ||
Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122470 | |||
samcv | yep | ||
samcv++ | |||
jnthn | There we go, merged | ||
samcv | am i allowed to ++ myself? | ||
lol. | |||
jnthn | samcv++ | ||
:) | |||
samcv | also. why did dalek say that | 12:24 | |
weird | |||
oh well. | |||
12:24
dalek joined
|
|||
jnthn | Not sure about why it reported the first commit | 12:25 | |
It has a tendedncy to flood and get itself booted for some reason when reporting a bunch of commits | |||
samcv | heh | ||
arVM: 0a170ab | (Jimmy Zhuo)++ | src/spesh/deopt.c: remove another unused MVMROOT |
|||
arVM: 293bda7 | jnthn++ | src/ (2 files): Merge pull request #459 from MoarVM/needless_mvmroot Remove needless MVMROOT usages |
|||
jnthn | There, I think I'm caught up on MoarVM PR review for the moment :) | 12:26 | |
samcv checks when her paperback unicode books come | 12:28 | ||
jnthn | ooh, you ordered the paperback version? | 12:29 | |
jnthn just read PDFs :) | |||
samcv | yea | ||
like 8 bucks each volume, there's two volumes total | |||
i think they're printed on demand or something so might be a bit. i want something i can leaf through | |||
i mean. | |||
would be really nice. and for 19 bucks you can have all 850ish pages | 12:30 | ||
prolly price with shipping or something idk | |||
jnthn | I did most of my reading of them when I was doing the NFG implementation. | ||
samcv | i didn't know they had a paperback book because wiki said the last one was unicode 5 | ||
jnthn | Did at least have an iPad to read them on, which was a bit more comfortable than laptop :) | 12:31 | |
Paperback woulda been nicer again...didn't even think of it. That said, I was away from home the month I did the NFG impl, and not sure I'd have felt like lugging the books with me even if I had 'em. | 12:34 | ||
samcv | i didn't know it existed until like | 12:36 | |
somehow discovered it on some odd unicode.org blog | |||
12:38
dalek joined
12:39
travis-ci joined
|
|||
travis-ci | MoarVM build failed. Jonathan Worthington 'Merge pull request #474 from samcv/unicode_collation | 12:39 | |
travis-ci.org/MoarVM/MoarVM/builds/187680690 github.com/MoarVM/MoarVM/compare/3...3c723be3e1 | |||
12:39
travis-ci left
|
|||
samcv | what | 12:40 | |
i just built it... | |||
oh only one of them failed? weird ===SORRY!=== | 12:41 | ||
No suitable MoarVM (moar executable) found using the --prefix | |||
(You can get a MoarVM built automatically with --gen-moar.) | |||
jnthn | Oh | 12:42 | |
12:42
FROGGS joined
|
|||
jnthn | Yeah, there's a bit of a race here | 12:42 | |
We get the MoarVM Travis build to also grab an NQP and build it | |||
Unfortunately, if it's a bit slow starting off the builds, and you bump MOAR_REVISION in NQP, then it ends up grabbing an NQP that wants a Moar newer than the commit it's testing :S | 12:43 | ||
This does tend to rectify itself when it builds the next commit | |||
FROGGS | samcv++ | 12:44 | |
13:34
timo joined
13:52
nebuchadnezzar joined
14:24
FROGGS joined
14:50
nebuchadnezzar joined
15:45
zakharyas joined
16:05
Ven joined
16:32
Ven joined
|
|||
dalek | arVM/even-moar-jit: 11ead5e | brrt++ | src/jit/ (3 files): Fix some more bugs with linear_scan We should not account uses for non-register references. I think there is still a bug with register releasing. Because the MVM_JIT_ARCH macro trickery relies on the symbol naem of the macro, and because we also need to distinguish between possible values, it's necessary to #define MVM_JIT_ARCH_X64 etc. and later #undef them. I'm looking into a more regular way to solve this. |
16:39 | |
16:48
pyrimidine joined
16:52
Ven joined
17:32
Ven joined
17:52
Ven joined
|
|||
dogbert2 | jnthn: are you lurking or resting :) | 17:55 | |
just wondering if the following gist contains any useful information that might come in handy next year? gist.github.com/dogbert17/4096deaa...2f7b6834c7 | 17:59 | ||
18:00
zakharyas joined
18:12
Ven joined
18:32
Ven joined
|
|||
jnthn | dogbert2: Well, not anything that's new to me at least. There's an interesting general issue there with lazy evaluation. | 18:37 | |
18:50
pyrimidine joined
18:52
Ven joined
|
|||
dogbert2 | jnthn: is there anything I can do in order to get more information or do you think that you have enough already? | 18:54 | |
jnthn | dogbert2: I understand that one (at least, in terms of the problem) well already. :) | 18:55 | |
walk & | 18:57 | ||
dogbert2 | jnthn: thx, will look for something else then. have a nice walk. | 18:59 | |
19:03
Ven joined
19:23
Ven joined
19:49
domidumont joined
20:02
Ven joined
|
|||
dogbert2 | a new gist hopefully moar :) interesting: gist.github.com/dogbert17/801b712c...37398514b5 | 20:16 | |
20:21
Ven joined
20:25
zakharyas joined
20:29
pyrimidine joined
|
|||
jnthn | That's a good one... | 20:32 | |
dogbert2 | seems 100% reproducible as well | 20:35 | |
jnthn | Goodie :) | 20:37 | |
dogbert2 | sometimes it also spews out a lot of stuff before the panic, like | 20:38 | |
jnthn | Will have a look into that next week once I get back to things. | ||
dogbert2 | 6opaque: no such attribute '$!named' in type QAST::Var+{QAST::SpecialArg} when trying to bind a value | ||
in any named at gen/moar/stage2/QASTNode.nqp line 30 | |||
should I write a MoarVM issue for it or is the RT enough? | 20:39 | ||
jnthn | Feel free to file a MoarVM issue | 20:42 | |
And reference the RT | |||
dogbert2 | will do | ||
jnthn | Thanks :) | ||
20:50
Ven joined
21:09
Ven joined
21:30
Ven joined
|
|||
samcv | jnthn, uhm was having an issue with trying to fix grapheme count | 21:45 | |
for ZWJ | |||
oh wait nvm i'm half awake | 21:46 | ||
let me wake up i cannot even put this in words properly atm ;) | |||
jnthn | Oh, I mighta just answered it on #perl6-dev? :) | ||
ZWJ rings a bell 'cus the exact treatement of it in the Unicode text segmentation algo changed between Unicode 8 and 9. | 21:47 | ||
I *think* it was ZWJ anyway | 21:48 | ||
samcv | yeah it is ZWJ but i need to program in more cases | 21:51 | |
but i wasn't able to affect change, it just didn't seem to do anything | 21:52 | ||
even when i had it panic in the same function and deleted moar. maybe that function isn't calleld much | |||
jnthn | I didn't actually fully do the new TR29 :/ | ||
It...seemed to contradict itself. | |||
(It claimed you only needed to look at the immediately surrounding 2 chars. But also had rules about regional indicators and odd/even counts of them.) | 21:53 | ||
samcv | yeah it is complicated ;) | 21:54 | |
jnthn | Well, complicated is OK, contradicting itself in the space of a few paragraphs less so ;) | ||
And if we *do* need to track odd/even we'll need a refactor | 21:55 | ||
samcv | i think we want to do a quick-check and see if it's emoji ond then being able to do more checks | ||
yep | |||
oh but this shouldn't be too hard to fix. we need to not break when ZWJ + Emoji character | 21:56 | ||
(let's say there are only two characters) i was trying to get it to count as one graphpme | |||
let me open the file | |||
jnthn | OK, *that* bit will be easy enough | ||
samcv | yeah | ||
was going to do that to start with | |||
jnthn | Go ahead :) | 21:57 | |
samcv | will at least make us more accurate | ||
jnthn | Yeah | ||
jnthn is glad somebody else is hacking on this stuff :) | |||
lizmat too | |||
jnthn: BTW,. are you aware that --profile doesn't produce any Allocation info anymore ? | |||
jnthn | It's not that I don't like working on it (I find Unicode stuff interesting), it's just that there's so many things that need looking at... | 21:58 | |
lizmat: No :( | |||
samcv | jnthn, i was looking at `should_break` and somehow | ||
even when i had moarvm panic and clean install it didn't even panic | |||
lizmat | jnthn: should I make that a MoarVM issue? | ||
samcv | maybe that's not called in many cases? which one do i need to look | ||
jnthn | samcv: There's an NFG quickcheck property | 21:59 | |
samcv | jnthn, agreed | ||
jnthn | samcv: That hides it. | ||
samcv | ah | ||
poo! | |||
jnthn | "hides" | ||
So yeah, you'd need to fiddle with that too :) | |||
samcv | so what is this NFG quickcheck | ||
where do i have to look | |||
jnthn | lizmat: I don't know that code has changed at all in MoarVM. (mroe) | ||
samcv | should_break is only called if conditions for our moar NFG quickcheck are a certain way? | ||
jnthn | lizmat: I did however notice that it was rendering a bit odd wih the tabs last time I used it in Firefox | 22:00 | |
lizmat: So I wonder if one of the AngularJS version updates mighta knocked it out | |||
lizmat | ah, possibly only on Safari ? | ||
hmmm./... | |||
jnthn | Think it was Firefox I saw that oddness in | ||
samcv: Yes. The trail starts in normalize.h, where we try to avoid doing anything expensive. | 22:01 | ||
samcv: NFG quickcheck is a synthetic Unicode property (as in, we add it ourselves). It's handled in udc2c.pl | |||
samcv | yeah i understand that part. oh ok | 22:02 | |
jnthn | lizmat: Anyway, if that is the place it busted (most likley IMO) then the code in question is in the NQP repo, not Moar | ||
samcv | about it being our own synthetic property | ||
jnthn | samcv: iirc I did it based on the NFC quick check and then some extra bits | ||
lizmat | jnthn: ok, will try to find that | ||
samcv | kk | ||
lizmat | it's very annoying to me :-) | ||
samcv curses udc2c.pl | |||
jnthn | I think (hope!) I wrote a few comments explaining it even ;) | ||
Well, that's one of the bits of ucd2c.pl that I wrote, so at least I should be able to explain it :P | |||
samcv | # If it's the NFC_QC property, then use this as the default value for | 22:03 | |
# NFG_QC also. | |||
so i need to set NFG_QC to uh. 0? to get it to handle it with more conditions? | |||
jnthn | umm, lemme see | ||
sub tweak_nfg_qc should be informative | 22:04 | ||
But yes, set it to 0 is the trick | |||
samcv | there are only certain characters currently used in ZWJ sequences | ||
so will probably have it set to 0 for those ones | |||
so we can do some checking and make sure it's not a combining sequence | |||
and prolly will add the ones in Emoji 5.0 too i don't want to have to do this again | 22:05 | ||
and they aren't going to remove any, anyway | |||
since unlike a lot of unicode stuff, the usage of them is predating the actual codification | |||
err. the _final_ release of the spec that is | |||
what happens if I set NFG_QC to 0 for everything? theoretically we should _still_ have it work the same or will it break things | 22:06 | ||
even though it will be slower | |||
may be a good thing to check. what do you think about that | 22:07 | ||
22:09
Ven joined
|
|||
jnthn | Sorry, got dragged away to taste freshly made vinegret :D | 22:12 | |
If you set it to zero for everything then we'd end up in should_break always (and it'd be slow :)) | 22:14 | ||
But the idea is to set it to zero if it can ever answer "no" if it's one of the arguments to should_break | 22:15 | ||
samcv | ah ok | ||
jnthn | ZWJ already has it as 0 github.com/MoarVM/MoarVM/blob/mast...c.pl#L1720 | ||
Since I believe multiple emoji can now form a single grapheme, we should likely set those to 0 also | 22:18 | ||
22:28
Ven joined
22:49
Ven joined
|
|||
samcv | yeah | 22:50 | |
mst | unicode: the gift that keeps on giving | ||
samcv | m: "a\x[200C]b".chars.say | 22:51 | |
camelia | rakudo-moar b2332c: OUTPUT«2» | ||
samcv | that is fine. 200C is often used to specify ligatures were used | ||
m: Uni.new(0x200D, 0x1F1E6).say | 22:57 | ||
camelia | rakudo-moar b2332c: OUTPUT«Uni:0x<200d 1f1e6>» | ||
samcv | m: Uni.new(0x200D, 0x1F1E6).Str.say | ||
camelia | rakudo-moar b2332c: OUTPUT«🇦» | ||
23:08
Ven joined
|
|||
samcv | jnthn, do we have a call to check which unicode version we have? | 23:18 | |
if we don't, we should | 23:19 | ||
and i'll automate setting that with ucdcpll | |||
23:28
Ven joined
|
|||
samcv | woo 51/744 failing tests => 45 failed only | 23:37 | |
jnthn, you missed the Grapheme_Cluster_Break property for setting NFG_QC!!! | 23:40 | ||
well I am here now! | |||
will all be fixed :) | |||
jnthn, did you see my comments on java's string compareTo function? | 23:41 | ||
23:48
Ven joined
23:56
Ven joined
|
|||
samcv | they totally duck collation and sorting issues using any actual real algorithm, just say they sort it "Lexically" and are super vauge about how that works | 23:58 |