jnthn I ended up reviewing it anyway 00:02
Left a couple of small comments but don't see anything serious.
samcv ok cool :) 00:04
thanks!
i find my spelling decreases the deeper i get into unicode land 00:06
jnthn, does it load the hash table of unicode properties everytime moar starts, even before them being used? 00:13
jnthn It builds the hashes the first time we need them 00:14
samcv kk
jnthn But I think even the most basic Rakudo invocation ends up using them somewhere
samcv 1st time property called that we don't have a case for?
jnthn iirc there's just a static variable that is NULL 00:15
I don't actually like this much because we can't clean them up
samcv i know we short circuit some cases
jnthn Not to mention concurrency control
I'd rather they were hung off MVMInstance
samcv what would be the best way?
instead of its ops thingy?
oh yeah i saw that comment there 00:16
jnthn To be clear - the data ucd2c spits out being static is fine
But we iterate it at some point to make hashes
And then we never free them on a --full-cleanup, and I don't think we acquire any kind of lock around their setup
samcv we only iterate over the keys and key values hashes right?
ah
jnthn Those are the only two I can think of
samcv i mean would not having a hash be better? 00:17
i mean they are only used to look up propcodes
jnthn That can happen relatively often though 00:18
I think in some of the code-gen the regex engine does it relies on that being cheap
otoh we also constant fold them in spesh, iirc
samcv well rakudo at least caches property values
jnthn In uniprop?
samcv in most places
there and some other spots, not sure about nqp
samcv wonders if we maybe should in nqp 00:19
jnthn Yeah, I thinking about the code we generate for /<:Foo>/ and similar.
samcv nqp has state right?
jnthn state?
samcv state $var;
jnthn No
samcv vs my
how to retain state? or is it not possible
jnthn Also, state doesn't enforce any kind of locking scheme also.
samcv ah
jnthn Though that only matters in some cases
samcv what would be better to use? 00:20
jnthn Well, the classic state emulation trick is just a lexical in the outer scope
samcv (in either nqp or rakudo)
jnthn The hashes themselves are fine by me, though
samcv is it faster to access it from nqp or rakudo or query moarvm each time for the propcode?
i guess i could bench it
jnthn Yeah
I mean, we'd only need a lock acquisition if the thing isn't already set up 00:21
Provided we're careful
It's not that much work to fix the lookup hash init/teardown up to be robust, so that's the path of least resistance at least.
(Also note this has never actually caused a demonstrable problem for somebody.) 00:22
samcv very true
jnthn (The worst it does is a bit of clutter in valgrind leak check output.)
I do wonder a tad why even a simple perl6 -e '' needs the prop hahses built 00:23
samcv well Rakudo::Internals.PROPCODE is 8.8x slower
jnthn If you don't have the hash?
samcv testing it 10000000 times
no i mean nqp vs rakudo op
both run from rakudo
jnthn Ah 00:24
samcv that was for misses. guessing it will be similar for non-miss
i mean misses i would think would have a bigger improment but
jnthn *nod*
OK, sleep time for me... 00:25
samcv night !
jnthn 'night o/
notviki jnthn: seems Actions uses nqp::unipropcode in mainline to cashe it for parsing franctional nbumber: github.com/rakudo/rakudo/blob/nom/....nqp#L7230
samcv notviki, that can be optimized for sure. unless unboxing it is super slow 00:27
or whatever
er wait
no that just caches propcodes not properties
also my int $nu := +nqp::getuniprop_str($code, $nuprop); 00:28
why do we call uniprop_str 00:29
m: say '1.uniprop('Numeric_Value_Numerator')
camelia rakudo-moar 9fc616: OUTPUT«===SORRY!=== Error while compiling <tmp>␤Two terms in a row␤at <tmp>:1␤------> say '1.uniprop('⏏Numeric_Value_Numerator')␤ expecting any of:␤ infix␤ infix stopper␤ postfix␤ statement end␤ …»
samcv m: say '1'.uniprop('Numeric_Value_Numerator')
camelia rakudo-moar 9fc616: OUTPUT«1␤»
samcv m: say '1'.uniprop-int('Numeric_Value_Numerator')
camelia rakudo-moar 9fc616: OUTPUT«3␤»
samcv what
oh it's an enum... well
guess what
i added integer properties!! :) 00:30
so that can be optimized
notviki, actually it's pretty ironic 00:34
we are looking up the numerical value of a codepoint. and to look up its numerical value we have to get a string back 00:35
which we then have to find its numerical value
to get the final one
hmm on second thought we will have to store a lot more bits if we do it as integer. should have an enum of ints instead of an enum of strings 00:37
ack
at least will be easier since I already have an int type 00:41
TimToady for the record, I'm fine with changing cmp to some kind of standard collation semantics if the overhead is only 1% 02:20
samcv yeah 02:21
i think leg we should keep the same
but cmp mostly just 'does what i mean' in my opinion
TimToady well, arguably, leg is more closely related to strings than cmp is
samcv exactly
less equal greater sounds more like numerical codepoint less equal greater etc 02:22
cmp just seems like compare
TimToady well, it can be argued the other way 'round too
samcv i suppose so
TimToady cmp is more generic, leg is Str-specific
in fact, we've gone around before on the exact semantics of cmp, and realized there's no one solution that preserves every expected consistency 02:23
samcv yeah
TimToady so I'd be more interested in making leg purely unicodical, whilc cmp just needs some kind of consistent ordering as a guarantee, so sort never fails 02:24
just speaking off the cuff...
samcv sort never fails for this new stuff either
unless the two graphemes are equal it won't return equal 02:25
TimToady if the Str vs Str subset of cmp happens to match leg, that's a plus
samcv cmp doesn't call leg though. but they may have both the same nqp op i don't remember.
TimToady well, leg is coerceive, so they could only share some of the impl 02:26
*cive
samcv when do you think cmp should compare two strings by human sorting order versus by codepoint? also i have to add 02:27
comparing two strings, when they have synthetic codepoints is not going to work with the old cmp/leg
it won't collate properly
TimToady I've sort of been paying half an ear to all this while doing family stuff, but by and large I do like what I'm seeing scroll by
samcv cool 02:28
TimToady but natural sorting is not going to be consistent, any way you implement cmp
samcv what do you mean by not consistent
consistent to what?
TimToady unicode collation is a good default for when you know two things are strings, but there are many conflicting ideas about a decent generic comparison 02:29
samcv oh yeah i only mean when they _are_ both strings
if they aren't we shouldn't sort that way even if we *could*
TimToady but there's a certain degree of slop; you'd like things that are vaguely numeric to sort primarily as Real, while things that are vaguely stringy to sort more like leg 02:30
but how much slop you allow there is negotiable
samcv but when you want to compare two strings, usually you either want to know if they're equal or not, or which is greater or less than the other, and i think the user should be able to ignore that codepoints exist most of the time
TimToady so the approaches that say first sort on typename, then withing the type, are a bit too rigid for what people expect 02:31
samcv if there's some like new-unicode standard with totally different codepoints i'd like cmp to still sort the same
than being totally random (at least for strings)
TimToady to the extent that we can lump all stringy things together and treat them consistently, AS IF they had been tranformed to NFG or NFD, I'm fine with having kind of a three part cmp 02:32
numeric things, stringy things, and things that fall into neigher category
samcv well i just mean plain Str objects 02:33
the unicode collation refers to NFC not NFD luckily or it would suck to do
TimToady which is another way of saying, first things that sort under <=>, then things that sort under leg, then other stuff
well, most of NFG is just NFC, till we hit synthetics, so that's good 02:34
samcv if it's an NFD to NFD comparison i think that is a case where the programmer IS caring about actual codepoints 02:35
TimToady never really got much into the collation end of things except to make sure we designed so we could do what they want somehow eventually :)
samcv heh
TimToady appreciates you're digging into it, for that reason 02:36
samcv glad to be here :)
TimToady my fallback position was that we might need to have a different operator for official collation, but if we can sneak it into leg with relatively little pain, I'm all for keeping things simpler 02:37
samcv TimToady, passes all the spectests on 6.c-errata 02:38
and the only master ones it fails, are tests which uh
TimToady and we do have ways of saying: $a leg $b :foo($bar) to sneak other parameters into the comparison at need
samcv test stringified versions of unsorted hashes and things…
which made me a little sad reading the tests, but oh well
TimToady though I suspect that typical users of non-typical legs will simply define their own operator 02:39
samcv you saw the moarvm pull description? i go into pretty good detail about the mvmop
though theer's one thing i forgot about
TimToady haven't actually read it, just seen the discussion go by, but if jnthn++ is happy with it, so am I :)
samcv tiebreakers. so we have a bitshift where 1 is test primary level (alphabetic/symbol sort), 2 is diacritic type things and 3 is case and 4 is basically implementation defined 02:40
but if the person doesn't want us to tiebreak in case unicode defines collation for those characters, there should be a way
could just make it 8 though. that's not too hard i guess 02:41
since it _is_ tetriary. so. i guess i'm fine.
TimToady my &infix:<myleg> = &infix:<leg>.assuming(:foo($bar)) should handle most of this
samcv i think i maybe thought of this yesterday but i've been awake a while
TimToady so we should just pick a reasonable default 02:42
samcv github.com/MoarVM/MoarVM/pull/474 you might find this enlightening 02:46
i pretty well summarize most all the things
and where it could go in the future. so i am hoping that this op will be future portable. and eventually we can implement language specific sorting (someday…)
but yeah please read that, and i think you will understand collation better
TimToady hmm, I wonder if it should be called 'unileg' instead, since it's string specific
and calling it 'cmp' kinda fights our generic ideas about cmp
I'll try and read it after dinner if I get a chance, but we're visiting family currently, for some reason... :)
samcv ok :)
02:48 ilbot3 joined
samcv TimToady, it should be noted that on the jvm, their string compare does primary and tertiary levels at least 02:54
so cmp already behaved differently on the two backends 02:55
their string compare function is literally Compare, and Compare is used for many different types of objects
well it seems to actually on jvm perform differently if it is a string versus like a character. which is pretty odd 03:02
java actually says they sort 'lexically' and are super vague about it. probably for the reason that they aren't guaranteeing it won't change by leaving it intentionally vague 03:08
05:09 pyrimidi_ joined 07:33 pyrimidine joined 09:04 domidumont joined 09:06 FROGGS joined 09:11 domidumont joined
samcv .tell jnthn have a question about synthetic codepoints. How are they generated, deterministically? we will need to have synthetic graphemes for things like Emoji and such, can be like 5 or 6 codepoints for one grapheme 10:20
yoleaux2 samcv: I'll pass your message to jnthn.
samcv hopefully we can do something like this? 10:21
timotimo they are not created deterministically 11:01
samcv, synthetics are assigned in the order of first use 11:02
jnthn samcv: So, going to look at your patch 12:14
yoleaux2 10:20Z <samcv> jnthn: have a question about synthetic codepoints. How are they generated, deterministically? we will need to have synthetic graphemes for things like Emoji and such, can be like 5 or 6 codepoints for one grapheme
samcv ok cool 12:15
jnthn samcv: About GraphemeIter, actually if collation is defined on NFC then I'd just use CodepointIter
Which will give you codepoints
So you don't have to care about synthetics at all
samcv so this patch is 'perfect' (i guess) i have more changes staged but they trigger the <:space> bug. but
ah k
jnthn And yeah, you'd have an iter for each string
samcv cool, will look into that
jnthn Basically, GraphemeIter and CodepointIter let you work through the units in a string 12:16
At grapheme or (NFC) codepoint level
samcv ah k
jnthn And they take care of the fact that the string may be represented in memory in a few different ways
samcv oh and i have bidi matching brackets imlemented in a branch of nqp which is super neat
jnthn Including traversing strands
samcv for matching delimiters
jnthn ooh :)
samcv yeah :) 12:17
jnthn, pretty sweet github.com/perl6/nqp/commit/7b68aa...d5e60ba853
12:18 FROGGS joined
samcv no longer will we have to search through a pretty long string of all delimiters + search and find the other bracket (inside the text we're parsing) 12:18
jnthn \o/ 12:19
samcv and we can also improve the error message like if you do Q<? > where the ? is a combining char
because we can see the delimiter by property, and see it has no match(when it should if it's a normal thing) 12:20
then say oh hey
looks like you have a combining character directy after
instead of trying to match it as a grapheme for how we allow arbitrary delimiters 12:21
jnthn Just to make sure: if I merge PR #474 we won't be busting spectests?
samcv yep!
heh
not gonna make that mistake again haha
jnthn :) 12:22
samcv 6.c errata and master are All Correct
jnthn Unless you'd prefer otherwise, I'm inclined to merge thsi now
samcv nope go ahead and merge :)
\o/
jnthn And using codepoint iter or adding the Unicode DB script can come later
dalek arVM: 874b6bd | jnthn++ | / (5 files):
Revert "Fix RT #122471 and #122470 return <control-0000> for \0 and other controls"
12:23
arVM: e8a2c43 | samcv++ | / (4 files):
Add code for processing the Unicode collation weights to ucd2c.pl

remove unused MVMROOT
synopsebot6 Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122471
Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122470
samcv yep
samcv++
jnthn There we go, merged
samcv am i allowed to ++ myself?
lol.
jnthn samcv++
:)
samcv also. why did dalek say that 12:24
weird
oh well.
12:24 dalek joined
jnthn Not sure about why it reported the first commit 12:25
It has a tendedncy to flood and get itself booted for some reason when reporting a bunch of commits
samcv heh
arVM: 0a170ab | (Jimmy Zhuo)++ | src/spesh/deopt.c:
remove another unused MVMROOT
arVM: 293bda7 | jnthn++ | src/ (2 files):
Merge pull request #459 from MoarVM/needless_mvmroot

Remove needless MVMROOT usages
jnthn There, I think I'm caught up on MoarVM PR review for the moment :) 12:26
samcv checks when her paperback unicode books come 12:28
jnthn ooh, you ordered the paperback version? 12:29
jnthn just read PDFs :)
samcv yea
like 8 bucks each volume, there's two volumes total
i think they're printed on demand or something so might be a bit. i want something i can leaf through
i mean.
would be really nice. and for 19 bucks you can have all 850ish pages 12:30
prolly price with shipping or something idk
jnthn I did most of my reading of them when I was doing the NFG implementation.
samcv i didn't know they had a paperback book because wiki said the last one was unicode 5
jnthn Did at least have an iPad to read them on, which was a bit more comfortable than laptop :) 12:31
Paperback woulda been nicer again...didn't even think of it. That said, I was away from home the month I did the NFG impl, and not sure I'd have felt like lugging the books with me even if I had 'em. 12:34
samcv i didn't know it existed until like 12:36
somehow discovered it on some odd unicode.org blog
12:38 dalek joined 12:39 travis-ci joined
travis-ci MoarVM build failed. Jonathan Worthington 'Merge pull request #474 from samcv/unicode_collation 12:39
travis-ci.org/MoarVM/MoarVM/builds/187680690 github.com/MoarVM/MoarVM/compare/3...3c723be3e1
12:39 travis-ci left
samcv what 12:40
i just built it...
oh only one of them failed? weird ===SORRY!=== 12:41
No suitable MoarVM (moar executable) found using the --prefix
(You can get a MoarVM built automatically with --gen-moar.)
jnthn Oh 12:42
12:42 FROGGS joined
jnthn Yeah, there's a bit of a race here 12:42
We get the MoarVM Travis build to also grab an NQP and build it
Unfortunately, if it's a bit slow starting off the builds, and you bump MOAR_REVISION in NQP, then it ends up grabbing an NQP that wants a Moar newer than the commit it's testing :S 12:43
This does tend to rectify itself when it builds the next commit
FROGGS samcv++ 12:44
13:34 timo joined 13:52 nebuchadnezzar joined 14:24 FROGGS joined 14:50 nebuchadnezzar joined 15:45 zakharyas joined 16:05 Ven joined 16:32 Ven joined
dalek arVM/even-moar-jit: 11ead5e | brrt++ | src/jit/ (3 files):
Fix some more bugs with linear_scan

We should not account uses for non-register references. I think there is still a bug with register releasing.
Because the MVM_JIT_ARCH macro trickery relies on the symbol naem of the macro, and because we also need to distinguish between possible values, it's necessary to #define MVM_JIT_ARCH_X64 etc. and later #undef them.
I'm looking into a more regular way to solve this.
16:39
16:48 pyrimidine joined 16:52 Ven joined 17:32 Ven joined 17:52 Ven joined
dogbert2 jnthn: are you lurking or resting :) 17:55
just wondering if the following gist contains any useful information that might come in handy next year? gist.github.com/dogbert17/4096deaa...2f7b6834c7 17:59
18:00 zakharyas joined 18:12 Ven joined 18:32 Ven joined
jnthn dogbert2: Well, not anything that's new to me at least. There's an interesting general issue there with lazy evaluation. 18:37
18:50 pyrimidine joined 18:52 Ven joined
dogbert2 jnthn: is there anything I can do in order to get more information or do you think that you have enough already? 18:54
jnthn dogbert2: I understand that one (at least, in terms of the problem) well already. :) 18:55
walk & 18:57
dogbert2 jnthn: thx, will look for something else then. have a nice walk. 18:59
19:03 Ven joined 19:23 Ven joined 19:49 domidumont joined 20:02 Ven joined
dogbert2 a new gist hopefully moar :) interesting: gist.github.com/dogbert17/801b712c...37398514b5 20:16
20:21 Ven joined 20:25 zakharyas joined 20:29 pyrimidine joined
jnthn That's a good one... 20:32
dogbert2 seems 100% reproducible as well 20:35
jnthn Goodie :) 20:37
dogbert2 sometimes it also spews out a lot of stuff before the panic, like 20:38
jnthn Will have a look into that next week once I get back to things.
dogbert2 6opaque: no such attribute '$!named' in type QAST::Var+{QAST::SpecialArg} when trying to bind a value
in any named at gen/moar/stage2/QASTNode.nqp line 30
should I write a MoarVM issue for it or is the RT enough? 20:39
jnthn Feel free to file a MoarVM issue 20:42
And reference the RT
dogbert2 will do
jnthn Thanks :)
20:50 Ven joined 21:09 Ven joined 21:30 Ven joined
samcv jnthn, uhm was having an issue with trying to fix grapheme count 21:45
for ZWJ
oh wait nvm i'm half awake 21:46
let me wake up i cannot even put this in words properly atm ;)
jnthn Oh, I mighta just answered it on #perl6-dev? :)
ZWJ rings a bell 'cus the exact treatement of it in the Unicode text segmentation algo changed between Unicode 8 and 9. 21:47
I *think* it was ZWJ anyway 21:48
samcv yeah it is ZWJ but i need to program in more cases 21:51
but i wasn't able to affect change, it just didn't seem to do anything 21:52
even when i had it panic in the same function and deleted moar. maybe that function isn't calleld much
jnthn I didn't actually fully do the new TR29 :/
It...seemed to contradict itself.
(It claimed you only needed to look at the immediately surrounding 2 chars. But also had rules about regional indicators and odd/even counts of them.) 21:53
samcv yeah it is complicated ;) 21:54
jnthn Well, complicated is OK, contradicting itself in the space of a few paragraphs less so ;)
And if we *do* need to track odd/even we'll need a refactor 21:55
samcv i think we want to do a quick-check and see if it's emoji ond then being able to do more checks
yep
oh but this shouldn't be too hard to fix. we need to not break when ZWJ + Emoji character 21:56
(let's say there are only two characters) i was trying to get it to count as one graphpme
let me open the file
jnthn OK, *that* bit will be easy enough
samcv yeah
was going to do that to start with
jnthn Go ahead :) 21:57
samcv will at least make us more accurate
jnthn Yeah
jnthn is glad somebody else is hacking on this stuff :)
lizmat too
jnthn: BTW,. are you aware that --profile doesn't produce any Allocation info anymore ?
jnthn It's not that I don't like working on it (I find Unicode stuff interesting), it's just that there's so many things that need looking at... 21:58
lizmat: No :(
samcv jnthn, i was looking at `should_break` and somehow
even when i had moarvm panic and clean install it didn't even panic
lizmat jnthn: should I make that a MoarVM issue?
samcv maybe that's not called in many cases? which one do i need to look
jnthn samcv: There's an NFG quickcheck property 21:59
samcv jnthn, agreed
jnthn samcv: That hides it.
samcv ah
poo!
jnthn "hides"
So yeah, you'd need to fiddle with that too :)
samcv so what is this NFG quickcheck
where do i have to look
jnthn lizmat: I don't know that code has changed at all in MoarVM. (mroe)
samcv should_break is only called if conditions for our moar NFG quickcheck are a certain way?
jnthn lizmat: I did however notice that it was rendering a bit odd wih the tabs last time I used it in Firefox 22:00
lizmat: So I wonder if one of the AngularJS version updates mighta knocked it out
lizmat ah, possibly only on Safari ?
hmmm./...
jnthn Think it was Firefox I saw that oddness in
samcv: Yes. The trail starts in normalize.h, where we try to avoid doing anything expensive. 22:01
samcv: NFG quickcheck is a synthetic Unicode property (as in, we add it ourselves). It's handled in udc2c.pl
samcv yeah i understand that part. oh ok 22:02
jnthn lizmat: Anyway, if that is the place it busted (most likley IMO) then the code in question is in the NQP repo, not Moar
samcv about it being our own synthetic property
jnthn samcv: iirc I did it based on the NFC quick check and then some extra bits
lizmat jnthn: ok, will try to find that
samcv kk
lizmat it's very annoying to me :-)
samcv curses udc2c.pl
jnthn I think (hope!) I wrote a few comments explaining it even ;)
Well, that's one of the bits of ucd2c.pl that I wrote, so at least I should be able to explain it :P
samcv # If it's the NFC_QC property, then use this as the default value for 22:03
# NFG_QC also.
so i need to set NFG_QC to uh. 0? to get it to handle it with more conditions?
jnthn umm, lemme see
sub tweak_nfg_qc should be informative 22:04
But yes, set it to 0 is the trick
samcv there are only certain characters currently used in ZWJ sequences
so will probably have it set to 0 for those ones
so we can do some checking and make sure it's not a combining sequence
and prolly will add the ones in Emoji 5.0 too i don't want to have to do this again 22:05
and they aren't going to remove any, anyway
since unlike a lot of unicode stuff, the usage of them is predating the actual codification
err. the _final_ release of the spec that is
what happens if I set NFG_QC to 0 for everything? theoretically we should _still_ have it work the same or will it break things 22:06
even though it will be slower
may be a good thing to check. what do you think about that 22:07
22:09 Ven joined
jnthn Sorry, got dragged away to taste freshly made vinegret :D 22:12
If you set it to zero for everything then we'd end up in should_break always (and it'd be slow :)) 22:14
But the idea is to set it to zero if it can ever answer "no" if it's one of the arguments to should_break 22:15
samcv ah ok
jnthn ZWJ already has it as 0 github.com/MoarVM/MoarVM/blob/mast...c.pl#L1720
Since I believe multiple emoji can now form a single grapheme, we should likely set those to 0 also 22:18
22:28 Ven joined 22:49 Ven joined
samcv yeah 22:50
mst unicode: the gift that keeps on giving
samcv m: "a\x[200C]b".chars.say 22:51
camelia rakudo-moar b2332c: OUTPUT«2␤»
samcv that is fine. 200C is often used to specify ligatures were used
m: Uni.new(0x200D, 0x1F1E6).say 22:57
camelia rakudo-moar b2332c: OUTPUT«Uni:0x<200d 1f1e6>␤»
samcv m: Uni.new(0x200D, 0x1F1E6).Str.say
camelia rakudo-moar b2332c: OUTPUT«‍🇦␤»
23:08 Ven joined
samcv jnthn, do we have a call to check which unicode version we have? 23:18
if we don't, we should 23:19
and i'll automate setting that with ucdcpll
23:28 Ven joined
samcv woo 51/744 failing tests => 45 failed only 23:37
jnthn, you missed the Grapheme_Cluster_Break property for setting NFG_QC!!! 23:40
well I am here now!
will all be fixed :)
jnthn, did you see my comments on java's string compareTo function? 23:41
23:48 Ven joined 23:56 Ven joined
samcv they totally duck collation and sorting issues using any actual real algorithm, just say they sort it "Lexically" and are super vauge about how that works 23:58