samcv anybody here? MVMGrapheme32 this holds just a codepoint right, since we store codepoints in 32 bit integers 01:06
timotimo no, more than a codepoint 01:40
samcv: because we create synthetic codepoints that represent graphemes that unicode didn't allocate a codepoint for 01:41
samcv ah
so that grapheme holds the whole grapheme? 01:42
timotimo yup 01:44
we hang the extra information off of some fancy datastructure
samcv ok so in string compare we need to check if it has property Mark or Mn Mo Mc 01:45
and if so, to compare the next character after
err codepoint
only if the characters are the same after the diacritic do we need to compare based on diacritic/mark
if you saw what i was saying in #perl6 01:46
timotimo we don't implement codepoint-level strings, though
samcv we need some way to be able to compare two graphemes though
timotimo oh, you mean for cases when we have a + umlaut vs ä? 01:47
but strings in moar are always in NFG
that's NFC + some extra magic
so if two graphemes are equal, they are also the same codepoint
samcv well ä > b
or can be. not sure about that specific case
timotimo oh, *compare*
i thought you meant only equality checks 01:48
samcv m: say 'ä' cmd 'b'
camelia rakudo-moar ea67ce: OUTPUT«===SORRY!=== Error while compiling <tmp>␤Two terms in a row␤at <tmp>:1␤------> say 'ä'⏏ cmd 'b'␤ expecting any of:␤ infix␤ infix stopper␤ postfix␤ statement end␤ statement modifier␤…»
samcv yeah no
m: say 'ä' cmp 'b'
camelia rakudo-moar ea67ce: OUTPUT«More␤»
samcv j: say 'ä' cmp 'b'
camelia rakudo-jvm 8ca367: OUTPUT«More␤»
samcv in nqp we call MVM_string_compare
and it checks by grapheme
for the cmp command
but it only checks if the graphemes are < each other 01:50
and that doesn't take anything into account
except the internal representation of them
timotimo ah 01:51
yeah, that's wrong
samcv yeah :(
timotimo we ought to compare the graphemes to 0 and get the nfg info for the individual characters
you should be able to find more about that in ops like "baseord" or "basecharord" or whatever it's called
samcv compare to 0?
timotimo yeah, synthetic graphemes are negative
samcv ah 01:52
m: say 'ä'.NFC
camelia rakudo-moar ea67ce: OUTPUT«NFC:0x<00e4>␤»
samcv m: say 'ä'.NFG
camelia rakudo-moar ea67ce: OUTPUT«No such method 'NFG' for invocant of type 'Str'␤ in block <unit> at <tmp> line 1␤␤»
samcv m: say 'ä'.NFD
camelia rakudo-moar ea67ce: OUTPUT«NFD:0x<0061 0308>␤»
samcv m: say 'ä'.NFCK
camelia rakudo-moar ea67ce: OUTPUT«No such method 'NFCK' for invocant of type 'Str'␤ in block <unit> at <tmp> line 1␤␤»
samcv m: say 'ä'.NFKC
camelia rakudo-moar ea67ce: OUTPUT«NFKC:0x<00e4>␤» 01:53
samcv m: say 'ä'.NFKD
camelia rakudo-moar ea67ce: OUTPUT«NFKD:0x<0061 0308>␤»
samcv i suspected as much, we need to compare NFD
so timotimo NFG stores it as uh? more like NFD or NFC 01:54
m: say 'ä'.ords 01:55
camelia rakudo-moar ea67ce: OUTPUT«(228)␤»
timotimo NFG is NFC + a bit more 01:56
samcv ok
so we would need to decompose them then
timotimo we'd have to decompose all everything ever
not just synthetics
but you already know that i guess
i'm packing and getting to bed, because ... super tired
samcv we can always check if the character is Mn Mo Mc
and only then decompose 01:57
timotimo oh, fair enough
samcv oh wait
timotimo anyway, functions with _nfg_ in their name probably help figuring this out while i'm gone
samcv m: 'ä'.uniprop
camelia ( no output )
samcv m: 'ä'.uniprop.say
camelia rakudo-moar ea67ce: OUTPUT«Ll␤»
timotimo and the grapheme iterator ought to help you not mess up moving along the a and b strings
samcv there must be another property that tests decomposition 01:58
and if the character decomposes, we should decompose it and check the property of the codepoints
timotimo right. maybe you can find it in our implementation of NFD or NFKC?
samcv yeah that seems the right way™
if there's no decomposition then we should be fine
until we implement unicode collation algorithm ;P
which is very complex 01:59
timotimo the composed grapheme takes on the properties of the base character
that may be ... interesting
i must afk now, else i'll fall over
samcv well that's true timotimo that's not very relevant 02:00
it is interesting though
well relavant but we can ignore that, since we will check decomposition 02:01
m: "ȧ".uniprop('NFD_QC').say
camelia rakudo-moar ea67ce: OUTPUT«␤»
samcv m: "ȧ".uniprop('NFD_Quick_Check').say
camelia rakudo-moar ea67ce: OUTPUT«␤»
samcv m: "ȧ".uniprop-bool('NFD_Quick_Check').say 02:02
camelia rakudo-moar ea67ce: OUTPUT«False␤»
samcv i don't think that works atm
at least for uniprop
m: 0x03D3.uniprop('NFD_QC').say 02:03
camelia rakudo-moar ea67ce: OUTPUT«␤»
samcv yea
02:05 mojca joined
samcv seems we don't have a property to check if a character changes on decomposition 02:18
looking at the normalization code
we have NFC and NFD quickcheck properties but they only tell us what form it's CURRENTLY in
not whether or not it changes on decomp 02:19
i could be wrong though, just looking at this quickly but i don't see anything there 02:21
but we could also just use the unicode collation algorithm which assigns values to each character
and for now just implement it not completely which should be good for most cases 02:22
i.e. just look up the priority of said NFC codepoint and then apply that priority
TimToady, thoughts on the Unicode Collation algorithm? should cmp on two strings use it to compare? 02:23
m: 'ȧ'.ord.base(16) 02:34
camelia ( no output )
samcv m: 'ȧ'.ord.base(16).say 02:35
camelia rakudo-moar ea67ce: OUTPUT«227␤»
samcv m: 'ȧ'.uniname.say
camelia rakudo-moar ea67ce: OUTPUT«LATIN SMALL LETTER A WITH DOT ABOVE␤»
samcv m: say (0x0EC0.chr ~ 0x0EDF.chr).chars 02:37
camelia rakudo-moar ea67ce: OUTPUT«2␤»
02:48 ilbot3 joined 04:04 hoelzro joined 07:57 domidumont joined 08:02 domidumont joined 08:34 pyrimidine joined 09:10 domidumont joined 10:21 dogbert2 joined 10:28 pyrimidi_ joined 10:36 TimToady joined, avar joined
samcv any moarvm hackers around? 11:19
11:51 domidumont joined 11:54 pyrimidine joined
jnthn samcv: I sorta am... 12:20
Just left a comment on the PR 12:21
samcv oh hey 12:24
what name to make it then?
MVM_string_uni_cmp? 12:25
err compare
jnthn Just commented on that 12:30
I was thinking about op as in the MoarVM op
The C function I don't mind quite so much :) 12:31
fwiw, adding an op is adding to src/core/oplist (there's instructions at the top), running tools/update-ops.p6 or so, and then adding an entry to interp.c
Also, not changing the existing thing will probably make your debugging easier. :-) 12:32
samcv ah kk
jnthn If you're getting really weird errors, I *think* precedence comparison uses cmp_s
(In the parser)
samcv ah
XD 12:33
jnthn Meaning if it's busted enough you might get some *really* interesting parses :D
samcv yeha i got some weird things
uni_cmp i kind of like better i think. i mean. i think it should be used anytime you want to do a unicode compare. not just a generic string compare
as for the op though 12:34
we would want a possibility of having a language
and to choose which levels to take into account
there's 3
jnthn Guess an int register for the level is easy :) 12:35
Language..hmm :)
samcv i'm thinking
country + language code
like en_US etc
rakudo can implement something nicer but for the op we should use that 12:36
jnthn Yeah, the op interface is pretty low level
samcv uhm
jnthn Note that we try to avoid _ in op names by convention (they're smash-case) except for type indicators (e.g. all the bigint things end in _I) and a prefix for ops that are spesh-emitted
samcv also level 1 doesn't take case into account
level 3 does case
jnthn What would the default level be? 12:38
If we're changing cmp I guess 3 would be needed since it's case-sensitive today and I don't think we can go busting that. :)
samcv uhm 12:39
level 2 is diacritic
i think default is all 3 levels 12:40
which is closest to str_cmp today
though more like 0 levels
but there are differences between any change in the characters
also about the precidence thingy. jvm doesn't do cmp_s with the same precidence as moarvm 12:41
*atm* at least
but i think adding a new op is fine.
By default, the algorithm makes use of three fully-customizable levels. 12:43
customizableeee
heh
but i think just we want language and country code, plus some way to note if we either only want some leveles and not others
and 4th level is basically “Whatever you want” 12:44
which isn't a real level
also jnthn how is the naming of the unicode properties
MVM_COLLATION_PRIMARY
i think makes it clear it's not a real normal unicode property. also since unicode properties rarely change 12:45
and clear it's not available on anything but MVM
dogbert2 is running a spectest with ~45k nursery ... 12:46
jnthn Hmm
It's OK, but will it end up accidentally exposed via uniprop somehow?
That said, I probably already ended up with us having an NFG_QC that is exposed that way... 12:47
samcv uhm
no it doesn't
well i added a new type for generation in ucdc.pl or whatever its name 12:48
for generating integer properties
since somehow it didn't exist before
jnthn OK
samcv which kind of sucked but
jnthn :)
samcv i somehow did it
in one night
jnthn Think you've figured that script out better than I've managed to :)
samcv but i am getting better at that script
slowly…
jnthn Yeah, I grokked it enough to put various things in (case mapping, primary composite table, etc.) 12:49
samcv i will need to eventually add a lot more comments
nice
but .uniprop('MVM_COLLATION_PRIMARY') will work
jnthn Hmm
samcv but shouldn't unimatch for anything
since it's not an enum 12:50
jnthn So long as we don't spectest it it's not really in the language :)
samcv yeah
i mean
it's all uppercase and no unicode properties are
antd MVM
in front
jnthn Right :)
samcv plus it's nice to be able to check it for diagnostic purposes
jnthn Yes
We can do it that way for now; if we need a standard cross-backend abstraction for it later on, then we can cross that bridge then. 12:51
samcv yeah
so uh how do i add an op?
jnthn I wrote that a few moments ago? :)
samcv oh
jnthn 12:31 < jnthn> fwiw, adding an op is adding to src/core/oplist (there's instructions at the top), running tools/update-ops.p6 or so, and then adding an entry to interp.c
samcv when 12:52
oh
jnthn Well, 20 mins is maybe more than a few moments :)
samcv so add to that file. run that, then 3rd add entry to interp.c
yeah
jnthn Yup
samcv ok renaming it and putting the old one back 12:54
MVM_unicode_string_compare
that can be changed fairly easily though right
since nobody uses them but internal
jnthn The MVM_COLLATION_PRIMARY thing, you mean?
samcv oh no
jnthn Or function names?
samcv MVM_unicode_string_compare 12:55
jnthn Yeah
As long as you don't mark it MVM_PUBLIC then we can change it :)
samcv kk
jnthn We only really mark things up that way for embedding, extops, etc.
And there's no reason I can see to do that for what you're adding.
samcv yeha
jnthn Oh, there's a 4th step when you add an op, though its in the NQP repo: add a mapping to QASTOperationsMAST.nqp 12:56
(It's just a one-line addition; look for the cmp_s one)
(And then copy/tweak)
That makes the association between nqp::something and a MoarVM opcode
samcv how do i mark something optional
jnthn Optional where? 12:57
samcv in the ops
in moar
jnthn Ah
samcv or is anything optional
jnthn There's no such thing :)
samcv kk
jnthn But you can always pass null or null_s as an operand
samcv maybe the levels can be 12:58
jnthn (for "no object here" and "no string here")
samcv 3 = do 3; 2 = do 2, 1 = do 1; 5= do 2+3
7 etc
:)
or uh
oh yeah
well
jnthn Maybe a bit field?
samcv we can just do 1 2 4
idk
like unix permisions 12:59
jnthn Yeah
samcv 1 2 4 i guess. idk
and we can always add a 4th level
jnthn I somehow thought from them being levels then doing 3 would *subsume* doing 1 and 2
samcv oh
not for a core op tho
what if you don't care about the other parts
and only want to compare that level 13:00
jnthn (I don't *know* and this is Unicode and I've no idea, I was guessing. :))
samcv you usually do. i mean
you must do 1, but don't have to do 2 or 3
so it says
but
yeah 1 is mostly assumed
but if we are making an op that shouldn't change we should allow for the future of being able to do that 13:01
jnthn Anyway, if you should be able to pick and choose then I'd prefer a bit field (1/2/4 and you or them)
samcv cause uh
3rd level is mostly case for Latin alphabet
other languages it will be different
and probably more useful
so do i set it as an integer?
jnthn Yes
samcv cool :) i like that 13:02
and there is a made up 4th level that is custom on top of the other ones being basically their plus custom, though the 1st level isn't supposed to be really
or it's not technically the unicode collation algorithm or something
so int16 jnthn ? 13:03
jnthn Hmm 13:04
If we do that then we can only ever use a literal there
Whereas I think at the Perl 6 level we'll want to be able to compute that value 13:05
samcv modified: lib/MAST/Ops.nqp
modified: src/core/oplabels.h
modified: src/core/oplist
modified: src/core/ops.c
modified: src/core/ops.h
modified: src/strings/ops.c
modified: src/strings/unicode_ops.c
hmm
that seemed to update everything
jnthn So r(int64) is probably better
samcv well i updated the ones in strings folder
jnthn Yes, the script updates various things :)
samcv nice
jnthn It'd be tedious otherwise :)
samcv ok will make it 64
back in my day we looked up the unicode character properties out of a dictionary! 13:06
or something idk
didn't have computers
uni_cmp_s w(int64) r(str) r(str) w(int64) :pure
jnthn I'd prefer unicmp_s 13:07
food delivery, bbs
samcv i like that too 13:08
better 13:09
it's a unicmp not a cmp on a uni
dogbert2 and the spectest is done, only t/spec/S32-io/socket-recv-vs-read.t caused problems (but we knew that could happen) 13:12
samcv ok so we want to do iso_3166 integers 13:14
not strings, because names change etc
numbers are stable
well. countries names could change, or be renamed etc whatever but they have numbers
don't think moarvm is the place for that, though i guess we could do it or resolve names and things 13:15
jnthn, ?
dogbert2 guess jnthn is munching his food (bet it's Indian) 13:17
samcv w(int64) r(str) r(str) w(int64) w(int64) :pure # so 1ststring 2ndstring level integer_countrycode
tho we want language too hmm
jnthn I'd go with the most stable thing, for the op level of things 13:18
samcv i mean en_US is easy enough
yeah
so then we want two fields
jnthn Sounds like
samcv so language then country last
that sounds good
jnthn Aye
Lunch time here; back in a bit
samcv :)
jnthn dogbert2: Not Indian for lunch...though will make a cheese jalfrezi for dinner :D 13:19
dogbert2: If you're GC hunting, somebody filed a GC panic in the MoarVM issue tracker overnight you may be able to reproduce/golf :-)
&
dogbert2 looks
hmm, Bailador 13:20
m: class SQLString { }; my $stringy = Str.^find_method("Stringy"); my $handler = $stringy.wrap(method () { SQLString.new(:str(callsame)) }); say "foo".Stringy # niner, Issue #412 13:21
camelia rakudo-moar 19df35: OUTPUT«(signal SEGV)»
samcv added this to nqp QAST::MASTOperations.add_core_moarop_mapping('unicmp_s', 'unicmp_s'); 13:23
MoarVM op 'unicmp_s' is unknown as a core or extension op 13:24
gotta add that other place you said i think
interp.c 13:25
Unhandled exception: Bytecode validation error at offset 20, instruction 4: 13:30
oh no. nqp
does not like the opcodes changed i guess
helppppp
lizmat samcv: fraid can't help you with that :-( 13:34
13:35 pyrimidi_ joined
geekosaur make sure you added it at the end instead of somewhere in the middle? (which would break every opcode after it) 13:37
samcv ah 13:38
ok
in moarvm or nqp
geekosaur moarvm
samcv which file. oplist or the MAST 13:39
lizmat samcv: i think oplist 13:40
samcv i get Unhandled exception: Bytecode validation error at offset 20, instruction 4: 13:41
hmm
gonna reset
oh fuck 13:46
ok. i messed up my nqp i think 13:47
hahah
let me see
will use the rakudobrew
dogbert2 stupid question, what's the difference between 'make spectest' and 'make stresstest'? Looks as if the same tests are run
geekosaur last time I checked, there was a separate test directory for the stress tests, which spectest skips and stresstest runs 13:48
lizmat dogbert2 stress test runs the test marked "stress" in spectest.data 13:52
as well
dogbert2 aha, so more tests are being run than usual then
lizmat yup
about 100K more I think
dogbert2 cool, maybe I'll find something interesting 13:53
lizmat: how's your optimization work going, are you done with the arrays? 13:55
jnthn samcv: You need to add new ops near the end, but before the specializer ops (those starting sp_); I think the text at the top of oplist describes this. 13:58
samcv yeah i got that working
jnthn OK, cool 13:59
samcv interp.c doesn't matter right
jnthn The order?
samcv yeah
lizmat dogbert2: for now I'm done with arrays
jnthn Um...technically no but we try to keep them in order anyway
I think C compilers that don't do computed goto are still smart enough to see they can emit a jump table even when we disorder things though. 14:00
samcv what do i add to not pass any value?
for the extra thingys we don't use
or should nqp not complain 14:01
and i prolly need to not just copy paste? 14:02
jnthn You'll need to pass the default values 14:03
Or write some fancier code-gen routine in NQP
That emits them when they're missing
But I guess we won't use the op in terribly many places, so can just pass values explicitly for now? 14:04
dogbert2 lizmat: what's your current target?
samcv github.com/samcv/MoarVM/blob/unico...rp.c#L1464 14:05
do not know how to do…
help :P
jnthn Hm, lemme find what you put in oplist 14:06
But I think that += 6 is wrong
samcv prolly all wrong haha
jnthn w(int64) r(str) r(str) r(int64) r(int64)
So here, the op operates on 5 registers 14:07
Registers are encoded as 2 bytes each
So needs to be cur_op += 10;
Otherwise we'll end up reading junk bytecode afterwards :)
samcv :) 14:08
wooo!
seems to work
jnthn Nice :)
Once you need the other args you can get them with the GET_REG macro
samcv uh in nqp i should make it so it ads them right 14:09
and what to use for undef
jnthn Well, the latter two are integers
So just 0, but we pondered level 3 being the default
To see an example of an op that does a fancier nqp:: -> moar mapping and inserts a default value, grep for add_core_op('substr' 14:10
Or just write nqp::unicmp_s($str_a, $str_b, 4, 0) or whatever at the point of use :) 14:11
samcv level 3 = 7; so
should be 7,0
well 14:12
7,0,0
aka all 3 levels plus no language and no country specified
jnthn hm, but the oplist entry is
w(int64) r(str) r(str) r(int64) r(int64)
That's taking two strings and two integers, not three?
samcv ah
need to add one
should be 3 integers
jnthn Yeah, and replace the 10 with a 12 :)
samcv replaces with 10000 14:18
where do i add those extra numbers? 14:20
jnthn Writing nqp::unicmp_s($str_a, $str_b, 7, 0, 0) already works, by this point? And you want it so you can just write nqp::unicmp_s($str_a, $str_b) ? 14:22
samcv yeah 14:23
jnthn Then in QASTOperationsMAST.nqp we'll need a smarter mapping function 14:24
samcv ah kk
will that make it any slower?
jnthn The one for substr does something very similar
No, the insertion is at compile time
That is, we insert them when compiling the nqp:: op into MoarVM bytecode
So by runtime it's exactly the same
samcv kk 14:25
jnthn away for a bit 14:26
samcv woo 14:29
working :)
15:15 domidumont joined
samcv jnthn, spectests pass! 15:44
let me change some of my commit messages to be better
jnthn samcv: Nice! :) 16:06
nwc10 good UGT, #moarvm
jnthn o/ nwc10 16:07
dogbert2 jnthn: is it ok if we close github.com/MoarVM/MoarVM/issues/387 given the fact that it has been fixed? 16:20
jnthn Do we know that the underlying MoarVM segv was fixed? 16:35
That's what the ticket is addressing. We shouldn't SEGV in this case.
dogbert2 You mean that the original case should have worked and that binding should not be necessary? 16:41
m: multi sub cross() { }
camelia ( no output )
jnthn dogbert2: Well, an error also could have been OK in the original case 16:42
But not a SEGV
dogbert2 fair enough, in that case I suppose it is not fixed, just worked around :) 16:43
jnthn Unless the program involved is using NativeCall (in which case, all bets are off), Moar giving SEGV is always wrong. Even if what we were aked to do is bonkers. :)
right.
*asked
17:12 pyrimidine joined 18:08 pyrimidine joined
[Coke] Didn't I just see github.com/MoarVM/MoarVM/issues/475 in chat somewhere? hurm. 19:30
geekosaur [29 17:54:08] <jmerelo> Have you ever tried to define a constant like so: my constant this-is-constant = (1+sqrt(5))/2? 19:35
in #perl6-dev
[Coke] so this ticket can probably be rejected as DIHWIDT. 19:44
geekosaur I think they filed it post discussion? but the discussion said to file a ticket requesting a recursion limit, not about this /per se/ 19:46
so reject and ask them to file the correct one
notviki [Coke]: discussion: irclog.perlgeek.de/perl6-dev/2016-...i_13819854 19:58
samcv jnthn, everything is ready for merge :) 20:27
ok so if anybody wants details uhm. the new change to moar and using that op for cmp in rakudo, is 20:40
only 1% slower
also anybody have thoughts about "C".uniprop<property> 20:43
how do we handle that for other things
oh i thought i was in perl6-dev 20:45
lizmat samcv: you need a uniprop() candidate that returns an object that does an AT-KEY
samcv what would happen for uniprop<thing otherthing>
return a hash? the original spec called for uniprop to return a hash, which i think could be useful if you want to get like 6 properties 20:46
at key is only one thing right
lizmat yeah, but building the hash could be expensive, no ? 20:47
m: sub a() { class { method AT-KEY($a) { $a.uc } }.new }; say a<foo bar> 20:48
camelia rakudo-moar 19df35: OUTPUT«(FOO BAR)␤»
samcv m: sub a() { class { method AT-KEY($a) { $a.uc } }.new }; say a<foo bar>.perl 20:49
camelia rakudo-moar 19df35: OUTPUT«("FOO", "BAR")␤»
lizmat m: sub a() { class :: does Associative { method AT-KEY($a) { $a.uc } }.new }; say a<foo bar>
camelia rakudo-moar 19df35: OUTPUT«(FOO BAR)␤»
lizmat m: sub a() { class :: does Associative { method AT-KEY($a) { $a.uc } }.new }; say a<foo bar>:p
camelia rakudo-moar 19df35: OUTPUT«()␤»
lizmat hmmm
m: sub a() { class :: does Associative { method AT-KEY($a) { $a.uc } }.new }; say (a<foo bar>:p)
camelia rakudo-moar 19df35: OUTPUT«()␤»
lizmat m: sub a() { class :: does Associative { method AT-KEY($a) { $a.uc } }.new }; say a<foo bar> :p 20:50
camelia rakudo-moar 19df35: OUTPUT«()␤»
lizmat hmmm
21:38 pyrimidine joined
jnthn samcv: Bit too tired now to review the PR, but will do it when I wake up tomorrow :) 23:55
samcv jnthn, can you read the updated PR description?
tell me if i need to add anything