01:40 pyrimidi_ joined 02:48 ilbot3 joined 02:52 FROGGS_ joined 03:23 TimToady joined 04:31 geekosaur joined 05:10 pyrimidine joined 08:00 domidumont joined 08:05 domidumont joined
dalek arVM: 8b31d97 | samcv++ | / (3 files):
Fix RT #122471 and #122470 return <control-0000> for \0 and other controls

RT: rt.perl.org/Ticket/Display.html?id=122471
  rt.perl.org/Ticket/Display.html?id=122470
We now pass several tests we were not passing before in uniname.t
11:47
synopsebot6 Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122471
Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122470
arVM: 3dc5647 | jnthn++ | / (3 files):
Merge pull request #469 from samcv/uniname_no1

Fix RT #122471 and #122470 return <control-0000> for \0 and other controls
synopsebot6 Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122471
Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122470
samcv thanks jnthn 11:48
jnthn Thank you! :-) I'll try and find a moment to look at #468 soon also. 11:49
samcv oh one thing though
so there is this bug that has been in it for like
at least 1 year. where space is not space
m: say ' ' ~~ /<:space>/
camelia rakudo-moar 340bc9: OUTPUT«「 」␤»
samcv this ONLY works because it matches the space property of Line_Break 11:50
and it aliases SP to space and Space
if i change the alias in the unicode property _VALUE_ file, it totally breaks that
so i've gotten it fixed so White_Space == space. but
then it still is broken doing ' ' ~~ /<:space>/ 11:51
and i cannot figure out why :(
but i believe thet other PR passes all spectests, and still has that bug.
which also means the spectests pass :P
i also encountered the opposite case, where ' ' ~~ /<:White_Space>/ would be 'wrong' (aka it would seem to work for 0x20) but then ' ' ~~ /<:space>/ would be broken. though uniprop-bool would return the correct value for both 11:52
jnthn I think I uncovered something along these lines a while back 11:53
samcv ALSO if I hack the script to change the White_Space property to the 'space' property (so that's its primary name), then change MVM_CC_WHITE_SPACE(something like that) to MVM_CC_SPACE and recompile
it all works flawlessly
and all tests pass
jnthn Lemme try and find it...
samcv though it is a workaround
jnthn Yeah, iirc we're being a bit too lenient on what properties we accept without a name qualifier 11:54
samcv that's the only _consistant_ way i was able to fix the bug.
but it is quite bad that we think 'space' is somehow a property VALUE alias to SP
m: ' '.uniprop('Line_Break').say 11:55
camelia rakudo-moar 340bc9: OUTPUT«SP␤»
samcv which is the reason <:space> works. 'space' is one of the whitespace canonical names. while the SP property, it is aliased to Space with capital
jnthn aha, this: github.com/perl6/specs/issues/118 11:56
I think the badness may boil down to the thing <:space> compiles into
Which iirc is a lookup asking "what property name has a value of this name" or some such, which is ambiguous. 11:57
samcv though what is weird
jnthn I ran into this when realized that *some* regenerations of of the Unicode DB failed spectests
samcv ' '.uniprop-bool('space') works
but <:space> doesn't work. actually
<:space> IS OPPOSITE
jnthn And then the next re-run on the same input data worked 11:58
samcv and will match non whitespace and not match whitespace
jnthn Becuase hash randomization screwed up the order
samcv yeah
jnthn I think I discovered this while trying to get it to be resistant to hash randomization and always write out thigns in the same order
samcv i notice that too
jnthn That patch may be in a branch somewhere
samcv what patch?
to fix?
jnthn No
To get consistent order
samcv ah 11:59
jnthn Which...uh...provided consistent breakage.
samcv just add sort in front of all the 'keys'?
:)
yes
jnthn I think it needed a couple of places
samcv well i put sort everywhere keys was
and then it was always broken
and it was great :)
jnthn github.com/MoarVM/MoarVM/commit/22...f17661a58c
samcv but the space thing was the most pervasive problem… 12:00
but ONLY in regex
which i think is a regex bug
related to that thing you linked
<:Ll> is not a unicode property, it's a value.
and other things etc
jnthn *nod*
samcv general categories are distinct though
jnthn There's a useful answer on that issue I linked also, from nova patch
samcv but who wants to match the SP property of Line_Break without specifying it
jnthn "No regex engine allows for arbitrary property values for all properties without the associated names, due to the obvious conflicts." 12:01
samcv oh Also, supporting Script instead of Script_Extension would be a mistake since the latter is generally what people expect and should be encouraged over Script. I p
+1 for that
jnthn This isn't true. Ours apparently does. :P :P
But I agree it really shouldn't. :)
samcv matches extended script?
do we even parse that unicode file? 12:02
jnthn No, I meant the "allows for arbitrary property values" thing I quoted.
samcv ah
so i guess it looks up the SP property first
SP => Space... though....
jnthn Yeah, and the reason it gets that first is, iirc, because we get lucky with the ordering. :S 12:03
samcv if i edit the unicode file and change it from SP;Space to SP; fakeeeee
then it still breaks
ah
yes
so i don't know what it's looking up for that
also how to fix it so that it uh. isn't broken
i have been working hard trying to get things to work 12:04
and i sort of figured out where it breaks
but it's like "everything is ok here" #####MAGIC HERE#####
then on the other side it just is totally either screwed up totally or like ok-ishhhhhh
let me find the line
jnthn I wonder if we can fix it by changing how <:Foo> (that is, just a value) is compiled 12:05
So that it just considered general category, and script extension, and nothing else. 12:06
*considers
samcv uh emit_unicode_property_value_keypairs 12:07
what about boolean properties
like space?
jnthn github.com/perl6/nqp/blob/master/s....nqp#L1360 # this is where the compilation happens, fwiw
samcv property NAMES don't interfere with script or uhm 12:08
YES
i looked at that
ahh
yep
was hard to understand :P
jnthn op('unipropcode', $pcode, $pname), 12:09
op('unipvalcode', $pvcode, $pcode, $pname),
samcv but i think we should match binary properties
jnthn That I think is where it's a bit dubious
samcv script names
uh
and general category
and probably script extensions too
jnthn Are binary properties reliably unambiguous with general category and script extenion?
*extension 12:10
samcv yes
unles you count Sc and sc
that is the only exception
but you shouldn't changecase for names that are only 2 letters anyway
jnthn wonders if we'll end up wiht more exceptions in the future :)
samcv uh 12:11
that's not really an exception
you're only supposed to allow lowercase and no underscore for names that have an underscore
prettty sure
will have to see which TR said that
so like WSpace, space, White_Space are official names. so you can do also 12:12
whitespace, white_space
or WhiteSpace
jnthn Ah, I see. 12:13
samcv actually i think the better rule is the stuff in the 2nd column of the unicode property aliases file
anything in the 2nd column, the long name, you can do that with 12:14
for sure
but the 1st column you can't
let me try and find it
12:15 pyrimidine joined
samcv Loose matching should be applied to all property names and property values, with 12:15
# the exception of String Property values.
With loose matching of property names and
# values, the case distinctions, whitespace, and '_' are ignored. For Numeric Property
# values, numeric equivalencies are applied: thus "01.00" is equivalent to "1"
jnthn Last I looked at the script, I think we cheated in case a bit also. 12:16
samcv yeah it lowercases them 12:17
takes out _ etc
jnthn Yup
Just covers the common ways you might write it 12:18
But not truly case insensitive
samcv but the property value and property name aliases shouldn't be in the same stnructure maybe? idk
jnthn That's probably the least of our troubles at the moment, however. :)
samcv seemed weird
jnthn Yeah
samcv yeah :)
jnthn I guess...
samcv it creates a hash with the data there right? 12:19
so there would be collisions or?
jnthn At the point we compile the regex we can actually case-analyze <:Foo> for if it's a general category name, a script extension name, or a boolean property name
samcv is it just a normal kind of hash, with keys and values?
jnthn I think it's actually not a hash but instead does some kind of binary search 12:20
But I may be misremembering
samcv also i'm not sure how it looks up the property names 12:21
for a given name
jnthn udc2c.pl and the Unicode database stuff is one of the handful of bits of MoarVM that I didn't either write in the first place or significantly rewrite somewhere along the lines. :-)
samcv and what the numbers in unicode_property_value_keypairs mean
heh
jnthn And its workings have mystified me a few times too :P
Also I didn't touch it for a few months. I think that it boils down to something like: each char in the database has a bitfield, which stores bit-packed representations of property values 12:22
So looking up a property name I believe resolves to an index into a table that specifies the relevant bits to extract 12:23
samcv dammit rt.perl.org isn't posting my emails 12:24
jnthn And then the integer those bits make up provides a way to do a lookup in a property values table
samcv yeah that's what i sort of thought
though. does the order of the pairs in it matter
is what i want to know
(in the C file)
i mean i only got everything working fine when the numbers were the same 12:25
like when i manually changed the in the script (when it saw 'White_Space' it changed it to 'space')
then all the 'space' and "White_Space" whatever pairs were the same numbers for both the property and the property value datastructures in unicode_db.c 12:26
and i've noticed all the ones that i had to workaround in rakudo didn't match either. so
but i still don't know if the order matters at all, or maybe it shouldn't matter if all of them don't collide
because i've seen the same key, have a different value in the same structure. so higher up would be {"space", 21} then lower down {"space", 120} 12:27
jnthn Aha, reading MVM_unicode_get_property_str in unicode_db.c is somewhat informative 12:28
samcv jnthn, github.com/MoarVM/MoarVM/blob/mast...ops.c#L139
jnthn (And int)
samcv yeah it is
but i want to know how it gets the values aside from that.
jnthn Oh yes, that loosk familiar 12:29
samcv tell me what it does!
well
jnthn heh
It makes a hash
samcv what happens if there are multiple same keys
jnthn By going through a table
samcv with different values in the table
jnthn Latest entry wins
samcv ok so last one
jnthn Yup
But note it does through the table in reverse order too
samcv so keep regenerating until all the roast tests passes?
:P
jnthn (For no particular reason that I can tell)
samcv ah k 12:30
jnthn Well yes, that's why regenerating passes things :P
But it's still because we're too liberal with regards to processing <:Foo> style things, so far as I understand.
samcv so yeah. if we fix that. i maybe have fixed the problem
we can see i guess
let me recompile that 'fixed' i think version 12:31
jnthn Yeah, my feeling is if we can fix that form to only consider general category values, script values, and boolean property names we're good.
Unless spectests rely on the previous more liberal interpretation /o\
samcv then they should feel bad! 12:32
12:32 Ven joined
samcv i don't *think* they do 12:32
but we will only know once we change it I think
jnthn Indeed.
samcv oh yeah k
jnthn Anyway, I'm +1 to changing that. I wonder if it's best to try and do it in the regex compilation
samcv yeah it is fixed
even ' ' ~~ /<:space>/ works 12:33
jnthn With what fix? :)
samcv magic
let me push it to my fork 12:34
github.com/samcv/MoarVM/commit/960...539ffa53a1 12:35
oh wait maybe not that one idk
there are two commits
it at least works in the most recent one
github.com/samcv/MoarVM/commits/working
the one just called 'a' it at least works atm. let me run the two tests which caused problems 12:36
through all the fiddling there were two test files that would stop pasing if things went worng
i should change that from "terrible workaround" to "amazing workaround because it works"
though it's not like. super great. but working is working 12:37
jnthn :-)
True
samcv oh yes they pass 12:38
let me try the one before commit called 'a'
just commited quickly cause it worked… haha
i *think* the one i said was fully working maybe wasn't working so i made another commit? or both work 12:39
idk
either both work or the newest one works
after working on it all day and most of the time the breakage being caused by nothing but seemly chance it gets harder to tell. but i know the change i made with the MVM_UNICODE_PROPERTY_SPACE is the only thing that didn't re-break by running it again 12:40
and changing other things
which is a good place to start if you can finally get it reproducible
jnthn Indeed 12:41
samcv ok no it's only the most recent one 'a' that passes
where it looks like i changed it back… 12:42
well. it works. 12:43
let me see what else i changed in that
well i ended up with a different number of keypairs 12:44
that is 6 things smaller
oh here jnthn github.com/samcv/MoarVM/commit/2f5...cae8fL1005
i remember now
and then once i did that' some of the things errored so i added github.com/samcv/MoarVM/commit/2f5...cae8fL1138 12:45
changed this die into an if condition
jnthn o.O 12:51
Ugh. What a headache. :S
samcv yes
jnthn Oh heck, thinking about the <:Foo> code-gen again... 12:55
op('unipropcode', $pcode, $pname),
It seems to be feeding a property value in there 12:57
And relying on us having polluted the property names table with property values
That is, $pname is potentally something like Ll
samcv yeah
it is really. really not good 12:58
values, properties? who cares! mash em together
jnthn Indeed :(
I wonder if we *only* rely on it in that one place
samcv which line
well.
yeah which line :D
jnthn The one in NQP code-gen that I referenced
jnthn finds it again :) 12:59
samcv oh the one you linked? in nqp?
ah
yeah i remember that one
i thought you meant in moar
JimmyZ jnthn: my PR 459 needs your review :) 13:00
jnthn github.com/perl6/nqp/blob/master/s....nqp#L1404 13:01
Note how it uses $pname in both lookups
samcv also jnthn what does this merge_ins do
and what does it do with uh. these things in that list 13:02
like i can see what it calls but not really what it does with it
jnthn op means "emit a MoarVM op"
samcv oh no
oh ok
jnthn 'unipropcode' is the instruction name
samcv ok that makes more sense now
jnthn $pcode, $pname and registers 13:03
merge_ins just means "stick this array of instructions into this other array of instructions"
Like "append" in Perl 6
Earlier in the method we do things like 13:04
my $pcode := $!regalloc.fresh_i();
Which allocates a register
If you want to see the output, then write a /<:Ll>/ or so, and run nqp --target=mbc --output=x.moarvm 13:05
And then moar --dump x.moarvm
You can do it with perl6 instead of nqp too, but the nqp code has less clutter around the stuff you'll want to see, and we re-use the same code-gen path for both.
JimmyZ: Will try and get to that soonish :) 13:07
Time to make lunch; bbl :)
samcv ok i just checked it _does_ fail something but it's only failing <:Greek>
all the other scripts work fine
idk there's something about space and greek
well greek is a block AND a uh
script 13:08
m: my $a = "\c[GREEK LETTER SMALL CAPITAL GAMMA]"; $a.uniprop('Script').say; $a.uniprop('Block').say 13:09
camelia rakudo-moar 4724bd: OUTPUT«Greek␤Phonetic Extensions␤»
samcv yeah it fails that regex test, but it gets those properties fine with uniprop 13:10
so the problem isn't in nqp
butttt <:Script<Greek>> works fine 13:12
so i think nqp and moar problem here
thanks for that time jnthn about using nqp instead of perl6. i was dumping the perl 6 and there were so many things 13:16
i gotta go to bed now. talk to you soon jnthn :) 13:36
nine /win 14 13:42
JimmyZ jnthn: thanks ;) 13:52
14:21 pyrimidine joined 15:00 brrt joined 15:03 stmuk joined
dalek arVM/even-moar-jit: a4632df | brrt++ | / (3 files):
Make linear_scan allocator public

Also fix a number of minor things, and add some support for register specs. I'm not yet sure how to deal with conflicting register requirements.
15:09
arVM/even-moar-jit: 842d5a7 | brrt++ | src/jit/ (2 files):
Fix some of the more obvious bugs in linear_scan
brrt nasty bugses 15:15
16:12 japhb_ joined 18:14 dogbert17 joined 18:35 domidumont joined 19:27 FROGGS joined 19:55 pyrimidine joined 23:02 nebuchadnezzar joined 23:55 vendethiel joined