01:40
pyrimidi_ joined
02:48
ilbot3 joined
02:52
FROGGS_ joined
03:23
TimToady joined
04:31
geekosaur joined
05:10
pyrimidine joined
08:00
domidumont joined
08:05
domidumont joined
|
|||
dalek | arVM: 8b31d97 | samcv++ | / (3 files): Fix RT #122471 and #122470 return <control-0000> for \0 and other controls RT: rt.perl.org/Ticket/Display.html?id=122471 rt.perl.org/Ticket/Display.html?id=122470 We now pass several tests we were not passing before in uniname.t |
11:47 | |
synopsebot6 | Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122471 | ||
Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122470 | |||
arVM: 3dc5647 | jnthn++ | / (3 files): Merge pull request #469 from samcv/uniname_no1 Fix RT #122471 and #122470 return <control-0000> for \0 and other controls |
|||
synopsebot6 | Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122471 | ||
Link: rt.perl.org/rt3//Public/Bug/Displa...?id=122470 | |||
samcv | thanks jnthn | 11:48 | |
jnthn | Thank you! :-) I'll try and find a moment to look at #468 soon also. | 11:49 | |
samcv | oh one thing though | ||
so there is this bug that has been in it for like | |||
at least 1 year. where space is not space | |||
m: say ' ' ~~ /<:space>/ | |||
camelia | rakudo-moar 340bc9: OUTPUT«「 」» | ||
samcv | this ONLY works because it matches the space property of Line_Break | 11:50 | |
and it aliases SP to space and Space | |||
if i change the alias in the unicode property _VALUE_ file, it totally breaks that | |||
so i've gotten it fixed so White_Space == space. but | |||
then it still is broken doing ' ' ~~ /<:space>/ | 11:51 | ||
and i cannot figure out why :( | |||
but i believe thet other PR passes all spectests, and still has that bug. | |||
which also means the spectests pass :P | |||
i also encountered the opposite case, where ' ' ~~ /<:White_Space>/ would be 'wrong' (aka it would seem to work for 0x20) but then ' ' ~~ /<:space>/ would be broken. though uniprop-bool would return the correct value for both | 11:52 | ||
jnthn | I think I uncovered something along these lines a while back | 11:53 | |
samcv | ALSO if I hack the script to change the White_Space property to the 'space' property (so that's its primary name), then change MVM_CC_WHITE_SPACE(something like that) to MVM_CC_SPACE and recompile | ||
it all works flawlessly | |||
and all tests pass | |||
jnthn | Lemme try and find it... | ||
samcv | though it is a workaround | ||
jnthn | Yeah, iirc we're being a bit too lenient on what properties we accept without a name qualifier | 11:54 | |
samcv | that's the only _consistant_ way i was able to fix the bug. | ||
but it is quite bad that we think 'space' is somehow a property VALUE alias to SP | |||
m: ' '.uniprop('Line_Break').say | 11:55 | ||
camelia | rakudo-moar 340bc9: OUTPUT«SP» | ||
samcv | which is the reason <:space> works. 'space' is one of the whitespace canonical names. while the SP property, it is aliased to Space with capital | ||
jnthn | aha, this: github.com/perl6/specs/issues/118 | 11:56 | |
I think the badness may boil down to the thing <:space> compiles into | |||
Which iirc is a lookup asking "what property name has a value of this name" or some such, which is ambiguous. | 11:57 | ||
samcv | though what is weird | ||
jnthn | I ran into this when realized that *some* regenerations of of the Unicode DB failed spectests | ||
samcv | ' '.uniprop-bool('space') works | ||
but <:space> doesn't work. actually | |||
<:space> IS OPPOSITE | |||
jnthn | And then the next re-run on the same input data worked | 11:58 | |
samcv | and will match non whitespace and not match whitespace | ||
jnthn | Becuase hash randomization screwed up the order | ||
samcv | yeah | ||
jnthn | I think I discovered this while trying to get it to be resistant to hash randomization and always write out thigns in the same order | ||
samcv | i notice that too | ||
jnthn | That patch may be in a branch somewhere | ||
samcv | what patch? | ||
to fix? | |||
jnthn | No | ||
To get consistent order | |||
samcv | ah | 11:59 | |
jnthn | Which...uh...provided consistent breakage. | ||
samcv | just add sort in front of all the 'keys'? | ||
:) | |||
yes | |||
jnthn | I think it needed a couple of places | ||
samcv | well i put sort everywhere keys was | ||
and then it was always broken | |||
and it was great :) | |||
jnthn | github.com/MoarVM/MoarVM/commit/22...f17661a58c | ||
samcv | but the space thing was the most pervasive problem… | 12:00 | |
but ONLY in regex | |||
which i think is a regex bug | |||
related to that thing you linked | |||
<:Ll> is not a unicode property, it's a value. | |||
and other things etc | |||
jnthn | *nod* | ||
samcv | general categories are distinct though | ||
jnthn | There's a useful answer on that issue I linked also, from nova patch | ||
samcv | but who wants to match the SP property of Line_Break without specifying it | ||
jnthn | "No regex engine allows for arbitrary property values for all properties without the associated names, due to the obvious conflicts." | 12:01 | |
samcv | oh Also, supporting Script instead of Script_Extension would be a mistake since the latter is generally what people expect and should be encouraged over Script. I p | ||
+1 for that | |||
jnthn | This isn't true. Ours apparently does. :P :P | ||
But I agree it really shouldn't. :) | |||
samcv | matches extended script? | ||
do we even parse that unicode file? | 12:02 | ||
jnthn | No, I meant the "allows for arbitrary property values" thing I quoted. | ||
samcv | ah | ||
so i guess it looks up the SP property first | |||
SP => Space... though.... | |||
jnthn | Yeah, and the reason it gets that first is, iirc, because we get lucky with the ordering. :S | 12:03 | |
samcv | if i edit the unicode file and change it from SP;Space to SP; fakeeeee | ||
then it still breaks | |||
ah | |||
yes | |||
so i don't know what it's looking up for that | |||
also how to fix it so that it uh. isn't broken | |||
i have been working hard trying to get things to work | 12:04 | ||
and i sort of figured out where it breaks | |||
but it's like "everything is ok here" #####MAGIC HERE##### | |||
then on the other side it just is totally either screwed up totally or like ok-ishhhhhh | |||
let me find the line | |||
jnthn | I wonder if we can fix it by changing how <:Foo> (that is, just a value) is compiled | 12:05 | |
So that it just considered general category, and script extension, and nothing else. | 12:06 | ||
*considers | |||
samcv | uh emit_unicode_property_value_keypairs | 12:07 | |
what about boolean properties | |||
like space? | |||
jnthn | github.com/perl6/nqp/blob/master/s....nqp#L1360 # this is where the compilation happens, fwiw | ||
samcv | property NAMES don't interfere with script or uhm | 12:08 | |
YES | |||
i looked at that | |||
ahh | |||
yep | |||
was hard to understand :P | |||
jnthn | op('unipropcode', $pcode, $pname), | 12:09 | |
op('unipvalcode', $pvcode, $pcode, $pname), | |||
samcv | but i think we should match binary properties | ||
jnthn | That I think is where it's a bit dubious | ||
samcv | script names | ||
uh | |||
and general category | |||
and probably script extensions too | |||
jnthn | Are binary properties reliably unambiguous with general category and script extenion? | ||
*extension | 12:10 | ||
samcv | yes | ||
unles you count Sc and sc | |||
that is the only exception | |||
but you shouldn't changecase for names that are only 2 letters anyway | |||
jnthn wonders if we'll end up wiht more exceptions in the future :) | |||
samcv | uh | 12:11 | |
that's not really an exception | |||
you're only supposed to allow lowercase and no underscore for names that have an underscore | |||
prettty sure | |||
will have to see which TR said that | |||
so like WSpace, space, White_Space are official names. so you can do also | 12:12 | ||
whitespace, white_space | |||
or WhiteSpace | |||
jnthn | Ah, I see. | 12:13 | |
samcv | actually i think the better rule is the stuff in the 2nd column of the unicode property aliases file | ||
anything in the 2nd column, the long name, you can do that with | 12:14 | ||
for sure | |||
but the 1st column you can't | |||
let me try and find it | |||
12:15
pyrimidine joined
|
|||
samcv | Loose matching should be applied to all property names and property values, with | 12:15 | |
# the exception of String Property values. | |||
With loose matching of property names and | |||
# values, the case distinctions, whitespace, and '_' are ignored. For Numeric Property | |||
# values, numeric equivalencies are applied: thus "01.00" is equivalent to "1" | |||
jnthn | Last I looked at the script, I think we cheated in case a bit also. | 12:16 | |
samcv | yeah it lowercases them | 12:17 | |
takes out _ etc | |||
jnthn | Yup | ||
Just covers the common ways you might write it | 12:18 | ||
But not truly case insensitive | |||
samcv | but the property value and property name aliases shouldn't be in the same stnructure maybe? idk | ||
jnthn | That's probably the least of our troubles at the moment, however. :) | ||
samcv | seemed weird | ||
jnthn | Yeah | ||
samcv | yeah :) | ||
jnthn | I guess... | ||
samcv | it creates a hash with the data there right? | 12:19 | |
so there would be collisions or? | |||
jnthn | At the point we compile the regex we can actually case-analyze <:Foo> for if it's a general category name, a script extension name, or a boolean property name | ||
samcv | is it just a normal kind of hash, with keys and values? | ||
jnthn | I think it's actually not a hash but instead does some kind of binary search | 12:20 | |
But I may be misremembering | |||
samcv | also i'm not sure how it looks up the property names | 12:21 | |
for a given name | |||
jnthn | udc2c.pl and the Unicode database stuff is one of the handful of bits of MoarVM that I didn't either write in the first place or significantly rewrite somewhere along the lines. :-) | ||
samcv | and what the numbers in unicode_property_value_keypairs mean | ||
heh | |||
jnthn | And its workings have mystified me a few times too :P | ||
Also I didn't touch it for a few months. I think that it boils down to something like: each char in the database has a bitfield, which stores bit-packed representations of property values | 12:22 | ||
So looking up a property name I believe resolves to an index into a table that specifies the relevant bits to extract | 12:23 | ||
samcv | dammit rt.perl.org isn't posting my emails | 12:24 | |
jnthn | And then the integer those bits make up provides a way to do a lookup in a property values table | ||
samcv | yeah that's what i sort of thought | ||
though. does the order of the pairs in it matter | |||
is what i want to know | |||
(in the C file) | |||
i mean i only got everything working fine when the numbers were the same | 12:25 | ||
like when i manually changed the in the script (when it saw 'White_Space' it changed it to 'space') | |||
then all the 'space' and "White_Space" whatever pairs were the same numbers for both the property and the property value datastructures in unicode_db.c | 12:26 | ||
and i've noticed all the ones that i had to workaround in rakudo didn't match either. so | |||
but i still don't know if the order matters at all, or maybe it shouldn't matter if all of them don't collide | |||
because i've seen the same key, have a different value in the same structure. so higher up would be {"space", 21} then lower down {"space", 120} | 12:27 | ||
jnthn | Aha, reading MVM_unicode_get_property_str in unicode_db.c is somewhat informative | 12:28 | |
samcv | jnthn, github.com/MoarVM/MoarVM/blob/mast...ops.c#L139 | ||
jnthn | (And int) | ||
samcv | yeah it is | ||
but i want to know how it gets the values aside from that. | |||
jnthn | Oh yes, that loosk familiar | 12:29 | |
samcv | tell me what it does! | ||
well | |||
jnthn | heh | ||
It makes a hash | |||
samcv | what happens if there are multiple same keys | ||
jnthn | By going through a table | ||
samcv | with different values in the table | ||
jnthn | Latest entry wins | ||
samcv | ok so last one | ||
jnthn | Yup | ||
But note it does through the table in reverse order too | |||
samcv | so keep regenerating until all the roast tests passes? | ||
:P | |||
jnthn | (For no particular reason that I can tell) | ||
samcv | ah k | 12:30 | |
jnthn | Well yes, that's why regenerating passes things :P | ||
But it's still because we're too liberal with regards to processing <:Foo> style things, so far as I understand. | |||
samcv | so yeah. if we fix that. i maybe have fixed the problem | ||
we can see i guess | |||
let me recompile that 'fixed' i think version | 12:31 | ||
jnthn | Yeah, my feeling is if we can fix that form to only consider general category values, script values, and boolean property names we're good. | ||
Unless spectests rely on the previous more liberal interpretation /o\ | |||
samcv | then they should feel bad! | 12:32 | |
12:32
Ven joined
|
|||
samcv | i don't *think* they do | 12:32 | |
but we will only know once we change it I think | |||
jnthn | Indeed. | ||
samcv | oh yeah k | ||
jnthn | Anyway, I'm +1 to changing that. I wonder if it's best to try and do it in the regex compilation | ||
samcv | yeah it is fixed | ||
even ' ' ~~ /<:space>/ works | 12:33 | ||
jnthn | With what fix? :) | ||
samcv | magic | ||
let me push it to my fork | 12:34 | ||
github.com/samcv/MoarVM/commit/960...539ffa53a1 | 12:35 | ||
oh wait maybe not that one idk | |||
there are two commits | |||
it at least works in the most recent one | |||
github.com/samcv/MoarVM/commits/working | |||
the one just called 'a' it at least works atm. let me run the two tests which caused problems | 12:36 | ||
through all the fiddling there were two test files that would stop pasing if things went worng | |||
i should change that from "terrible workaround" to "amazing workaround because it works" | |||
though it's not like. super great. but working is working | 12:37 | ||
jnthn | :-) | ||
True | |||
samcv | oh yes they pass | 12:38 | |
let me try the one before commit called 'a' | |||
just commited quickly cause it worked… haha | |||
i *think* the one i said was fully working maybe wasn't working so i made another commit? or both work | 12:39 | ||
idk | |||
either both work or the newest one works | |||
after working on it all day and most of the time the breakage being caused by nothing but seemly chance it gets harder to tell. but i know the change i made with the MVM_UNICODE_PROPERTY_SPACE is the only thing that didn't re-break by running it again | 12:40 | ||
and changing other things | |||
which is a good place to start if you can finally get it reproducible | |||
jnthn | Indeed | 12:41 | |
samcv | ok no it's only the most recent one 'a' that passes | ||
where it looks like i changed it back… | 12:42 | ||
well. it works. | 12:43 | ||
let me see what else i changed in that | |||
well i ended up with a different number of keypairs | 12:44 | ||
that is 6 things smaller | |||
oh here jnthn github.com/samcv/MoarVM/commit/2f5...cae8fL1005 | |||
i remember now | |||
and then once i did that' some of the things errored so i added github.com/samcv/MoarVM/commit/2f5...cae8fL1138 | 12:45 | ||
changed this die into an if condition | |||
jnthn | o.O | 12:51 | |
Ugh. What a headache. :S | |||
samcv | yes | ||
jnthn | Oh heck, thinking about the <:Foo> code-gen again... | 12:55 | |
op('unipropcode', $pcode, $pname), | |||
It seems to be feeding a property value in there | 12:57 | ||
And relying on us having polluted the property names table with property values | |||
That is, $pname is potentally something like Ll | |||
samcv | yeah | ||
it is really. really not good | 12:58 | ||
values, properties? who cares! mash em together | |||
jnthn | Indeed :( | ||
I wonder if we *only* rely on it in that one place | |||
samcv | which line | ||
well. | |||
yeah which line :D | |||
jnthn | The one in NQP code-gen that I referenced | ||
jnthn finds it again :) | 12:59 | ||
samcv | oh the one you linked? in nqp? | ||
ah | |||
yeah i remember that one | |||
i thought you meant in moar | |||
JimmyZ | jnthn: my PR 459 needs your review :) | 13:00 | |
jnthn | github.com/perl6/nqp/blob/master/s....nqp#L1404 | 13:01 | |
Note how it uses $pname in both lookups | |||
samcv | also jnthn what does this merge_ins do | ||
and what does it do with uh. these things in that list | 13:02 | ||
like i can see what it calls but not really what it does with it | |||
jnthn | op means "emit a MoarVM op" | ||
samcv | oh no | ||
oh ok | |||
jnthn | 'unipropcode' is the instruction name | ||
samcv | ok that makes more sense now | ||
jnthn | $pcode, $pname and registers | 13:03 | |
merge_ins just means "stick this array of instructions into this other array of instructions" | |||
Like "append" in Perl 6 | |||
Earlier in the method we do things like | 13:04 | ||
my $pcode := $!regalloc.fresh_i(); | |||
Which allocates a register | |||
If you want to see the output, then write a /<:Ll>/ or so, and run nqp --target=mbc --output=x.moarvm | 13:05 | ||
And then moar --dump x.moarvm | |||
You can do it with perl6 instead of nqp too, but the nqp code has less clutter around the stuff you'll want to see, and we re-use the same code-gen path for both. | |||
JimmyZ: Will try and get to that soonish :) | 13:07 | ||
Time to make lunch; bbl :) | |||
samcv | ok i just checked it _does_ fail something but it's only failing <:Greek> | ||
all the other scripts work fine | |||
idk there's something about space and greek | |||
well greek is a block AND a uh | |||
script | 13:08 | ||
m: my $a = "\c[GREEK LETTER SMALL CAPITAL GAMMA]"; $a.uniprop('Script').say; $a.uniprop('Block').say | 13:09 | ||
camelia | rakudo-moar 4724bd: OUTPUT«GreekPhonetic Extensions» | ||
samcv | yeah it fails that regex test, but it gets those properties fine with uniprop | 13:10 | |
so the problem isn't in nqp | |||
butttt <:Script<Greek>> works fine | 13:12 | ||
so i think nqp and moar problem here | |||
thanks for that time jnthn about using nqp instead of perl6. i was dumping the perl 6 and there were so many things | 13:16 | ||
i gotta go to bed now. talk to you soon jnthn :) | 13:36 | ||
nine | /win 14 | 13:42 | |
JimmyZ | jnthn: thanks ;) | 13:52 | |
14:21
pyrimidine joined
15:00
brrt joined
15:03
stmuk joined
|
|||
dalek | arVM/even-moar-jit: a4632df | brrt++ | / (3 files): Make linear_scan allocator public Also fix a number of minor things, and add some support for register specs. I'm not yet sure how to deal with conflicting register requirements. |
15:09 | |
arVM/even-moar-jit: 842d5a7 | brrt++ | src/jit/ (2 files): Fix some of the more obvious bugs in linear_scan |
|||
brrt | nasty bugses | 15:15 | |
16:12
japhb_ joined
18:14
dogbert17 joined
18:35
domidumont joined
19:27
FROGGS joined
19:55
pyrimidine joined
23:02
nebuchadnezzar joined
23:55
vendethiel joined
|