01:48
ilbot3 joined
04:09
nebuchadnezzar joined
04:27
zakharyas joined
05:49
domidumont joined
05:52
domidumont joined
07:13
brrt joined
|
|||
jnthn | D'oh. So, I wake up, shower pondering what on earth might be wrong with ucd2c.pl, grab breakfast, come to computer to read the news before the day's work...and discover it's a national holiday here today. :P | 08:51 | |
Anyway, figure I'll work the day and take some other day off next week when family visit. Otherwise I'll just spend the day sitting around wondering why a darn Unicode database import script is busted :P | 08:53 | ||
nwc10 | but given the good news that I belatedly read here news.perlfoundation.org/2016/09/jon...ent-g.html | 08:54 | |
wasn't today a non-work-work day anyway? | |||
anyway, your "move the holiday" sounds like an excellent plan | 08:55 | ||
jnthn somehow suspects that in terms of technical difficulty, the stuff he does for his TPF grant is harder work than this other work :) | |||
More fun too, though :) | |||
dalek | arVM/unicode9: ed685ed | jnthn++ | tools/ucd2c.pl: Tweak ucd2c.pl to produce deterministic output. Before, the output produced depended on hash ordering, which meant two runs on the exact same Unicode database could produce wildly different output. This doesn't help when trying to debug. |
09:18 | |
arVM/unicode9: a94e443 | jnthn++ | tools/ucd2c.pl: Tweak ucd2c.pl to produce deterministic output. Before, the output produced depended on hash ordering, which meant two runs on the exact same Unicode database could produce wildly different output. This doesn't help when trying to debug. |
09:22 | ||
arVM/unicode9: fb29f18 | jnthn++ | src/strings/unicode_ (2 files): Update to the Unicode 9 character database. |
|||
lizmat | jnthn++ but /me wonders why that isn't a perl 6 script :-) | ||
jnthn | lizmat: Because it wasn't feasible to make it one at the time it was written | ||
nwc10 | but 'patches soon welcome' (once the current problem is fixed) ? | ||
lizmat | jnthn: feels like LHF for someone else :-) | 09:23 | |
jnthn | heh | ||
If 1600+ lines of Unicode database parsing, bit-field computation, generating C code, and so forth is appealing to anyone, then yes :P | 09:24 | ||
lizmat | well, at least now it's deterministic :-) | 09:26 | |
so debugging it should be a piece of cake :-) | |||
jnthn | Right, so the port should spit out the same thing :P | ||
Which will help, yes. | |||
stmuk_ | that *BSD crash appears to happen on more bleading edge linux distros (tumbleweed & arc according to reports) although I still fail to reproduce on linux :( | 09:27 | |
jnthn | Oh no, please say this isn't going to be that libc lock ellision hardware bug? :/ | 09:28 | |
Well, maybe software bug | |||
I forget the details | |||
stmuk_ | use of zsh seems common to reports (although many people use it anyway) and I played around with ulimits and zsh but nothing | 09:29 | |
or perhaps fs (ufs/brfs?) although I tried brfs and it seemed to work fine | 09:32 | ||
jnthn | bugs.debian.org/cgi-bin/bugreport....bug=800574 is the issue I was thinking of, fwiw | 09:35 | |
stmuk_ | eww | 09:36 | |
jnthn | Yes, if it's that then very :( indeed | 09:41 | |
But I've no idea if this is the libc used on BSDs? | 09:42 | ||
stmuk_ | the BSDs use their own libc not glibc (its also related to the android one) | 09:44 | |
jnthn | Hmm | 09:56 | |
So, emit_unicode_property_keypairs loads both property and property value aliases | |||
The problem we have is that some property aliases and property value aliases overlap | 09:57 | ||
lizmat | so, is that a unicode problem? or a perl scripting issue ? | 09:59 | |
jnthn | Well, we seem to emit two tables | 10:00 | |
emit_unicode_property_keypairs does the one in question, and emit_unicode_property_value_keypairs does another | |||
10:00
zakharyas joined
|
|||
jnthn | It's unclear why we're paying attention to property *value* aliases in the former | 10:00 | |
Though not doing so fails us an incredible number of tests | 10:03 | ||
m: say ' ' ~~ /<:space>/ | 10:26 | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«ļ½¢ ļ½£ā¤Ā» | ||
jnthn | m: say uniprop(' ', 'Space') | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«SPā¤Ā» | ||
jnthn | m: say uniprop(' ', 'LineBreak') | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«SPā¤Ā» | ||
jnthn | m: say uniprop(' ', 'space') | 10:28 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«SPā¤Ā» | ||
jnthn | m: say uniprop(' ', 'WSpace') | 10:29 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«1ā¤Ā» | ||
jnthn | Gah, it gets worse | 10:40 | |
There are other collisions too | |||
Heh, and at the top of PropertyValueAliases.txt it explains all of this | 10:43 | ||
m: say uniprop("x", "AL") | 10:48 | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«Lā¤Ā» | ||
jnthn | m: say uniprop("x", "Arabic_Letter") | 10:49 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«Lā¤Ā» | ||
jnthn | m: say uniprop("x", "Line_Break") | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«BKā¤Ā» | ||
jnthn | m: say uniprop("x", "Bidi_Class") | 10:50 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«Lā¤Ā» | ||
jnthn | m: say uniprop("&", "Bidi_Class") | 10:51 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«ONā¤Ā» | ||
jnthn | m: say uniprop("&", "Line_Break") | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«BKā¤Ā» | ||
jnthn | m: say uniprop($_, "Line_Break") for ^64 | 10:53 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤BKā¤LFā¤BKā¤BKā¤CRā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤CMā¤SPā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤BKā¤Bā¦Ā» | ||
timotimo | bok bok bok bok bok goes the hen | 10:57 | |
QU | |||
Ambiguous Quotation | |||
10:58 | |||
Quotation marks | |||
act like they are both opening and closing | |||
mhhh, both opening and closing | |||
that's fun :) | |||
jnthn | m: for ^0xFFFF { if uniprop($_, 'Gc') eq 'Pe' { .say; last } } | 10:59 | |
camelia | ( no output ) | ||
jnthn | m: for ^0xFFFF { if uniprop($_) eq 'Pe' { .say; last } } | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«41ā¤Ā» | ||
jnthn | m: for ^0xFFFF { if uniprop($_, 'gc') eq 'Pe' { .say; last } } | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«41ā¤Ā» | ||
jnthn | m: for ^0xFFFF { if uniprop($_, 'jg') eq 'Pe' { .say; last } } | ||
camelia | ( no output ) | ||
jnthn | m: for ^0xFFFF { if uniprop($_, 'Joining_Group') eq 'Pe' { .say; last } } | 11:00 | |
camelia | ( no output ) | ||
jnthn | m: say uniprop('x', 'Joining_Group') | 11:01 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«No_Joining_Groupā¤Ā» | ||
jnthn | m: say ^0xFFFF .map({ uniprop($_, 'Joining_Group') }.uniq | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«===SORRY!=== Error while compiling <tmp>ā¤Unable to parse expression in argument list; couldn't find final ')' ā¤at <tmp>:1ā¤------> ap({ uniprop($_, 'Joining_Group') }.uniqā<EOL>ā¤Ā» | ||
jnthn | m: say ^0xFFFF .map({ uniprop($_, 'Joining_Group') }).uniq | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«No such method 'uniq' for invocant of type 'Seq'ā¤ in block <unit> at <tmp> line 1ā¤ā¤Ā» | ||
jnthn | m: say ^0xFFFF .map({ uniprop($_, 'Joining_Group') }).unique | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«(No_Joining_Group YEH ALEF WAW BEH TEH MARBUTA HAH DAL REH SEEN SAD TAH AIN GAF FARSI YEH FEH QAF KAF LAM MEEM NOON HEH SWASH KAF NYA KNOTTED HEH HEH GOAL TEH MARBUTA GOAL YEH WITH TAIL YEH BARREE ALAPH BETH GAMAL DALATH RISH HE SYRIAC WAW ZAIN HETH TETH Yā¦Ā» | ||
jnthn | m: for ^0xFFFF { if uniprop($_, 'Joining_Group') eq 'PE' { .say; last } } | 11:02 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«1830ā¤Ā» | ||
nwc10 | do any of these bugs summon Cthulu by accident? | ||
jnthn | m: say chr(41) ~~ /<:Pe>/ | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«ļ½¢)ļ½£ā¤Ā» | ||
jnthn | m: say chr(41) ~~ /<:pe>/ | 11:03 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«ļ½¢)ļ½£ā¤Ā» | ||
jnthn | m: say chr(41) ~~ /<:PE>/ | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«Nilā¤Ā» | ||
jnthn | m: say chr(1830) ~~ /<:PE>/ | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«Nilā¤Ā» | ||
jnthn | m: say chr(1830) ~~ /<:Pe>/ | ||
camelia | rakudo-moar c01fc3: OUTPUTĀ«Nilā¤Ā» | ||
jnthn | The problem seems to boil down to "what should the <:Foo> form mean", because it appears to be ambiguous at the moment | 11:06 | |
At present, we take the property, and map it back to a Unicode property name | 11:07 | ||
Ah, here's the bytecode: | 11:08 | ||
00074 const_s loc_31_str, 'Foo' | |||
00075 unipropcode loc_30_int, loc_31_str | |||
00076 unipvalcode loc_29_int, loc_30_int, loc_31_str | |||
00077 hasuniprop loc_28_int, loc_6_str, loc_7_int, loc_30_int, loc_29_int | |||
So, we then use the property we mapped it to, to grab a Unicode property value code | 11:09 | ||
We then ask if the string we're matching (6_str) and the given location (7_int) has that property/value | 11:10 | ||
In the long form like :PropName<Value> there's no ambiguity | 11:13 | ||
timotimo | did they add more aliases so that we now have ambiguities? | ||
and thus our test suite has become bogus? | |||
jnthn | It doesn't seem so | 11:14 | |
timotimo | strange that it'd explode now, then :\ | 11:15 | |
hey, we can constant-fold unipropcode and probably also unipvalcode! | |||
jnthn | Indeed | ||
Well, in spesh at least | |||
timotimo | that'll help all those scripts that spend 50% of their cpu time inside unicode_db! | 11:16 | |
yes, otherwise we'd bind precompiled scripts to the unicode version they were compiled with | |||
jnthn | Thing is, if I reverse the order of the loop in generate_property_codes_by_names_aliases then I get more things busted | 11:17 | |
Meaning we were relying on, so far as I can tell, a completely aribitrary mechanism for resolving the conflcits | |||
*conflicts | |||
timotimo | ugh, that's a really bad "decision" :) | 11:18 | |
jnthn | Wow, I just put the Unicode 8 DB back in with my consistent output fix and it also fails that space test | 11:24 | |
timotimo | o_O | 11:25 | |
timotimo reconsiders bragging about moarvm's unicode support | |||
jnthn | Well, it boils down to a lang design issue too, I think | 11:26 | |
The what...even if I back out my changes to sort things and feed it the Unicode 8 DB it still fails the one test | |||
timotimo | not having the decision codified yet what a uniprop in a regex really means, yes? | 11:27 | |
jnthn | Something like | 11:28 | |
jnthn tries running ucd2c from master on the Unicode 8 DB | |||
huh, that passes | 11:29 | ||
Aha, but applying my "produce deterministic output" patch busts it | 11:34 | ||
lunch & | 11:44 | ||
nwc10 | jnthn++ | ||
jnthn | Back | 12:18 | |
OK, so | |||
On master, with the Unicode 8 database, I did 3 runs | |||
of ucd2c.pl, make -j install, run the spectest | |||
first 2 times pased, third failed | 12:19 | ||
So we had the problem all along | |||
It's just that hash ordering sometimes hides it | 12:20 | ||
For all the cases we have spectests for | |||
Do it 3 times with the Unicode 9 DB, and I got pass, fail, pass | 12:23 | ||
nwc10 | but will it be fixed before ilmari returns from lunch? :-) | 12:24 | |
ilmari | nwc10: I'd have to go for lunch first... | 12:25 | |
jnthn | :P | ||
Now trying this with 24742b1024 | |||
pass, fail, fail | 12:27 | ||
So it can pass with the Unicode 9 DB + NFG tweaks | |||
nwc10 | ilmari: I think that you're giving him an unfair advantage | 12:28 | |
jnthn | .oO( How many more re-gens until it passes again... :) ) |
12:29 | |
bah, I shoulda stashed the passing one :P | |||
jnthn would kinda like to get a clarification from TimToady on what we'd like <:Foo> to do | 12:30 | ||
S05 doesn't really make things much clearer | |||
Hurrah, I have a passing one. | 12:32 | ||
So in the meantime, we can have our Unicode 9 DB bump, more by luck than judgement | |||
The situation doesn't get any worse | |||
But urgh. | 12:33 | ||
The other bad news is that NFG will need a more extensive bunch of changes to handle...yes, *emoji*. | 12:34 | ||
nwc10 | does Unicode actually have suitable joiners yet to express jumping the shark? | 12:35 | |
jnthn | Since the grapheme boundary algorithm seems to have become something that needs more than just looking at 2 chars and being able to decide if there is a grapheme boundary between them | ||
Wouldn't surprise me :P | 12:36 | ||
m: say "\c[SHARK]" | |||
camelia | rakudo-moar c01fc3: OUTPUTĀ«===SORRY!=== Error while compiling <tmp>ā¤Unrecognized character name SHARKā¤at <tmp>:1ā¤------> say "\c[SHARKā]"ā¤Ā» | ||
jnthn | jnthn@lviv:~/dev/rakudo$ ./perl6-m -e 'say "\c[SHARK]"' | ||
š¦ | |||
Unicode 9 does, however, add a shark :P | |||
Seems there's no chars with JUMP in their name | 12:39 | ||
dalek | arVM: a19dc90 | jnthn++ | / (2 files): Explicitly handle ZWJ in grapheme break. This is in preparation for updating to the Unicode 9 UCD, which takes the Grapheme_Extend property away from Zero Width Joiner. |
12:41 | |
MoarVM: 9b39aa5 | jnthn++ | src/strings/unicode_ (2 files): | |||
MoarVM: Update to the Unicode 9 database. | |||
MoarVM: | |||
MoarVM: Note that while this gives us support for the various new chars in | |||
MoarVM: Unicode 9, we'll need updates to NFG to fully handle the emoji rules | |||
MoarVM: in grapheme break handling. | |||
arVM/deterministic-ucd2c: 224a261 | jnthn++ | tools/ucd2c.pl: Tweak ucd2c.pl to produce deterministic output. Before, the output produced depended on hash ordering, which meant two runs on the exact same Unicode database could produce wildly different output. This doesn't help when trying to debug. |
12:58 | ||
jnthn | So, I'll leave that cleanup in a branch for now, until the lang design issue gets a decision/resolution. | ||
ilmari | jnthn: but there doesn't seem to be a water skier emoji | 12:59 | |
the closest is surfer, I guess | |||
SPEEDBOAT+ZJW+SURFER+ZJW+SHARK | 13:00 | ||
jnthn | Nice :) | 13:01 | |
ilmari is disappointed there's no WHOLE PIZZA, only SLICE OF PIZZA | 13:03 | ||
13:03
domidumont joined
|
|||
ilmari | and on that note, lunchtime | 13:03 | |
nwc10 | \o/ | ||
ilmari | m: say "\c[SLICE OF PIZZA]" xx 8 | 13:04 | |
camelia | rakudo-moar c01fc3: OUTPUTĀ«(š š š š š š š š)ā¤Ā» | ||
ilmari | although the slices I intend to eat are rectangular | ||
nwc10 | ZWJ is the anti-pizza-wheel? | 13:05 | |
ilmari | pizza glue | ||
nwc10 | jnthn: `git describe` in MoarVM gives 2016.09-3-g9b39aa5 | 13:07 | |
your NQP bump wants 4 | |||
nwc10 is surprised that Travis hasn't been on a drive-by yet | 13:08 | ||
jnthn | It did in the other channel | ||
(And fixed) | 13:17 | ||
I'll leave the NFG changes for another day. | |||
brrt | jnthn++ | 13:19 | |
jnthn | rt.perl.org/Ticket/Display.html?id=125978 is "intersting" | 14:59 | |
gist.github.com/jnthn/7d2cefda9d88...a62e6c7b4f is from valgrind | |||
I'm not quite sure how this state can ever arise | 15:00 | ||
Of course, it doesn't seem to show up anything like so easily with a debug build | 15:34 | ||
Ah, got a related but different valgrind error out of it now | 15:42 | ||
These are sorta suggestive that a thread is running while another is GCing, which would be bizzare. | 15:56 | ||
But it's hard to see how: | |||
if (ctx->arg_flags) { | |||
/* Free the generated flags. */ | |||
MVM_free(ctx->arg_flags); | |||
ctx->arg_flags = NULL; | |||
Could lead to a double free otherwise :S | |||
The traces clearly say the free in question ran | 15:57 | ||
And the NULLing must happen right after it | |||
16:04
Ven_ joined
16:10
Ven_ joined
|
|||
jnthn suspects he won't fully track this down today... | 16:15 | ||
Time for some rest now, anyway. | |||
17:16
FROGGS joined
17:34
domidumont joined
18:01
Ven_ joined
18:21
Ven_ joined
18:41
Ven_ joined
19:01
Ven_ joined
19:08
wrl joined
|
|||
wrl | hey, i'm evaluating languages for embedding inside of a larger C app (ala blender). how's moar's embedding API these days? | 19:09 | |
timotimo | there isn't a stable api for embedding | 19:12 | |
wrl | gotcha. is that planned? | ||
timotimo | not sure | 19:13 | |
it's quite possible to just run a moar instance and communicate with the script running inside it via normal IPC things like sockets or pipes | 19:14 | ||
wrl | unfortunately the larger app relies on being able to share wrapped C data structure around | ||
timotimo | oh, i meant run the instance in your own program | 19:15 | |
then you can use NativeCall and friends to get at shared data structures | |||
wrl | ah right i see | ||
that seems like it could get very obtuse, unfortunately | 19:16 | ||
timotimo | wouldn't know until i tried it, tbh | 19:19 | |
19:21
Ven_ joined
19:32
patrickz joined
19:41
Ven_ joined
20:00
Ven_ joined
20:21
Ven_ joined
20:40
Ven_ joined
20:56
Ven_ joined
|