01:51
colomon joined
06:25
brrt joined
|
|||
brrt | \o | 06:42 | |
06:48
FROGGS joined
07:01
zakharyas joined,
Ven joined
07:06
lizmat joined
07:21
lizmat joined
|
|||
dalek | arVM: a4f97a5 | jnthn++ | src/strings/nfg. (2 files): Start stashing NFG synthetics into a table. We don't put them into a trie yet or do lookups there, meaning that we currently always create a new synthetic all the time. This will do for initial round-trip tests. |
07:44 | |
arVM: 9aaf34a | jnthn++ | src/strings/nfg. (2 files): Add function for looking up NFG synthetic info. |
07:54 | ||
arVM: 3123cd5 | jnthn++ | src/ (2 files): Make code point iterator iterate synthetics. |
|||
arVM: 78f8c85 | jnthn++ | src/strings/normalize.c: Fix bug in grapheme composition algorithm. Forgot to slide codepoints in the buffer beyond those compressed into synthetics backwards. |
07:55 | ||
arVM: 6c8bd19 | jnthn++ | src/strings/normalize.c: Implement NFG string -> NFD/NFKC/NFKD codes. Probably not a highly likely path, but thankfully rather easy to do with the pieces we already have. |
08:19 | ||
nwc10 | is there a good way to dump the MoarVM bytecode to text, to be able to diff it? | 09:10 | |
FROGGS | --dump? | 09:12 | |
nwc10 | FROGGS: thanks. That looks like a good start. | 09:14 | |
FROGGS | yes, if some information is missing we should add it | ||
jnthn | It includes most things. | ||
nwc10 | something, somewhere, somehow, seems to be behaving differently if the serialised size of string table offsets changes | 09:15 | |
*other* than the thingy lazy that is cruel and unforgiving. | |||
and valgrind can't find it. | |||
nor can ASAN | |||
anyway, I'm back to work fun | 09:16 | ||
bad news BeOS users, you'll have to stick to apache 2.2, as 2.4 removed the relevant MPM | |||
OS/2 and Netware still OK: httpd.apache.org/docs/current/mpm.html | 09:17 | ||
jnthn: ASAN remains silent on master/master/nom | 09:31 | ||
jnthn | \o/ | ||
Checking that is the case is one of the reasons I bumped :) | 09:32 | ||
09:53
Ven joined
10:00
brrt joined
10:07
donaldh joined
|
|||
nwc10 | jnthn: you haven't broken Power64. Keep trying... | 10:40 | |
jnthn | Working on it :) | 10:41 | |
10:48
rurban joined
11:13
dalek joined
11:18
colomon joined
|
|||
dalek | arVM: fae0069 | jnthn++ | src/strings/nfg.c: Implement re-use of synthetic graphemes. This is needed to get string equality correct. We use a partially lock-free trie to achieve this. Having read the base of the trie, any thread is free to traverse it without having to acquire a lock (that is, reads are always lock free). Only additions need the lock, and it exists to serialize the additions. We always copy nodes that are modified, and never modify things in place; we schedule the memory to be freed at the next global safe point (at which point we know that no thread could possibly be reading any version of the trie). |
11:30 | |
jnthn | Passes 500 test cases. :) | 11:32 | |
Next up: lunch. Then some of the fancier stuff that needs fixing for NFG to work out. | 11:37 | ||
(case change, char classes, etc.) | |||
12:00
brrt joined
|
|||
jnthn back | 12:27 | ||
nwc10 | no-one broke anything while you were away. | 12:29 | |
brrt | \o jnthn | 12:39 | |
jnthn | o/ brrt | 12:42 | |
brrt is very excited about the possibility of a long stretch of moarvm hacking | 12:46 | ||
FROGGS | I'd be excited if I'd know hot to tackle my (de)serialization bug... | 12:47 | |
how* | 12:48 | ||
nwc10 | what is *your* deserialization bug? | ||
FROGGS | I changed rakudo to use the nqp::(de)serialize ops instead of using json... github.com/rakudo/rakudo/commit/52...7a41421488 | 12:49 | |
I can bootstrap panda just fine, but as soon as I install another module, my serialized blob gets invalid | |||
nwc10 | OK, I have no idea about that sort of thing | 12:50 | |
FROGGS | nwc10: but but... it is based on C (for some definition of based on) | 12:51 | |
that is perhaps just one problem: gist.github.com/FROGGS/c6d637b32e4665ec3882 | |||
nwc10 | you said "gets invalid" | ||
I don't know about that | |||
12:57
rurban joined
|
|||
dalek | arVM: 350212e | jnthn++ | src/strings/ (3 files): Basic implementation of case change with NFG. This should work out for most cases - but I fear there's somewhere in Unicode where something has a precomposed uppercase but there is no such precomposed lowercase, or vice versa. Those cases will wrongly produce a synthetic in one direction now. |
13:15 | |
jnthn | If anyone knows any such cases off the top of their head, I'd be interested to know 'em. | 13:17 | |
oh... | 13:19 | ||
SpecialCasing.txt in the Unicode database has 'em. | |||
FROGGS | m: say "\c[LATIN CAPITAL LETTER SHARP S]" | 13:20 | |
camelia | rakudo-moar 5cfddf: OUTPUTĀ«įŗā¤Ā» | ||
FROGGS | what does that produce as lowercase? | ||
though, that has nothing todo with composition :/ | |||
jnthn | m: say "\c[LATIN CAPITAL LETTER SHARP S]".lc | ||
camelia | rakudo-moar 5cfddf: OUTPUTĀ«Ćā¤Ā» | ||
jnthn | m: say uniname ord "\c[LATIN CAPITAL LETTER SHARP S]".lc | ||
camelia | rakudo-moar 5cfddf: OUTPUTĀ«LATIN SMALL LETTER SHARP Sā¤Ā» | ||
FROGGS | m: say "\c[LATIN CAPITAL LETTER SHARP S]".lc.uc | 13:21 | |
camelia | rakudo-moar 5cfddf: OUTPUTĀ«Ćā¤Ā» | ||
jnthn | m: say uniname ord "\c[LATIN CAPITAL LETTER SHARP S]".lc.uc | ||
camelia | rakudo-moar 5cfddf: OUTPUTĀ«LATIN SMALL LETTER SHARP Sā¤Ā» | ||
jnthn | That one should become SS by Unicode spec | ||
[Coke] | jnthn: there is a fudged esset test which hopefully you can now resolve. :) | ||
jnthn | [Coke]: esset? | ||
FROGGS | jnthn: I hope the unicode spec changes before we fix that :o) | ||
eszet | |||
[Coke] | Ć | 13:22 | |
jnthn | [Coke]: Well, I didn't implement the special casing rules yet | ||
FROGGS | m: say "SS" ~~ /:i Ć/ | ||
camelia | rakudo-moar 5cfddf: OUTPUTĀ«Nilā¤Ā» | ||
[Coke] | 121377 | ||
jnthn | Just realized that when I do, they interact with NFG. | ||
FROGGS | I hat the Eszet fwiw | ||
so, dont care for that, it is insane anyway | 13:23 | ||
[Coke] | jnthn: right, just letting you know there's unfudgeable stuff when you get there. :) | ||
jnthn | [Coke]: Aye, thanks. There's an RT also iirc :) | 13:24 | |
Here's the entry in SpecialCasing.txt for those curious: | 13:25 | ||
# The German es-zed is special--the normal mapping is to SS. | |||
# Note: the titlecase should never occur in practice. It is equal to titlecase(uppercase(<es-zed>)) | |||
00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S | |||
nwc10 | greek final sigma... | 13:26 | |
jnthn | Joy :) | ||
And also in the same file | |||
nwc10 | anyway, took the Unicode consortium a long time to realise that they didn't have an answer for what the capturing groups should contain in (their idea for) "Ć" =~ s/(s)(s)/i | 13:27 | |
so don't assume that everything is even implementable. | |||
they are human. | |||
jnthn | What does Perl 5 do there, ooc? | ||
nwc10 | (and I have some inside knowledge that I'm not quoting anywhere logged about how some of them really are human) | 13:28 | |
FROGGS | $ perl -E 'use utf8; say "SS" =~ /([Ć])/i; say length $1' | ||
SS | |||
2 | |||
nwc10 | um, well, perl 5 always treated //i as just "lowercase" | ||
Unicode had wanted the "case insensitive" matching to be on case folded, not just lowercase | |||
Ć case folds to "ss" | |||
(this is all from memory, so might be contradicted by [citation needed]) | 13:29 | ||
so Perl 5 wasn't doing what they wanted things to move to (long term) | |||
it just turned out that because no-one anywhere had tried to implement that "long term" | |||
that when you added capturing groups to the mix | |||
(ie not just "does it match, Y/N?" | |||
but also "what bits matched") | |||
that the "bits matched" became a bit tricky to actually answer. | 13:30 | ||
at all. | |||
jnthn realizes we have string bitwise opertaions that will need careful handling with synthetics too :) | 13:32 | ||
nwc10 | um, do we have a definition for what the "stringwise not" of a code point is? | 13:33 | |
eg, how many bits long is the logical not of "A" | 13:35 | ||
FROGGS | m: say ~^'A' | 13:36 | |
camelia | rakudo-moar 5cfddf: OUTPUTĀ«prefix:<~^> NYIā¤ in block <unit> at /tmp/svYs6QVxTG:1ā¤ā¤Ā» | ||
FROGGS | I guess we don't have to care about that either at this point... does anybody know what to expect? | ||
jnthn has no idea :) | 13:37 | ||
FROGGS | see | ||
so if there are no expectations and probably no use case... | |||
dalek | arVM: 633a11c | jnthn++ | src/strings/ops.c: Add a missing MVMROOT. |
14:09 | |
arVM: e42476f | jnthn++ | src/strings/ops.c: Fix confusing whitespace. |
|||
arVM: b95028f | jnthn++ | src/strings/ops.c: cclass check on synthetic uses base codepoint |
14:11 | ||
arVM: c8aaed9 | jnthn++ | src/strings/ops.c: Don't use cp as a variable name for a MVMGrapheme. Yes, this sin is replicated in various bits of the codebase and wants a clear up. |
|||
jnthn typos MVMCodepoint as MVMCodepint, and wonders if it's nearly pub time yet. :) | 14:13 | ||
dalek | arVM: c2b818d | jnthn++ | src/strings/ops.c: Unicode property checks on synthetics use base. |
14:16 | |
nwc10 | use more 'beer fridge'? | 14:20 | |
dalek | arVM: 8bb5da8 | jnthn++ | src/strings/ops.c: Fix typos. |
14:21 | |
nwc10 | is pub time a reward, or a "step away from the keyboard"? :-) | 14:33 | |
jnthn | Also a "have dinner" :) | 14:34 | |
It's not really time yet though. | |||
FROGGS | it is always a good time to eat something | 14:35 | |
jnthn | Though that's probably enough NFG for today. Will take a keyboard break for a bit, and then look at some RTs. :) | ||
I certainly feel like the core of NFG is there now; the rest is making places that aren't aware of it be so, and also asking TimToady hard questions about what we want in various cases :) | 14:37 | ||
Oh, and plumbing it into I/O. :) | |||
Well, I. It's already in O. :) | 14:38 | ||
nwc10 | everything, in O(1) time and space? :-) | ||
jnthn | Probably :P | ||
anyway, bbi10 | 14:39 | ||
Darn, back up to 675 RTs... | 14:59 | ||
[Coke] is reminded to start converting misc todo's into RTs again. | 15:00 | ||
nwc10 | IIRC Perl 5 is steady state around 1300 to 1400, so Perl 6 is halfway respectable these days :-) | 15:03 | |
[Coke] | m: "\c[LATIN CAPITAL LETTER A WITH DOT ABOVE, COMBINING DOT BELOW]".codes.say | 15:05 | |
camelia | rakudo-moar 958ffb: OUTPUTĀ«2ā¤Ā» | ||
jnthn | [Coke]: We only NFG things explicilty constructed from a Uni so far. | 15:06 | |
Well, NF-whatever really | 15:07 | ||
[Coke] | jnthn: just autounfudging. that test is currently skipped. | ||
jnthn | Ah :) | ||
Well, it's the right answer for .codes at least :) | |||
Uh, I think so, anyway :) | |||
15:17
btyler joined
|
|||
TimToady | obviously they should've just added an 'ss' ligature instead of overloading ASCII | 15:49 | |
15:51
colomon joined
|
|||
nwc10 | TimToady: I'm not convinced about that, but offhand I can't remember what the rules are for matching "ļ¬" and similar ligatures | 17:05 | |
in the "phone book" ordering, "Ć" sorts as "ss" | |||
"Ć" as "AE" (etc) | |||
17:07
vendethiel joined
|
|||
TimToady | that's not matching, that's collation, which is also known to be insane :) | 17:08 | |
nwc10 | yes, mmm "Ć" isn't expected to match "AE" | ||
so why is Ć special snowflake? :-) | 17:09 | ||
me-- # did not prime the beer fridge | 17:12 | ||
17:46
FROGGS joined
|
|||
nwc10 | jnthn: EPIC non-fail. Try harder. | 18:09 | |
TimToady wonders why the comment strings just repeat the hexcodes rather than showing the actual characters, which would be more interesting and informative, IHHO | 18:22 | ||
japhb | TimToady++ # Remembering to change even acronyms to match third-person emoting | 18:37 | |
18:57
brrt joined
|
|||
brrt | \o | 18:57 | |
i have a cunning plan | |||
to implement spesh-level tracing | |||
key ingredients - we don't need to trace all instructions ever, we can just trace entry of basic blocks | 18:59 | ||
we can do this by inserting trace logging statements at bb entry | 19:00 | ||
then we need to make sure that all callee's also have trace logging inserted | 19:02 | ||
preferably even when they've been speshed before | |||
the one tricky bit are invokish ops | 19:15 | ||
because their invocation is more or less hidden | 19:21 | ||
19:29
AndChat|228864 joined
|
|||
JimmyZ_ | brrt: I think we still need to trace every instruction to trace loops | 19:32 | |
brrt | you don't | ||
that's the beauty of it | |||
because they are basic blocks | |||
you can't actually leave them :-) | 19:33 | ||
(within the block) | |||
JimmyZ_ | and consider some loop optimistion | ||
brrt | so if you enter the block, you *will* execute all of them | ||
JimmyZ_ | oh , you meant bb | ||
brrt | aye | ||
JimmyZ_ | not perl6 block... | 19:34 | |
brrt | no indeed, a perl6 block is quite a larger construt | ||
construct | |||
JimmyZ_ | so luajit is tracing the loop ? | 19:35 | |
by some tags? | |||
brrt | i don't know how luajit does it | 19:36 | |
JimmyZ_ | anyway, sleep time, 03:36 am here. | ||
brrt | sleep well | ||
JimmyZ_ | I was always thinking how luajit is doing, consider it has many advanced optimizitions . | 19:38 | |
good night. | |||
19:53
lizmat joined
|
|||
jnthn | brrt: Yeah, the block/invokish level tracing is what I'd had in mind. | 19:58 | |
brrt | much cheaper than adding a check on every opcode i'd think | 19:59 | |
jnthn | TimToady: (comment strings) 'cus my piece of crap terminal will likely copy-paste them wrong to my editor, slowing me down in debugging stuff, so largely they're optimized for me getting stuff done. :) | ||
brrt: Indeed. :) | |||
brrt | tracing is such an awesome optimisation | 20:00 | |
it's supercharged inlining | |||
jnthn | Aye, but like all optimizations, it's a trade-off. :) | ||
brrt | right. if your code is not actually tracy it's really costly to deopt all the time | 20:01 | |
jnthn | nwc10: (try harder) I implemented a partially lock-free trie, that shoulda been advanced enough to screw up, dammit :P | 20:02 | |
nwc10 | jnthn: would it make sense to generate 2 lines - one with comment strings in hex, and one with the real characters? | 20:15 | |
I think that having the hex (or U+....) notation around makes a lot of sense, as it's unambiguous and really hard for text editors to screw up | 20:16 | ||
jnthn | nwc10: To me right now? Not really, the tests have served their purpose in getting me to something that seems to behave sanely, so I've not a huge incentive to spend more time on them. ;) | ||
nwc10 | ah. and TimToady has a commit bit? :-) | 20:17 | |
TimToady | the parts that matter programmatically are already in hex | ||
I don't care about those; I'm just curious what the actual character are :) | 20:18 | ||
jnthn | They're all just things with a non-zero Canonical_Combining_Class to me... :) | 20:19 | |
nwc10 | jnthn: I think I have a livelocked moar again | 20:20 | |
it's running t/spec/S17-procasync/basic.rakudo.moar | |||
it's here: | |||
#0 AO_load_acquire (addr=0x1000f420378) at 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/gcc/powerpc.h:91 | |||
#1 0x00003fffa92f6818 in MVM_gc_enter_from_allocator (tc=0x1000f4206c0) at src/gc/orchestrate.c:378 | |||
paste.scsys.co.uk/473116 | 20:21 | ||
I know nothing about gdb and threads debugging | 20:22 | ||
but TFM suggests that `info threads` is a good start, and we have 6 threads | |||
jnthn: all 6 backtraces paste.scsys.co.uk/473122 | 20:25 | ||
jnthn | nwc10: ooh, thanks :) | 20:26 | |
nwc10 | I've ^Z ed the process, so it's not chewing CPU | 20:27 | |
and awaits further probing | |||
jnthn | nwc10: I guess this is hard to reliably reproduce? | ||
nwc10 | seems to happen at random less than 10% of the time | 20:28 | |
[Coke] | Wonder if this is related to the flapping S17 tests we see in the dailies. | 20:30 | |
jnthn | Mebbe, though those don't deadlock | 20:31 | |
nwc10 | backtrace is for MoarVM at 8bb5da80f2c072be49ffa6f75c2814a6b47dd381 | ||
nqp at 53d43e830ecbf34a1dbf9f6b1597f0d57fa540a3 | 20:32 | ||
rakudo at b62929991faecf1ae38fc5e0b6d2dd0a675d3187 | |||
on gcc110 as described here gcc.gnu.org/wiki/CompileFarm | 20:33 | ||
gcc110 2TB 4x16x3.55 GHz IBM POWER7 / 64 GB RAM / IBM Power 730 Express server / Fedora 18 ppc64 | |||
gcc112 is more bonkers still | |||
oh gosh, I'm asleep. gcc112 is also ppc64le | 20:34 | ||
oooh, it's funky enough that ccache won't build from source | 20:40 | ||
fedora-- # "perl" - you keep using that package name. I do not think that it means what you think that it means. | 20:42 | ||
21:41
brrt left,
brrt joined
22:58
vendethiel joined
23:22
vendethiel joined
|