01:51 colomon joined 06:25 brrt joined
brrt \o 06:42
06:48 FROGGS joined 07:01 zakharyas joined, Ven joined 07:06 lizmat joined 07:21 lizmat joined
dalek arVM: a4f97a5 | jnthn++ | src/strings/nfg. (2 files):
Start stashing NFG synthetics into a table.

We don't put them into a trie yet or do lookups there, meaning that we currently always create a new synthetic all the time. This will do for initial round-trip tests.
07:44
arVM: 9aaf34a | jnthn++ | src/strings/nfg. (2 files):
Add function for looking up NFG synthetic info.
07:54
arVM: 3123cd5 | jnthn++ | src/ (2 files):
Make code point iterator iterate synthetics.
arVM: 78f8c85 | jnthn++ | src/strings/normalize.c:
Fix bug in grapheme composition algorithm.

Forgot to slide codepoints in the buffer beyond those compressed into synthetics backwards.
07:55
arVM: 6c8bd19 | jnthn++ | src/strings/normalize.c:
Implement NFG string -> NFD/NFKC/NFKD codes.

Probably not a highly likely path, but thankfully rather easy to do with the pieces we already have.
08:19
nwc10 is there a good way to dump the MoarVM bytecode to text, to be able to diff it? 09:10
FROGGS --dump? 09:12
nwc10 FROGGS: thanks. That looks like a good start. 09:14
FROGGS yes, if some information is missing we should add it
jnthn It includes most things.
nwc10 something, somewhere, somehow, seems to be behaving differently if the serialised size of string table offsets changes 09:15
*other* than the thingy lazy that is cruel and unforgiving.
and valgrind can't find it.
nor can ASAN
anyway, I'm back to work fun 09:16
bad news BeOS users, you'll have to stick to apache 2.2, as 2.4 removed the relevant MPM
OS/2 and Netware still OK: httpd.apache.org/docs/current/mpm.html 09:17
jnthn: ASAN remains silent on master/master/nom 09:31
jnthn \o/
Checking that is the case is one of the reasons I bumped :) 09:32
09:53 Ven joined 10:00 brrt joined 10:07 donaldh joined
nwc10 jnthn: you haven't broken Power64. Keep trying... 10:40
jnthn Working on it :) 10:41
10:48 rurban joined 11:13 dalek joined 11:18 colomon joined
dalek arVM: fae0069 | jnthn++ | src/strings/nfg.c:
Implement re-use of synthetic graphemes.

This is needed to get string equality correct. We use a partially lock-free trie to achieve this. Having read the base of the trie, any thread is free to traverse it without having to acquire a lock
  (that is, reads are always lock free). Only additions need the lock,
and it exists to serialize the additions. We always copy nodes that are modified, and never modify things in place; we schedule the memory to be freed at the next global safe point (at which point we know that no thread could possibly be reading any version of the trie).
11:30
jnthn Passes 500 test cases. :) 11:32
Next up: lunch. Then some of the fancier stuff that needs fixing for NFG to work out. 11:37
(case change, char classes, etc.)
12:00 brrt joined
jnthn back 12:27
nwc10 no-one broke anything while you were away. 12:29
brrt \o jnthn 12:39
jnthn o/ brrt 12:42
brrt is very excited about the possibility of a long stretch of moarvm hacking 12:46
FROGGS I'd be excited if I'd know hot to tackle my (de)serialization bug... 12:47
how* 12:48
nwc10 what is *your* deserialization bug?
FROGGS I changed rakudo to use the nqp::(de)serialize ops instead of using json... github.com/rakudo/rakudo/commit/52...7a41421488 12:49
I can bootstrap panda just fine, but as soon as I install another module, my serialized blob gets invalid
nwc10 OK, I have no idea about that sort of thing 12:50
FROGGS nwc10: but but... it is based on C (for some definition of based on) 12:51
that is perhaps just one problem: gist.github.com/FROGGS/c6d637b32e4665ec3882
nwc10 you said "gets invalid"
I don't know about that
12:57 rurban joined
dalek arVM: 350212e | jnthn++ | src/strings/ (3 files):
Basic implementation of case change with NFG.

This should work out for most cases - but I fear there's somewhere in Unicode where something has a precomposed uppercase but there is no such precomposed lowercase, or vice versa. Those cases will wrongly produce a synthetic in one direction now.
13:15
jnthn If anyone knows any such cases off the top of their head, I'd be interested to know 'em. 13:17
oh... 13:19
SpecialCasing.txt in the Unicode database has 'em.
FROGGS m: say "\c[LATIN CAPITAL LETTER SHARP S]" 13:20
camelia rakudo-moar 5cfddf: OUTPUTĀ«įŗžā¤Ā»
FROGGS what does that produce as lowercase?
though, that has nothing todo with composition :/
jnthn m: say "\c[LATIN CAPITAL LETTER SHARP S]".lc
camelia rakudo-moar 5cfddf: OUTPUTĀ«ĆŸā¤Ā»
jnthn m: say uniname ord "\c[LATIN CAPITAL LETTER SHARP S]".lc
camelia rakudo-moar 5cfddf: OUTPUTĀ«LATIN SMALL LETTER SHARP Sā¤Ā»
FROGGS m: say "\c[LATIN CAPITAL LETTER SHARP S]".lc.uc 13:21
camelia rakudo-moar 5cfddf: OUTPUTĀ«ĆŸā¤Ā»
jnthn m: say uniname ord "\c[LATIN CAPITAL LETTER SHARP S]".lc.uc
camelia rakudo-moar 5cfddf: OUTPUTĀ«LATIN SMALL LETTER SHARP Sā¤Ā»
jnthn That one should become SS by Unicode spec
[Coke] jnthn: there is a fudged esset test which hopefully you can now resolve. :)
jnthn [Coke]: esset?
FROGGS jnthn: I hope the unicode spec changes before we fix that :o)
eszet
[Coke] Ɵ 13:22
jnthn [Coke]: Well, I didn't implement the special casing rules yet
FROGGS m: say "SS" ~~ /:i Ɵ/
camelia rakudo-moar 5cfddf: OUTPUTĀ«Nilā¤Ā»
[Coke] 121377
jnthn Just realized that when I do, they interact with NFG.
FROGGS I hat the Eszet fwiw
so, dont care for that, it is insane anyway 13:23
[Coke] jnthn: right, just letting you know there's unfudgeable stuff when you get there. :)
jnthn [Coke]: Aye, thanks. There's an RT also iirc :) 13:24
Here's the entry in SpecialCasing.txt for those curious: 13:25
# The German es-zed is special--the normal mapping is to SS.
# Note: the titlecase should never occur in practice. It is equal to titlecase(uppercase(<es-zed>))
00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
nwc10 greek final sigma... 13:26
jnthn Joy :)
And also in the same file
nwc10 anyway, took the Unicode consortium a long time to realise that they didn't have an answer for what the capturing groups should contain in (their idea for) "Ɵ" =~ s/(s)(s)/i 13:27
so don't assume that everything is even implementable.
they are human.
jnthn What does Perl 5 do there, ooc?
nwc10 (and I have some inside knowledge that I'm not quoting anywhere logged about how some of them really are human) 13:28
FROGGS $ perl -E 'use utf8; say "SS" =~ /([Ɵ])/i; say length $1'
SS
2
nwc10 um, well, perl 5 always treated //i as just "lowercase"
Unicode had wanted the "case insensitive" matching to be on case folded, not just lowercase
Ɵ case folds to "ss"
(this is all from memory, so might be contradicted by [citation needed]) 13:29
so Perl 5 wasn't doing what they wanted things to move to (long term)
it just turned out that because no-one anywhere had tried to implement that "long term"
that when you added capturing groups to the mix
(ie not just "does it match, Y/N?"
but also "what bits matched")
that the "bits matched" became a bit tricky to actually answer. 13:30
at all.
jnthn realizes we have string bitwise opertaions that will need careful handling with synthetics too :) 13:32
nwc10 um, do we have a definition for what the "stringwise not" of a code point is? 13:33
eg, how many bits long is the logical not of "A" 13:35
FROGGS m: say ~^'A' 13:36
camelia rakudo-moar 5cfddf: OUTPUTĀ«prefix:<~^> NYIā¤ in block <unit> at /tmp/svYs6QVxTG:1ā¤ā¤Ā»
FROGGS I guess we don't have to care about that either at this point... does anybody know what to expect?
jnthn has no idea :) 13:37
FROGGS see
so if there are no expectations and probably no use case...
dalek arVM: 633a11c | jnthn++ | src/strings/ops.c:
Add a missing MVMROOT.
14:09
arVM: e42476f | jnthn++ | src/strings/ops.c:
Fix confusing whitespace.
arVM: b95028f | jnthn++ | src/strings/ops.c:
cclass check on synthetic uses base codepoint
14:11
arVM: c8aaed9 | jnthn++ | src/strings/ops.c:
Don't use cp as a variable name for a MVMGrapheme.

Yes, this sin is replicated in various bits of the codebase and wants a clear up.
jnthn typos MVMCodepoint as MVMCodepint, and wonders if it's nearly pub time yet. :) 14:13
dalek arVM: c2b818d | jnthn++ | src/strings/ops.c:
Unicode property checks on synthetics use base.
14:16
nwc10 use more 'beer fridge'? 14:20
dalek arVM: 8bb5da8 | jnthn++ | src/strings/ops.c:
Fix typos.
14:21
nwc10 is pub time a reward, or a "step away from the keyboard"? :-) 14:33
jnthn Also a "have dinner" :) 14:34
It's not really time yet though.
FROGGS it is always a good time to eat something 14:35
jnthn Though that's probably enough NFG for today. Will take a keyboard break for a bit, and then look at some RTs. :)
I certainly feel like the core of NFG is there now; the rest is making places that aren't aware of it be so, and also asking TimToady hard questions about what we want in various cases :) 14:37
Oh, and plumbing it into I/O. :)
Well, I. It's already in O. :) 14:38
nwc10 everything, in O(1) time and space? :-)
jnthn Probably :P
anyway, bbi10 14:39
Darn, back up to 675 RTs... 14:59
[Coke] is reminded to start converting misc todo's into RTs again. 15:00
nwc10 IIRC Perl 5 is steady state around 1300 to 1400, so Perl 6 is halfway respectable these days :-) 15:03
[Coke] m: "\c[LATIN CAPITAL LETTER A WITH DOT ABOVE, COMBINING DOT BELOW]".codes.say 15:05
camelia rakudo-moar 958ffb: OUTPUTĀ«2ā¤Ā»
jnthn [Coke]: We only NFG things explicilty constructed from a Uni so far. 15:06
Well, NF-whatever really 15:07
[Coke] jnthn: just autounfudging. that test is currently skipped.
jnthn Ah :)
Well, it's the right answer for .codes at least :)
Uh, I think so, anyway :)
15:17 btyler joined
TimToady obviously they should've just added an 'ss' ligature instead of overloading ASCII 15:49
15:51 colomon joined
nwc10 TimToady: I'm not convinced about that, but offhand I can't remember what the rules are for matching "ļ¬€" and similar ligatures 17:05
in the "phone book" ordering, "Ɵ" sorts as "ss"
"Ƅ" as "AE" (etc)
17:07 vendethiel joined
TimToady that's not matching, that's collation, which is also known to be insane :) 17:08
nwc10 yes, mmm "Ƅ" isn't expected to match "AE"
so why is Ɵ special snowflake? :-) 17:09
me-- # did not prime the beer fridge 17:12
17:46 FROGGS joined
nwc10 jnthn: EPIC non-fail. Try harder. 18:09
TimToady wonders why the comment strings just repeat the hexcodes rather than showing the actual characters, which would be more interesting and informative, IHHO 18:22
japhb TimToady++ # Remembering to change even acronyms to match third-person emoting 18:37
18:57 brrt joined
brrt \o 18:57
i have a cunning plan
to implement spesh-level tracing
key ingredients - we don't need to trace all instructions ever, we can just trace entry of basic blocks 18:59
we can do this by inserting trace logging statements at bb entry 19:00
then we need to make sure that all callee's also have trace logging inserted 19:02
preferably even when they've been speshed before
the one tricky bit are invokish ops 19:15
because their invocation is more or less hidden 19:21
19:29 AndChat|228864 joined
JimmyZ_ brrt: I think we still need to trace every instruction to trace loops 19:32
brrt you don't
that's the beauty of it
because they are basic blocks
you can't actually leave them :-) 19:33
(within the block)
JimmyZ_ and consider some loop optimistion
brrt so if you enter the block, you *will* execute all of them
JimmyZ_ oh , you meant bb
brrt aye
JimmyZ_ not perl6 block... 19:34
brrt no indeed, a perl6 block is quite a larger construt
construct
JimmyZ_ so luajit is tracing the loop ? 19:35
by some tags?
brrt i don't know how luajit does it 19:36
JimmyZ_ anyway, sleep time, 03:36 am here.
brrt sleep well
JimmyZ_ I was always thinking how luajit is doing, consider it has many advanced optimizitions . 19:38
good night.
19:53 lizmat joined
jnthn brrt: Yeah, the block/invokish level tracing is what I'd had in mind. 19:58
brrt much cheaper than adding a check on every opcode i'd think 19:59
jnthn TimToady: (comment strings) 'cus my piece of crap terminal will likely copy-paste them wrong to my editor, slowing me down in debugging stuff, so largely they're optimized for me getting stuff done. :)
brrt: Indeed. :)
brrt tracing is such an awesome optimisation 20:00
it's supercharged inlining
jnthn Aye, but like all optimizations, it's a trade-off. :)
brrt right. if your code is not actually tracy it's really costly to deopt all the time 20:01
jnthn nwc10: (try harder) I implemented a partially lock-free trie, that shoulda been advanced enough to screw up, dammit :P 20:02
nwc10 jnthn: would it make sense to generate 2 lines - one with comment strings in hex, and one with the real characters? 20:15
I think that having the hex (or U+....) notation around makes a lot of sense, as it's unambiguous and really hard for text editors to screw up 20:16
jnthn nwc10: To me right now? Not really, the tests have served their purpose in getting me to something that seems to behave sanely, so I've not a huge incentive to spend more time on them. ;)
nwc10 ah. and TimToady has a commit bit? :-) 20:17
TimToady the parts that matter programmatically are already in hex
I don't care about those; I'm just curious what the actual character are :) 20:18
jnthn They're all just things with a non-zero Canonical_Combining_Class to me... :) 20:19
nwc10 jnthn: I think I have a livelocked moar again 20:20
it's running t/spec/S17-procasync/basic.rakudo.moar
it's here:
#0 AO_load_acquire (addr=0x1000f420378) at 3rdparty/libatomic_ops/src/atomic_ops/sysdeps/gcc/powerpc.h:91
#1 0x00003fffa92f6818 in MVM_gc_enter_from_allocator (tc=0x1000f4206c0) at src/gc/orchestrate.c:378
paste.scsys.co.uk/473116 20:21
I know nothing about gdb and threads debugging 20:22
but TFM suggests that `info threads` is a good start, and we have 6 threads
jnthn: all 6 backtraces paste.scsys.co.uk/473122 20:25
jnthn nwc10: ooh, thanks :) 20:26
nwc10 I've ^Z ed the process, so it's not chewing CPU 20:27
and awaits further probing
jnthn nwc10: I guess this is hard to reliably reproduce?
nwc10 seems to happen at random less than 10% of the time 20:28
[Coke] Wonder if this is related to the flapping S17 tests we see in the dailies. 20:30
jnthn Mebbe, though those don't deadlock 20:31
nwc10 backtrace is for MoarVM at 8bb5da80f2c072be49ffa6f75c2814a6b47dd381
nqp at 53d43e830ecbf34a1dbf9f6b1597f0d57fa540a3 20:32
rakudo at b62929991faecf1ae38fc5e0b6d2dd0a675d3187
on gcc110 as described here gcc.gnu.org/wiki/CompileFarm 20:33
gcc110 2TB 4x16x3.55 GHz IBM POWER7 / 64 GB RAM / IBM Power 730 Express server / Fedora 18 ppc64
gcc112 is more bonkers still
oh gosh, I'm asleep. gcc112 is also ppc64le 20:34
oooh, it's funky enough that ccache won't build from source 20:40
fedora-- # "perl" - you keep using that package name. I do not think that it means what you think that it means. 20:42
21:41 brrt left, brrt joined 22:58 vendethiel joined 23:22 vendethiel joined