00:25
tokuhirom joined
01:09
tokuhirom joined
01:20
TimToady joined
02:37
vendethiel joined
02:47
ilbot3 joined
04:08
TimToady joined
04:13
KDr2 joined
05:24
ingy joined
05:36
FROGGS joined
06:41
FROGGS_ joined
06:47
FROGGS__ joined
06:57
nwc10 joined
07:44
FROGGS joined
08:38
kjs_ joined
09:24
zakharyas joined
10:12
kjs_ joined
|
|||
jnthn | .tell Hotkeys I didn't actually change how \r behaves yet, just did the groundwork. Also, unless you're building a MoarVM at HEAD, rather than the version Rakudo will pick by default, you'll not have any of my changes yet anyway. | 10:12 | |
oh, no bot | |||
Hotkeys: I didn't actually change how \r behaves yet, just did the groundwork. Also, unless you're building a MoarVM at HEAD, rather than the version Rakudo will pick by default, you'll not have any of my changes yet anyway. | |||
psch: \r\n will become a synthetic, *not* identical to \n, just to be clear. | 10:13 | ||
10:48
brrt joined
|
|||
brrt | \o | 10:49 | |
guess who had to fix... yet another compilation bug this morning? | 10:50 | ||
jnthn | brrt? :) | 10:52 | |
brrt | jnthn++ :-P | ||
apparantly, accessing the lower bytes of rsp-rdi (registers 4-7) dynamically in x64 means you *have* to add a REX byte | 10:53 | ||
jnthn | This REX byte seems to cause plenty of fun... | 10:54 | |
brrt | because otherwise the address is understood as the second byte of the lower 4 (rax-rbx) registers | ||
even if you don't actually address any of the regular extended registers | |||
'fun' | |||
why is this? GOD ONLY KNOWS | 10:55 | ||
i'm also seeing that my approach to register-allocation-over-conditional-and-call-boundaries was oversimplisitc | 10:58 | ||
as it happens, *online algorithms suck* | |||
whereby 'suck' has a technical definition meaning 'make things much more complicated and difficult to analyse' | |||
anyway... i was hoping i could get the compiler to run with just bugfixing the current setup. but the complexity of it feels like it's spiralling | 11:03 | ||
so i'm wondering if i should move the register allocator to an offline phase, as well as having the tiles in linear memory order | |||
one of the things that bothered me slightly is that if i linearize the tiles to an array, then i can't splice in spills and loads easily. | 11:05 | ||
which means one of three things: a): i linearize to something else than an array, like a linked list, which is kind of not-so-fun with regards to allocation (or i should use the spesh allocator, but I don't like that much); | 11:06 | ||
b): i add a second array which is walked next to the first array that holds spills and stores (i kind of like that solution, but it is complex) | 11:07 | ||
c): i do spills and stores inline | |||
but i'm not sure how c) interoperates with the idea of making the register allocation step an offline step | 11:08 | ||
jnthn | Hmmm | ||
brrt | (having register alloc as a offline step is basically compiler best practice, or so i've heard)\ | ||
jnthn | For (a) what's wrong with using the spesh allocator? | ||
brrt | consistency. i use the spesh allocator for nothing, and then bam!, it reappears | 11:09 | |
although...... | |||
hmmm | |||
i could actually use the spesh allocator to hold the info nodes | |||
the info node array is quite redundant as all the tree 'pointers' and constants also get a info node | 11:10 | ||
timotimo | well, the spesh allocator is good for things that are unevenly sized and that you're going to throw away completely at the end | ||
brrt | aye | ||
timotimo | so it seems like a good fit | ||
brrt | it is a good fit | ||
brrt wonders if i want random access to the tiles | |||
ok, that is a good idea, i think | 11:14 | ||
timotimo | i've heard a thousand times that linked lists never outperform arrays even if you do inserts in the middle ... or something like that | ||
because ... CACHES! | |||
jnthn | yeah but the spesh allocator sticks the nodes in order anyway :P | ||
timotimo | L1 cache can give you multiple gigs per second throughput! doesn't matter that it's still small and has to grab data from RAM at a much, much slower rate all the time | 11:15 | |
jnthn | That's why I picked that kind of design | ||
Thus a bunch of MVMSpeshIns will be continguous, unless you hit a point something was spliced in. | |||
So it's relatively cache friendly | |||
brrt | i think knuth's old quote applies really well here | 11:16 | |
timotimo: just do multiple gigs of computation on 8k of memory or so :-P | 11:17 | ||
brrt wonders if that is actually done anywhere | |||
timotimo | i'm still wondering if my idea for a Big Data product that handles "Big Data data sets as large as 100x your L1 cache effectively" will be seen as revolutionary and awesome | 11:18 | |
brrt | lol | 11:21 | |
'big' data like ... a gigabyte :-o | 11:22 | ||
fwiw, did anybody see the 'go is a poorly designed language' article anywhere? | |||
it's funny because all the things the author considers as 'poor design' are quite logical imho | |||
timotimo | oh, does it say "its types are spelled totally differently from C"? | 11:23 | |
brrt | no, actually, it doesn't | 11:24 | |
it says 'i want my negative array indexing to work like python and it doesn't :-(' | |||
while negative array indices are a huge cost in a potential fast path | |||
jnthn heard Go had attracted a bunch of Python folks, but isn't sure how accurate that is | 11:25 | ||
brrt | perhaps not when they are constant (you can constant-fold it away), but definitely variable indices | ||
yeah, well, if i were to go and blog about how python doesn't support sigils, i'd look ridiculous, no? | |||
timotimo | wouldn't that be fun? | 11:26 | |
brrt | yes. yes it would | ||
'splicing stuff out doesn't look easy' - that's because it *isn't* easy | 11:27 | ||
and cheap | |||
'declaring a variable in a new scope using shorthand notation shadows my outer scope variable' - i'm not even sure what anybody should expect | 11:28 | ||
i'm going to write an article about how go doesn't have a whateverstar and how that makes it a sucky language | 11:29 | ||
timotimo | A/B-test it against an article about python not having a whateverstar | 11:31 | |
brrt | that... | 11:32 | |
is an excellent idea | |||
dalek | arVM: 385e498 | jnthn++ | src/strings/normalize.c: Make NFG algorithm use Unicode Grapheme Clusters. As described in Annex #29. We do all of it except the CRLF case, as enabling that even breaks our ability to parse Perl 6 code (will need to figure out why). Aside from the CRLF case, though, we now pass all the Unicode grapheme boundary tests (that is, we get the .chars that are expected). |
12:13 | |
arVM: 82f93f7 | jnthn++ | src/strings/normalize.h: Toss #define we ended up not needing. |
12:14 | ||
arVM: 3519077 | jnthn++ | src/strings/normalize.h: Slightly simplify a conditional. |
|||
jnthn | Time to see what the spectest fallout of the NFG algorithm change will be... :) | 12:15 | |
nwc10 | spectests weren't clean before | 12:19 | |
lizmat | yeah, were 4 files failing for me yesterday eve | ||
nwc10 | they are in a state of sin a bit too often for my liking | ||
jnthn | With MOAR_REVISION or with master? | 12:20 | |
Seems 7 test files are vicitms of the change | 12:23 | ||
Well, have tests that are vicitms | 12:24 | ||
lunch & | 12:27 | ||
nwc10 | jnthn: "nom", I believe is the culprit | 12:38 | |
more spectests fail, but eyeballing the summary, doesn't look like anything surprising | 13:01 | ||
jnthn back | 13:06 | ||
brrt | \o jnthn | 13:09 | |
jnthn | Turns out 1 failing file was 'cus I needed to patch Str.perl (now done) | 13:19 | |
Next 2 were tests that don't make sense under NFG semantics. | |||
And that we didn't spot last time around 'cus the NFG algo was insufficient. | |||
m: say uniprop("\x1B3D", 'General_Category') | 13:30 | ||
camelia | rakudo-moar da8881: OUTPUTĀ«Mcā¤Ā» | ||
jnthn | m: say "\x1B3D".NFD | 13:31 | |
camelia | rakudo-moar da8881: OUTPUTĀ«NFD:0x<1b3c 1b35>ā¤Ā» | ||
jnthn | m: say "\x1B3D".chars | ||
camelia | rakudo-moar da8881: OUTPUTĀ«1ā¤Ā» | ||
jnthn | m: say "\x1B3D".ord | ||
camelia | rakudo-moar da8881: OUTPUTĀ«6973ā¤Ā» | ||
jnthn | wtf | ||
m: say 0x1B3D | 13:32 | ||
camelia | rakudo-moar da8881: OUTPUTĀ«6973ā¤Ā» | ||
jnthn | Locally | ||
> say "\x1B3D".ord | |||
6972 | |||
o.O | |||
m: say Uni.new(0x1B3D).Str.ord.base(16) | 13:34 | ||
camelia | rakudo-moar da8881: OUTPUTĀ«1B3Dā¤Ā» | ||
jnthn | > say Uni.new(0x1B3D).NFC | 13:35 | |
NFC:0x<1b3d> | |||
Not NFC that got busted | |||
> say Uni.new(0x1B3D).Str.ord.base(16) | |||
1B3C | |||
I don't even... | 13:36 | ||
m: say Uni.new(0x1B3D).NFD.NFC | 13:45 | ||
camelia | rakudo-moar da8881: OUTPUTĀ«NFC:0x<1b3c 1b35>ā¤Ā» | ||
jnthn | ah | 13:46 | |
brrt | what, how | ||
jnthn | m: say uniname(0x1B3D) | ||
camelia | rakudo-moar da8881: OUTPUTĀ«BALINESE VOWEL SIGN LA LENGA TEDUNGā¤Ā» | ||
jnthn | m: say uniname(0x1B3C) | ||
camelia | rakudo-moar da8881: OUTPUTĀ«BALINESE VOWEL SIGN LA LENGAā¤Ā» | ||
jnthn | m: say uniname(0x1B35) | 13:47 | |
camelia | rakudo-moar da8881: OUTPUTĀ«BALINESE VOWEL SIGN TEDUNGā¤Ā» | ||
jnthn | m: say uniprop(0x1B3C, 'General_Category') | ||
camelia | rakudo-moar da8881: OUTPUTĀ«Mnā¤Ā» | ||
jnthn | m: say uniprop(0x1B3D, 'General_Category') | ||
camelia | rakudo-moar da8881: OUTPUTĀ«Mcā¤Ā» | ||
jnthn | m: say uniprop(0x1B35, 'General_Category') | ||
camelia | rakudo-moar da8881: OUTPUTĀ«Mcā¤Ā» | ||
jnthn | Yowser. That's a bit of an interesting problem. | 13:48 | |
nwc10 | It's not at all obvious to me why (or what's wrong) | 13:49 | |
I have not read this yet: morepypy.blogspot.co.at/2015/10/pyp...us+Blog%29 | |||
jnthn | nwc10: Well, it came from a test regression | ||
brrt should probably subscribe to their blog since it is interesting | 13:50 | ||
jnthn | m: say uniprop(0x1B3D, 'NFC_QC') | ||
camelia | rakudo-moar da8881: OUTPUTĀ«Yā¤Ā» | ||
nwc10 | I'm just looking at what's on planetpython.org/ | ||
brrt | although i recently lost all my subscriptions when i forgot to copy a opml file | ||
jnthn | So, that character passes the NFC quick-check | ||
Which implies "we're already in NFC" | 13:51 | ||
brrt | uhuh | ||
jnthn | But NFC should afaiu be stable | ||
Such that if you compute NFD and then again compute NFC, you get the same thing back | |||
That's not happenign here | |||
*happening | |||
nwc10 | we're into "#11907 Looking for a compiler bug is the strategy of LAST resort. LAST resort." ? | 13:52 | |
jnthn | Not yet | ||
I need to go look at our NFD -> NFC | 13:53 | ||
But NFC is *defined* in terms of NFD | |||
13:57
rarara_ joined
|
|||
brrt wonders what the current state of the art is in rubyland, since ruby may be even more comparable to perl6 in terms of indirections | 13:58 | ||
jnthn | Well, I can't find a way we're inconsistent with the actual Unicode data files | 14:07 | |
So yeah, it's very odd. I have a case where a string passes the NFC QuickCheck, but actually computing NFC on that string doesn't give identity | 14:16 | ||
nwc10 | use more 'coffee'; ? | 14:17 | |
jnthn | And it's a one-char string so I don't think I could be getting the use of NFC_QC wrong. | ||
14:37
tokuhiro_ joined
|
|||
jnthn | OK, seems we have something wrong in our canonical composition | 14:40 | |
Yeah, nailed it I think | 14:56 | ||
brrt | that was fast | 14:57 | |
jnthn | The number of characters added by patch to time spent ratio is pretty awful :P | 14:59 | |
brrt | aw, there goes your enterprise points | 15:00 | |
jnthn | :P | 15:01 | |
brrt has seen a lot of articles lately about how COBOL was making a comeback | 15:03 | ||
maybe we should have an Inline::COBOL | |||
or just port moar to COBOL for enterprise points | |||
jnthn | If your goal is -Osalary, COBOL may well be one of the best languages to learn :) | 15:05 | |
dalek | arVM: 5ff3001 | jnthn++ | src/strings/normalize.c: Fix a canonical composition bug. We didn't admit various starter/starter composition cases. This bug actually managed to survive despite us passing the complete Unicode normalization test suite, because we never hit this code path before thanks to the NFC_QC property. Now, thanks to NFG_QC, we can hit it in some more cases (this does perhaps point to a future optimization). Fixes a spectest regression. |
15:09 | |
brrt | i don't even... | 15:11 | |
jnthn | :) | 15:13 | |
brrt | understand how or why that makes a difference :-) | 15:14 | |
jnthn | brrt: Sometimes, two characters that are *not* combining chars are composed into a single char when canonicalizing. | 15:15 | |
brrt: And the bug I just fixed meant we didn't let that happen. | 15:16 | ||
Now I'm down to one affected test file | 15:17 | ||
Bizzarely, S32-io/IO-Socket-INET.t | |||
m: say uniname(0xbeef) | 15:21 | ||
camelia | rakudo-moar 3cc195: OUTPUTĀ«<Hangul Syllable>ā¤Ā» | ||
jnthn | haha | ||
m: say uniname(0xbabe) | |||
camelia | rakudo-moar 3cc195: OUTPUTĀ«<Hangul Syllable>ā¤Ā» | ||
jnthn | Pro tip: when writing tests and wanting some random "Unicode character", don't just spell cute words :) | 15:22 | |
Otherwise you might (or in this case, will!) end up with something that will, under NFG, end up combining with the previous grapheme. | |||
brrt | no, i stil don't get it | 15:37 | |
but i'll accept that for now | 15:38 | ||
jnthn | brrt: I only get it in so far as "I read the Unicode spec and know what the terms mean" | ||
I don't know anything about the Balianese language and the specific thing that's going on with these chars. | 15:39 | ||
brrt | the NFG business seems similar in a way to the x86 instruction encoding business | ||
you think you fixed it, but no, something funny happens | 15:40 | ||
only in specific magical cases | |||
jnthn | Well, this wsan't even an NFG bug, just an NFC one :) | ||
brrt | fair enough :-) | ||
jnthn | It's not *that* bad, tbh. It's just that humans are darn creative about their writing systems. | ||
Hangul is probably the biggest offender in terms of amount of code we have to write just for it. | 15:41 | ||
nwc10 | t/spec/S15-nfg/cgj.rakudo.moar | 15:50 | |
TODO passed: 5-8 | |||
jnthn | Yup :) | ||
Oh man. The \r\n => 1 grapheme thing may be a bit fraught | 16:12 | ||
nwc10 | why so? | ||
jnthn | Well, NQP can't even parse a simple test file with a \r\n in it any more | 16:14 | |
And then the error handling code it uses to try and report that fails too | 16:15 | ||
And the REPL hangs | 16:17 | ||
nwc10 | step away from the keyboard, and make a curry? | 16:18 | |
the "error reporting" thing sounds like a bug that needs fixing whatever else happens next. | |||
jnthn | aye | ||
Ah | 16:21 | ||
One possible problem is that concatenation of a \r and \n doesn't produce the grapheme | |||
dalek | arVM: f1a216d | jnthn++ | src/strings/nfg.c: Update concat code in prep for \r\n as grapheme. |
16:28 | |
jnthn | Aha | 16:35 | |
say(?("\r\n" ~~ /\v/)) | 16:36 | ||
[Coke] | oho? | ||
jnthn | That doesn't match | ||
So, that's almost certainly the source of the \r\n -> 1 grapheme bustage | |||
I suspect fixing this is going to need an NQP bootstrap updage | 16:37 | ||
*update | |||
But worse, it probably also brings up the issues with NFG and the NFA engine | |||
TimToady | m: say "\x037e".ord.base(16) | 16:45 | |
camelia | rakudo-moar 3cc195: OUTPUTĀ«3Bā¤Ā» | ||
jnthn | m: say uniname(0x037E) | 16:46 | |
camelia | rakudo-moar 3cc195: OUTPUTĀ«GREEK QUESTION MARKā¤Ā» | ||
TimToady | is there an explanation for why we lose track of GREEK QUESTION MARK? | ||
jnthn | Almost certainly :) | ||
m: say uniprop(0x037E, 'Decomp_Spec') | |||
camelia | rakudo-moar 3cc195: OUTPUTĀ«003Bā¤Ā» | ||
jnthn | ^^ | ||
'cus Unicode says we should as part of normalization | |||
See singleton equivalence in unicode.org/reports/tr15/ for more info | 16:47 | ||
TimToady | k | ||
jnthn | (There's a handful of 'em) | ||
[Coke] | TimToady: I thought we already said that on #perl6. my bad. | 16:49 | |
jnthn: it makes us impervious to the "mess with your friend's code" meme that was going around recently. | |||
jnthn | [Coke]: Nah, there's still plenty of other ways to do that :) | 16:51 | |
TimToady | jnthn: btw | ||
m: say uniprop(0x1B3D) # don't need to type General_Category every time | |||
camelia | rakudo-moar 3cc195: OUTPUTĀ«Mcā¤Ā» | ||
TimToady | that's the default | ||
jnthn | Darn, wish I'd know that earlier :P | ||
After today I probably don't need to look up gen cats again for another few months :P | |||
TimToady: I assume you want us to end up with \r\n as a synthetic so we totally follow the Unicode grapheme cluster rules? :) | 16:53 | ||
TimToady | unless there's some showstopper reason we can't | ||
jnthn | Not that I see so far, it's just a bit of hunting down the badass umptions in the code... :) | 16:54 | |
TimToady | at least it makes it easier to match \n in regex :) | ||
jnthn | Yeah | ||
TimToady | and certainly \v should match it too | ||
jnthn | For sure; just patched that locally | 16:55 | |
Though it's a hack :/ | |||
And will break LTM of \V until I more generally fix the NFG/NFA interaction | |||
The NFA design we inherited is rather into doing nqp::ord | |||
And so loses synthetics | |||
Not a huge engineering problem to fix, I don't think. Just another thing to do. | 16:56 | ||
TimToady | though even if we have an nqp::gord or so, we're still in trouble if we go with per-string or per-domain tables | 16:57 | |
jnthn | Indeed. I don't want to go that way. | ||
Better to actually pass one-char strings. | |||
At the "API" | |||
(We can still keep it all integers on the inside at actual matching time) | 16:58 | ||
TimToady | let's keep doing it right, and then think about optimization | ||
jnthn | Well, we're not doing it right yet...but yeah. | ||
TimToady | *righter | ||
jnthn | Otherwise, I think the NFG algo tweaks have turned out OK. | 16:59 | |
TimToady | any feel on input performance degradation? | ||
17:00
tokuhiro_ joined
|
|||
TimToady | presumably shouldn't be much if most of the file is quick-reject ASCII | 17:00 | |
jnthn | Yeah, it's a little slowdown for ASCII 'cus we have to care about \r now | 17:01 | |
TimToady | well, except insofar as \r\n is ... yeah | ||
jnthn | And we're more careful over controls | ||
When we have to do the full analysis it's more costly | |||
But I computed us an NFG quickcheck property | |||
So we should only be doing the hard work when we really need to | 17:02 | ||
m: say uniprop('x', 'NFG_QC') | |||
camelia | rakudo-moar 3cc195: OUTPUTĀ«0ā¤Ā» | ||
jnthn | Ah, build is behind | ||
But yeah, we leak that property to userspace as if it was a normal Unicode property, when it's in fact one we've made up | |||
Dunno if that bothers you. | 17:03 | ||
TimToady | not much | ||
jnthn | k | ||
Ended up with all the control chars being NFG terminators, btw. | 17:04 | ||
TimToady | though if the UC adopts the "NFG" term we could eventually get a name collision, but so far they seem to prefer Normalization_Form_Grapheme and such | ||
jnthn | Which was something you suggested before. | ||
Well, they do call their quickcheck properties NFC_QC for example | 17:05 | ||
TimToady | well, hopefully then they won't adopt it and flip the sense :) | ||
jnthn | If they did, they'd make it inconsistent with how the other quickcheck properties work | ||
And they don't seem that crazy. :) | |||
Or at least, no more crazy than you have to be to try and bring some order to the world's writing systems... | 17:06 | ||
TimToady | in the backlog: "never hit this codepath" | 17:08 | |
do we have any plans for a code coverage tool? | |||
so we can tell if there are glaring blind spots in roast? | 17:09 | ||
jnthn | Well, a user-level thing "not yet" | ||
Do we have the tech to hack something up to tell us where our roast blind spots are? That's easier. | 17:10 | ||
The cross-thread write logging and the profiler use bytecode instrumentation, and it's easy enough to write extra ones of those | |||
TimToady | well, thinking setting-level mostly there | ||
jnthn | My Big Plan is to turn the instrumentation stuff into a kind of meta-interpreter framework so you can write stuff like profilers and coverage tools and debuggers in NQP or Perl 6 code | 17:11 | |
But that's not going to happen this side of 6.c | 17:12 | ||
Anyway, I can do a couple-of-hours hack solution to get an approximate answer to "what is roast not covering" | 17:15 | ||
And maybe it'll be inspiration for somebody to go and make a good one :) | |||
OK, I've got NQP patches that get all but one of the NQP tests passing | 17:17 | ||
(With \r\n as a grapheme) | 17:18 | ||
It'll need a rebootstrap, alas | 17:19 | ||
Hm, and it doesn't make it all the way through the Perl 6 build either. | 17:20 | ||
TimToady | we have 5 comments in nqp that mention things to change after a rebootstrap :) | 17:22 | |
which I'm sure were put there at least one reboot ago... | |||
jnthn | :) | 17:23 | |
Time for me to go cook us something tasty :) | |||
jnthn is happy to have this bit of the NFG work nearly done | 17:26 | ||
Guess I need to worry about the threads/IO things next week | |||
Well, various I/O things... | 17:27 | ||
Anyway, away for a bit | |||
18:06
kjs_ joined
18:17
tokuhiro_ joined
18:18
zakharyas joined
18:28
leont joined
18:40
FROGGS joined
18:42
vendethiel joined
19:24
kjs_ joined
19:54
tokuhiro_ joined
19:55
rarara_ joined
20:10
kjs_ joined
20:37
kjs_ joined
21:19
kjs_ joined
21:33
zakharyas joined
22:20
tokuhiro_ joined
|