01:20 tokuhiro_ joined 02:00 camelia joined 02:03 camelia joined 02:06 camelia joined 02:15 tokuhiro_ joined 02:21 camelia joined 02:48 ilbot3 joined 02:54 camelia joined 03:05 jnthn joined 03:07 camelia joined 03:18 vendethiel joined 04:35 dalek joined 06:12 kjs_ joined 07:25 harrow` joined 07:38 FROGGS joined 07:40 vendethiel joined
nwc10 good UGT, #moarvm 08:45
almost typoed that as UFT
I wonder what UFT would be
timotimo Universal Food Time? 08:46
nwc10 so, dammit, there goes the answer to the security question "qwerty or dvorak?" :-)
that sounds very fattening
09:58 lizmat joined 10:49 zakharyas joined
dalek arVM: 2b9c98a | jnthn++ | src/strings/ (3 files):
Fix encoding \r\n grapheme in non-utf8 encodings.
12:12
arVM: fdd5bd5 | jnthn++ | src/strings/ (5 files):
Fix non-streaming decoders for \r\n as 1 grapheme.
12:51
ilmari jnthn++ # I was about to have to add realloc to the encoders to handle multi-character replacements, now I don't have to :) 12:52
jnthn ;) 12:53
ilmari: I need to update the streaming decoders too; writing tests for that at the moment.
ilmari btw, what's the best way to pass them to the low-level function if we want to provide them as Bufs at the perl6 level?
jnthn Did we want to provide them as Bufs at the Perl 6 level, or Str? 12:54
If it's a Buf, will be re-validate that the Buf in question actually decodes OK?
ilmari Str avoids the risk the replacement being invalid
yeah
so just taking it as an MVMString in _encode_substr and recursing to encode it?
jnthn OTOH, Str needs care too
ilmari throwing an exception if it self isn't encodable 12:55
jnthn Because you really need to feed its graphemes through the normalization process too
Oh wait, I guess that's overkill 12:56
Yeah, that's kinda silly
So ignore :)
And yeah, you'd just recurse to encode the replacement string, or just do it once.
Also there's no streaming encode, so you can ignore my "streaming decode not updated" comment too :) 12:57
(Just do it once at the start, I mean)
ilmari yeah 12:58
jnthn Doing it once at the start would also mean you get to check if it will/won't work out up front 12:59
ilmari yeah
jnthn And so ease memory management a little
ilmari if (replacement) repl_bytes = MVM_string_ascii_encode_substr(tc, replacement, &repl_length, 0, -1, NULL) 13:00
jnthn That works
ilmari NULL being the new «MVMString *replacement» paramter
I see the code mixes size_t/MVMuint64 and char*/MVMuint8* a bit 13:01
e.g. the function returns char*, but the result var is MVMuint8*
jnthn Which encoding for, ooc? 13:02
Oh, even the ASCII one I'm looking at casts at the end
ilmari I'm doing ascii first
ah, it does cast at the end 13:03
13:14 domidumont joined
ilmari bah, I keep typoing MVMString as MVString 13:46
dalek arVM: a362d21 | jnthn++ | src/strings/ (3 files):
Fix various streaming decoders for \r\n grapheme.
14:24
ilmari I've done as much as I can easily do wrt. encoding replacment chars: github.com/ilmari/MoarVM/commits/e...exceptions 14:30
the next bit requires changing the bytecode format to allow specifying the replacement string for unencodable characters
if someone can point me in the right direction I could have a go later 14:32
now I need to get back to work
jnthn ilmari++ 14:37
ilmari: The overall process is: update src/core/oplist, and run tools/update_ops.p6, then edit src/core/interp.c. 14:38
ilmari: But we have to deal with back-compat of bytecode, so I'd add a new op (encoderep or so) 14:39
ilmari: There are instructions on where to add ops at the top of src/core/oplist (simple answer: after all normal ops, before any special ops) 14:40
ilmari are fallthroughs in the interp.c switch allowed? 14:51
OP(encode): {
MVMString *replacement = NULL;
OP(encoderep):
replacement = GET_REG(cur_op, 6).s;
then I guess I need to wire it up in nqp and finally rakudo? 14:54
jnthn ilmari: Yeah but I worry a bit about whether compilers are smart enough to turn the switch into a jump table if we do tricks like that. 14:58
ilmari: Which is quite important for the interp.
And the op needs to come at the end
Well, the end of the normal ops
m: say +"42\n" 15:11
camelia rakudo-moar 36a351: OUTPUT«42␤»
jnthn m: say +"42\r\n"
camelia rakudo-moar 36a351: OUTPUT«should eventually be unreachable␤ in block <unit> at /tmp/ZU7uQLeOwj:1␤␤»
[Coke] O_o 15:13
jnthn :) 15:14
jnthn makes things more eventual :)
\r\n being a grapheme is being really good at shaking out various things. 15:15
dalek arVM: 4b46f83 | jnthn++ | src/ (2 files):
Make radix ops not blow up over synthetics.
15:16
15:33 tokuhiro_ joined
jnthn m: say uniprop "\r", 'Decomp_Type' 15:40
camelia rakudo-moar 36a351: OUTPUT«0␤»
jnthn m: say uniprop "\r", 'Decomposition_Type'
camelia rakudo-moar 36a351: OUTPUT«None␤»
jnthn m: say uniprop "\r", 'WhatNoThisIsNoProp'
camelia rakudo-moar 36a351: OUTPUT«0␤»
ilmari $ ./perl6 -e 'say "skjærgårdsøl".encode("ascii", :replacement("?")).decode("ascii")' 15:44
skj?rg?rds?l
\o/
jnthn \o/ 15:47
What on earth...
My debugger claims that we're passing a 13 to a function
And it receives -1
(Yes, same type in both cases: MVMCodepoint) 15:48
TimToady it's a good thing we included NFG as one of the Big Three 15:52
jnthn Indeed 15:54
Though the S of NSA is...uh...in need of some love..
TimToady one hopes that won't be quite so upsetting as \r\n
nwc10 the Three Wise Ones to do before Christmas 15:55
jnthn m: say uniname(214) 15:56
camelia rakudo-moar 36a351: OUTPUT«LATIN CAPITAL LETTER O WITH DIAERESIS␤»
nwc10 j: say uniname(214) 15:57
camelia rakudo-jvm 273e89: OUTPUT«LATIN CAPITAL LETTER O WITH DIAERESIS␤»
TimToady you'll note those are not the same version
nwc10 I failed to note that. Thanks.
TimToady we decoupled them last night 15:58
and jvm build has been failing for a couple days
nwc10 it needs a champion/hero/superhero? 15:59
TimToady jvm has periodic champions 16:00
nwc10 jnthn: this might be a daft thought experiment, but as the grapheme for \r\n is likely to crop up quite a bit, and is the only one in ASCII, might it be useful to "hard code" it as ord -1?
it means that "is this string ASCII" becomes the range (-1, 127), and ISO-8859-1 (-1, 255) 16:01
but there might be downsides to this
jnthn It gets -1 anyway since it's the only synthetic we encounter during VM initialization
nwc10 ah OK
jnthn Though yeah, I didn't code anything hard against that assumption
TimToady well, on windows
nwc10 I wondered if it's bad to make that assumption.
Or a saving. 16:02
I guess it's too early to know
TimToady is it -1 on *nix?
nwc10 I don't know. And I don't know how to answer that
jnthn TimToady: Yeah. It's 'cus stdin/stdout/stderr are initialized at startup, and their separators are set up to include \r\n whatever the platform
nwc10: At a guess it'd be a very modest saving 16:03
TimToady in a sense, having -1..127|255 makes ascii and latin-1 into variable-width encodings
jnthn oh wtf 16:05
nwc10 I used to like text better than numbers. Maybe I should give up and decide that booleans are the only fun type.
jnthn Seems some buffer mis-management
nwc10 jnthn: ASAN doesn
t
barf.
jnthn Not an overrun 16:06
Don't think the code can overrun
Just re-consider something it already processed
[Coke] TimToady++ hacking on camelia. 16:08
ilmari $ ./perl6 -e 'say "\x[ffffff]".encode(:replacement).decode("utf8")' 16:09
ilmari does a little dance
16:13 lucasb joined
lucasb iiuc, Configure.pl clones the 3rdparty submodules everytime it is run in a "clean" tree. Is it worth to have the full clone? Would a git --depth option speed the clonning? 16:19
*cloning
but I understand these repos may be small, so I wouldn't matter much. maybe except for libuv 16:20
*so it wouldn't matter much 16:21
JimmyZ ilmari++ # another RT down. 16:27
jnthn has figured out what's going on 16:30
And again, was something that was bust before, but \r\n as a single grapheme showed it up 16:40
jnthn wonders how many of the regressions from making n magical will be solved by his fix
timotimo jnthn: did you see the one where ords( ) on that string containing \r\n gives you a super high number (didn't check, but probably -1) 16:44
but not just any string with \r\n 16:45
jnthn That's hopefully covered by the local patch I have
timotimo neat
there was such a lot of backlog today that i couldn't keep up yet; is there a fix for socket's .get giving you more than you've wished for? 16:48
jnthn timotimo: I *think* so
timotimo it's a good start 16:49
jnthn At lesat, RabidGravy++ seemed to think I fixed all the things, and I think that was included
timotimo grabs fresh sources
do you have the file for ord() -> -1? if not, i can link it or test it locally after your latest patch lands
ords leaking synthetics seems fixed already, but it could just be that the reproduction isn't doing it in this case any more 16:54
it was not reliable in the first place 16:56
ah, "special handles" 17:00
jnthn Oh, that wasn't The Change 17:01
timotimo it's not interesting for the .get on sockets thing? 17:02
timotimo looks
jnthn No. It'd be worth checking if the sockets thing is still busted
These line endings things have been a lot mroe time-consuming than I expected... 17:03
Including making \n virtual on Windows
Well, everywhere, but it means it's \r\n on Windows
timotimo i'm trying out panda without the hotfix right now 17:04
seems still broken, let me double-check if i've got latest everything forever 17:05
dalek arVM: 62fbd37 | jnthn++ | src/strings/normalize.c:
Never re-normalize what we already considered.

In some cases, we get into a situation where already normalized chars are left sitting in the normalizer's buffer. We could then end up trying to renormalize them, and any synthetics would cause immense upset to things expecting codepoints.
jnthn ^ caused some nasty issues too
timotimo ok, i'll grab & build that, too 17:06
i wish there was a way to tell configure "no, it's all right, just accept the version i have. i just didn't re-configure everything"
17:08 domidumont joined
timotimo Use of Nil in string context in block at /home/timo/perl6/install/share/perl6/site/lib/Panda/Ecosystem.pm:106 17:12
this is the problem we've had
about .get giving us more than just one line 17:13
timotimo instruments with debug output
"HTTP/1.1 200 OK\r\nDate: Wed, 04 Nov 2015 17:15:21 GMT\r\nServer: Apache/2.4.10 (Debian)\r\n(and so on)…" 17:15
that's what .get gives us 17:16
[Coke] timotimo: I think we could make our makefiles smart enough to include the config step to avoid that issue. 17:18
timotimo well. that's what you get!
[Coke] but it's so much pain.
timotimo i hear ya 17:19
seriously, please don't talk so loud! i hear you from all the way over there! ;)
jnthn break & 17:21
17:34 tokuhiro_ joined 18:00 camelia joined 18:32 tokuhiro_ joined
nwc10 jnthn: much test fail. Tried one - ASAN barfage: paste.scsys.co.uk/500963 19:01
19:02 vendethiel joined 19:12 leont joined 19:38 synbot6 joined, tokuhiro_ joined 20:23 zakharyas joined 21:40 tokuhiro_ joined 21:44 Peter_R joined 21:58 Ven joined 22:03 ShimmerFairy joined 22:18 Ven joined
flussence I've cobbled some code together over the last 2 days that *looks* pretty reasonable but breaks in various ways. It's reached the point where it's throwing mvm-internal errors, so I give up :) Here it is if someone else wants a fun time: gist.github.com/flussence/a27ca3f5632476e80019 23:57