01:20
tokuhiro_ joined
02:00
camelia joined
02:03
camelia joined
02:06
camelia joined
02:15
tokuhiro_ joined
02:21
camelia joined
02:48
ilbot3 joined
02:54
camelia joined
03:05
jnthn joined
03:07
camelia joined
03:18
vendethiel joined
04:35
dalek joined
06:12
kjs_ joined
07:25
harrow` joined
07:38
FROGGS joined
07:40
vendethiel joined
|
|||
nwc10 | good UGT, #moarvm | 08:45 | |
almost typoed that as UFT | |||
I wonder what UFT would be | |||
timotimo | Universal Food Time? | 08:46 | |
nwc10 | so, dammit, there goes the answer to the security question "qwerty or dvorak?" :-) | ||
that sounds very fattening | |||
09:58
lizmat joined
10:49
zakharyas joined
|
|||
dalek | arVM: 2b9c98a | jnthn++ | src/strings/ (3 files): Fix encoding \r\n grapheme in non-utf8 encodings. |
12:12 | |
arVM: fdd5bd5 | jnthn++ | src/strings/ (5 files): Fix non-streaming decoders for \r\n as 1 grapheme. |
12:51 | ||
ilmari | jnthn++ # I was about to have to add realloc to the encoders to handle multi-character replacements, now I don't have to :) | 12:52 | |
jnthn | ;) | 12:53 | |
ilmari: I need to update the streaming decoders too; writing tests for that at the moment. | |||
ilmari | btw, what's the best way to pass them to the low-level function if we want to provide them as Bufs at the perl6 level? | ||
jnthn | Did we want to provide them as Bufs at the Perl 6 level, or Str? | 12:54 | |
If it's a Buf, will be re-validate that the Buf in question actually decodes OK? | |||
ilmari | Str avoids the risk the replacement being invalid | ||
yeah | |||
so just taking it as an MVMString in _encode_substr and recursing to encode it? | |||
jnthn | OTOH, Str needs care too | ||
ilmari | throwing an exception if it self isn't encodable | 12:55 | |
jnthn | Because you really need to feed its graphemes through the normalization process too | ||
Oh wait, I guess that's overkill | 12:56 | ||
Yeah, that's kinda silly | |||
So ignore :) | |||
And yeah, you'd just recurse to encode the replacement string, or just do it once. | |||
Also there's no streaming encode, so you can ignore my "streaming decode not updated" comment too :) | 12:57 | ||
(Just do it once at the start, I mean) | |||
ilmari | yeah | 12:58 | |
jnthn | Doing it once at the start would also mean you get to check if it will/won't work out up front | 12:59 | |
ilmari | yeah | ||
jnthn | And so ease memory management a little | ||
ilmari | if (replacement) repl_bytes = MVM_string_ascii_encode_substr(tc, replacement, &repl_length, 0, -1, NULL) | 13:00 | |
jnthn | That works | ||
ilmari | NULL being the new «MVMString *replacement» paramter | ||
I see the code mixes size_t/MVMuint64 and char*/MVMuint8* a bit | 13:01 | ||
e.g. the function returns char*, but the result var is MVMuint8* | |||
jnthn | Which encoding for, ooc? | 13:02 | |
Oh, even the ASCII one I'm looking at casts at the end | |||
ilmari | I'm doing ascii first | ||
ah, it does cast at the end | 13:03 | ||
13:14
domidumont joined
|
|||
ilmari | bah, I keep typoing MVMString as MVString | 13:46 | |
dalek | arVM: a362d21 | jnthn++ | src/strings/ (3 files): Fix various streaming decoders for \r\n grapheme. |
14:24 | |
ilmari | I've done as much as I can easily do wrt. encoding replacment chars: github.com/ilmari/MoarVM/commits/e...exceptions | 14:30 | |
the next bit requires changing the bytecode format to allow specifying the replacement string for unencodable characters | |||
if someone can point me in the right direction I could have a go later | 14:32 | ||
now I need to get back to work | |||
jnthn | ilmari++ | 14:37 | |
ilmari: The overall process is: update src/core/oplist, and run tools/update_ops.p6, then edit src/core/interp.c. | 14:38 | ||
ilmari: But we have to deal with back-compat of bytecode, so I'd add a new op (encoderep or so) | 14:39 | ||
ilmari: There are instructions on where to add ops at the top of src/core/oplist (simple answer: after all normal ops, before any special ops) | 14:40 | ||
ilmari | are fallthroughs in the interp.c switch allowed? | 14:51 | |
OP(encode): { | |||
MVMString *replacement = NULL; | |||
OP(encoderep): | |||
replacement = GET_REG(cur_op, 6).s; | |||
then I guess I need to wire it up in nqp and finally rakudo? | 14:54 | ||
jnthn | ilmari: Yeah but I worry a bit about whether compilers are smart enough to turn the switch into a jump table if we do tricks like that. | 14:58 | |
ilmari: Which is quite important for the interp. | |||
And the op needs to come at the end | |||
Well, the end of the normal ops | |||
m: say +"42\n" | 15:11 | ||
camelia | rakudo-moar 36a351: OUTPUT«42» | ||
jnthn | m: say +"42\r\n" | ||
camelia | rakudo-moar 36a351: OUTPUT«should eventually be unreachable in block <unit> at /tmp/ZU7uQLeOwj:1» | ||
[Coke] | O_o | 15:13 | |
jnthn | :) | 15:14 | |
jnthn makes things more eventual :) | |||
\r\n being a grapheme is being really good at shaking out various things. | 15:15 | ||
dalek | arVM: 4b46f83 | jnthn++ | src/ (2 files): Make radix ops not blow up over synthetics. |
15:16 | |
15:33
tokuhiro_ joined
|
|||
jnthn | m: say uniprop "\r", 'Decomp_Type' | 15:40 | |
camelia | rakudo-moar 36a351: OUTPUT«0» | ||
jnthn | m: say uniprop "\r", 'Decomposition_Type' | ||
camelia | rakudo-moar 36a351: OUTPUT«None» | ||
jnthn | m: say uniprop "\r", 'WhatNoThisIsNoProp' | ||
camelia | rakudo-moar 36a351: OUTPUT«0» | ||
ilmari | $ ./perl6 -e 'say "skjærgårdsøl".encode("ascii", :replacement("?")).decode("ascii")' | 15:44 | |
skj?rg?rds?l | |||
\o/ | |||
jnthn | \o/ | 15:47 | |
What on earth... | |||
My debugger claims that we're passing a 13 to a function | |||
And it receives -1 | |||
(Yes, same type in both cases: MVMCodepoint) | 15:48 | ||
TimToady | it's a good thing we included NFG as one of the Big Three | 15:52 | |
jnthn | Indeed | 15:54 | |
Though the S of NSA is...uh...in need of some love.. | |||
TimToady | one hopes that won't be quite so upsetting as \r\n | ||
nwc10 | the Three Wise Ones to do before Christmas | 15:55 | |
jnthn | m: say uniname(214) | 15:56 | |
camelia | rakudo-moar 36a351: OUTPUT«LATIN CAPITAL LETTER O WITH DIAERESIS» | ||
nwc10 | j: say uniname(214) | 15:57 | |
camelia | rakudo-jvm 273e89: OUTPUT«LATIN CAPITAL LETTER O WITH DIAERESIS» | ||
TimToady | you'll note those are not the same version | ||
nwc10 | I failed to note that. Thanks. | ||
TimToady | we decoupled them last night | 15:58 | |
and jvm build has been failing for a couple days | |||
nwc10 | it needs a champion/hero/superhero? | 15:59 | |
TimToady | jvm has periodic champions | 16:00 | |
nwc10 | jnthn: this might be a daft thought experiment, but as the grapheme for \r\n is likely to crop up quite a bit, and is the only one in ASCII, might it be useful to "hard code" it as ord -1? | ||
it means that "is this string ASCII" becomes the range (-1, 127), and ISO-8859-1 (-1, 255) | 16:01 | ||
but there might be downsides to this | |||
jnthn | It gets -1 anyway since it's the only synthetic we encounter during VM initialization | ||
nwc10 | ah OK | ||
jnthn | Though yeah, I didn't code anything hard against that assumption | ||
TimToady | well, on windows | ||
nwc10 | I wondered if it's bad to make that assumption. | ||
Or a saving. | 16:02 | ||
I guess it's too early to know | |||
TimToady | is it -1 on *nix? | ||
nwc10 | I don't know. And I don't know how to answer that | ||
jnthn | TimToady: Yeah. It's 'cus stdin/stdout/stderr are initialized at startup, and their separators are set up to include \r\n whatever the platform | ||
nwc10: At a guess it'd be a very modest saving | 16:03 | ||
TimToady | in a sense, having -1..127|255 makes ascii and latin-1 into variable-width encodings | ||
jnthn | oh wtf | 16:05 | |
nwc10 | I used to like text better than numbers. Maybe I should give up and decide that booleans are the only fun type. | ||
jnthn | Seems some buffer mis-management | ||
nwc10 | jnthn: ASAN doesn | ||
t | |||
barf. | |||
jnthn | Not an overrun | 16:06 | |
Don't think the code can overrun | |||
Just re-consider something it already processed | |||
[Coke] | TimToady++ hacking on camelia. | 16:08 | |
ilmari | $ ./perl6 -e 'say "\x[ffffff]".encode(:replacement).decode("utf8")' | 16:09 | |
� | |||
ilmari does a little dance | |||
16:13
lucasb joined
|
|||
lucasb | iiuc, Configure.pl clones the 3rdparty submodules everytime it is run in a "clean" tree. Is it worth to have the full clone? Would a git --depth option speed the clonning? | 16:19 | |
*cloning | |||
but I understand these repos may be small, so I wouldn't matter much. maybe except for libuv | 16:20 | ||
*so it wouldn't matter much | 16:21 | ||
JimmyZ | ilmari++ # another RT down. | 16:27 | |
jnthn has figured out what's going on | 16:30 | ||
And again, was something that was bust before, but \r\n as a single grapheme showed it up | 16:40 | ||
jnthn wonders how many of the regressions from making n magical will be solved by his fix | |||
timotimo | jnthn: did you see the one where ords( ) on that string containing \r\n gives you a super high number (didn't check, but probably -1) | 16:44 | |
but not just any string with \r\n | 16:45 | ||
jnthn | That's hopefully covered by the local patch I have | ||
timotimo | neat | ||
there was such a lot of backlog today that i couldn't keep up yet; is there a fix for socket's .get giving you more than you've wished for? | 16:48 | ||
jnthn | timotimo: I *think* so | ||
timotimo | it's a good start | 16:49 | |
jnthn | At lesat, RabidGravy++ seemed to think I fixed all the things, and I think that was included | ||
timotimo grabs fresh sources | |||
do you have the file for ord() -> -1? if not, i can link it or test it locally after your latest patch lands | |||
ords leaking synthetics seems fixed already, but it could just be that the reproduction isn't doing it in this case any more | 16:54 | ||
it was not reliable in the first place | 16:56 | ||
ah, "special handles" | 17:00 | ||
jnthn | Oh, that wasn't The Change | 17:01 | |
timotimo | it's not interesting for the .get on sockets thing? | 17:02 | |
timotimo looks | |||
jnthn | No. It'd be worth checking if the sockets thing is still busted | ||
These line endings things have been a lot mroe time-consuming than I expected... | 17:03 | ||
Including making \n virtual on Windows | |||
Well, everywhere, but it means it's \r\n on Windows | |||
timotimo | i'm trying out panda without the hotfix right now | 17:04 | |
seems still broken, let me double-check if i've got latest everything forever | 17:05 | ||
dalek | arVM: 62fbd37 | jnthn++ | src/strings/normalize.c: Never re-normalize what we already considered. In some cases, we get into a situation where already normalized chars are left sitting in the normalizer's buffer. We could then end up trying to renormalize them, and any synthetics would cause immense upset to things expecting codepoints. |
||
jnthn | ^ caused some nasty issues too | ||
timotimo | ok, i'll grab & build that, too | 17:06 | |
i wish there was a way to tell configure "no, it's all right, just accept the version i have. i just didn't re-configure everything" | |||
17:08
domidumont joined
|
|||
timotimo | Use of Nil in string context in block at /home/timo/perl6/install/share/perl6/site/lib/Panda/Ecosystem.pm:106 | 17:12 | |
this is the problem we've had | |||
about .get giving us more than just one line | 17:13 | ||
timotimo instruments with debug output | |||
"HTTP/1.1 200 OK\r\nDate: Wed, 04 Nov 2015 17:15:21 GMT\r\nServer: Apache/2.4.10 (Debian)\r\n(and so on)…" | 17:15 | ||
that's what .get gives us | 17:16 | ||
[Coke] | timotimo: I think we could make our makefiles smart enough to include the config step to avoid that issue. | 17:18 | |
timotimo | well. that's what you get! | ||
[Coke] | but it's so much pain. | ||
timotimo | i hear ya | 17:19 | |
seriously, please don't talk so loud! i hear you from all the way over there! ;) | |||
jnthn | break & | 17:21 | |
17:34
tokuhiro_ joined
18:00
camelia joined
18:32
tokuhiro_ joined
|
|||
nwc10 | jnthn: much test fail. Tried one - ASAN barfage: paste.scsys.co.uk/500963 | 19:01 | |
19:02
vendethiel joined
19:12
leont joined
19:38
synbot6 joined,
tokuhiro_ joined
20:23
zakharyas joined
21:40
tokuhiro_ joined
21:44
Peter_R joined
21:58
Ven joined
22:03
ShimmerFairy joined
22:18
Ven joined
|
|||
flussence | I've cobbled some code together over the last 2 days that *looks* pretty reasonable but breaks in various ways. It's reached the point where it's throwing mvm-internal errors, so I give up :) Here it is if someone else wants a fun time: gist.github.com/flussence/a27ca3f5632476e80019 | 23:57 |