|
08:42
sivoais left
11:02
finanalyst joined
11:14
finanalyst left
|
|||
| lizmat | this feels LTA: | 12:00 | |
| m: 56792.chr | |||
| camelia | ( no output ) | ||
| lizmat | m: say 56792.chr | ||
| camelia | Error encoding UTF-8 string: could not encode Unicode Surrogate codepoint 56792 (0xDDD8) in block <unit> at <tmp> line 1 |
||
| lizmat | ShimmerFairy ^^ | 12:01 | |
| it's nqp::encoderepconf() throwing in Encoding::Encoder::Builtin.encode-chars | 12:02 | ||
| timo | really depends on whether we want Str in NFG to be able to hold a lone surrogate codepoint or not | 13:33 | |
| it makes sense that it explodes when trying to encode it to a buf in order to print it out | 13:34 | ||
| m: 56792.chr.encode("utf8-c8").say | |||
| camelia | Error encoding UTF-8 string: could not encode Unicode Surrogate codepoint 56792 (0xDDD8) in block <unit> at <tmp> line 1 |
||
| timo | i guess that's not how you get that out of there huh | ||
| gist.github.com/milseman/c22a0413d...07e3ee7c8b - surely interesting to cross-reference | 13:40 | ||
| [Coke] | m: 56792.chr.NFKD.say | 13:45 | |
| camelia | NFKD:0x<ddd8> | ||
|
16:32
sivoais joined
16:48
[Coke]_ joined
16:51
[Coke] left
18:38
[Coke]_ is now known as [Coke]
|
|||
| Geth | rakudo/main: a97c7a33c1 | (Elizabeth Mattijsen)++ | src/core.c/IO/Path.rakumod Rework IO::Path.slurp(:bin) Inspired by af30c7bed30b725a12 Instead of *always* asking for the filesize beforehand, it now asks for the filesize if the initial read filled the initial size buffer (1MB). If that exceeds INT_MAX (minus 1MB) a slow path is taken ... (9 more lines) |
19:30 | |
| lizmat | m: use nqp; my $b = nqp::setelems(Buf.new,0x080000000).decode | 19:32 | |
| camelia | MoarVM panic: Memory allocation failed; could not allocate 2147483648 bytes | ||
| lizmat | m: use nqp; my $b = nqp::setelems(Buf.new,0x07fffffff).decode | ||
| camelia | MoarVM panic: Memory allocation failed; could not allocate 2147483647 bytes | ||
| lizmat | m: use nqp; my $b = nqp::setelems(Buf.new,0x07ffffff).decode | ||
| camelia | ( no output ) | 19:33 | |
| lizmat | m: use nqp; say nqp::setelems(Buf.new,0x07ffffff).decode.chars | ||
| camelia | 134217727 | ||
| lizmat | m: use nqp; say nqp::setelems(Buf.new,0x07ffffff).decode.chars.base(16) | ||
| camelia | 7FFFFFF | ||
| lizmat | m: use nqp; say nqp::setelems(Buf.new,0x07fffffff).decode.chars.base(16) | 19:34 | |
| camelia | MoarVM panic: Memory allocation failed; could not allocate 2147483647 bytes | ||
| lizmat | weird, that works on my machine, but then again that has 64G | ||
| 0x07fffffff works for me, 0x08000000 fails | 19:35 | ||
| so I guess I will put in a check for > 0x07fffffff before trying to decode | |||
| timo | PIO? oh yeah that's uhhh Physical Input/Output of course :) | 19:51 | |
|
20:27
patrickb left
20:41
patrickb joined
20:47
finanalyst joined
|
|||
| Geth | rakudo/main: 06f16f6d58 | (Elizabeth Mattijsen)++ | 2 files Provide better error when trying to decode too large blobs Apparently at least on some OSes (and/or on MoarVM) it's impossible to decode buffers that have more than 0x07fffffff elements: this used to throw a hard untrappable memory exceeded error thrown from the guts of the VM. This adds logic to check the number of elements in the blob to be decoded, and produces a Failure if the number of elements is too large: Too many bytes to decode: 2418709326 is more than 2147483647 |
21:03 | |
| [Coke] | ;win 12 | 21:56 | |
|
22:06
finanalyst left
|
|||
| ShimmerFairy | lizmat: To respond to earlier, surrogates are where things get real subtle. Surrogate codepoints are, in fact, codepoints, and you can manipulate them just like you would any other codepoint (though, obviously, their practical use is quite limited). Surrogates however are not "Unicode scalar values", which is the only kind of codepoint allowed in the UTF encodings. So trying to store a surrogate in any UTF encoding is an error, but | 23:36 | |
| holding onto one as a codepoint isn't. Since NFG strings operate at a higher level than raw UTF encoding, there's a logic to them being OK in Strs. | |||
| I tried looking, but Unicode doesn't seem to have guidance on how a language's string type should handle surrogate codepoints, and I'm not sure what I think the right answer is. For one thing, since Strs are conceptualized as grapheme sequences (i.e. codepoint sequences with bookkeeping), it makes sense to allow all valid codepoints in them. On the other hand, you'll never be able to do UTF-based I/O with such strings, so their | 23:40 | ||
| usability is limited to within the program, and I can't offhand think of any uses. | |||
| timo | put it in a string you *really* don't want your program to output so users or other processes can see them :) :) | 23:44 | |
| ShimmerFairy | Yeah, you could (ab)use them as like noncharacters that absolutely must not exit the program's internal memory. | 23:47 | |