08:42 sivoais left 11:02 finanalyst joined 11:14 finanalyst left
lizmat this feels LTA: 12:00
m: 56792.chr
camelia ( no output )
lizmat m: say 56792.chr
camelia Error encoding UTF-8 string: could not encode Unicode Surrogate codepoint 56792 (0xDDD8)
in block <unit> at <tmp> line 1
lizmat ShimmerFairy ^^ 12:01
it's nqp::encoderepconf() throwing in Encoding::Encoder::Builtin.encode-chars 12:02
timo really depends on whether we want Str in NFG to be able to hold a lone surrogate codepoint or not 13:33
it makes sense that it explodes when trying to encode it to a buf in order to print it out 13:34
m: 56792.chr.encode("utf8-c8").say
camelia Error encoding UTF-8 string: could not encode Unicode Surrogate codepoint 56792 (0xDDD8)
in block <unit> at <tmp> line 1
timo i guess that's not how you get that out of there huh
gist.github.com/milseman/c22a0413d...07e3ee7c8b - surely interesting to cross-reference 13:40
[Coke] m: 56792.chr.NFKD.say 13:45
camelia NFKD:0x<ddd8>
16:32 sivoais joined 16:48 [Coke]_ joined 16:51 [Coke] left 18:38 [Coke]_ is now known as [Coke]
Geth rakudo/main: a97c7a33c1 | (Elizabeth Mattijsen)++ | src/core.c/IO/Path.rakumod
Rework IO::Path.slurp(:bin)

Inspired by af30c7bed30b725a12
Instead of *always* asking for the filesize beforehand, it now asks for the filesize if the initial read filled the initial size buffer
  (1MB). If that exceeds INT_MAX (minus 1MB) a slow path is taken
... (9 more lines)
19:30
lizmat m: use nqp; my $b = nqp::setelems(Buf.new,0x080000000).decode 19:32
camelia MoarVM panic: Memory allocation failed; could not allocate 2147483648 bytes
lizmat m: use nqp; my $b = nqp::setelems(Buf.new,0x07fffffff).decode
camelia MoarVM panic: Memory allocation failed; could not allocate 2147483647 bytes
lizmat m: use nqp; my $b = nqp::setelems(Buf.new,0x07ffffff).decode
camelia ( no output ) 19:33
lizmat m: use nqp; say nqp::setelems(Buf.new,0x07ffffff).decode.chars
camelia 134217727
lizmat m: use nqp; say nqp::setelems(Buf.new,0x07ffffff).decode.chars.base(16)
camelia 7FFFFFF
lizmat m: use nqp; say nqp::setelems(Buf.new,0x07fffffff).decode.chars.base(16) 19:34
camelia MoarVM panic: Memory allocation failed; could not allocate 2147483647 bytes
lizmat weird, that works on my machine, but then again that has 64G
0x07fffffff works for me, 0x08000000 fails 19:35
so I guess I will put in a check for > 0x07fffffff before trying to decode
timo PIO? oh yeah that's uhhh Physical Input/Output of course :) 19:51
20:27 patrickb left 20:41 patrickb joined 20:47 finanalyst joined
Geth rakudo/main: 06f16f6d58 | (Elizabeth Mattijsen)++ | 2 files
Provide better error when trying to decode too large blobs

Apparently at least on some OSes (and/or on MoarVM) it's impossible to decode buffers that have more than 0x07fffffff elements: this used to throw a hard untrappable memory exceeded error thrown from the guts of the VM.
This adds logic to check the number of elements in the blob to be decoded, and produces a Failure if the number of elements is too large:
   Too many bytes to decode: 2418709326 is more than 2147483647
21:03
[Coke] ;win 12 21:56
22:06 finanalyst left
ShimmerFairy lizmat: To respond to earlier, surrogates are where things get real subtle. Surrogate codepoints are, in fact, codepoints, and you can manipulate them just like you would any other codepoint (though, obviously, their practical use is quite limited). Surrogates however are not "Unicode scalar values", which is the only kind of codepoint allowed in the UTF encodings. So trying to store a surrogate in any UTF encoding is an error, but 23:36
holding onto one as a codepoint isn't. Since NFG strings operate at a higher level than raw UTF encoding, there's a logic to them being OK in Strs.
I tried looking, but Unicode doesn't seem to have guidance on how a language's string type should handle surrogate codepoints, and I'm not sure what I think the right answer is. For one thing, since Strs are conceptualized as grapheme sequences (i.e. codepoint sequences with bookkeeping), it makes sense to allow all valid codepoints in them. On the other hand, you'll never be able to do UTF-based I/O with such strings, so their 23:40
usability is limited to within the program, and I can't offhand think of any uses.
timo put it in a string you *really* don't want your program to output so users or other processes can see them :) :) 23:44
ShimmerFairy Yeah, you could (ab)use them as like noncharacters that absolutely must not exit the program's internal memory. 23:47