01:20 tokuhiro_ joined 02:12 colomon joined 02:18 BinGOs_ joined, btyler_ joined, leedo_ joined, [Coke]_ joined 02:19 jnthn_ joined
timotimo i think we threw the frame pool out because it no longer helped performance 03:19
but it wasn't replaced by malloc; it was replaced by the Fixed Size Allocator
03:22 tokuhiro_ joined 05:24 tokuhiro_ joined
diakopter yah, but from the fixed sized allocator, it spends 10% of time in malloc 06:20
07:26 tokuhiro_ joined 07:47 domidumont joined 07:52 domidumont joined 08:24 arnsholt_ joined 09:27 tokuhiro_ joined 09:30 domidumont joined 09:45 Peter_R joined 10:10 kjs_ joined 10:14 BinGOs joined 10:35 vendethiel joined 11:17 tokuhiro_ joined 11:25 domidumont joined 12:03 FROGGS joined 12:23 tokuhiro_ joined 12:24 TimToady joined 13:28 leont joined
timotimo oh 13:36
14:24 tokuhiro_ joined 14:53 vendethiel joined 16:11 kjs_ joined 16:18 colomon joined 16:26 tokuhiro_ joined 16:37 colomon joined 17:08 zakharyas joined
hoelzro o/ #moarvm 17:32
japhb o/
18:02 colomon joined
hoelzro o/ japhb 18:05
I've been digging around in MVM_string_utf16_encode_substr, and afaict, it doesn't specify if the output is UTF-16BE or UTF-16LE 18:06
it seems to just use native encoding
er, endianness
is that something that should be well defined for that function?
timotimo needs a BOM :P 18:13
hoelzro should we set us up the bom in that function, then? 18:14
that feels like it belongs at a higher level, maybe
timotimo dunno, what does the spec say about its power level?
18:15 colomon joined
hoelzro > 9000 18:15
timotimo WHAT NINE THOUSAND
well, the BMP is a bit more than nine thousand, isn't it?
hoelzro 16K, right? 18:16
er, no
64K
hoelzro can't math today
timotimo math! what is it good for? 18:17
dalek arVM: 47ab6f3 | hoelzro++ | src/strings/utf16.c:
Resize buffers as needed when taking a UTF-16 substring
18:18
arVM: 05ad276 | hoelzro++ | src/strings/utf16.c:
Initialize repl_length to 0

Otherwise we depend on uninitialized values for growing the buffer
hoelzro timotimo: do you think the endianness thing is RT worthy? 18:22
timotimo no clue 18:24
i'ven't seen an UTF-16 thing in a long time
isn't it quite common in asian parts of the world? 18:25
hoelzro I thought it was just MS stuff
and Java, but Java uses UCS-2
leont I think so does Oracle
hoelzro I'm not sure about asian countries, but I thought that Japan, for example, has stuck with Shift-JIS
oh, I didn't know that 18:26
timotimo what is Shift-JIS?
hoelzro it's an encoding that was (is?) popular in Japan
18:27 tokuhiro_ joined
timotimo let's see ... 18:27
leont Or at least it's producing CESU-8, which is an eldrich horror 18:28
(UTF-8, but with surrogate pairs…) 18:29
hoelzro wtf
timotimo so just like json? 18:30
leont Almost
AFAIK JSON is Modified UTF-8, which is the same except that a null character is encoded as 0xC0,0x80… 18:31
Which is a Java thing 18:32
Don't see it mentioned in the JSON RFC, I may be mistaken there 18:33
18:39 colomon joined
jnthn hoelzro: I think we should probably have UTF-16 write a BOM and mean native, and add UTF-16-LE and UTF-16-BE 18:41
Which can re-use the same code near enough 18:42
And just twiddle the endianness on the way out
Or in
leont "twiddle the endianness on the way out" 18:43
arnsholt leont: What on Earth is the rationale for something like CESU-8? If you're restricted to bytes, wouldn't UTF-8 be simpler?
leont ?
arnsholt: it's cheaper to convert UTF-16 to CESU-8 than to UTF-8, I guess
jnthn leont: As in, after grabbing codepoints, doing the surrogate pair split, and so forth
arnsholt True, I guess
hoelzro jnthn: should I make a ticket for that?
jnthn Heck, can even pass in a function pointer
hoelzro: Yeah, can do 18:44
leont endianness and surrogates have a clear order in my head
hoelzro rt.perl.org/Ticket/Display.html?id=126704 18:45
leont (possibly I'm misunderstanding what you just said and we're in agreement)
jnthn leont: You write the surrogates in a different order too?
I thought you just wrote the 16-bit values in a different order...
hoelzro rt.perl.org/Ticket/Display.html?id=126705
leont No, I don't think so
We're probably talking past each other, just ignore what I said :- 18:46
)
hoelzro jnthn: re: a BOM, though; I would think that would be the responsibility of a higher layer? ex. what if a protocol *always* uses UTF-16BE; does it make sense to throw a BOM on?
leont Depends on the protocol 18:47
jnthn leont: en.wikipedia.org/wiki/UTF-16#U.2BD...o_U.2BDFFF seem to agree with what I mean... :)
hoelzro leont: right, so why force the BOM if the programmer doesn't need it?
leont jnthn: indeed that's the obvious thing 18:49
jnthn bah 18:51
"If the BOM is missing, RFC 2781 says that big-endian encoding should be assumed. (In practice, due to Windows using little-endian order by default, many applications similarly assume little-endian encoding by default.)"
Standards... :/
leont Little-Endian is a bit silly, but given that's how all architectures work nowadays (even ARM switched) it seems a fait accompli 18:52
ilmari however, because the first character of JSON must be < 127, you can tell by the pattern of nulls
RFC 7159 says «Implementations MUST NOT add a byte order mark to the beginning of a 18:54
JSON text.»
leont UTF-16 has all the disadvantages of UCS-2 with all the disadvantages of UTF8, and adds one of its own: it isn't binary sortable (even UTF-16BE) due to surrogate pairs. It's a mess really. 18:56
ilmari 00 00 00 xx: UTF32-BE, xx 00 00 00: UTF-32LE, 00 xx: UTF-16BE, xx 00: UTF-16LE, xx: UTF-8 18:58
hoelzro 9^/win3 19:07
oops
19:29 tokuhiro_ joined 19:36 kjs_ joined 19:44 domidumont joined 19:45 vendethiel- joined 19:57 kjs_ joined 20:07 kjs_ joined 20:22 tokuhiro_ joined 20:27 lizmat joined 21:34 vendethiel joined
diakopter here's a CORE.setting compilation profile output using XCode Instruments: imgur.com/5LJOuf7 21:43
in case anyone wants to find some low-hanging fruitzies 21:45
that's at a 40-microsecond sample rate 21:48
and sorted by Self (ms) if you're interested: i.imgur.com/uW1KBZ6.png 21:54
jnthn Nice 21:58
jnthn drops them in browser tabs for when he's not tired :)
Rest time for now... o/
diakopter o/ 21:59
22:02 Ven joined 22:24 tokuhiro_ joined 22:31 Ven_ joined
diakopter in core setting compilation, MVM_sc_find_object_idx hits its cache 127855 times, but misses the cache 667817 times. Each time it misses the cache, it does a linear search through possibly thousands of objects to find the match.. 23:28
667817 linear searches is not good
timotimo that seems like a good catch 23:32
i'm definitely looking forward to when moar's jit builds a /tmp/perf-PID.map 23:50