01:20
tokuhiro_ joined
02:12
colomon joined
02:18
BinGOs_ joined,
btyler_ joined,
leedo_ joined,
[Coke]_ joined
02:19
jnthn_ joined
|
|||
timotimo | i think we threw the frame pool out because it no longer helped performance | 03:19 | |
but it wasn't replaced by malloc; it was replaced by the Fixed Size Allocator | |||
03:22
tokuhiro_ joined
05:24
tokuhiro_ joined
|
|||
diakopter | yah, but from the fixed sized allocator, it spends 10% of time in malloc | 06:20 | |
07:26
tokuhiro_ joined
07:47
domidumont joined
07:52
domidumont joined
08:24
arnsholt_ joined
09:27
tokuhiro_ joined
09:30
domidumont joined
09:45
Peter_R joined
10:10
kjs_ joined
10:14
BinGOs joined
10:35
vendethiel joined
11:17
tokuhiro_ joined
11:25
domidumont joined
12:03
FROGGS joined
12:23
tokuhiro_ joined
12:24
TimToady joined
13:28
leont joined
|
|||
timotimo | oh | 13:36 | |
14:24
tokuhiro_ joined
14:53
vendethiel joined
16:11
kjs_ joined
16:18
colomon joined
16:26
tokuhiro_ joined
16:37
colomon joined
17:08
zakharyas joined
|
|||
hoelzro | o/ #moarvm | 17:32 | |
japhb | o/ | ||
18:02
colomon joined
|
|||
hoelzro | o/ japhb | 18:05 | |
I've been digging around in MVM_string_utf16_encode_substr, and afaict, it doesn't specify if the output is UTF-16BE or UTF-16LE | 18:06 | ||
it seems to just use native encoding | |||
er, endianness | |||
is that something that should be well defined for that function? | |||
timotimo | needs a BOM :P | 18:13 | |
hoelzro | should we set us up the bom in that function, then? | 18:14 | |
that feels like it belongs at a higher level, maybe | |||
timotimo | dunno, what does the spec say about its power level? | ||
18:15
colomon joined
|
|||
hoelzro | > 9000 | 18:15 | |
timotimo | WHAT NINE THOUSAND | ||
well, the BMP is a bit more than nine thousand, isn't it? | |||
hoelzro | 16K, right? | 18:16 | |
er, no | |||
64K | |||
hoelzro can't math today | |||
timotimo | math! what is it good for? | 18:17 | |
dalek | arVM: 47ab6f3 | hoelzro++ | src/strings/utf16.c: Resize buffers as needed when taking a UTF-16 substring |
18:18 | |
arVM: 05ad276 | hoelzro++ | src/strings/utf16.c: Initialize repl_length to 0 Otherwise we depend on uninitialized values for growing the buffer |
|||
hoelzro | timotimo: do you think the endianness thing is RT worthy? | 18:22 | |
timotimo | no clue | 18:24 | |
i'ven't seen an UTF-16 thing in a long time | |||
isn't it quite common in asian parts of the world? | 18:25 | ||
hoelzro | I thought it was just MS stuff | ||
and Java, but Java uses UCS-2 | |||
leont | I think so does Oracle | ||
hoelzro | I'm not sure about asian countries, but I thought that Japan, for example, has stuck with Shift-JIS | ||
oh, I didn't know that | 18:26 | ||
timotimo | what is Shift-JIS? | ||
hoelzro | it's an encoding that was (is?) popular in Japan | ||
18:27
tokuhiro_ joined
|
|||
timotimo | let's see ... | 18:27 | |
leont | Or at least it's producing CESU-8, which is an eldrich horror | 18:28 | |
(UTF-8, but with surrogate pairs…) | 18:29 | ||
hoelzro | wtf | ||
timotimo | so just like json? | 18:30 | |
leont | Almost | ||
AFAIK JSON is Modified UTF-8, which is the same except that a null character is encoded as 0xC0,0x80… | 18:31 | ||
Which is a Java thing | 18:32 | ||
Don't see it mentioned in the JSON RFC, I may be mistaken there | 18:33 | ||
18:39
colomon joined
|
|||
jnthn | hoelzro: I think we should probably have UTF-16 write a BOM and mean native, and add UTF-16-LE and UTF-16-BE | 18:41 | |
Which can re-use the same code near enough | 18:42 | ||
And just twiddle the endianness on the way out | |||
Or in | |||
leont | "twiddle the endianness on the way out" | 18:43 | |
arnsholt | leont: What on Earth is the rationale for something like CESU-8? If you're restricted to bytes, wouldn't UTF-8 be simpler? | ||
leont | ? | ||
arnsholt: it's cheaper to convert UTF-16 to CESU-8 than to UTF-8, I guess | |||
jnthn | leont: As in, after grabbing codepoints, doing the surrogate pair split, and so forth | ||
arnsholt | True, I guess | ||
hoelzro | jnthn: should I make a ticket for that? | ||
jnthn | Heck, can even pass in a function pointer | ||
hoelzro: Yeah, can do | 18:44 | ||
leont | endianness and surrogates have a clear order in my head | ||
hoelzro | rt.perl.org/Ticket/Display.html?id=126704 | 18:45 | |
leont | (possibly I'm misunderstanding what you just said and we're in agreement) | ||
jnthn | leont: You write the surrogates in a different order too? | ||
I thought you just wrote the 16-bit values in a different order... | |||
hoelzro | rt.perl.org/Ticket/Display.html?id=126705 | ||
leont | No, I don't think so | ||
We're probably talking past each other, just ignore what I said :- | 18:46 | ||
) | |||
hoelzro | jnthn: re: a BOM, though; I would think that would be the responsibility of a higher layer? ex. what if a protocol *always* uses UTF-16BE; does it make sense to throw a BOM on? | ||
leont | Depends on the protocol | 18:47 | |
jnthn | leont: en.wikipedia.org/wiki/UTF-16#U.2BD...o_U.2BDFFF seem to agree with what I mean... :) | ||
hoelzro | leont: right, so why force the BOM if the programmer doesn't need it? | ||
leont | jnthn: indeed that's the obvious thing | 18:49 | |
jnthn | bah | 18:51 | |
"If the BOM is missing, RFC 2781 says that big-endian encoding should be assumed. (In practice, due to Windows using little-endian order by default, many applications similarly assume little-endian encoding by default.)" | |||
Standards... :/ | |||
leont | Little-Endian is a bit silly, but given that's how all architectures work nowadays (even ARM switched) it seems a fait accompli | 18:52 | |
ilmari | however, because the first character of JSON must be < 127, you can tell by the pattern of nulls | ||
RFC 7159 says «Implementations MUST NOT add a byte order mark to the beginning of a | 18:54 | ||
JSON text.» | |||
leont | UTF-16 has all the disadvantages of UCS-2 with all the disadvantages of UTF8, and adds one of its own: it isn't binary sortable (even UTF-16BE) due to surrogate pairs. It's a mess really. | 18:56 | |
ilmari | 00 00 00 xx: UTF32-BE, xx 00 00 00: UTF-32LE, 00 xx: UTF-16BE, xx 00: UTF-16LE, xx: UTF-8 | 18:58 | |
hoelzro | 9^/win3 | 19:07 | |
oops | |||
19:29
tokuhiro_ joined
19:36
kjs_ joined
19:44
domidumont joined
19:45
vendethiel- joined
19:57
kjs_ joined
20:07
kjs_ joined
20:22
tokuhiro_ joined
20:27
lizmat joined
21:34
vendethiel joined
|
|||
diakopter | here's a CORE.setting compilation profile output using XCode Instruments: imgur.com/5LJOuf7 | 21:43 | |
in case anyone wants to find some low-hanging fruitzies | 21:45 | ||
that's at a 40-microsecond sample rate | 21:48 | ||
and sorted by Self (ms) if you're interested: i.imgur.com/uW1KBZ6.png | 21:54 | ||
jnthn | Nice | 21:58 | |
jnthn drops them in browser tabs for when he's not tired :) | |||
Rest time for now... o/ | |||
diakopter | o/ | 21:59 | |
22:02
Ven joined
22:24
tokuhiro_ joined
22:31
Ven_ joined
|
|||
diakopter | in core setting compilation, MVM_sc_find_object_idx hits its cache 127855 times, but misses the cache 667817 times. Each time it misses the cache, it does a linear search through possibly thousands of objects to find the match.. | 23:28 | |
667817 linear searches is not good | |||
timotimo | that seems like a good catch | 23:32 | |
i'm definitely looking forward to when moar's jit builds a /tmp/perf-PID.map | 23:50 |