#moarvm on 21 November 2015 - Raku Programming Language Log

01:20 tokuhiro_ joined 02:12 colomon joined 02:18 BinGOs_ joined, btyler_ joined, leedo_ joined, [Coke]_ joined 02:19 jnthn_ joined
timotimo	i think we threw the frame pool out because it no longer helped performance	03:19	Copy link Message link Add to gist Remove
	but it wasn't replaced by malloc; it was replaced by the Fixed Size Allocator		Copy link Message link Add to gist Remove
03:22 tokuhiro_ joined 05:24 tokuhiro_ joined
diakopter	yah, but from the fixed sized allocator, it spends 10% of time in malloc	06:20	Copy link Message link Add to gist Remove
07:26 tokuhiro_ joined 07:47 domidumont joined 07:52 domidumont joined 08:24 arnsholt_ joined 09:27 tokuhiro_ joined 09:30 domidumont joined 09:45 Peter_R joined 10:10 kjs_ joined 10:14 BinGOs joined 10:35 vendethiel joined 11:17 tokuhiro_ joined 11:25 domidumont joined 12:03 FROGGS joined 12:23 tokuhiro_ joined 12:24 TimToady joined 13:28 leont joined
timotimo	oh	13:36	Copy link Message link Add to gist Remove
14:24 tokuhiro_ joined 14:53 vendethiel joined 16:11 kjs_ joined 16:18 colomon joined 16:26 tokuhiro_ joined 16:37 colomon joined 17:08 zakharyas joined
hoelzro	o/ #moarvm	17:32	Copy link Message link Add to gist Remove
japhb	o/		Copy link Message link Add to gist Remove
18:02 colomon joined
hoelzro	o/ japhb	18:05	Copy link Message link Add to gist Remove
	I've been digging around in MVM_string_utf16_encode_substr, and afaict, it doesn't specify if the output is UTF-16BE or UTF-16LE	18:06	Copy link Message link Add to gist Remove
	it seems to just use native encoding		Copy link Message link Add to gist Remove
	er, endianness		Copy link Message link Add to gist Remove
	is that something that should be well defined for that function?		Copy link Message link Add to gist Remove
timotimo	needs a BOM :P	18:13	Copy link Message link Add to gist Remove
hoelzro	should we set us up the bom in that function, then?	18:14	Copy link Message link Add to gist Remove
	that feels like it belongs at a higher level, maybe		Copy link Message link Add to gist Remove
timotimo	dunno, what does the spec say about its power level?		Copy link Message link Add to gist Remove
18:15 colomon joined
hoelzro	> 9000	18:15	Copy link Message link Add to gist Remove
timotimo	WHAT NINE THOUSAND		Copy link Message link Add to gist Remove
	well, the BMP is a bit more than nine thousand, isn't it?		Copy link Message link Add to gist Remove
hoelzro	16K, right?	18:16	Copy link Message link Add to gist Remove
	er, no		Copy link Message link Add to gist Remove
	64K		Copy link Message link Add to gist Remove
	hoelzro can't math today		Copy link Message link Add to gist Remove
timotimo	math! what is it good for?	18:17	Copy link Message link Add to gist Remove
dalek	arVM: 47ab6f3 \| hoelzro++ \| src/strings/utf16.c: Resize buffers as needed when taking a UTF-16 substring	18:18	Copy link Message link Add to gist Remove
	arVM: 05ad276 \| hoelzro++ \| src/strings/utf16.c: Initialize repl_length to 0 Otherwise we depend on uninitialized values for growing the buffer		Copy link Message link Add to gist Remove
hoelzro	timotimo: do you think the endianness thing is RT worthy?	18:22	Copy link Message link Add to gist Remove
timotimo	no clue	18:24	Copy link Message link Add to gist Remove
	i'ven't seen an UTF-16 thing in a long time		Copy link Message link Add to gist Remove
	isn't it quite common in asian parts of the world?	18:25	Copy link Message link Add to gist Remove
hoelzro	I thought it was just MS stuff		Copy link Message link Add to gist Remove
	and Java, but Java uses UCS-2		Copy link Message link Add to gist Remove
leont	I think so does Oracle		Copy link Message link Add to gist Remove
hoelzro	I'm not sure about asian countries, but I thought that Japan, for example, has stuck with Shift-JIS		Copy link Message link Add to gist Remove
	oh, I didn't know that	18:26	Copy link Message link Add to gist Remove
timotimo	what is Shift-JIS?		Copy link Message link Add to gist Remove
hoelzro	it's an encoding that was (is?) popular in Japan		Copy link Message link Add to gist Remove
18:27 tokuhiro_ joined
timotimo	let's see ...	18:27	Copy link Message link Add to gist Remove
leont	Or at least it's producing CESU-8, which is an eldrich horror	18:28	Copy link Message link Add to gist Remove
	(UTF-8, but with surrogate pairs…)	18:29	Copy link Message link Add to gist Remove
hoelzro	wtf		Copy link Message link Add to gist Remove
timotimo	so just like json?	18:30	Copy link Message link Add to gist Remove
leont	Almost		Copy link Message link Add to gist Remove
	AFAIK JSON is Modified UTF-8, which is the same except that a null character is encoded as 0xC0,0x80…	18:31	Copy link Message link Add to gist Remove
	Which is a Java thing	18:32	Copy link Message link Add to gist Remove
	Don't see it mentioned in the JSON RFC, I may be mistaken there	18:33	Copy link Message link Add to gist Remove
18:39 colomon joined
jnthn	hoelzro: I think we should probably have UTF-16 write a BOM and mean native, and add UTF-16-LE and UTF-16-BE	18:41	Copy link Message link Add to gist Remove
	Which can re-use the same code near enough	18:42	Copy link Message link Add to gist Remove
	And just twiddle the endianness on the way out		Copy link Message link Add to gist Remove
	Or in		Copy link Message link Add to gist Remove
leont	"twiddle the endianness on the way out"	18:43	Copy link Message link Add to gist Remove
arnsholt	leont: What on Earth is the rationale for something like CESU-8? If you're restricted to bytes, wouldn't UTF-8 be simpler?		Copy link Message link Add to gist Remove
leont	?		Copy link Message link Add to gist Remove
	arnsholt: it's cheaper to convert UTF-16 to CESU-8 than to UTF-8, I guess		Copy link Message link Add to gist Remove
jnthn	leont: As in, after grabbing codepoints, doing the surrogate pair split, and so forth		Copy link Message link Add to gist Remove
arnsholt	True, I guess		Copy link Message link Add to gist Remove
hoelzro	jnthn: should I make a ticket for that?		Copy link Message link Add to gist Remove
jnthn	Heck, can even pass in a function pointer		Copy link Message link Add to gist Remove
	hoelzro: Yeah, can do	18:44	Copy link Message link Add to gist Remove
leont	endianness and surrogates have a clear order in my head		Copy link Message link Add to gist Remove
hoelzro	rt.perl.org/Ticket/Display.html?id=126704	18:45	Copy link Message link Add to gist Remove
leont	(possibly I'm misunderstanding what you just said and we're in agreement)		Copy link Message link Add to gist Remove
jnthn	leont: You write the surrogates in a different order too?		Copy link Message link Add to gist Remove
	I thought you just wrote the 16-bit values in a different order...		Copy link Message link Add to gist Remove
hoelzro	rt.perl.org/Ticket/Display.html?id=126705		Copy link Message link Add to gist Remove
leont	No, I don't think so		Copy link Message link Add to gist Remove
	We're probably talking past each other, just ignore what I said :-	18:46	Copy link Message link Add to gist Remove
	)		Copy link Message link Add to gist Remove
hoelzro	jnthn: re: a BOM, though; I would think that would be the responsibility of a higher layer? ex. what if a protocol always uses UTF-16BE; does it make sense to throw a BOM on?		Copy link Message link Add to gist Remove
leont	Depends on the protocol	18:47	Copy link Message link Add to gist Remove
jnthn	leont: en.wikipedia.org/wiki/UTF-16#U.2BD...o_U.2BDFFF seem to agree with what I mean... :)		Copy link Message link Add to gist Remove
hoelzro	leont: right, so why force the BOM if the programmer doesn't need it?		Copy link Message link Add to gist Remove
leont	jnthn: indeed that's the obvious thing	18:49	Copy link Message link Add to gist Remove
jnthn	bah	18:51	Copy link Message link Add to gist Remove
	"If the BOM is missing, RFC 2781 says that big-endian encoding should be assumed. (In practice, due to Windows using little-endian order by default, many applications similarly assume little-endian encoding by default.)"		Copy link Message link Add to gist Remove
	Standards... :/		Copy link Message link Add to gist Remove
leont	Little-Endian is a bit silly, but given that's how all architectures work nowadays (even ARM switched) it seems a fait accompli	18:52	Copy link Message link Add to gist Remove
ilmari	however, because the first character of JSON must be < 127, you can tell by the pattern of nulls		Copy link Message link Add to gist Remove
	RFC 7159 says «Implementations MUST NOT add a byte order mark to the beginning of a	18:54	Copy link Message link Add to gist Remove
	JSON text.»		Copy link Message link Add to gist Remove
leont	UTF-16 has all the disadvantages of UCS-2 with all the disadvantages of UTF8, and adds one of its own: it isn't binary sortable (even UTF-16BE) due to surrogate pairs. It's a mess really.	18:56	Copy link Message link Add to gist Remove
ilmari	00 00 00 xx: UTF32-BE, xx 00 00 00: UTF-32LE, 00 xx: UTF-16BE, xx 00: UTF-16LE, xx: UTF-8	18:58	Copy link Message link Add to gist Remove
hoelzro	9^/win3	19:07	Copy link Message link Add to gist Remove
	oops		Copy link Message link Add to gist Remove
19:29 tokuhiro_ joined 19:36 kjs_ joined 19:44 domidumont joined 19:45 vendethiel- joined 19:57 kjs_ joined 20:07 kjs_ joined 20:22 tokuhiro_ joined 20:27 lizmat joined 21:34 vendethiel joined
diakopter	here's a CORE.setting compilation profile output using XCode Instruments: imgur.com/5LJOuf7	21:43	Copy link Message link Add to gist Remove
	in case anyone wants to find some low-hanging fruitzies	21:45	Copy link Message link Add to gist Remove
	that's at a 40-microsecond sample rate	21:48	Copy link Message link Add to gist Remove
	and sorted by Self (ms) if you're interested: i.imgur.com/uW1KBZ6.png	21:54	Copy link Message link Add to gist Remove
jnthn	Nice	21:58	Copy link Message link Add to gist Remove
	jnthn drops them in browser tabs for when he's not tired :)		Copy link Message link Add to gist Remove
	Rest time for now... o/		Copy link Message link Add to gist Remove
diakopter	o/	21:59	Copy link Message link Add to gist Remove
22:02 Ven joined 22:24 tokuhiro_ joined 22:31 Ven_ joined
diakopter	in core setting compilation, MVM_sc_find_object_idx hits its cache 127855 times, but misses the cache 667817 times. Each time it misses the cache, it does a linear search through possibly thousands of objects to find the match..	23:28	Copy link Message link Add to gist Remove
	667817 linear searches is not good		Copy link Message link Add to gist Remove
timotimo	that seems like a good catch	23:32	Copy link Message link Add to gist Remove
	i'm definitely looking forward to when moar's jit builds a /tmp/perf-PID.map	23:50	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!