#moarvm on 16 November 2017 - Raku Programming Language Log

MasterDuke	samcv: so compiling the core setting is now not slower with that PR?	00:42	Copy link Message link Add to gist Remove
01:00 lizmat joined 01:03 MasterDuke_ joined 01:28 leedo__ joined, avar joined, moritz joined 01:31 yoleaux joined
samcv	yep	01:45	Copy link Message link Add to gist Remove
	i fixed it. it was not returning the same string when it was already flat but i remedied that	01:46	Copy link Message link Add to gist Remove
MasterDuke_	ah, cool	01:48	Copy link Message link Add to gist Remove
02:01 lizmat joined
Geth	MoarVM: de6b0e4b13 \| (Samantha McVey)++ \| src/strings/ops.c collapse_strands with memcpy if all strands are same type 4x faster If all the strands to collapse are of the same type (ASCII, 8bit, or 32bit) then use memcpy to collapse the strands. If they are not all the same type then we use the traditional grapheme iterator based collapsing that we previously used to collapse strands. If it's 8bit and a repetition with only one grapheme, it will use memset to more quickly write the memory. This is 4-4.5x faster as long as all the strands are of the same type.	02:02	Copy link Message link Add to gist Remove
	MoarVM: e876f1484e \| (Samantha McVey)++ (committed using GitHub Web editor) \| src/strings/ops.c Merge pull request #753 from samcv/collapse_better collapse_strands with memcpy if all strands are same type 4x faster		Copy link Message link Add to gist Remove
MasterDuke	samcv: interesting. i just tested this one-liner: `my $a = "a" x 1_000_000; for ^1000 {$a ~~ /./;}; say now - INIT now`	02:34	Copy link Message link Add to gist Remove
	4.3s before your PR, 93% of the time spent in iterate_gi_into_string	02:35	Copy link Message link Add to gist Remove
	5.9s after the PR, 43% in collapse_strands, 33% in __memmove_sse2_unaligned_erms, 10.6% in [email@hidden.address] 5.4% in memcpy@plt	02:36	Copy link Message link Add to gist Remove
02:56 ilbot3 joined 03:17 colomon joined
samcv	MasterDuke: well it's 2x faster if it is more than one character repeated	04:18	Copy link Message link Add to gist Remove
	"ab" x 1_000_000		Copy link Message link Add to gist Remove
	well about 1.5x faster with the new code		Copy link Message link Add to gist Remove
	interesting it takes longer afterward though		Copy link Message link Add to gist Remove
	well that "a" is a 32 bit string	04:21	Copy link Message link Add to gist Remove
	so it doesn't end up doing memset on it		Copy link Message link Add to gist Remove
06:28 domidumont joined 06:35 domidumont joined 06:40 brrt joined
japhb	samcv: Why is it a 32-bit string?	06:47	Copy link Message link Add to gist Remove
samcv	japhb: probably because it was a substring of the whole document	07:09	Copy link Message link Add to gist Remove
	is my best guess		Copy link Message link Add to gist Remove
brrt	good * #moarvm	07:18	Copy link Message link Add to gist Remove
	also, good * japhb, samcv		Copy link Message link Add to gist Remove
	jnthn: bisecting the jit issue now	07:19	Copy link Message link Add to gist Remove
07:40 lizmat joined 07:48 brrt joined 08:17 zakharyas joined 08:21 domidumont joined 09:22 zakharyas joined 09:39 brrt joined
brrt	hmm, damnit, it's multithreaded?	09:39	Copy link Message link Add to gist Remove
	oh, it is multiprocess	09:42	Copy link Message link Add to gist Remove
jnthn	brrt++	10:13	Copy link Message link Add to gist Remove
	Yes, 'fraid so, it shows up in something using a Channel		Copy link Message link Add to gist Remove
	You may or may not have luck producing a golf		Copy link Message link Add to gist Remove
brrt	hmmmm	10:14	Copy link Message link Add to gist Remove
	always when using a channel?		Copy link Message link Add to gist Remove
jnthn	Well, the place things go wrong is (try $channel.receive) // buf8	10:21	Copy link Message link Add to gist Remove
	The code in the try there is a thunk, and receive is a method call		Copy link Message link Add to gist Remove
	receive is inlined into the thunk, and the thunk is inlined into the code with the try and //	10:22	Copy link Message link Add to gist Remove
	And the try then fails to catch the exception		Copy link Message link Add to gist Remove
	It may be that you can set up something very similar with a single-threaded program		Copy link Message link Add to gist Remove
	Just my $channel = Channel.new; $channel.close;		Copy link Message link Add to gist Remove
	And then trying to receive will always throw		Copy link Message link Add to gist Remove
samcv	the peak memory usage during core compilation is 1.3G with or without my recent change. though total allocations is down from 13.95Gb to 13.74Gb	10:37	Copy link Message link Add to gist Remove
	i wish it gave me more detailed info on peak memory usage though		Copy link Message link Add to gist Remove
11:04 domidumont joined
timotimo	jnthn: we need some way to spurt/write bufs bigger than int8 or uint8 into files, otherwise our utf16 encoding is almost completely useless	11:32	Copy link Message link Add to gist Remove
jnthn	timotimo: It'll just need some tweaks to the stuff behind write_fhb to support things other than 1-byte VMArrays	11:40	Copy link Message link Add to gist Remove
	(So, nothing more than an NYI)	11:41	Copy link Message link Add to gist Remove
timotimo	will we accidentally impose an endianness if we just split the 16 into 8 naively?	11:43	Copy link Message link Add to gist Remove
	or is that why there's UTF16LE and UTF16BE encodings?		Copy link Message link Add to gist Remove
jnthn	By this point we're already past encodings		Copy link Message link Add to gist Remove
	But yeah, we'll impose native endian		Copy link Message link Add to gist Remove
	Hm		Copy link Message link Add to gist Remove
	Maybe our utf16 encoding should spit out a buf8 too, then we don't have this issue.	11:44	Copy link Message link Add to gist Remove
	Or it could always spit out the correct BE/LE BOM at the start for the current platform		Copy link Message link Add to gist Remove
timotimo	if the utf16 encoder spits out anything, it'd have to be the same value regardless of platform endianness, because depending on how it gets turned into 8 bit pieces by the write_fhb instruction it'll end up being the correct bom	12:33	Copy link Message link Add to gist Remove
	... or something?		Copy link Message link Add to gist Remove
ilmari	encoders should output bytes. full stop.	12:34	Copy link Message link Add to gist Remove
	the endianness is an intergral part of the encoding	12:35	Copy link Message link Add to gist Remove
	lower layers should not have to know about this. I/O is streams of bytes		Copy link Message link Add to gist Remove
timotimo	hum. the utf16 encoder in moar already just gives you a char *, i wonder where it gets turned into 16 bit pieces	12:37	Copy link Message link Add to gist Remove
	oh, that just happens if you pass a 16-bit-per-entry VMArray to the decode call	12:38	Copy link Message link Add to gist Remove
	so we'd have to either turn the utf16 type into a buffer of 8bit ints or do something different there	12:42	Copy link Message link Add to gist Remove
	same with utf32, of course		Copy link Message link Add to gist Remove
brrt	hmm, i'll try it out at least	13:00	Copy link Message link Add to gist Remove
	fwiw, i can try to 'beat' some information out of a single run as well, but it's just not as happy as a bisect	13:01	Copy link Message link Add to gist Remove
jnthn	timotimo: We should do what ilmari is suggesting, and always have a buf8, I think	13:28	Copy link Message link Add to gist Remove
13:39 markmont joined
nwc10	imlari is suggesting a buffet‽ Om nom nom	13:42	Copy link Message link Add to gist Remove
	oops, that won't highlight	13:46	Copy link Message link Add to gist Remove
	ilmari: ^^		Copy link Message link Add to gist Remove
14:08 zakharyas joined 14:24 zakharyas joined 15:13 AlexDaniel joined 15:26 zakharyas joined 15:56 zakharyas joined 16:08 zakharyas joined 16:14 releasable6 joined 16:27 brrt joined
brrt	yay, i golfed it	16:42	Copy link Message link Add to gist Remove
	jnthn++		Copy link Message link Add to gist Remove
	your advice worked		Copy link Message link Add to gist Remove
jnthn	yay :)	16:43	Copy link Message link Add to gist Remove
japhb	jnthn: I've been reading the current Cro docs and going through the examples. I'm really impressed. My stint in the world of web dev seems absolutely ancient in comparison.	16:44	Copy link Message link Add to gist Remove
brrt	gist.github.com/bdw/13cb662504b3f4...acc63c56c6	16:45	Copy link Message link Add to gist Remove
jnthn	I bet you can pull the first two lines out of the loop and still get it?	16:46	Copy link Message link Add to gist Remove
	(might make the generated code you need to debug smaller)		Copy link Message link Add to gist Remove
japhb	jnthn: Is there a FreeNode channel for Cro yet?	16:47	Copy link Message link Add to gist Remove
brrt	hmm, i can try		Copy link Message link Add to gist Remove
jnthn	japhb: Nice to hear. :)		Copy link Message link Add to gist Remove
	japhb: Not yet, though maybe it's time... :)		Copy link Message link Add to gist Remove
brrt	yep, you are correct		Copy link Message link Add to gist Remove
japhb	(I don't see it in the results from alis, but alis seems to miss some already.)		Copy link Message link Add to gist Remove
	jnthn: Please! :-)		Copy link Message link Add to gist Remove
brrt	aye!		Copy link Message link Add to gist Remove
	jnthn wonders if #cro is taken or not	16:48	Copy link Message link Add to gist Remove
brrt	heh, thats a delightfully fast bisect now	16:50	Copy link Message link Add to gist Remove
japhb	jnthn: Looks like it's free, I just joined and am the only person		Copy link Message link Add to gist Remove
brrt	and there is a guard control inserted into the tree… let's see if it is compiled differently in any way	16:55	Copy link Message link Add to gist Remove
16:57 zakharyas1 joined 17:04 zakharyas joined 18:04 domidumont joined 18:12 zakharyas joined 19:09 evalable6 joined 19:26 robertle joined
nine	I'm now reasonably sure that the remaining issue is about multi-level un-inlines but only in deopt-one cases, not for deopt-all	19:52	Copy link Message link Add to gist Remove
timotimo	.o( you are crorect )	20:06	Copy link Message link Add to gist Remove
jnthn	Oh goodness, deopt /o\		Copy link Message link Add to gist Remove
nine	It's not certain though, but the statistics point at this. I've seen lots of multi-level un-inlines that are harmless, but those were all deopt-all. The deopt-one cases appear in failing test files.	20:08	Copy link Message link Add to gist Remove
timotimo	fascinating		Copy link Message link Add to gist Remove
nine	It also fits the incredible rarity of the failures. rakudo builds fine, make test passes (with blocking and nodelay) and most spec test files pass.		Copy link Message link Add to gist Remove
	Intriguingly, I could golf one of the failures down to: MVM_SPESH_BLOCKING=1 MVM_SPESH_NODELAY=1 perl6 -e '1; { my $a; }; { my Int $a; }'	20:09	Copy link Message link Add to gist Remove
timotimo	if only we had a simple way/tool to run a frame once from the beginning until it deopts and the next time the unoptimized version until it hits the point it deopted into		Copy link Message link Add to gist Remove
nine	Resuling in "No int multidim positional reference type registered for current HLL"		Copy link Message link Add to gist Remove
timotimo	so we could compare register content and all that		Copy link Message link Add to gist Remove
	could that be from version skew in rakudo's .c parts and moarvm's parts?	20:10	Copy link Message link Add to gist Remove
nine	ll-exception backtrace shows the failure coming from a frame that's involved with the Multi-level un-inline	20:12	Copy link Message link Add to gist Remove
	And the error goes away as soon as I leave the pointless goto entering the nested inline in	20:13	Copy link Message link Add to gist Remove
	I.e. this case: github.com/MoarVM/MoarVM/blob/inli...ze.c#L2369	20:14	Copy link Message link Add to gist Remove
	Another interesting point: if I don't delete the goto op but just turn it into a no_op, the error disappears.	20:18	Copy link Message link Add to gist Remove
timotimo	so another case where we rely on a goto existing to know about the structure of things?	20:21	Copy link Message link Add to gist Remove
nine	In this case it looks like it doesn't have to be a goto which is consistent with me being unable to find a reliance on a goto in deopt.c.	20:23	Copy link Message link Add to gist Remove
	Looks more like it stumbles over the removal of an instruction, making me think more about some offset becoming incorrect.	20:24	Copy link Message link Add to gist Remove
	Sooooo....when deopting an inline, wouldn't it look for the instruction calling the inlined frame? And in a nested inline, wouldn't that instruction be that goto op that eliminate_pointless_gotos tries to remove?	20:27	Copy link Message link Add to gist Remove
	From what I see, uninline does not look for some annotation. It relies on the inlines table to get its information. But that table is not updated by eliminate_pointless_gotos	20:29	Copy link Message link Add to gist Remove
timotimo	hm, but we only ever compute offsets at code-gen time, or at least we should	20:33	Copy link Message link Add to gist Remove
nine	Offset or this mysterious deopt_idx that I haven't really found out yet what it means	20:34	Copy link Message link Add to gist Remove
20:42 lizmat joined
jnthn	deopt_idx is just an index into the deopt table	20:55	Copy link Message link Add to gist Remove
	Which contains mappings to locations in the original, interpreted, bytecdoe	20:56	Copy link Message link Add to gist Remove
nine	And those mappings are created during codegen?	20:57	Copy link Message link Add to gist Remove
jnthn	The original locations are written in graph.c, iirc		Copy link Message link Add to gist Remove
	github.com/MoarVM/MoarVM/blob/mast...raph.c#L37	20:58	Copy link Message link Add to gist Remove
nine	That's this I guess: g->deopt_addrs[2 * g->num_deopt_addrs] = deopt_target;		Copy link Message link Add to gist Remove
jnthn	And yes, code-gen fills the rest in: github.com/MoarVM/MoarVM/blob/mast...raph.c#L37	20:59	Copy link Message link Add to gist Remove
nine	And deopt_target is the unoptimized code I guess.		Copy link Message link Add to gist Remove
jnthn	Right, it's a table of pairs		Copy link Message link Add to gist Remove
	Yes, github.com/MoarVM/MoarVM/blob/mast...aph.c#L360 for example		Copy link Message link Add to gist Remove
	Just passes pc - g->bytecode		Copy link Message link Add to gist Remove
	Which is a relative offset from the start of the unoptimized bytecode	21:00	Copy link Message link Add to gist Remove
nine	So that value is certainly still correct regardless of what we do to the optimized bytecode.	21:01	Copy link Message link Add to gist Remove
	And the deopt_offset is only generated at code gen, i.e. after our optimizations. So they ought to be correct, too.		Copy link Message link Add to gist Remove
	jnthn tries to remember how this thing works	21:05	Copy link Message link Add to gist Remove
	Ah, right, github.com/MoarVM/MoarVM/blob/b9a0...ine.c#L163 is used to identify the location that we return to when doing a multi-level inline	21:06	Copy link Message link Add to gist Remove
nine	Oooooooh	21:08	Copy link Message link Add to gist Remove
	/* -1 all the deopt targets, so we'll easily catch those that don't get		Copy link Message link Add to gist Remove
	* mapped if we try to use them. Same for inlines. */		Copy link Message link Add to gist Remove
	But unline inlines, there is no code for actually checking those deopt targets.		Copy link Message link Add to gist Remove
	When I add that I get MoarVM oops: Spesh: failed to fix up deopt_addr 1		Copy link Message link Add to gist Remove
	But I get that even if the program would actually work...hm...	21:09	Copy link Message link Add to gist Remove
jnthn	Hm, and also it stores and uses the deopt index, so the location in the optimized bytecdoe isn't important for this.	21:10	Copy link Message link Add to gist Remove
	Is it sensitive to JIT, btw?		Copy link Message link Add to gist Remove
nine	no	21:11	Copy link Message link Add to gist Remove
jnthn	Hmm.	21:13	Copy link Message link Add to gist Remove
	jnthn doesn't have any more guesses, alas		Copy link Message link Add to gist Remove
	But hopefully those pointers helped a little	21:14	Copy link Message link Add to gist Remove
nine	jnthn: does this look odd to you? gist.github.com/niner/5626227d1397...c4bcb0b5e3	21:23	Copy link Message link Add to gist Remove
jnthn	Hmm, where's that "uninline expecting a goto" coming from?	21:24	Copy link Message link Add to gist Remove
nine	just an additional fprintf I added		Copy link Message link Add to gist Remove
	The working version does lots of deopts in frame 'MATCH' (cuid '139'), but none in postcircumfix:<{ }>	21:25	Copy link Message link Add to gist Remove
jnthn	6632 -> 144 is kinda interesting too	21:26	Copy link Message link Add to gist Remove
	Oh, though I guess if we're in an inline the top index is the top frame which would be an inlinee		Copy link Message link Add to gist Remove
nine	It's 6636 -> 144 in the working version	21:27	Copy link Message link Add to gist Remove
jnthn	If you're seeing totally different deopts and you're using MVM_SPESH_BLOCKING, though...		Copy link Message link Add to gist Remove
	Then something's odd		Copy link Message link Add to gist Remove
nine	Some output from the working version: gist.github.com/niner/35accfac21f2...f702a35810	21:28	Copy link Message link Add to gist Remove
	The difference between the version should really just be the removal of the no_op	21:29	Copy link Message link Add to gist Remove
jnthn	Right		Copy link Message link Add to gist Remove
	It's odd it'd cause different deops		Copy link Message link Add to gist Remove
	*deopts		Copy link Message link Add to gist Remove
nine	I guess this riddle needs at least one more night of sleep. Thanks for the help so far :)	21:42	Copy link Message link Add to gist Remove
22:08 markmont joined 22:42 zakharyas joined 22:53 MasterDuke joined
MasterDuke	samcv: what benchmark were you testing with. the one i've tried `my $a = "a" x 1_000_000; for ^1000 {$a ~~ /./;}; say now - INIT now`, is faster before your recent change, whether it's "a", "ab", or "abcd"	23:54	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!