#moarvm on 1 December 2020 - Raku Programming Language Log

github.com/moarvm/moarvm \| IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018.
00:04 elcaro left 00:05 elcaro joined
timotimo	m: say 4605054973 * 100 / 4594950376	00:11	Copy link Message link Add to gist Remove Run code
camelia	100.21990655335		Copy link Message link Add to gist Remove
timotimo	m: say 100 * (4605054973 R/ 4594950376)		Copy link Message link Add to gist Remove Run code
camelia	99.78057597446		Copy link Message link Add to gist Remove
01:30 leont left, leont joined 03:04 leont left 04:04 linkable6 left, evalable6 left 04:06 evalable6 joined 04:07 linkable6 joined
nwc10	jnthn: oops, yes, was supposed to be marked ready for review. It is now	06:35	Copy link Message link Add to gist Remove
	MasterDuke: nice. Particularlly the LLi miss improvement		Copy link Message link Add to gist Remove
	that's actually bigger than my guess. If we can find more things like that, it will all add up.	06:36	Copy link Message link Add to gist Remove
08:07 Altai-man joined 08:12 sena_kun joined 08:14 Altai-man left
MasterDuke	yeah, not sure how noticeable it was in time, but lots of little optimizations add up	08:56	Copy link Message link Add to gist Remove
09:03 zakharyas joined 09:05 Altai-man joined 09:07 sena_kun left 09:09 domidumont joined 10:35 frost-lab joined
timotimo	if we could cheaply sort worklists by repr ...	10:54	Copy link Message link Add to gist Remove
jnthn	The number of reprs is fixed, so you could have per-repr worklists...but why? :)	10:58	Copy link Message link Add to gist Remove
10:58 Kaiepi left, Kaiepi joined
timotimo	keep the instruction cache hot by running the same repr's gc_mark over and over	10:59	Copy link Message link Add to gist Remove
jnthn	Wonder if that's a measurable improvement...	11:00	Copy link Message link Add to gist Remove
timotimo	in cachegrind, probably, in wallclock, probably not		Copy link Message link Add to gist Remove
nine	I fear the only way to know is to try		Copy link Message link Add to gist Remove
jnthn	I guess the downside is it would scatter object graphs more		Copy link Message link Add to gist Remove
timotimo	ah, since new objects are allocated in the nursery again	11:01	Copy link Message link Add to gist Remove
jnthn	Just that we move objects as we encounter them in the list, and so objects that reference each other are sometimes adjacent in the worklist and so end up copied into the other semispace or into gen2 one after the other, although it's less true of gen2 given the free list	11:02	Copy link Message link Add to gist Remove
	So you might lose some memory cache hits		Copy link Message link Add to gist Remove
timotimo	it's also less true of gen2 given that different reprs are likely to have slightly different sizes	11:03	Copy link Message link Add to gist Remove
jnthn	But I've no idea how these two effects would play off each other		Copy link Message link Add to gist Remove
	Or if either is even going to be significant		Copy link Message link Add to gist Remove
timotimo	yeah		Copy link Message link Add to gist Remove
nwc10	it sounds like quite a bit of work (or am I wrong on that part?) and complexity (or wrong?) for potentially marginal gain.	11:04	Copy link Message link Add to gist Remove
timotimo	yeah		Copy link Message link Add to gist Remove
	i wouldn't mind a factor of 2 speedup for the gc, but this is not how we get that	11:05	Copy link Message link Add to gist Remove
nwc10	usually that sort of speed up is "better algorithm" but that's never easy, even if it's possible.	11:06	Copy link Message link Add to gist Remove
timotimo	well, there is a whole research field for "better algorithm" in GC		Copy link Message link Add to gist Remove
	but many of those better algorithms are not trivial to adopt to a whole system	11:07	Copy link Message link Add to gist Remove
lizmat	there's even a book about it: www.bookdepository.com/The-Garbage...gKUB_D_BwE		Copy link Message link Add to gist Remove
	oops		Copy link Message link Add to gist Remove
timotimo	like when you have to add not only write barriers but also read barriers to all your gc-object-using C code when you change to a concurrent GC		Copy link Message link Add to gist Remove
	was concurrent the word for when the gc runs while mutators also run?		Copy link Message link Add to gist Remove
lizmat	www.bookdepository.com/The-Garbage...1420082791 # better link	11:08	Copy link Message link Add to gist Remove
timotimo	1kg of book for almost a hundred bucks		Copy link Message link Add to gist Remove
nwc10	jnthn pointed me at this a few weeks ago: sqlite.1065341.n5.nabble.com/50-fas...78082.html -- ... is 50% faster than the 3.7.17 release	11:09	Copy link Message link Add to gist Remove
	from 16 months ago. That is to say, it does 50% more work using the same		Copy link Message link Add to gist Remove
jnthn	I read that book (well, most of it) before working on the MoarVM GC :)		Copy link Message link Add to gist Remove
nwc10	number of CPU cycles.		Copy link Message link Add to gist Remove
	a lot of small wins can add up.	11:10	Copy link Message link Add to gist Remove
timotimo	right		Copy link Message link Add to gist Remove
nwc10	as to read barriers, my fear would be "and then the code is even more complex, and fewer people understand it, and more time is lost to bugs than was gained from speedup"		Copy link Message link Add to gist Remove
timotimo	yes, absolutely		Copy link Message link Add to gist Remove
	getting more and more code ported from C to nqp or similar would be a way to get this smoothed out	11:11	Copy link Message link Add to gist Remove
	that's also not easy, either		Copy link Message link Add to gist Remove
lizmat	fwiw, I'm considering moving the shaped array code to Raku land	11:15	Copy link Message link Add to gist Remove
	to get more flexibility		Copy link Message link Add to gist Remove
	the code basically predates Christmas and has been untouched basically since then		Copy link Message link Add to gist Remove
	and we have now better HLL optimizing	11:16	Copy link Message link Add to gist Remove
	and it would make it easier to port shaped arrays to new VM's		Copy link Message link Add to gist Remove
	(think .NET :-)		Copy link Message link Add to gist Remove
MasterDuke	are you talking about code that's currently in moarvm or nqp>		Copy link Message link Add to gist Remove
	?		Copy link Message link Add to gist Remove
jnthn	Except that the CLR and JVM both natively provide shaped arrays too	11:17	Copy link Message link Add to gist Remove
lizmat	yes		Copy link Message link Add to gist Remove
jnthn	And they are more efficient there than resizable ones		Copy link Message link Add to gist Remove
	Heck, at the VM level that's probably true in MoarVM also; it doesn't have to do any resize check logic		Copy link Message link Add to gist Remove
lizmat	fact is, that shaped arrays are still at least 5x as slow as unshaped arrays	11:18	Copy link Message link Add to gist Remove
	in Raku land :-(		Copy link Message link Add to gist Remove
jnthn	Yes, but that appears to be related to type checking and method resolution issues.		Copy link Message link Add to gist Remove
lizmat	well, yes		Copy link Message link Add to gist Remove
	but that doesn't matter to people wanting to use it :-)		Copy link Message link Add to gist Remove
jnthn	Anyway, big -12	11:19	Copy link Message link Add to gist Remove
	uh, -1		Copy link Message link Add to gist Remove
	Unless we find we somehow can't fix the type/method issues	11:20	Copy link Message link Add to gist Remove
lizmat	well, it's my intent to document all of the related nqp ops first		Copy link Message link Add to gist Remove
	and grok how that part of Rakudo actually works		Copy link Message link Add to gist Remove
	my prototype atm is about 5x as fast the current shaped array performance	11:21	Copy link Message link Add to gist Remove
	and might well get merged with the current backend implementation if we can fix the type checking / resolution issues	11:22	Copy link Message link Add to gist Remove
jnthn	Well, does the new thing use the multidim repr? That's the key part to the VM having a clue what to do with it	11:25	Copy link Message link Add to gist Remove
MasterDuke	running `my $a; for 1..5 -> $x { for 1..5 -> $y { $a = $x gcd $y } }; say now - INIT now; say $a` with MVM_SPESH_DISABLE=1, why in the world would end up in this branch three times? github.com/MoarVM/MoarVM/blob/mast...ops.c#L453		Copy link Message link Add to gist Remove
jnthn	(And getting the compact memory layout)		Copy link Message link Add to gist Remove
	MasterDuke: Maybe the numbers produced by `now` are big enough?	11:27	Copy link Message link Add to gist Remove
	m: say now.WHAT		Copy link Message link Add to gist Remove Run code
camelia	(Instant)		Copy link Message link Add to gist Remove
jnthn	m: say now.^mro		Copy link Message link Add to gist Remove Run code
camelia	((Instant) (Cool) (Any) (Mu))		Copy link Message link Add to gist Remove
jnthn	I forget how Instant is represented though		Copy link Message link Add to gist Remove
MasterDuke	but i'm gcd'ing `$x` and `$y`, not `now`	11:28	Copy link Message link Add to gist Remove
jnthn	If now is involving rational arithmetic anywhere that uses gcd internally	11:29	Copy link Message link Add to gist Remove
lizmat	jnthn: it would use a single array, with a single index internally, so it would be compact		Copy link Message link Add to gist Remove
timotimo	can always bt and mvm_dump_backtrace		Copy link Message link Add to gist Remove
jnthn	Maybe breakpoint it and...what timo said		Copy link Message link Add to gist Remove
lizmat	jnthn: in any case, I'm exploring this in module space :-)	11:31	Copy link Message link Add to gist Remove
jnthn	ok	11:32	Copy link Message link Add to gist Remove
Geth	MoarVM/update-docs: ae5f7ad447 \| (Elizabeth Mattijsen)++ \| 8 files Update some docs to Raku era Unless they're specifically historically inclined.	11:54	Copy link Message link Add to gist Remove
	MoarVM: lizmat++ created pull request #1394: Update some docs to Raku era	11:55	Copy link Message link Add to gist Remove
MasterDuke	interesting. i ran Daniel Lemire's benchmark code of a bunch of different gcd implementations. there is a version that takes half the time as moarvm's implementation. but if i stick it in moarvm, my example gets 1s slower (1.7s -> 2.7s)	12:14	Copy link Message link Add to gist Remove
	m: my $a; for 1..5_000 -> int $x { for 1..5_000 -> int $y { $a = $x gcd $y } }; say now - INIT now; say $a		Copy link Message link Add to gist Remove Run code
camelia	3.8432893 5000		Copy link Message link Add to gist Remove
MasterDuke	hm. with spesh disabled current is still about 1s faster, but the absolute times have increased (10.4s -> 11.9s)	12:16	Copy link Message link Add to gist Remove
	oh wait, i might be reading his benchmark results backwards	12:19	Copy link Message link Add to gist Remove
12:22 zakharyas left
MasterDuke	would using __builtin_ctz() be a portability problem for moarvm?	12:50	Copy link Message link Add to gist Remove
	looks like it's not available in visual studio	12:58	Copy link Message link Add to gist Remove
	but it's only 1s faster when doing 100_000_000 gcds, this probably isn't worth it	13:04	Copy link Message link Add to gist Remove
lizmat	how is memory doing ?	13:05	Copy link Message link Add to gist Remove
	that could be another reason ?		Copy link Message link Add to gist Remove
	I mean gcd gets used a lot for Rats		Copy link Message link Add to gist Remove
MasterDuke	memory should be identical	13:06	Copy link Message link Add to gist Remove
13:07 sena_kun joined 13:09 Altai-man left
jnthn	MasterDuke: typical way with unportable things is a probe to see if it's available, and a fallback approach if not	13:27	Copy link Message link Add to gist Remove
MasterDuke	yeah, looks like there's a _BitScanReverse that can be used instead. but all told it doesn't seem worth the trouble right now	13:30	Copy link Message link Add to gist Remove
13:44 Geth left 13:45 Geth joined 14:03 lucasb joined 14:06 bartolin left, bartolin joined 14:09 zakharyas joined 14:25 leont joined 14:27 frost-lab left
lizmat	some unexpected timings: github.com/Raku/nqp/issues/685	15:33	Copy link Message link Add to gist Remove
jnthn	Can you try using $a?	15:38	Copy link Message link Add to gist Remove
	Or declaring it outside of the loop?		Copy link Message link Add to gist Remove
	(I suspect spesh will be dropping the atpos entirely)		Copy link Message link Add to gist Remove
lizmat	try using $a ?	15:39	Copy link Message link Add to gist Remove
jnthn	Yes, at the moment it's an unused variable, and atpos is a pure operation		Copy link Message link Add to gist Remove
lizmat	.670 vs .1458	15:40	Copy link Message link Add to gist Remove
	.670 vs 1.458		Copy link Message link Add to gist Remove
	so no change really, the slow one being a little faster ?	15:41	Copy link Message link Add to gist Remove
	not even that		Copy link Message link Add to gist Remove
jnthn	OK, was curious how much that would be part of it		Copy link Message link Add to gist Remove
	I suspect it's the extra allocations		Copy link Message link Add to gist Remove
lizmat	are there other side effects to nqp::shift?		Copy link Message link Add to gist Remove
jnthn	Well, the point of shift is to have an effect :)	15:42	Copy link Message link Add to gist Remove
lizmat	I found one case in nqp where an iterator is used to iterate over a list just for the number of elements in the list		Copy link Message link Add to gist Remove
	so not actually using the nqp::shift($iter) value		Copy link Message link Add to gist Remove
jnthn	Huh, in a place it could just use nqp::elems?		Copy link Message link Add to gist Remove
lizmat	yes		Copy link Message link Add to gist Remove
jnthn	oops		Copy link Message link Add to gist Remove
lizmat	vm/moar/QAST/QASTRegexCompilerMAST.nqp line 303	15:43	Copy link Message link Add to gist Remove
	if nqp::iterator / nqp::shift would be faster than manual indexing, a lot of Rakudo internals could also benefit from that, fwiw	15:47	Copy link Message link Add to gist Remove
	also: changing the nqp::list to a nqp::list_i, makes it worse	15:48	Copy link Message link Add to gist Remove
	aah... oops	15:49	Copy link Message link Add to gist Remove
jnthn	I wondered how much it could be GC overhead of allocating iterator objects, but it's not		Copy link Message link Add to gist Remove
lizmat	hmmm... looks like it does make things way worse	15:50	Copy link Message link Add to gist Remove
jnthn	So yeah, certainly room for improvement		Copy link Message link Add to gist Remove
lizmat	$ time nqp -e 'my $l := nqp::list_i(1,2,3,4,5,6,7,8,9,10); my int $j := 10000000; while $j-- { my $iter := nqp::iterator($l); nqp::while($iter, my int $a := nqp::shift($iter)) }' # 2.830		Copy link Message link Add to gist Remove
jnthn	Yes, because shift returns an object		Copy link Message link Add to gist Remove
	So it's doing a box/unbox every element		Copy link Message link Add to gist Remove
lizmat	ah...		Copy link Message link Add to gist Remove
jnthn	Anyway, given it's not GC overhead, then it's the shift/boolification that wants a look	15:51	Copy link Message link Add to gist Remove
lizmat	looks like...	15:52	Copy link Message link Add to gist Remove
	would be nice if an nqp::iterator / nqp::shift combo would be faster		Copy link Message link Add to gist Remove
jnthn	Probably can be	15:53	Copy link Message link Add to gist Remove
lizmat	should I tackle that one case where nqp::shift() is not being used ?		Copy link Message link Add to gist Remove
jnthn	Yeah, go for it		Copy link Message link Add to gist Remove
lizmat	will do		Copy link Message link Add to gist Remove
	github.com/Raku/nqp/commit/829f1d42f9	15:57	Copy link Message link Add to gist Remove
nine	lizmat: using nqp::shift_i instead of nqp::shift in your example is 60 % faster	15:59	Copy link Message link Add to gist Remove
lizmat	ahhh.... so feels like effectively, nqp::iterator($list) is about the same as nqp::clone($list) ?	16:00	Copy link Message link Add to gist Remove
jnthn	No, it just creates an object with an index and a pointer to the list	16:01	Copy link Message link Add to gist Remove
lizmat	fwiw: I was working on documenting nqp ops, and found nqp::iterator listed under list ops, rather than hash ops	16:03	Copy link Message link Add to gist Remove
jnthn	It's both, I guess		Copy link Message link Add to gist Remove
lizmat	yes, and then I remembered why I wasn't using it for lists		Copy link Message link Add to gist Remove
	because it was slower		Copy link Message link Add to gist Remove
	I hadn't realized how much slower	16:04	Copy link Message link Add to gist Remove
jnthn	But still being used enough to make it worth speeding up?	16:05	Copy link Message link Add to gist Remove
lizmat	afaik, nqp::iterator is not used in the core on lists because it was slower		Copy link Message link Add to gist Remove
	however, that means that a lot of the NQP code is doing manual indexing, which is more error prone from a maintenance point of view	16:06	Copy link Message link Add to gist Remove
	basically anything that runs over an IterationBuffer or a $!reified in Rakudo		Copy link Message link Add to gist Remove
nine	FWIW I don't see anything that's obviously slow about nqp::iterator	16:07	Copy link Message link Add to gist Remove
lizmat	and there's quite a lot of that		Copy link Message link Add to gist Remove
	but shouldn't we look at nqp::shift() ?		Copy link Message link Add to gist Remove
nine	Oh, now I do		Copy link Message link Add to gist Remove
	nqp::atpos can be devirtualized by the JIT. nqp::shift on an nqp::iterator however always goes through REPR(target)->pos_funcs.at_pos	16:08	Copy link Message link Add to gist Remove
	So the shift on the iterator is devirtualized, but not the following at_pos		Copy link Message link Add to gist Remove
jnthn	That plus the integer addition and comparison are JIT straight into assembly code, whereas the boolification is maybe still a C function call	16:09	Copy link Message link Add to gist Remove
nine	So keeping track of the iteration position and using nqp::atpos in HLL is actually a perfect example of how using smaller, less powerful operations leads to better optimization opportunities for the VM		Copy link Message link Add to gist Remove
jnthn	That plus another example of how things change when you have a JIT rather than interpret everything	16:10	Copy link Message link Add to gist Remove
lizmat	so, would that be easily fixable?	16:11	Copy link Message link Add to gist Remove
nine	And spesh which takes more credit for devirtualization		Copy link Message link Add to gist Remove
lizmat	or shall we deprecate support for nqp::iterator(List)		Copy link Message link Add to gist Remove
nine	It's certainly possible to extend spesh and the JIT to do deep devirtualization and get rid of the boolification slow down. But then just not using nqp::iterator would get us to the same place much easier.	16:13	Copy link Message link Add to gist Remove
lizmat	yeah, feels like a lot of work to get to a point we already are in most cases	16:14	Copy link Message link Add to gist Remove
	otoh, those where the explicit cases of using nqp::iterator		Copy link Message link Add to gist Remove
	when we say "for @array { }" would that not codegen to a nqp::iterator thing?		Copy link Message link Add to gist Remove
jnthn	In NQP, yes	16:15	Copy link Message link Add to gist Remove
lizmat	that happens a lot more in NQP		Copy link Message link Add to gist Remove
jnthn	I wonder how we can move that to an iterator object in NQP code		Copy link Message link Add to gist Remove
	Should be quite possible	16:16	Copy link Message link Add to gist Remove
	And then rely on inlining to make it cheaper		Copy link Message link Add to gist Remove
	Should probably check that it really does come out just as well		Copy link Message link Add to gist Remove
lizmat	not quite following what should be quite possible		Copy link Message link Add to gist Remove
nine	To implement an iterator object in pure NQP	16:17	Copy link Message link Add to gist Remove
jnthn	Replacing the use of nqp::iterator in NQP for array iterations		Copy link Message link Add to gist Remove
	So we can drop VM-levels support for nqp::iterator(List)	16:18	Copy link Message link Add to gist Remove
	*level		Copy link Message link Add to gist Remove
lizmat	class ListIterator { has $!list; has int $index = -1; method shift() ... } ?	16:19	Copy link Message link Add to gist Remove
nine	That + has $limit = nqp::elems($!list);	16:21	Copy link Message link Add to gist Remove
lizmat	I seem to recall that there is no point in doing that, as the nqp::elems() gets optimized pretty quickly		Copy link Message link Add to gist Remove
nine	It's necessary for keeping the same semantics though.	16:23	Copy link Message link Add to gist Remove
	Now if we even want to keep those semantics is another question		Copy link Message link Add to gist Remove
lizmat	huh? why would that be needed for keeing the same semantiics ?	16:24	Copy link Message link Add to gist Remove
	*keeping		Copy link Message link Add to gist Remove
MasterDuke	fwiw, it looks like there (at least) a couple cases of `nqp::iterator(@...)` in the rakudo core	16:25	Copy link Message link Add to gist Remove
lizmat	MasterDuke: there are?		Copy link Message link Add to gist Remove
	hmmm.		Copy link Message link Add to gist Remove
nine	It's a difference when the array gets changed during the loop (push or pop)		Copy link Message link Add to gist Remove
lizmat	aah... ok, and nqp::shift() doesn't follow that currently?		Copy link Message link Add to gist Remove
	then yes		Copy link Message link Add to gist Remove
MasterDuke	looks like 9 where it's explicitly an '@'-sigiled variable, and a couple more where it's probably a list even if not '@'	16:30	Copy link Message link Add to gist Remove
lizmat	yeah... looking at them now	16:32	Copy link Message link Add to gist Remove
17:00 rypervenche left 17:03 rypervenche joined 17:06 Altai-man joined 17:08 sena_kun left 17:35 patrickb joined 18:01 domidumont left 18:58 zakharyas left 20:49 zakharyas joined 21:07 sena_kun joined 21:09 Altai-man left 21:26 patrickb left
Geth	MoarVM: ae5f7ad447 \| (Elizabeth Mattijsen)++ \| 8 files Update some docs to Raku era Unless they're specifically historically inclined.	21:46	Copy link Message link Add to gist Remove
	MoarVM: a595d9ddc4 \| (Jonathan Worthington)++ (committed using GitHub Web editor) \| 8 files Merge pull request #1394 from MoarVM/update-docs Update some docs to Raku era		Copy link Message link Add to gist Remove
21:55 zakharyas left, sena_kun left 21:56 sena_kun joined 22:24 sena_kun left, sena_kun joined 22:35 sena_kun left

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!