#moarvm on 18 December 2021 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
00:08 reportable6 left 00:23 squashable6 joined 01:09 reportable6 joined 01:37 CaCode left 02:37 tellable6 left, greppable6 left, squashable6 left, bisectable6 left, committable6 left, bloatable6 left, evalable6 left, statisfiable6 left, releasable6 left, sourceable6 left, quotable6 left, benchable6 left, linkable6 left, shareable6 left, notable6 left, reportable6 left, coverable6 left, unicodable6 left, nativecallable6 left, nativecallable6 joined, linkable6 joined 02:38 greppable6 joined, bisectable6 joined 02:39 squashable6 joined, reportable6 joined, sourceable6 joined, coverable6 joined 02:40 tellable6 joined 03:24 vrurg joined 03:27 vrurg_ left 03:37 shareable6 joined 03:38 evalable6 joined 03:39 bloatable6 joined, notable6 joined, benchable6 joined 03:40 statisfiable6 joined, vrurg left, vrurg joined 04:39 unicodable6 joined, quotable6 joined 04:50 CaCode joined 05:40 committable6 joined 05:46 samebchase joined 06:07 reportable6 left 06:40 releasable6 joined, vrurg_ joined 06:42 vrurg left 08:25 CaCode left
Geth	MoarVM/master: 4 commits pushed by (Nicholas Clark)++, niner++ - For an empty hash, report that at most 0 buckets are in use - No need to call `MVM_str_hash_build` when deserializing an empty hash - Skip sorting the keys when serializing empty hashes - Merge pull request #1620 from MoarVM/deserialize-empty-hashes-robustly	08:56	Copy link Message link Add to gist Remove
09:10 reportable6 joined 09:35 CaCode joined 09:48 Colt left 09:49 Colt joined
nine	Looks like LibXML suffers from at least 2 bugs. One spesh-related (but not JIT) and one that appears even without spesh. And one test seems to get stuck currently because LibXML is trying to fetch something from the network but just hangs instead (or runs into a timeout, wasn't patient enough to check)	09:53	Copy link Message link Add to gist Remove
10:49 evalable6 left, linkable6 left 10:50 linkable6 joined 10:51 evalable6 joined
dogbert17	nine: still down in the signed/unsigned rabbit hole?	10:59	Copy link Message link Add to gist Remove
nine	That's on pause while I am debugging Blin issues	11:01	Copy link Message link Add to gist Remove
dogbert17	I stumbled upon a, what I believe is a nativecall related, bug yesterday	11:03	Copy link Message link Add to gist Remove
	github.com/MoarVM/MoarVM/issues/1621	11:05	Copy link Message link Add to gist Remove
nine	Oh, the non-spesh bug exposed by LibXML is about Proxy. Once we encounter a Proxy in the args list, we stop processing arguments and delegate to raku-native-dispatch-deproxy instead. This causes us to not install guards for the remaining arguments	11:10	Copy link Message link Add to gist Remove
dogbert17	so you found one of the bugs already, impressive	11:11	Copy link Message link Add to gist Remove
nine	Now I've even got a fix. Though it's kinda awful and I don't understand 100 % why it doesn't work the other way.	11:59	Copy link Message link Add to gist Remove
12:07 reportable6 left
MasterDuke	how do i turn on github.com/rakudo/rakudo/blob/mast...#L301-L305 when building rakudo? moarvm/nqp/rakudo were all built with `make MVM_TRACING=1 install`, but when i run with --tracing it's instead going to the default (even though the value would be triggering that case)	12:20	Copy link Message link Add to gist Remove
	also, when i just remove that #if, tracing still doesn't happen, though maybe that's because the MVM_interp_enable_tracing call is happening before the MVM_vm_create_instance call?	12:24	Copy link Message link Add to gist Remove
	well, moving MVM_interp_enable_tracing after the MVM_vm_create_instance doesn't change anything	12:27	Copy link Message link Add to gist Remove
14:03 CaCode left 14:08 reportable6 joined 14:47 [Coke] left 14:51 [Coke] joined 15:54 Nicholas left
nine	So, I've rewritten Proxy support in NativeCall so that instead of the clever but inefficient trick that would require that ugly fix, it works like multi dispatch and uses the ProxyReaderFactory. Works like a charm, except that now in a few cases the dispatch just does not get resumed	16:44	Copy link Message link Add to gist Remove
16:58 linkable6 left, evalable6 left
dogbert17	uh oh	17:00	Copy link Message link Add to gist Remove
MasterDuke	every problem in computer science can be solved by adding a layer of indirection, time to break out the ProxyReaderFactoryFactory!	17:09	Copy link Message link Add to gist Remove
nine	This isn't Java. The solution is obvioulsy a ProxyReaderFactoryProxy	17:39	Copy link Message link Add to gist Remove
lizmat	.oO( with maybe a touch of sudo ? )	17:50	Copy link Message link Add to gist Remove
17:58 Guest1254 joined 18:00 evalable6 joined 18:08 reportable6 left 18:09 reportable6 joined 18:59 linkable6 joined 19:36 Guest1254 left 19:47 Guest125 joined 19:48 Guest125 left
japhb	lizmat: sudo MakeMeAProxyReaderFactoryProxy ?	20:27	Copy link Message link Add to gist Remove
	sudo write-my-boilerplate	20:28	Copy link Message link Add to gist Remove
21:13 kjp left
timo	`no nonsense;`	21:20	Copy link Message link Add to gist Remove
MasterDuke	huh. using alloca where possible in the rakudo runner took 56,532,210 more instructions for 100 runs of `MVM_SPESH_BLOCKING=1 raku -e ''`	21:31	Copy link Message link Add to gist Remove
timo	oh? that is odd	21:34	Copy link Message link Add to gist Remove
MasterDuke	i do add conditionals for if the size is too big, but i still sort of expected it to be few instructions overall. going to check elapsed time with /usr/bin/time now	21:36	Copy link Message link Add to gist Remove
timo	huh		Copy link Message link Add to gist Remove
MasterDuke	well, /usr/bin/time isn't very precise, but for 100 runs it thinks using alloca is 0.19s slower (in total)	21:41	Copy link Message link Add to gist Remove
	the patch gist.github.com/MasterDuke17/497d9...9b29845492 if you're curious	21:42	Copy link Message link Add to gist Remove
	maybe some of those checks for the size can be assumed to always be ok and removed	21:44	Copy link Message link Add to gist Remove
21:47 kjp joined
timo	ok the first thing i see is file path lengths	21:51	Copy link Message link Add to gist Remove
	we should be able to get a proper value for "maximum file path length the system allows"		Copy link Message link Add to gist Remove
	so anything more than that would be an error later anyway		Copy link Message link Add to gist Remove
MasterDuke	"Linux has a maximum filename length of 255 characters for most filesystems (including EXT4), and a maximum path of 4096 characters" to quote the first hit on google	21:54	Copy link Message link Add to gist Remove
timo	we'll want to have this value for every system we support		Copy link Message link Add to gist Remove
	and of course never accidentally write past our allocated buffer when we are fed something longer		Copy link Message link Add to gist Remove
MasterDuke	looks like for bsd max path is 1024	21:55	Copy link Message link Add to gist Remove
	and macos	21:56	Copy link Message link Add to gist Remove
	oh, and windows is 260		Copy link Message link Add to gist Remove
timo	oh christ, 260?	21:57	Copy link Message link Add to gist Remove
	that's nothing?!		Copy link Message link Add to gist Remove
21:57 harrow left
MasterDuke	well, of course it's more complicated than that docs.microsoft.com/en-us/windows/w...limitation	21:57	Copy link Message link Add to gist Remove
	but that looks like the initial default	21:58	Copy link Message link Add to gist Remove
	i'll just remove all those checks and see if it's faster. if not, no reason to continue the experiment	22:00	Copy link Message link Add to gist Remove
timo	ah, good point	22:02	Copy link Message link Add to gist Remove
	to get an upper and lower bound		Copy link Message link Add to gist Remove
MasterDuke	huh. even with no checks, 6,889,088 more instructions for alloca and 0.04s more time according to /usr/bin/time	22:23	Copy link Message link Add to gist Remove
22:26 harrow joined
timo	that is very odd. what does alloca actually compile to, then?	22:39	Copy link Message link Add to gist Remove
	just the call into malloc is short, but then you'll also run through malloc, which does a bunch of stuff, whereas alloca should essentially be barely a change at all??	22:40	Copy link Message link Add to gist Remove
	Ir also goes by cache lines, right? could it have something to do, then, with little changes giving us whole-cacheline differences at once?	22:42	Copy link Message link Add to gist Remove
	even if the malloc implementation just stays in cache most of the time, it'd still be more code than many alloca usages combined?		Copy link Message link Add to gist Remove
	but 7 million is out of how much? 10,000 million?	22:43	Copy link Message link Add to gist Remove
MasterDuke	what do you mean, out of?	22:52	Copy link Message link Add to gist Remove
	95769046524 total instructions for 100 runs with alloca, 95762157436 total instructions on master	22:53	Copy link Message link Add to gist Remove
timo	m: say 95769046524 * 100 / 95762157436	22:54	Copy link Message link Add to gist Remove Run code
camelia	100.007193956553		Copy link Message link Add to gist Remove
timo	i wonder. will this be difficult to measure reliably until we've made everything else a lot faster already?	22:55	Copy link Message link Add to gist Remove
	perhaps the big difference is when we have a multithreaded program, but valgrind wouldn't tell us in that case	22:56	Copy link Message link Add to gist Remove
	also, can you try jemalloc or some other malloc implementation?	22:57	Copy link Message link Add to gist Remove
MasterDuke	well, i figure the empty program is the best possible case for this since this is just about trying to speedup startup	22:58	Copy link Message link Add to gist Remove
	i'll try with mimalloc. btw, did you see my comments about using it a little while ago?	22:59	Copy link Message link Add to gist Remove
timo	i did not	23:00	Copy link Message link Add to gist Remove
MasterDuke	colabti.org/irclogger/irclogger_lo...-10-16#l56	23:01	Copy link Message link Add to gist Remove
	heh. with mimalloc LD_PRELOADed, alloca is .17 faster than master (though guess now i have to try master with mimalloc)	23:03	Copy link Message link Add to gist Remove
timo	oh i see that i actually even replied to that		Copy link Message link Add to gist Remove
	.17 seconds?		Copy link Message link Add to gist Remove
MasterDuke	they just released 2.0, i think we should seriously consider bundling/usring it with moarvm	23:04	Copy link Message link Add to gist Remove
	well, sum of all the elapsed times with master is 1827, 1649 for alloca+mimalloc	23:05	Copy link Message link Add to gist Remove
timo	empty program still?		Copy link Message link Add to gist Remove
MasterDuke	yeah		Copy link Message link Add to gist Remove
timo	that means a regular run of empty program is 1.8 seconds?	23:06	Copy link Message link Add to gist Remove
MasterDuke	0.18		Copy link Message link Add to gist Remove
timo	you only used 100 for valgrind i guess?		Copy link Message link Add to gist Remove
MasterDuke	for both		Copy link Message link Add to gist Remove
timo	m: say 1827 / 100	23:07	Copy link Message link Add to gist Remove Run code
camelia	18.27		Copy link Message link Add to gist Remove
timo	i'm confused :) :)		Copy link Message link Add to gist Remove
MasterDuke	divide by 100 again	23:08	Copy link Message link Add to gist Remove
	/usr/bin/time reports 0.18, i log 18 to a file (and do that 100 times)	23:09	Copy link Message link Add to gist Remove
	and then just sum all the lines in the file	23:14	Copy link Message link Add to gist Remove
	hm, wonder if using hyperfine would be more rigorous	23:16	Copy link Message link Add to gist Remove
23:19 discord-raku-bot left
MasterDuke	master+mimalloc is 1561 for startup time	23:19	Copy link Message link Add to gist Remove
23:19 discord-raku-bot joined
MasterDuke	so somehow it's really looking like alloca isn't helping in this case	23:20	Copy link Message link Add to gist Remove
	(but mimalloc is)		Copy link Message link Add to gist Remove
	ugh, so much slower to run callgrind 100 times...	23:25	Copy link Message link Add to gist Remove
timo	oh for sure		Copy link Message link Add to gist Remove
MasterDuke	an interesting yet involved experiment would be to rip out the fsa and just see how mimalloc does	23:28	Copy link Message link Add to gist Remove
timo	mhh the code is already there, you'd just want to toss out the size checks including allocating the size bit before the data	23:29	Copy link Message link Add to gist Remove
MasterDuke	oh right, FSA_SIZE_DEBUG is essentially that, isn't it?	23:30	Copy link Message link Add to gist Remove
	interesting...		Copy link Message link Add to gist Remove
	maybe that'll be a christmas break project		Copy link Message link Add to gist Remove
	alloca+mimalloc is 162919560 instructions more than master+mimalloc	23:31	Copy link Message link Add to gist Remove
	and master is 9017008487 instructions more than master+mimalloc	23:33	Copy link Message link Add to gist Remove
	all total for 100 runs		Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!