#moarvm on 26 April 2020 - Raku Programming Language Log

github.com/moarvm/moarvm \| IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018.
00:20 discoD joined 00:53 lucasb left 05:48 pamplemousse left 06:11 farcas1982regreg joined 06:53 farcas1982regreg left 06:54 farcas1982regreg joined 07:02 discoD left
nine	dogbert17, Altai-man_: I've thought about that commit for a bit when I was in bed, but I'm not sure. It's a hard hit on performance, but it also supposedly fixes the regression github.com/rakudo/rakudo/issues/3569	07:19	Copy link Message link Add to gist Remove
Geth	MoarVM/master: 4 commits pushed by (Dave Lewis)++, niner++ - Remove unused function ensure_buffer() - Release excess memory allocated in MVM_string_utf8_c8_decodestream - Make it const - Merge pull request #1271 from ddlws/utf8-c8	07:26	Copy link Message link Add to gist Remove
MasterDuke	timotimo: did you mean put the code to increase the initial number of FSA bin pages in rakudo/rakudo-m.c?	08:05	Copy link Message link Add to gist Remove
08:17 patrickb joined 08:42 Altai-man_ joined 09:20 sena_kun joined 09:22 Altai-man_ left 10:04 farcas1982regreg left, farcas1982regreg joined 10:06 Geth joined 10:12 farcas1982regreg left, farcas1982regreg joined
timotimo	MasterDuke:i think so, yeah	10:43	Copy link Message link Add to gist Remove
11:19 Altai-man_ joined 11:22 sena_kun left
MasterDuke	timotimo: currently github.com/rakudo/rakudo/blob/mast...build.c.in doesn't really do much other than execing moar with some arguments. think it should be a new argument to moar? maybe a mapping of bins and initial # of pages?	12:04	Copy link Message link Add to gist Remove
timotimo	oh	12:07	Copy link Message link Add to gist Remove
MasterDuke	btw, those numbers were on my branch. i updated the gist gist.github.com/MasterDuke17/bf4bd...d8e892e481 to include numbers on master also	12:25	Copy link Message link Add to gist Remove
	re github.com/MoarVM/MoarVM/blob/mast...c.c#L5-L10 what other improvements are possible?	12:27	Copy link Message link Add to gist Remove
timotimo	do hashes use the FSA already? i do believe i wrote code for that once, but not sure if it got merged	12:37	Copy link Message link Add to gist Remove
MasterDuke	think so	12:38	Copy link Message link Add to gist Remove
	github.com/MoarVM/MoarVM/blob/mast...Hash.c#L34	12:39	Copy link Message link Add to gist Remove
timotimo	we could probably have a look at how many instances of each set of keys we have at multiple points during compilation, for example		Copy link Message link Add to gist Remove
	someone, perhaps jonathan, mentioned replacing the most common hashes with classes perhaps?	12:40	Copy link Message link Add to gist Remove
MasterDuke	yeah, lizmat and jnthn were chatting about that recently	12:41	Copy link Message link Add to gist Remove
lizmat	yeah, but that was in the bootstrap mostly	12:42	Copy link Message link Add to gist Remove
MasterDuke	but i was more thinking, are there improvements to the FSA itself. if we end up using it for more things, making it better/faster would have a bigger impact		Copy link Message link Add to gist Remove
lizmat	MasterDuke: care to blog about the work you've been doing?		Copy link Message link Add to gist Remove
	lizmat sure would like to understand better what all of this FSA business is :-)	12:43	Copy link Message link Add to gist Remove
timotimo	the bootstrap, eh?	12:44	Copy link Message link Add to gist Remove
	i wonder if there's anything we can do with QAST::Want nodes; half the time they have three children, one QAST::WVal, one literal string, and one QAST::IVal, QAST::SVal, or QAST::NVal	12:46	Copy link Message link Add to gist Remove
	the other half they have one sub-tree, the "v" or "Vv" or whatever literal string, and another subtree		Copy link Message link Add to gist Remove
	that's a three-item array every time	12:47	Copy link Message link Add to gist Remove
	but i think we use $blerp[0] a lot of the time, and for @$blerp and such	12:48	Copy link Message link Add to gist Remove
	changing all of QAST code to work with that is probably not quickly done, and the payoff isn't even assured		Copy link Message link Add to gist Remove
	also, modifications to the array are often done as well	12:49	Copy link Message link Add to gist Remove
MasterDuke	lizmat: honestly, i'm rather unlikely to blog. i've never blogged anything online before and i don't really like writing. what's more likely is maybe doing a presentation sometime/somewhere. i actually like public speaking even less than writing, but feel like there's more of a reward		Copy link Message link Add to gist Remove
lizmat	ok, well, eh, I would take a paragraph about FSA here also :-)	12:50	Copy link Message link Add to gist Remove
timotimo	if the Md would prefer, i can give a bit of an explanation		Copy link Message link Add to gist Remove
MasterDuke	go for it, please	12:51	Copy link Message link Add to gist Remove
timotimo	sure		Copy link Message link Add to gist Remove
	ok, so we're mostly switching places that use malloc to places that use the FSA	12:52	Copy link Message link Add to gist Remove
MasterDuke	if github.com/MoarVM/MoarVM/pull/689 does get merged maybe i could be persuaded to write a paragraph for the weekly		Copy link Message link Add to gist Remove
timotimo	malloc has a couple of nice affordances, like every piece of allocated memory has pertinent information stored next to it, like the size of the allocated piece of memory	12:53	Copy link Message link Add to gist Remove
	that way, free can know to do the right thing when you give it a pointer that had come from malloc or realloc		Copy link Message link Add to gist Remove
lizmat	so far so good :-)		Copy link Message link Add to gist Remove
timotimo	but in most places we keep around a "size of the allocated memory piece" anyway, for example when we're adding stuff over time and realloc whenever space runs out		Copy link Message link Add to gist Remove
	there isn't a public API to get the size of a malloced piece of memory, at least to my knowledge	12:54	Copy link Message link Add to gist Remove
	at the very least not portable i guess		Copy link Message link Add to gist Remove
	storing the information next to the data makes it convenient to use, but it also increases the amount of memory taken up by allocated memory		Copy link Message link Add to gist Remove
	and malloc also rounds up, not exactly sure how much, but probably generously	12:55	Copy link Message link Add to gist Remove
	and malloc also has to handle allocations of all kinds of different sizes, and is expected to pack them efficiently in memory		Copy link Message link Add to gist Remove
lizmat	still with you	12:56	Copy link Message link Add to gist Remove
timotimo	the FSA has a different API that always requires you to pass the size of an allocated thing whenever you do an operation on the allocated data		Copy link Message link Add to gist Remove
	so fsa_alloc takes the size just like malloc does, but free also takes the size		Copy link Message link Add to gist Remove
	fsa_free in the last one there	12:57	Copy link Message link Add to gist Remove
	and fsa_realloc takes the old size and the new size		Copy link Message link Add to gist Remove
lizmat	hmmm... that feels potentially more fragile	12:58	Copy link Message link Add to gist Remove
timotimo	yes, that's right		Copy link Message link Add to gist Remove
	that's what makes changing from malloc to fsa a bit more difficult		Copy link Message link Add to gist Remove
lizmat	otoh, you would have all memory allocation data in one place		Copy link Message link Add to gist Remove
timotimo	that's right		Copy link Message link Add to gist Remove
	the FSA allocates space for one class of sizes for like a hundred items at a time	12:59	Copy link Message link Add to gist Remove
lizmat	but that also implies that fsa_alloc needs to be smarter with regards to memory fragmentation, no ?		Copy link Message link Add to gist Remove
timotimo	the trick is that it's split into these "bins" where all objects are about the same size		Copy link Message link Add to gist Remove
lizmat	not exactly the same size?		Copy link Message link Add to gist Remove
timotimo	so you don't get the "where can i fit this?" problem that gives you internal fragmentation		Copy link Message link Add to gist Remove
	not exactly, no, but only rounded up a little bit	13:00	Copy link Message link Add to gist Remove
lizmat	I could see that everything in a bin has the same size, even if less is actually needed		Copy link Message link Add to gist Remove
	turning a pointer into an index ?		Copy link Message link Add to gist Remove
timotimo	that would require everything that wants to use fsa-allocated memory to be a bit more complicated		Copy link Message link Add to gist Remove
	and going from an index to a usable pointer that regular code can work with may be expensive, too	13:01	Copy link Message link Add to gist Remove
	so we do keep regular pointers as the API		Copy link Message link Add to gist Remove
lizmat	ah, there's that of course :-)		Copy link Message link Add to gist Remove
timotimo	so you can imagine that especially for very small allocations the FSA has far less overhead than using malloc		Copy link Message link Add to gist Remove
	allocation into the FSA is done via a free list, i.e. whenever you free something, the free list now starts with that item and the item points at the previously first-in-line free spot and so on	13:02	Copy link Message link Add to gist Remove
	if the free list is exhausted, a new page is added	13:03	Copy link Message link Add to gist Remove
	on top of that, last year or so the FSA got per-thread free-lists in addition to the global free-list		Copy link Message link Add to gist Remove
	along with some heuristics for situations like producer-consumer, where one thread is the one that does all the allocations and another thread is the one doing all the frees		Copy link Message link Add to gist Remove
lizmat	I see, so we're at the bleeding edge of memory allocation logic		Copy link Message link Add to gist Remove
timotimo	not entirely, there's still some really impressive tricks	13:04	Copy link Message link Add to gist Remove
	but we're pretty far along :)		Copy link Message link Add to gist Remove
	anyway, the per-thread free-lists are cool because when you're allocating and freeing the same size of object over and over, you may stay in a situation where you never need to hold a lock or do any inter-thread synchronization of any kind	13:05	Copy link Message link Add to gist Remove
MasterDuke	my recent stats about number of add_page calls just running `raku -e ''` suggests to me that we need to increase the amount of memory we initially malloc to populate the FSA	13:06	Copy link Message link Add to gist Remove
timotimo	not sure if i may have missed anything		Copy link Message link Add to gist Remove
	MasterDuke: can you output the size of the pages we actually allocate for each bin, and how many items fit in each?		Copy link Message link Add to gist Remove
	i think the numbers are different for each bin	13:07	Copy link Message link Add to gist Remove
lizmat	timotimo: thanks for this exposé, pretty sure I'm not the only one who found this enlighting	13:08	Copy link Message link Add to gist Remove
timotimo	sure thing!		Copy link Message link Add to gist Remove
	oh! how could i forget this	13:09	Copy link Message link Add to gist Remove
	the FSA also has a neat little mechanism called "free at safepoint"		Copy link Message link Add to gist Remove
	with that you can free memory pieces that other threads may currently still be accessing, without causing trouble	13:10	Copy link Message link Add to gist Remove
	because FSA Safepoints™ only happen outside of functions, usually		Copy link Message link Add to gist Remove
MasterDuke	timotimo: github.com/MoarVM/MoarVM/blob/mast....h#L91-L92 suggests it's always 128 items per page		Copy link Message link Add to gist Remove
	timotimo: gist.github.com/MasterDuke17/bf4bd...d8e892e481 updated with page sizes	13:11	Copy link Message link Add to gist Remove
timotimo	could you output how often a request for 2 or 4 bytes comes in? that's where rounding up is, relatively, the biggest overhead	13:12	Copy link Message link Add to gist Remove
	though in absolute terms it may not matter at all		Copy link Message link Add to gist Remove
lizmat	wouldn't it make sense to up an request for <8 to 8 on 64-bit machines, with the proper alignment and such ?	13:15	Copy link Message link Add to gist Remove
MasterDuke	timotimo: gist updated	13:16	Copy link Message link Add to gist Remove
timotimo	that's what the FSA currently does i think		Copy link Message link Add to gist Remove
MasterDuke	and remember, this is just for `raku -e ''`		Copy link Message link Add to gist Remove
timotimo	OK, it literally doesn't happen ever in this example		Copy link Message link Add to gist Remove
MasterDuke	i don't see any requests for 2 or 4 bytes. wonder if that would change on my branch that uses the fsa for strings	13:17	Copy link Message link Add to gist Remove
timotimo	good question	13:18	Copy link Message link Add to gist Remove
	anything smaller than or equal to 8 bytes could in theory have its data live where the pointer is	13:19	Copy link Message link Add to gist Remove
	but that's possibly a boatload of special case code all over the place		Copy link Message link Add to gist Remove
13:21 sena_kun joined 13:22 Altai-man_ left
lizmat	yeah, that smells of Perl type of optimizations :-)	13:27	Copy link Message link Add to gist Remove
timotimo	aye, it's fairly common in other places, like string implementations in c++	13:28	Copy link Message link Add to gist Remove
	i've seen it mostly be called "in-situ storage"		Copy link Message link Add to gist Remove
lizmat	reminds me of turning in IV to an NV or a PV or a GV o :-)	13:29	Copy link Message link Add to gist Remove
nine	Btw. for topics like this, the Linux kernel offers plenty of advanced techniques. It contains a range of allocators (like the slab cache), synchronization privimites and concurrency safe data structures	13:56	Copy link Message link Add to gist Remove
MasterDuke	yeah, i saw some articles on phoronix recently about facebook wanting to upstream a new slab allocator that apparently is already giving them tons of memory reduction in production	14:01	Copy link Message link Add to gist Remove
14:32 vesper11 left 14:36 vesper11 joined
MasterDuke	timotimo: why doesn't this patch gist.github.com/MasterDuke17/4840f...a0d5395ee2 show a reduced number of calls to add_page?	14:37	Copy link Message link Add to gist Remove
	ah, needed to also adjust alloc_limit	14:41	Copy link Message link Add to gist Remove
14:58 pamplemousse joined
timotimo	MasterDuke: can you try just increasing the page_size and not upping the num_pages?	15:03	Copy link Message link Add to gist Remove
15:19 Altai-man_ joined 15:22 sena_kun left
MasterDuke	timotimo: by increasing MVM_FSA_PAGE_ITEMS, or adding some other factor to the calculation?	15:34	Copy link Message link Add to gist Remove
timotimo	the same number you're currently using to generate more than one page could be used to make a single, bigger page instead	15:36	Copy link Message link Add to gist Remove
16:08 vesper11 left 16:09 vesper11 joined
MasterDuke	timotimo: i changed it to `if (bin == 2 \|\| bin == 5) page_size = 24; else if (bin == 15) page_size = 8; else if (bin <= 8) page_size *= 4;` in setup_bin and add_page. updated numbers in the gist	16:51	Copy link Message link Add to gist Remove
timotimo	that looks pretty good, right?	16:52	Copy link Message link Add to gist Remove
	am i wrong to expect some bin numbers to appear twice in that listing?	16:53	Copy link Message link Add to gist Remove
	since they would get one huge page and then normal-sized pages?		Copy link Message link Add to gist Remove
	or are the pages now always bigger, rather than just the very first one?		Copy link Message link Add to gist Remove
MasterDuke	always bigger	16:54	Copy link Message link Add to gist Remove
17:20 patrickb left, sena_kun joined 17:22 Altai-man_ left
timotimo	oh, hm, ok	17:38	Copy link Message link Add to gist Remove
	would you be interested in collecting the data for a core setting compilation?	17:40	Copy link Message link Add to gist Remove
18:03 zakharyas joined
MasterDuke	timotimo: sure. which stats? on my branch or master or both?	18:19	Copy link Message link Add to gist Remove
timotimo	both would be fine, but your branch is more interesting first maybe?	18:39	Copy link Message link Add to gist Remove
MasterDuke	but unmodified otherwise, right? i.e., not my my recent changes to increase the size of pages	18:40	Copy link Message link Add to gist Remove
timotimo	after would be okay i think	18:41	Copy link Message link Add to gist Remove
MasterDuke	log bytes requested? or just when adding pages?		Copy link Message link Add to gist Remove
timotimo	i thought "just when adding pages" for now		Copy link Message link Add to gist Remove
	the other would likely take ages to make the whole output		Copy link Message link Add to gist Remove
MasterDuke	true	18:43	Copy link Message link Add to gist Remove
timotimo	could perhaps want to turn it into a binary format, or at least much shorter output	18:45	Copy link Message link Add to gist Remove
	%d is probably a tiny bit slower than %x		Copy link Message link Add to gist Remove
MasterDuke	eh, doesn't really take too much longer		Copy link Message link Add to gist Remove
timotimo	ok	18:46	Copy link Message link Add to gist Remove
MasterDuke	gist.github.com/MasterDuke17/0cde6...89c817ef53	18:49	Copy link Message link Add to gist Remove
	oh, let me add the diff	18:50	Copy link Message link Add to gist Remove
	updated	18:51	Copy link Message link Add to gist Remove
19:07 farcas1982regreg left 19:19 Altai-man_ joined 19:22 sena_kun left 19:59 lucasb joined 21:02 zakharyas left
MasterDuke	timotimo: see anything useful there?	21:07	Copy link Message link Add to gist Remove
timotimo	it doesn't really speak to me just yet	21:14	Copy link Message link Add to gist Remove
	i'm wondering if checking what lines tend to do what sizes gives us any interesting spots, like spots that only ever allocate a single size? but no idea what that could actually give us		Copy link Message link Add to gist Remove
MasterDuke	fwiw, that diff against master brings down the number of instructions callgrind reports for `raku -e ''` by ~600k. from ~702,800,000 to ~702,200,000		Copy link Message link Add to gist Remove
	well, if they don't already maybe they could do a setelems	21:15	Copy link Message link Add to gist Remove
	guess it depends it the size they usually produce causes the array to realloc, and if so, how much is wasted?	21:16	Copy link Message link Add to gist Remove
	afk for a bit		Copy link Message link Add to gist Remove
21:20 sena_kun joined 21:22 Altai-man_ left
timotimo	i'm thinking more about the other places where the FSA is used, but yeah, setelems would be one that has boatloads of different sizes, i'd expect	21:24	Copy link Message link Add to gist Remove
22:09 dogbert11 joined
MasterDuke	it definitely seems worth it to me to somehow (either bigger pages or more created at setup) increase the initial size of the FSA. 400 (or 1k on my branch) calls to add_pages just for `raku -e ''` seems like an unneeded cost to pay	22:16	Copy link Message link Add to gist Remove
22:38 Kaiepi left, Kaeipi joined
timotimo	after a specific size we'll end up mmapping instead of mallocing, too	22:51	Copy link Message link Add to gist Remove
	i'm not entirely sure what the exact costs come down to for those two operations		Copy link Message link Add to gist Remove
22:53 MasterDuke left 23:11 Kaeipi left 23:14 Kaiepi joined 23:19 Altai-man_ joined 23:22 sena_kun left 23:49 lucasb left

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!