github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:20
discoD joined
00:53
lucasb left
05:48
pamplemousse left
06:11
farcas1982regreg joined
06:53
farcas1982regreg left
06:54
farcas1982regreg joined
07:02
discoD left
|
|||
nine | dogbert17, Altai-man_: I've thought about that commit for a bit when I was in bed, but I'm not sure. It's a hard hit on performance, but it also supposedly fixes the regression github.com/rakudo/rakudo/issues/3569 | 07:19 | |
Geth | MoarVM/master: 4 commits pushed by (Dave Lewis)++, niner++ | 07:26 | |
MasterDuke | timotimo: did you mean put the code to increase the initial number of FSA bin pages in rakudo/rakudo-m.c? | 08:05 | |
08:17
patrickb joined
08:42
Altai-man_ joined
09:20
sena_kun joined
09:22
Altai-man_ left
10:04
farcas1982regreg left,
farcas1982regreg joined
10:06
Geth joined
10:12
farcas1982regreg left,
farcas1982regreg joined
|
|||
timotimo | MasterDuke:i think so, yeah | 10:43 | |
11:19
Altai-man_ joined
11:22
sena_kun left
|
|||
MasterDuke | timotimo: currently github.com/rakudo/rakudo/blob/mast...build.c.in doesn't really do much other than execing moar with some arguments. think it should be a new argument to moar? maybe a mapping of bins and initial # of pages? | 12:04 | |
timotimo | oh | 12:07 | |
MasterDuke | btw, those numbers were on my branch. i updated the gist gist.github.com/MasterDuke17/bf4bd...d8e892e481 to include numbers on master also | 12:25 | |
re github.com/MoarVM/MoarVM/blob/mast...c.c#L5-L10 what other improvements are possible? | 12:27 | ||
timotimo | do hashes use the FSA already? i do believe i wrote code for that once, but not sure if it got merged | 12:37 | |
MasterDuke | think so | 12:38 | |
github.com/MoarVM/MoarVM/blob/mast...Hash.c#L34 | 12:39 | ||
timotimo | we could probably have a look at how many instances of each set of keys we have at multiple points during compilation, for example | ||
someone, perhaps jonathan, mentioned replacing the most common hashes with classes perhaps? | 12:40 | ||
MasterDuke | yeah, lizmat and jnthn were chatting about that recently | 12:41 | |
lizmat | yeah, but that was in the bootstrap mostly | 12:42 | |
MasterDuke | but i was more thinking, are there improvements to the FSA itself. if we end up using it for more things, making it better/faster would have a bigger impact | ||
lizmat | MasterDuke: care to blog about the work you've been doing? | ||
lizmat sure would like to understand better what all of this FSA business is :-) | 12:43 | ||
timotimo | the bootstrap, eh? | 12:44 | |
i wonder if there's anything we can do with QAST::Want nodes; half the time they have three children, one QAST::WVal, one literal string, and one QAST::IVal, QAST::SVal, or QAST::NVal | 12:46 | ||
the other half they have one sub-tree, the "v" or "Vv" or whatever literal string, and another subtree | |||
that's a three-item array every time | 12:47 | ||
but i think we use $blerp[0] a lot of the time, and for @$blerp and such | 12:48 | ||
changing all of QAST code to work with that is probably not quickly done, and the payoff isn't even assured | |||
also, modifications to the array are often done as well | 12:49 | ||
MasterDuke | lizmat: honestly, i'm rather unlikely to blog. i've never blogged anything online before and i don't really like writing. what's more likely is maybe doing a presentation sometime/somewhere. i actually like public speaking even less than writing, but feel like there's more of a reward | ||
lizmat | ok, well, eh, I would take a paragraph about FSA here also :-) | 12:50 | |
timotimo | if the Md would prefer, i can give a bit of an explanation | ||
MasterDuke | go for it, please | 12:51 | |
timotimo | sure | ||
ok, so we're mostly switching places that use malloc to places that use the FSA | 12:52 | ||
MasterDuke | if github.com/MoarVM/MoarVM/pull/689 does get merged maybe i could be persuaded to write a paragraph for the weekly | ||
timotimo | malloc has a couple of nice affordances, like every piece of allocated memory has pertinent information stored next to it, like the size of the allocated piece of memory | 12:53 | |
that way, free can know to do the right thing when you give it a pointer that had come from malloc or realloc | |||
lizmat | so far so good :-) | ||
timotimo | but in most places we keep around a "size of the allocated memory piece" anyway, for example when we're adding stuff over time and realloc whenever space runs out | ||
there isn't a public API to get the size of a malloced piece of memory, at least to my knowledge | 12:54 | ||
at the very least not portable i guess | |||
storing the information next to the data makes it convenient to use, but it also increases the amount of memory taken up by allocated memory | |||
and malloc also rounds up, not exactly sure how much, but probably generously | 12:55 | ||
and malloc also has to handle allocations of all kinds of different sizes, and is expected to pack them efficiently in memory | |||
lizmat | still with you | 12:56 | |
timotimo | the FSA has a different API that always requires you to pass the size of an allocated thing whenever you do an operation on the allocated data | ||
so fsa_alloc takes the size just like malloc does, but free also takes the size | |||
fsa_free in the last one there | 12:57 | ||
and fsa_realloc takes the old size and the new size | |||
lizmat | hmmm... that *feels* potentially more fragile | 12:58 | |
timotimo | yes, that's right | ||
that's what makes changing from malloc to fsa a bit more difficult | |||
lizmat | otoh, you would have all memory allocation data in one place | ||
timotimo | that's right | ||
the FSA allocates space for one class of sizes for like a hundred items at a time | 12:59 | ||
lizmat | but that also implies that fsa_alloc needs to be smarter with regards to memory fragmentation, no ? | ||
timotimo | the trick is that it's split into these "bins" where all objects are about the same size | ||
lizmat | not exactly the same size? | ||
timotimo | so you don't get the "where can i fit this?" problem that gives you internal fragmentation | ||
not exactly, no, but only rounded up a little bit | 13:00 | ||
lizmat | I could see that everything in a bin has the same size, even if less is actually needed | ||
turning a pointer into an index ? | |||
timotimo | that would require everything that wants to use fsa-allocated memory to be a bit more complicated | ||
and going from an index to a usable pointer that regular code can work with may be expensive, too | 13:01 | ||
so we do keep regular pointers as the API | |||
lizmat | ah, there's that of course :-) | ||
timotimo | so you can imagine that especially for very small allocations the FSA has far less overhead than using malloc | ||
allocation into the FSA is done via a free list, i.e. whenever you free something, the free list now starts with that item and the item points at the previously first-in-line free spot and so on | 13:02 | ||
if the free list is exhausted, a new page is added | 13:03 | ||
on top of that, last year or so the FSA got per-thread free-lists in addition to the global free-list | |||
along with some heuristics for situations like producer-consumer, where one thread is the one that does all the allocations and another thread is the one doing all the frees | |||
lizmat | I see, so we're at the bleeding edge of memory allocation logic | ||
timotimo | not entirely, there's still some really impressive tricks | 13:04 | |
but we're pretty far along :) | |||
anyway, the per-thread free-lists are cool because when you're allocating and freeing the same size of object over and over, you may stay in a situation where you never need to hold a lock or do any inter-thread synchronization of any kind | 13:05 | ||
MasterDuke | my recent stats about number of add_page calls just running `raku -e ''` suggests to me that we need to increase the amount of memory we initially malloc to populate the FSA | 13:06 | |
timotimo | not sure if i may have missed anything | ||
MasterDuke: can you output the size of the pages we actually allocate for each bin, and how many items fit in each? | |||
i think the numbers are different for each bin | 13:07 | ||
lizmat | timotimo: thanks for this exposé, pretty sure I'm not the only one who found this enlighting | 13:08 | |
timotimo | sure thing! | ||
oh! how could i forget this | 13:09 | ||
the FSA also has a neat little mechanism called "free at safepoint" | |||
with that you can free memory pieces that other threads may currently still be accessing, without causing trouble | 13:10 | ||
because FSA Safepoints™ only happen outside of functions, usually | |||
MasterDuke | timotimo: github.com/MoarVM/MoarVM/blob/mast....h#L91-L92 suggests it's always 128 items per page | ||
timotimo: gist.github.com/MasterDuke17/bf4bd...d8e892e481 updated with page sizes | 13:11 | ||
timotimo | could you output how often a request for 2 or 4 bytes comes in? that's where rounding up is, relatively, the biggest overhead | 13:12 | |
though in absolute terms it may not matter at all | |||
lizmat | wouldn't it make sense to up an request for <8 to 8 on 64-bit machines, with the proper alignment and such ? | 13:15 | |
MasterDuke | timotimo: gist updated | 13:16 | |
timotimo | that's what the FSA currently does i think | ||
MasterDuke | and remember, this is just for `raku -e ''` | ||
timotimo | OK, it literally doesn't happen ever in this example | ||
MasterDuke | i don't see any requests for 2 or 4 bytes. wonder if that would change on my branch that uses the fsa for strings | 13:17 | |
timotimo | good question | 13:18 | |
anything smaller than or equal to 8 bytes could in theory have its data live where the pointer is | 13:19 | ||
but that's possibly a boatload of special case code all over the place | |||
13:21
sena_kun joined
13:22
Altai-man_ left
|
|||
lizmat | yeah, that smells of Perl type of optimizations :-) | 13:27 | |
timotimo | aye, it's fairly common in other places, like string implementations in c++ | 13:28 | |
i've seen it mostly be called "in-situ storage" | |||
lizmat | reminds me of turning in IV to an NV or a PV or a GV o :-) | 13:29 | |
nine | Btw. for topics like this, the Linux kernel offers plenty of advanced techniques. It contains a range of allocators (like the slab cache), synchronization privimites and concurrency safe data structures | 13:56 | |
MasterDuke | yeah, i saw some articles on phoronix recently about facebook wanting to upstream a new slab allocator that apparently is already giving them tons of memory reduction in production | 14:01 | |
14:32
vesper11 left
14:36
vesper11 joined
|
|||
MasterDuke | timotimo: why doesn't this patch gist.github.com/MasterDuke17/4840f...a0d5395ee2 show a reduced number of calls to add_page? | 14:37 | |
ah, needed to also adjust alloc_limit | 14:41 | ||
14:58
pamplemousse joined
|
|||
timotimo | MasterDuke: can you try just increasing the page_size and not upping the num_pages? | 15:03 | |
15:19
Altai-man_ joined
15:22
sena_kun left
|
|||
MasterDuke | timotimo: by increasing MVM_FSA_PAGE_ITEMS, or adding some other factor to the calculation? | 15:34 | |
timotimo | the same number you're currently using to generate more than one page could be used to make a single, bigger page instead | 15:36 | |
16:08
vesper11 left
16:09
vesper11 joined
|
|||
MasterDuke | timotimo: i changed it to `if (bin == 2 || bin == 5) page_size *= 24; else if (bin == 15) page_size *= 8; else if (bin <= 8) page_size *= 4;` in setup_bin and add_page. updated numbers in the gist | 16:51 | |
timotimo | that looks pretty good, right? | 16:52 | |
am i wrong to expect some bin numbers to appear twice in that listing? | 16:53 | ||
since they would get one huge page and then normal-sized pages? | |||
or are the pages now always bigger, rather than just the very first one? | |||
MasterDuke | always bigger | 16:54 | |
17:20
patrickb left,
sena_kun joined
17:22
Altai-man_ left
|
|||
timotimo | oh, hm, ok | 17:38 | |
would you be interested in collecting the data for a core setting compilation? | 17:40 | ||
18:03
zakharyas joined
|
|||
MasterDuke | timotimo: sure. which stats? on my branch or master or both? | 18:19 | |
timotimo | both would be fine, but your branch is more interesting first maybe? | 18:39 | |
MasterDuke | but unmodified otherwise, right? i.e., not my my recent changes to increase the size of pages | 18:40 | |
timotimo | after would be okay i think | 18:41 | |
MasterDuke | log bytes requested? or just when adding pages? | ||
timotimo | i thought "just when adding pages" for now | ||
the other would likely take ages to make the whole output | |||
MasterDuke | true | 18:43 | |
timotimo | could perhaps want to turn it into a binary format, or at least much shorter output | 18:45 | |
%d is probably a tiny bit slower than %x | |||
MasterDuke | eh, doesn't really take too much longer | ||
timotimo | ok | 18:46 | |
MasterDuke | gist.github.com/MasterDuke17/0cde6...89c817ef53 | 18:49 | |
oh, let me add the diff | 18:50 | ||
updated | 18:51 | ||
19:07
farcas1982regreg left
19:19
Altai-man_ joined
19:22
sena_kun left
19:59
lucasb joined
21:02
zakharyas left
|
|||
MasterDuke | timotimo: see anything useful there? | 21:07 | |
timotimo | it doesn't really speak to me just yet | 21:14 | |
i'm wondering if checking what lines tend to do what sizes gives us any interesting spots, like spots that only ever allocate a single size? but no idea what that could actually give us | |||
MasterDuke | fwiw, that diff against master brings down the number of instructions callgrind reports for `raku -e ''` by ~600k. from ~702,800,000 to ~702,200,000 | ||
well, if they don't already maybe they could do a setelems | 21:15 | ||
guess it depends it the size they usually produce causes the array to realloc, and if so, how much is wasted? | 21:16 | ||
afk for a bit | |||
21:20
sena_kun joined
21:22
Altai-man_ left
|
|||
timotimo | i'm thinking more about the other places where the FSA is used, but yeah, setelems would be one that has boatloads of different sizes, i'd expect | 21:24 | |
22:09
dogbert11 joined
|
|||
MasterDuke | it definitely seems worth it to me to somehow (either bigger pages or more created at setup) increase the initial size of the FSA. 400 (or 1k on my branch) calls to add_pages just for `raku -e ''` seems like an unneeded cost to pay | 22:16 | |
22:38
Kaiepi left,
Kaeipi joined
|
|||
timotimo | after a specific size we'll end up mmapping instead of mallocing, too | 22:51 | |
i'm not entirely sure what the exact costs come down to for those two operations | |||
22:53
MasterDuke left
23:11
Kaeipi left
23:14
Kaiepi joined
23:19
Altai-man_ joined
23:22
sena_kun left
23:49
lucasb left
|