Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021. |
|||
00:38
MasterDuke joined
|
|||
MasterDuke | timo1++!! the branch is passing all the CI jobs that main is! | 00:39 | |
great find, i was definitely not looking at that code at all for the fix | |||
timo1 | :) | 00:46 | |
thanks for your effort to resurrect this code | |||
i've found two spots that were creating fresh string objects with 0 graphemes in them | |||
MasterDuke | likewise thanks for getting it started | 00:47 | |
that doesn't sound great | |||
timo1 | it was just the decode functions for latin1 and ascii; one of them was called at least from the generated config.c where we create a hash from what Configure.pl has found out, and some of these keys got an empty string bound. surely the majority of empty strings from latin1 came from somewhere else though | 00:48 | |
and the other is from cu_get_string, which every CU probably has the empty string in it, and that was giving us one fresh empty string per compunit | |||
which there's many compunit when compiling the core setting, thanks to "compile in context" | 00:49 | ||
near the end of core setting compilation, there's 17.9% MVM_STRING_STRAND with just a single strand. i want to double-check that these don't have repetition or so, and maybe there can be an in-situ strand storage type | 00:50 | ||
MasterDuke | isn't there a VM constant for the empty string? | 00:51 | |
timo1 | there is, we just weren't returning it in the right spot :) | ||
extra fun: don't try to return the empty string constant always when the bytes count is 0, because surprise surprise, the very first empty string gets created in that same function :D | 00:52 | ||
MasterDuke | he | 00:56 | |
*heh | |||
i was gratified by those stats you posted earlier, glad to see in-situ is being used a bunch | 00:57 | ||
timo1 | r: say 647437 * 0.2 | 01:02 | |
camelia | Can't open perl script "/home/camelia/rakudo-j-inst/bin/eval-client.pl": No such file or directory | ||
129487.4 | |||
timo1 | m: say 129487 / 256 | 01:03 | |
camelia | 505.808594 | ||
timo1 | there's like enough single-grapheme in_situ_8 strings to store every possible single byte 500 times :D | ||
MasterDuke | don't you also have a branch with a cache of short string? | 01:04 | |
github.com/MoarVM/MoarVM/compare/m...ring_cache | |||
i think i started to rebase it, but it was a little trickier than the in-situ-string one | 01:05 | ||
but that might be a good next project after in-situ-strings is merged | |||
it's not actually that big of a diff | 01:06 | ||
timo1 | but is it actually helpful, i wonder | 01:07 | |
it's already a lot better to have the single character strings in situ instead of with its own allocation | |||
MasterDuke | dunno if it's helpful, but have to rebase to measure | ||
timo1 | my investigation into what single character string would be most worth caching: $ | 01:25 | |
:! 6.8; :$ 36.4; :& 3.1; :+ 1.6; :, 16.2; :- 1.2; :? 4.4; :@ 1.6; | |||
:D 4.9; :a 1.8; :b 1.4; :i 2.3; :| 1.5; | |||
left out anything with a too low percentage | 01:26 | ||
but i think it's very funny that caching $ would be worth so much | |||
MasterDuke | ugh, i'm seeing emojis | 01:27 | |
timo1 | hehe. ok on second thought maybe : isn't the best character to show "this is a character directly, rather than a hex code for the ascii value" | 01:28 | |
earlier in a run i get this: :1 5.6; :2 5.8; :3 5.6; :4 5.4; | 01:29 | ||
:5 5.4; :6 5.4; :7 5.4; :8 5.4; :9 5.2; | |||
MasterDuke | no 0? | 01:30 | |
timo1 | funny enough, no | 01:34 | |
MasterDuke | that's...surprising | ||
timo1 | i think a good chunk of the medium-length in_situ_8 strings come from the lc/uc thing i might have a branch for | 01:35 | |
MasterDuke | ? | 01:36 | |
rakudo branch? | 01:43 | ||
timo1 | actually might be nqp? | 01:48 | |
ok, QRegex's MATCH uses nqp::split against "=" to figure out the list of names a capture has (it can be more than 1) and that generates quite an amount of strings | 02:15 | ||
MasterDuke | split should be making them in-situ if possible. but you think they're good candidates for the short-string-cache? | 02:18 | |
timo1 | they are in situ in this case yes | 02:23 | |
MasterDuke | off to bed. should have to some time tomorrow to follow up though | 02:37 | |
05:46
[Coke] left
05:47
[Coke] joined
08:34
sena_kun joined
|
|||
Geth | MoarVM/main: 2c866ae859 | (Patrick Bƶker)++ | src/core/exceptions.c Fix returning from LEAVE to surrounding scope When a LEAVE phaser is run during an unwind, there is no valid return_address set in the LEAVE phasers outer frame. (It's return_address is actually set to interp_cur_op which is still the one that started the unwind in some unrelated frame.) Thus when a `return` in a LEAVE happens, the handler of the outer frame is missed, because `search_frame_handlers_lex()` ... (19 more lines) |
11:48 | |
MoarVM/main: 08848c7e45 | (Patrick Bƶker)++ (committed using GitHub Web editor) | src/core/exceptions.c Merge pull request #1785 from patrickbkr/leave-return-handler-miss-fix Fix returning from LEAVE to surrounding scope |
|||
lizmat will do the appropriate bumping | 11:53 | ||
MasterDuke | what's the overall feel about github.com/MoarVM/MoarVM/pull/1802 now that CI is passing? ready to merge? anything else to add? more optimization needed? | 12:59 | |
lizmat | if you could also update github.com/MoarVM/MoarVM/blob/main...e.markdown with these changes, that would be nice | 13:07 | |
that will help me in updating MoarVM::Bytecode :-) | 13:08 | ||
MasterDuke | don't think anything would change there, i don't see mention of a string's storage type | 13:10 | |
lizmat | perhaps that *could* be documented though? | 13:11 | |
MasterDuke | yeah, probably a good idea | 13:12 | |
lizmat | I'm not sure that would make sense, but any more guidance in grokking MoarVM internals would be appreciated by people now and in the future :) | ||
possibly even yourself :-) | |||
MasterDuke | hm, why can't i profile compiling CORE.e? sometimes i can't do CORE.c because it used too much memory, but CORE.e should be fine | 13:21 | |
lizmat | perhaps CORE.e does something async? | 13:22 | |
MasterDuke | can't do CORE.d either | 13:23 | |
timo1 | i wonder if the microbenchmark where the number of allocations went way down but the time didn't shows how good mimalloc is at many very short-lived allocations, or something like that | 15:00 | |
the implication might be "the microbenchmark is too far removed from real use cases"? | 15:21 | ||
i thought a bit more about the short string cache; when building the stuff that generates statistics about strings, I've come to appreciate the lifetime of the worklist object, and that it's passed into every gc_mark function | 20:56 | ||
it's also per-thread, so accessing it is always safe, no need to worry about concurrently running gcs of different threads | 20:57 | ||
and doesn't stay around in between gc runs | 20:58 | ||
the existing branch, the "cache one grapheme strings" one, had to make sure that everything that could create strings with just one character would look in the cache first | 21:00 | ||
that's a lot more places that have to be changed to take advantage of this, whereas giving the gc collect phase something that deduplicates strings would hit every string created how-ever. with the drawback that the string makes it into the nursery and therefore causes the next gc run to be sooner | 21:01 | ||
a cache that also holds longer strings would be beneficial for things like common keys in hashes, though | |||
we build boatloads of these when parsing code, for all the named match captures | 21:02 | ||
and as i've seen whenever we have something like <foo=bar> we recreate the foo and maybe also the bar string and put them as keys to a newly created hash | |||
that might be a lot of potential for deduplication there | 21:03 | ||
MasterDuke | oh interesting. reducing mem used during parse/build would be very welcome | 21:17 | |
21:18
vrurg joined
|
|||
timo1 | having the gc run do the caching and finding also means that the cost of that is only paid for strings that survive long enough for GC to happen, whereas having it at the spot where, for example, substr or split creates it would be paid for all strings, even the very short-lived ones | 22:19 | |
22:24
sena_kun left
|
|||
timo1 | it would be bonkers if we could store in-situ characters inside a strand's pointer that normally goes to another string | 22:44 | |
no idea how often that would actually work | 22:45 |