Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:38 MasterDuke joined
MasterDuke timo1++!! the branch is passing all the CI jobs that main is! 00:39
great find, i was definitely not looking at that code at all for the fix
timo1 :) 00:46
thanks for your effort to resurrect this code
i've found two spots that were creating fresh string objects with 0 graphemes in them
MasterDuke likewise thanks for getting it started 00:47
that doesn't sound great
timo1 it was just the decode functions for latin1 and ascii; one of them was called at least from the generated config.c where we create a hash from what Configure.pl has found out, and some of these keys got an empty string bound. surely the majority of empty strings from latin1 came from somewhere else though 00:48
and the other is from cu_get_string, which every CU probably has the empty string in it, and that was giving us one fresh empty string per compunit
which there's many compunit when compiling the core setting, thanks to "compile in context" 00:49
near the end of core setting compilation, there's 17.9% MVM_STRING_STRAND with just a single strand. i want to double-check that these don't have repetition or so, and maybe there can be an in-situ strand storage type 00:50
MasterDuke isn't there a VM constant for the empty string? 00:51
timo1 there is, we just weren't returning it in the right spot :)
extra fun: don't try to return the empty string constant always when the bytes count is 0, because surprise surprise, the very first empty string gets created in that same function :D 00:52
MasterDuke he 00:56
*heh
i was gratified by those stats you posted earlier, glad to see in-situ is being used a bunch 00:57
timo1 r: say 647437 * 0.2 01:02
camelia Can't open perl script "/home/camelia/rakudo-j-inst/bin/eval-client.pl": No such file or directory
129487.4
timo1 m: say 129487 / 256 01:03
camelia 505.808594
timo1 there's like enough single-grapheme in_situ_8 strings to store every possible single byte 500 times :D
MasterDuke don't you also have a branch with a cache of short string? 01:04
github.com/MoarVM/MoarVM/compare/m...ring_cache
i think i started to rebase it, but it was a little trickier than the in-situ-string one 01:05
but that might be a good next project after in-situ-strings is merged
it's not actually that big of a diff 01:06
timo1 but is it actually helpful, i wonder 01:07
it's already a lot better to have the single character strings in situ instead of with its own allocation
MasterDuke dunno if it's helpful, but have to rebase to measure
timo1 my investigation into what single character string would be most worth caching: $ 01:25
:! 6.8; :$ 36.4; :& 3.1; :+ 1.6; :, 16.2; :- 1.2; :? 4.4; :@ 1.6;
:D 4.9; :a 1.8; :b 1.4; :i 2.3; :| 1.5;
left out anything with a too low percentage 01:26
but i think it's very funny that caching $ would be worth so much
MasterDuke ugh, i'm seeing emojis 01:27
timo1 hehe. ok on second thought maybe : isn't the best character to show "this is a character directly, rather than a hex code for the ascii value" 01:28
earlier in a run i get this: :1 5.6; :2 5.8; :3 5.6; :4 5.4; 01:29
:5 5.4; :6 5.4; :7 5.4; :8 5.4; :9 5.2;
MasterDuke no 0? 01:30
timo1 funny enough, no 01:34
MasterDuke that's...surprising
timo1 i think a good chunk of the medium-length in_situ_8 strings come from the lc/uc thing i might have a branch for 01:35
MasterDuke ? 01:36
rakudo branch? 01:43
timo1 actually might be nqp? 01:48
ok, QRegex's MATCH uses nqp::split against "=" to figure out the list of names a capture has (it can be more than 1) and that generates quite an amount of strings 02:15
MasterDuke split should be making them in-situ if possible. but you think they're good candidates for the short-string-cache? 02:18
timo1 they are in situ in this case yes 02:23
MasterDuke off to bed. should have to some time tomorrow to follow up though 02:37
05:46 [Coke] left 05:47 [Coke] joined 08:34 sena_kun joined
Geth MoarVM/main: 2c866ae859 | (Patrick Bƶker)++ | src/core/exceptions.c
Fix returning from LEAVE to surrounding scope

When a LEAVE phaser is run during an unwind, there is no valid return_address set in the LEAVE phasers outer frame. (It's return_address is actually set to interp_cur_op which is still the one that started the unwind in some unrelated frame.) Thus when a `return` in a LEAVE happens, the handler of the outer frame is missed, because `search_frame_handlers_lex()` ... (19 more lines)
11:48
MoarVM/main: 08848c7e45 | (Patrick Bƶker)++ (committed using GitHub Web editor) | src/core/exceptions.c
Merge pull request #1785 from patrickbkr/leave-return-handler-miss-fix

Fix returning from LEAVE to surrounding scope
lizmat will do the appropriate bumping 11:53
MasterDuke what's the overall feel about github.com/MoarVM/MoarVM/pull/1802 now that CI is passing? ready to merge? anything else to add? more optimization needed? 12:59
lizmat if you could also update github.com/MoarVM/MoarVM/blob/main...e.markdown with these changes, that would be nice 13:07
that will help me in updating MoarVM::Bytecode :-) 13:08
MasterDuke don't think anything would change there, i don't see mention of a string's storage type 13:10
lizmat perhaps that *could* be documented though? 13:11
MasterDuke yeah, probably a good idea 13:12
lizmat I'm not sure that would make sense, but any more guidance in grokking MoarVM internals would be appreciated by people now and in the future :)
possibly even yourself :-)
MasterDuke hm, why can't i profile compiling CORE.e? sometimes i can't do CORE.c because it used too much memory, but CORE.e should be fine 13:21
lizmat perhaps CORE.e does something async? 13:22
MasterDuke can't do CORE.d either 13:23
timo1 i wonder if the microbenchmark where the number of allocations went way down but the time didn't shows how good mimalloc is at many very short-lived allocations, or something like that 15:00
the implication might be "the microbenchmark is too far removed from real use cases"? 15:21
i thought a bit more about the short string cache; when building the stuff that generates statistics about strings, I've come to appreciate the lifetime of the worklist object, and that it's passed into every gc_mark function 20:56
it's also per-thread, so accessing it is always safe, no need to worry about concurrently running gcs of different threads 20:57
and doesn't stay around in between gc runs 20:58
the existing branch, the "cache one grapheme strings" one, had to make sure that everything that could create strings with just one character would look in the cache first 21:00
that's a lot more places that have to be changed to take advantage of this, whereas giving the gc collect phase something that deduplicates strings would hit every string created how-ever. with the drawback that the string makes it into the nursery and therefore causes the next gc run to be sooner 21:01
a cache that also holds longer strings would be beneficial for things like common keys in hashes, though
we build boatloads of these when parsing code, for all the named match captures 21:02
and as i've seen whenever we have something like <foo=bar> we recreate the foo and maybe also the bar string and put them as keys to a newly created hash
that might be a lot of potential for deduplication there 21:03
MasterDuke oh interesting. reducing mem used during parse/build would be very welcome 21:17
21:18 vrurg joined
timo1 having the gc run do the caching and finding also means that the cost of that is only paid for strings that survive long enough for GC to happen, whereas having it at the spot where, for example, substr or split creates it would be paid for all strings, even the very short-lived ones 22:19
22:24 sena_kun left
timo1 it would be bonkers if we could store in-situ characters inside a strand's pointer that normally goes to another string 22:44
no idea how often that would actually work 22:45