26 Apr 2024
nine __builtin_clzll is available on gcc and clang and compiles to a single assembly instruction which counts the number of leading 0s (for the number 0 we don't get an answer, so | it with 1 to have always at least 1 bit set). Thus to get the number of the most significant bit set we subtract that from 64. 08:10
Divide that by 3 to get the rough estimate of decimal digits for this number of binary digits and add 1 to avoid the missing digit when the division rounded down.
For a comparison of the results with the perfect answer run rakudo -e 'say "{(2 ** $_).Str.chars} {$_ div 3 + 1}" for ^64' 08:11
Callgrind seems to indicate that this makes the code actually faster. But I'm not sure I interpret callgrind's output correctly. Haven't used it in years, so that may just be wrong. 08:12
However: this whole investigation made me wonder about something completely different: 08:13
There's a clear bias here. The smaller the number, the more often it will appear in real world scenarios. Even your benchmark uses just the numbers up to 10 million. Indeed the same is true in general: the shorter the string the higher the lightlyhood of it appearing. 08:15
MVMString stores strings in a variety of representations: MVMGrapheme32, MVMGraphemeASCII, MVMGrapheme8 and MVMStringStrand. They all have in common that the actual string data lives in a malloced memory area. 08:17
This can mean that for a stringified 3 digit number, we alloc just 3 bytes of memory and then store an 8 byte pointer in MVMString pointing at this tiny memory area.
Depending on the allocator this 3 byte area may actually waste 4 bytes of memory or even 8. 08:18
Since MVMString already has all the machinery to distinguish between those different storage types, why not introduce one more for short strings that stores the characters directly in the MVMStringBody instead of those pointers? 08:19
I.e. union { MVMGrapheme32 *blob_32; MVMGraphemeASCII *blob_ascii; MVMGrapheme8 *blob_8; MVMStringStrand *strands; void *any; MVMGrapheme8[8] chars; } 08:20
I think this fares even better as it avoids the costly division: int msb = 64 - __builtin_clzll(i | 1); char *buffer = MVM_malloc(msb >> 1 - msb >> 3 - msb >> 4 + 1); 08:35
Actually since it's just 64 input values, we could also just use a lookup-table. 09:14
lizmat nine: this feels like an approach that Perl also uses internally for small strings? 09:23
I mean, the idea feels very familiar
nine Oops.... got my operator precedence wrong. Should be: char *buffer = MVM_malloc((msb >> 1) - (msb >> 3) - (msb >> 4) + 1); 09:33
Lookup-table seems to be the clear winner. It's most correct and also seems to be the fastest of the suggested options. Caveat being that correct handling of negative numbers is still missing (those will have lots of leading 1s instead of 0s) 09:47
m: (^64).map({(2 ** ($_ + 1) - 1).Str.chars}).join(", ").say
camelia 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 19, 20
japhb nine: I think a few years back someone (timo?) created a branch trying to do what you're talking about, including very short strings directly into the MVMStringBody. Sadly I don't know the branch name at this point .... 20:16
lizmat my $M := MoarVM::Bytecode.new("c"); 20:23
my %b is Bag = $M.map(*.name);
.say for %b.sort(-*.value).head(10);
set => 121516 20:24
dispatch_o => 108447
decont => 97621
wval => 79660
const_s => 61392
getcode => 60319
getlex => 32927
assertparamcheck => 25808
wval_wide => 25737
checkarity => 24944
bytecode introspection FTW
in the next release of MoarVM::Bytecode.... tomorrow 20:25
nine japhb: yeah I can't shake the feeling that this has been tried before and timo would be just the one to do that. 20:33
japhb: the branch is called in-situ-strings :) 20:34
Looks like exactly what I was thinking of including being able to store 2 grapheme32 in-situ (well I didn't have that word for it). Only thing it misses is using this storage format for "small" stringified numbers. 20:39
lizmat How big is the chance that a current MoarVM would be able to run a precompiled file of an older MoarVM? Would that even happen nowadays? 20:55
asking because it feels like the DEPRECATED opcodes could be removed
or at least, recycled
[Coke] IIRC, they have been recycled in the past. 20:56
I think when you bump the bytecode version number. 20:57
(is a point when it can happen)
MasterDuke ++nine for grant 20:58
[Coke] but IANAMD
lizmat doctor? 20:59
MasterDuke i was also trying to find the in-situ-string branch and couldn't remember the name and found this instead github.com/MoarVM/MoarVM/tree/shor...ring_cache
might make sense also 21:00
[Coke] moar develoer
lizmat hehe
afk& 21:01
MasterDuke timo1: any memory of the state of those branches? how much work would they need before ready to merge to main? 21:03