#moarvm on 26 July 2020 - Raku Programming Language Log

github.com/moarvm/moarvm \| IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018.
00:15 sena_kun joined 00:16 Altai-man_ left 01:23 lucasb left 02:13 Altai-man_ joined 02:16 sena_kun left 04:09 leont joined 04:14 sena_kun joined 04:15 sivoais_ left, sivoais joined 04:16 Altai-man_ left 06:13 Altai-man_ joined 06:16 sena_kun left 08:14 sena_kun joined 08:16 Altai-man_ left 09:13 MasterDuke joined 10:13 Altai-man_ joined 10:16 sena_kun left 12:14 sena_kun joined 12:17 Altai-man_ left 12:47 Ven`` joined 12:59 sena_kun1 joined 14:05 sena_kun1 left 14:14 Altai-man_ joined 14:16 sena_kun left 15:45 Voldenet left 15:57 Voldenet joined, Voldenet left, Voldenet joined 16:15 sena_kun joined 16:16 Altai-man_ left 16:26 zakharyas joined 17:18 Ven`` left 17:36 Ven`` joined 18:14 Altai-man_ joined 18:16 sena_kun left 20:15 sena_kun joined 20:17 Altai-man_ left 20:51 zakharyas left 21:03 Kaeipi joined, Kaiepi left
timotimo	i wonder if the internal structure of NFA should be changed a bit	21:12	Copy link Message link Add to gist Remove
	currently we have an array of arrays for the state infos		Copy link Message link Add to gist Remove
	i.e. the "kind of edge", the target state, and the parameters, like what grapheme or some such	21:13	Copy link Message link Add to gist Remove
MasterDuke	array of classes instead?		Copy link Message link Add to gist Remove
timotimo	hm?		Copy link Message link Add to gist Remove
MasterDuke	didn't jnthn say recently that accessing class attributes was even faster than arrays?	21:14	Copy link Message link Add to gist Remove
timotimo	i'm talking about C code right now	21:15	Copy link Message link Add to gist Remove
21:16 leont left
timotimo	my thought is to put all the MVMNFAStateInfo entries in one long array and instead of having an array with "how many edges does each state have" we'd have "at what index do this state's edges start?" in one array, and one last entry for where the total array ends	21:18	Copy link Message link Add to gist Remove
	whenever we're "at" a state, we go through all the outgoing edges, checking what their "act" is to see how the check should work, then the "extra data" for whatever kind of check it is, and only when the check is successful do we use the "to" value	21:19	Copy link Message link Add to gist Remove
	i think that the percentage of successful checks can be relatively small in general	21:20	Copy link Message link Add to gist Remove
	and hopefully there's also cases where none of the checks succeed		Copy link Message link Add to gist Remove
	so my thinking is, if the "to" values live in a separate array, there would be fewer accesses to that array, and going through the rest of the data could be faster		Copy link Message link Add to gist Remove
	in other words, the stuff we always need to look at should be the only stuff that gets fetched always	21:21	Copy link Message link Add to gist Remove
	and an array of arrays could very well become a little fragmented in memory	21:22	Copy link Message link Add to gist Remove
	plus, for states that have only one or two outgoing edges anyway, the allocation overhead of malloc (or do we use the fixed size allocator already here?) could add up?		Copy link Message link Add to gist Remove
	OK, we're already using the fixed size allocator	21:24	Copy link Message link Add to gist Remove
MasterDuke	interesting	21:25	Copy link Message link Add to gist Remove
timotimo	MVMNFAStateInfo is 24 bytes big already	21:27	Copy link Message link Add to gist Remove
	moving the "to" out of it will drop that down to 16, which i imagine is a lovely tiny win	21:28	Copy link Message link Add to gist Remove
MasterDuke	how many of them are there?	21:31	Copy link Message link Add to gist Remove
timotimo	also, to is 64bit, but since it's an index into the states array, having 32bit could be enough		Copy link Message link Add to gist Remove
MasterDuke	are there ever more than 65k states?	21:32	Copy link Message link Add to gist Remove
timotimo	gist.github.com/timo/308debec42f59...796c5e545e - here's a little excerpt from the beginning of compiling the core setting	21:34	Copy link Message link Add to gist Remove
	not all nfas are deserialized from the very beginning, of course		Copy link Message link Add to gist Remove
MasterDuke	so maybe 16bit would be sufficient?	21:35	Copy link Message link Add to gist Remove
timotimo	seems like we already reach the 20k every now and then		Copy link Message link Add to gist Remove
	to be totally safe, unsigned 32bit integer should definitely be fine	21:36	Copy link Message link Add to gist Remove
21:39 Ven`` left
timotimo	mhhh ... huh? NFA's deserialize function isn't being called?	21:39	Copy link Message link Add to gist Remove
	updated the gist with what nfa_from_statelist creates	21:43	Copy link Message link Add to gist Remove
MasterDuke	hm. yeah, 16 might be too small	21:53	Copy link Message link Add to gist Remove
22:04 Kaeipi left
timotimo	at some point we could generate some mvm bytecode for extremely small nfas	22:04	Copy link Message link Add to gist Remove
	i'm not sure if there would be any significant winnings, though	22:05	Copy link Message link Add to gist Remove
	aaaaaanyway, if we have a big one like the 37k edges over 21.8k states, we'd have up to 21.8k calls to fixed_size_free to free it up, as well as the same amount of calls to fixed_size_alloc t create these arrays in the first place	22:08	Copy link Message link Add to gist Remove
	if it all goes into one big flat array, that'd just be one call, that i guess often hits malloc directly because it's too big for even the biggest fsa bins	22:09	Copy link Message link Add to gist Remove
MasterDuke	tradeoffs	22:10	Copy link Message link Add to gist Remove
timotimo	not sure there's a downside here :)		Copy link Message link Add to gist Remove
	one big malloc is better than a hundred small fsa's		Copy link Message link Add to gist Remove
MasterDuke	yeah, probably	22:11	Copy link Message link Add to gist Remove
22:15 Altai-man_ joined 22:16 sena_kun left
timotimo	some of the arguments are just an MVMGrapheme32, other times it's a MVMString *, an MVMint64, or two MVMGrapheme32	22:17	Copy link Message link Add to gist Remove
	i wonder how often it's the 32bit one, because that'd be a waste of 32bits every time		Copy link Message link Add to gist Remove
	gist.github.com/timo/308debec42f59...796c5e545e	22:22	Copy link Message link Add to gist Remove
	at the expense of one number per state, a "buffer" of mixed 64bit and 32bit values could be indexed, and then traversed one-by-one, since we're going through all edges anyway; there's a binary search optimization in there, though, which may make this a bit mre difficult	22:23	Copy link Message link Add to gist Remove
22:33 Kaiepi joined
timotimo	oh, hmm, doing the binary search thing adds edges in the middle of the nfa, though	22:49	Copy link Message link Add to gist Remove
	so i'd have to memmove a bunch of stuff ...		Copy link Message link Add to gist Remove
Geth	MoarVM/nfa_flat_statelist: 1f182b48cf \| (Timo Paulssen)++ \| Configure.pl new Configure.pl flag: --dtrace When this is turned on, we can put DTrace points in different places in the code. These DTrace trace points show up as just a single nop in the resulting binary, but an additional section in the ELF lists the available trace points along with how to get at whatever is in the arguments.	22:53	Copy link Message link Add to gist Remove
	MoarVM/nfa_flat_statelist: 60edb07c81 \| (Timo Paulssen)++ \| 2 files WIP on getting nfa to use a flat state array instead of having one array per state that has arrays for each state's outgoing edges in it, we store the index into the array where the first edge lives, and use the next number in that array to calculate the amount. ... (7 more lines)		Copy link Message link Add to gist Remove
timotimo	oh, that other commit belongs to master actually	22:54	Copy link Message link Add to gist Remove
Geth	MoarVM: 1f182b48cf \| (Timo Paulssen)++ \| Configure.pl new Configure.pl flag: --dtrace When this is turned on, we can put DTrace points in different places in the code. These DTrace trace points show up as just a single nop in the resulting binary, but an additional section in the ELF lists the available trace points along with how to get at whatever is in the arguments.	23:30	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!