#moarvm on 2 May 2019 - Raku Programming Language Log

github.com/moarvm/moarvm \| IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018.
01:23 SmokeMachine left 01:26 SmokeMachine joined 01:37 SmokeMachine left 01:38 lucasb left 01:52 SmokeMachine joined 03:09 quotable6 left, squashable6 left, unicodable6 left, undersightable6 left, notable6 left, benchable6 left, reportable6 left, statisfiable6 left, greppable6 left, evalable6 left, committable6 left, bisectable6 left, shareable6 left, releasable6 left, nativecallable6 left, bloatable6 left, coverable6 left, nativecallable6 joined 03:10 releasable6 joined, notable6 joined, bisectable6 joined 03:11 committable6 joined, undersightable6 joined, squashable6 joined, quotable6 joined 03:12 statisfiable6 joined, benchable6 joined, coverable6 joined 03:13 shareable6 joined, bloatable6 joined, unicodable6 joined, greppable6 joined 03:14 evalable6 joined, reportable6 joined 05:03 robertle left 06:08 nebuchadnezzar left, nebuchadnezzar joined 06:15 domidumont joined 06:49 brrt joined
brrt	\o	07:02	Copy link Message link Add to gist Remove
nwc10	o/		Copy link Message link Add to gist Remove
07:11 patrickb joined 08:34 robertle joined
jnthn	So yesterday I implemented some "is this point in a branch relative to this one" thing as a quick heuristic check, but was thinking, hmm, this is ad-hoc, but there must be a proper way to do this...	09:10	Copy link Message link Add to gist Remove
	Realized in the pub last night: I wanted postdominance.		Copy link Message link Add to gist Remove
	Though not sure if it's worth calculating if all we really need it for is a heuristic.	09:11	Copy link Message link Add to gist Remove
lizmat	.oO( post dominance for a post modern language )	09:12	Copy link Message link Add to gist Remove
nwc10	pubs++		Copy link Message link Add to gist Remove
09:15 zakharyas joined 09:27 dogbert11 joined 09:29 dogbert17 left
brrt	what is postdominance?	09:58	Copy link Message link Add to gist Remove
timotimo	probably very similar to the dominance we use to get the SSA form right, but instead of from the beginning towards the end it'd go from the end towards the beginning?	09:59	Copy link Message link Add to gist Remove
jnthn	brrt: Dominance but with the arrows reversed		Copy link Message link Add to gist Remove
	Or intuitively, "what do we always exit via"	10:00	Copy link Message link Add to gist Remove
	(Where dominance is "what do we always enter via")		Copy link Message link Add to gist Remove
Woodi	would be nice to have in precompiled modules but if runtime needs just heurestics then can be costly ?	10:02	Copy link Message link Add to gist Remove
timotimo	we only calculate those things for things that are measured to be hot during run time	10:06	Copy link Message link Add to gist Remove
	calculating it for everything while precompiling modules is wasteful		Copy link Message link Add to gist Remove
	especially since earlier optimizations can already throw out branches completely		Copy link Message link Add to gist Remove
jnthn	Well, also probably what the VM optimizer considers a basic block is similar to what Perl 6 does anyway, especially given it has to be re-calc'd after inlines	10:07	Copy link Message link Add to gist Remove
brrt	oh, I see	10:14	Copy link Message link Add to gist Remove
	.... that would've been a nice term to have, I had that precise problem half a year ago	10:15	Copy link Message link Add to gist Remove
10:55 brrt left 11:17 zakharyas left 11:30 domidumont left 11:31 domidumont joined
timotimo	the MVM_op_infos array is about 35 kib big	11:45	Copy link Message link Add to gist Remove
	i do believe we have a few fields that don't need to be 8 bit wide		Copy link Message link Add to gist Remove
	timotimo tries bitfield support in C for this	11:46	Copy link Message link Add to gist Remove
	28 kib, cool.		Copy link Message link Add to gist Remove
	hmm. we have an MVMuint16 num_operands, but the maximum operand coulnt we allow is 8; the higher values are just for the benefit of PHI nodes	11:47	Copy link Message link Add to gist Remove
	hm, 28.6 kib actually	11:48	Copy link Message link Add to gist Remove
	having the operands in-line with the structs themselves means we pay 8 byte for every op, which is 917 bytes	11:50	Copy link Message link Add to gist Remove
	there's only 4 ops with 8 operands, 6 with 7, 14 with 6	11:53	Copy link Message link Add to gist Remove
	silly me, it's not 917 bytes, it's 917 * 8 bytes	11:54	Copy link Message link Add to gist Remove
	m: say 2360 / (917 * 8)		Copy link Message link Add to gist Remove Run code
evalable6	0.321701		Copy link Message link Add to gist Remove
timotimo	about 66% space savings available with a more packed operands storage	11:55	Copy link Message link Add to gist Remove
	m: say 917 * 8		Copy link Message link Add to gist Remove Run code
evalable6	7336		Copy link Message link Add to gist Remove
timotimo	save 5 more kilobytes off these 28		Copy link Message link Add to gist Remove
	that's my useless timewaste for the day, thank you very much	11:56	Copy link Message link Add to gist Remove
11:58 brrt joined
timotimo	but if we see what spesh and the bytecode validator tend to need vs not need, we might get the tiniest bit of performance out of less cache contention	12:00	Copy link Message link Add to gist Remove
brrt	Well, yes, spesh etc use the op_infos table quite heavily	12:05	Copy link Message link Add to gist Remove
	so a saving would be nice :-)		Copy link Message link Add to gist Remove
timotimo	oh, i have to add 917 * 2 back for the savings, because we'll obviously need an index into the ops list		Copy link Message link Add to gist Remove
	(though actually that can re-use parts that repeat, which would be cool)		Copy link Message link Add to gist Remove
	collect all operands bits, calculate overlaps, tile the shortest array with it, make update_ops.p6 take ten minutes to complete	12:07	Copy link Message link Add to gist Remove
	i'm kind of sort of interested now		Copy link Message link Add to gist Remove
	FWIW, tools/lib/MAST/Ops.pm has a setup much like this already :)	12:08	Copy link Message link Add to gist Remove
	completely without overlap finding, though		Copy link Message link Add to gist Remove
brrt	well, if you first make update_ops.p6 take ten minutes, we'll all have a challenge to reduce that to a couple seconds	12:11	Copy link Message link Add to gist Remove
timotimo	:D	12:12	Copy link Message link Add to gist Remove
	a simple greedy algorithm should be just fine and very fast		Copy link Message link Add to gist Remove
brrt	so, you know what else I found		Copy link Message link Add to gist Remove
	I expect that tiling will Just Work on a linear IR		Copy link Message link Add to gist Remove
timotimo	oh?	12:13	Copy link Message link Add to gist Remove
	that sounds good		Copy link Message link Add to gist Remove
	no changes needed for that part of the code		Copy link Message link Add to gist Remove
brrt	well, some changes		Copy link Message link Add to gist Remove
timotimo	which i imagine is a difficult part of the code		Copy link Message link Add to gist Remove
	like the rest of the code :)		Copy link Message link Add to gist Remove
brrt	I expect it will be simpler		Copy link Message link Add to gist Remove
	right now, we do a postorder (bottom-up) iteration, and then a top-down iteration	12:14	Copy link Message link Add to gist Remove
	but all we really need bottom-up for, is to ensure that nodes are processed before the nodes that reference them;	12:15	Copy link Message link Add to gist Remove
	that is easily ensured by the order of the linear IR		Copy link Message link Add to gist Remove
timotimo	nice		Copy link Message link Add to gist Remove
	sounds like a whole phase can be tossed		Copy link Message link Add to gist Remove
brrt	and the inverse is true for top-down		Copy link Message link Add to gist Remove
	not sure about the phase, but we can translate a tree traversal to a linear iteration, without a change in semantics	12:16	Copy link Message link Add to gist Remove
timotimo	ah, sweet		Copy link Message link Add to gist Remove
brrt	another thing that the tiler currently does, is create the basic blocks		Copy link Message link Add to gist Remove
	that would be handled in an earlier phase		Copy link Message link Add to gist Remove
	(and assigning labels)	12:17	Copy link Message link Add to gist Remove
	because it translates between the graph form, and the linear form		Copy link Message link Add to gist Remove
	so if we have a linear form throughout, we no longer need that translation (it'll happen in the perl comopiler)		Copy link Message link Add to gist Remove
	and finally. Because of the linear IR, I can ensure that any operand loads are emitted before the template that references them. This will eliminate entirely the missing-ancestor problem	12:18	Copy link Message link Add to gist Remove
	and the optimizer can move them and eliminate them if it proves that this is uneccesary	12:19	Copy link Message link Add to gist Remove
	I guess the point is that the adding an explicit relative order of operations allows you to work on that relative order	12:20	Copy link Message link Add to gist Remove
	which isn't possible in the graph IR as that leaves them unspecified		Copy link Message link Add to gist Remove
timotimo	i've got a preliminary result	12:25	Copy link Message link Add to gist Remove
brrt	I'm interested?	12:26	Copy link Message link Add to gist Remove
timotimo	it's built a list of 722 elements		Copy link Message link Add to gist Remove
	compare that to 917 * 8 elements for "no overlap attempts"	12:27	Copy link Message link Add to gist Remove
	er		Copy link Message link Add to gist Remove
	that's "not tightly packed"		Copy link Message link Add to gist Remove
	and 2360 for "tightly packed"		Copy link Message link Add to gist Remove
	that's not all yet		Copy link Message link Add to gist Remove
	because when i find no match, currently i just append the whole thing to the end without checking for a tail overlap		Copy link Message link Add to gist Remove
	m: say 722 / (917 * 8)	12:30	Copy link Message link Add to gist Remove Run code
evalable6	0.098419		Copy link Message link Add to gist Remove
timotimo	m: say 722 / (2360)		Copy link Message link Add to gist Remove Run code
evalable6	0.305932		Copy link Message link Add to gist Remove
timotimo	i had meant to work on something else today, but that's not a terrible win		Copy link Message link Add to gist Remove
	m: say 722 - (917 * 8)	12:31	Copy link Message link Add to gist Remove Run code
evalable6	-6614		Copy link Message link Add to gist Remove
timotimo	so a bit more than 6kb savings for the operands array		Copy link Message link Add to gist Remove
12:33 brrt left
timotimo	if i don't "have to" put the opcodes in at the same order as they are in the input, i can potentially make better decisions	12:41	Copy link Message link Add to gist Remove
	Cowardly refusing to permutate more than 20 elements, tried 821	12:44	Copy link Message link Add to gist Remove
	%)		Copy link Message link Add to gist Remove
	641 is my best score so far based on "just" reordering the ops randomly	12:48	Copy link Message link Add to gist Remove
	reproducible builds this is not	12:49	Copy link Message link Add to gist Remove
	except i srand($attempt-number)		Copy link Message link Add to gist Remove
	632, cool		Copy link Message link Add to gist Remove
12:50 sena_kun joined
timotimo	and it parallelizes trivially, of course	12:53	Copy link Message link Add to gist Remove
	oh, hm, but random state is per thread context, and if any of the blocks had an await in it or were otherwise descheduled, they would resume with different random values		Copy link Message link Add to gist Remove
12:55 zakharyas joined
timotimo	oh, throwing out all the 1-parameter ops at the beginning could be interesting; there's 117 of them, and 12 0-parameter ops, too	12:56	Copy link Message link Add to gist Remove
	they can just be put in at the end. either they overlap anywhere or they'll go in at the end anywhere, but they shouldn't cause a potential chain to break by being appended at the end	12:57	Copy link Message link Add to gist Remove
13:00 Guest2775 left
timotimo	i can't find the name of this problem; it should totally be a common comp-sci problem to solve	13:02	Copy link Message link Add to gist Remove
13:18 Guest55475 joined
timotimo	oh!	13:21	Copy link Message link Add to gist Remove
	write registers		Copy link Message link Add to gist Remove
	two opcodes that have a write register can only have overlapping beginnings	13:22	Copy link Message link Add to gist Remove
	only all-read-operand opcodes can be partially inside or overlapping beginning and end of other ops		Copy link Message link Add to gist Remove
13:22 lucasb joined
timotimo	([71 71 71 71 167 71 39 63 39 39 71 55 55 39 71 63 71 71 71 71 39 71 71 63 167 23 63 63 39 39 39 71 63 63 63 71 39 39 63 39 23 23 71 71 71 71 71 71 63 71 63 39 79 39 55 55 55 71 71 79 63 39 71 71 63 63 71 39 63 63 39 71 71 71 39 39 71 39 23 79 71 23 23 55 167 55 167 63 55 71 63 71 71 71 63 39 39 71 167 143 ...] 396 31)	13:27	Copy link Message link Add to gist Remove
	([2 1 1 0 0 2 1 1 1 1 1 1 1 1 1 2 0 0 0 1 1 0 1 0 2 3 2 1 0 2 0 1 4 1] 34 5)		Copy link Message link Add to gist Remove
	these are the lists followed by score followed by random seed for all operands with r/w and lex/reg/literal masked out followed by all operands with only their r/w lex/reg_literal	13:28	Copy link Message link Add to gist Remove
	if we keep those separate, which may be a PITA, we can get a much tighter packing		Copy link Message link Add to gist Remove
13:36 robertle left 13:38 robertle joined
timotimo	on top of the fact that the values themselves are now smaller, so they could be less than 8 bits per entry, but that's also annoying to deal with depending on the API we have to offer	13:39	Copy link Message link Add to gist Remove
	anyway, the r/w, lit/reg/lex bits are only 3 out of 8, so i can't just store one in 4 bits per entry and the other in 4 bits per entry	14:02	Copy link Message link Add to gist Remove
	however		Copy link Message link Add to gist Remove
	if i do a 4/4 split, even though the values aren't entirely meaningful then, i can get the whole thing stored in 240 bytes	14:03	Copy link Message link Add to gist Remove
	whereas if i use 8 bits per entry with one list for the upper 5 and one for the lower 3 bits i'd spend 430 bytes		Copy link Message link Add to gist Remove
	either is better than spending the 624		Copy link Message link Add to gist Remove
	though of course there's diminishing returns at some point, and i might be going into "actually useless" territory by now	14:04	Copy link Message link Add to gist Remove
nine	Well if you mange to make it fit in 64 bytes or so, you'd be down to a single cache line :)	14:49	Copy link Message link Add to gist Remove
timotimo	rather unlikely :D		Copy link Message link Add to gist Remove
	MFW a sequence that doesn't try to put only parts of the new operand data at the end when no match was found yet is better than sequences that try to find overlaps for the end	15:08	Copy link Message link Add to gist Remove
	gaaah tantalizingly close to getting the "start" indices for one of the arrays to fit into 8 bits	15:20	Copy link Message link Add to gist Remove
	271 entries		Copy link Message link Add to gist Remove
15:20 domidumont left 15:26 AlexDani` joined 15:29 AlexDani` is now known as AlexDaniel, AlexDaniel left, AlexDaniel joined
AlexDaniel	⚠ github.com/perl6/problem-solving/p...-488716403	15:30	Copy link Message link Add to gist Remove
15:42 robertle left 15:59 patrickb left 16:28 brrt joined 16:36 patrickb joined 16:39 zakharyas left 16:41 squashable6 left 16:42 squashable6 joined 16:47 zakharyas joined 16:55 brrt left 16:58 domidumont joined 17:12 robertle joined 17:34 zakharyas left 18:41 domidumont left 18:44 brrt joined 19:21 zakharyas joined
timotimo	only 35 of the first 820 have any marks; that's 2 bytes in every op	19:43	Copy link Message link Add to gist Remove
	m: say (820 - 35) * 2		Copy link Message link Add to gist Remove Run code
evalable6	1570		Copy link Message link Add to gist Remove
timotimo	and i think the marks are only ever needed by the validator anyway	19:44	Copy link Message link Add to gist Remove
brrt	you can extract them in a separate array?	19:48	Copy link Message link Add to gist Remove
	so that the rest is more compact		Copy link Message link Add to gist Remove
timotimo	sure		Copy link Message link Add to gist Remove
	would also be possible to have an array of spans that'd be quick to scan through		Copy link Message link Add to gist Remove
	the only two usages of the mark struct member are in validation.c and ext.c	19:52	Copy link Message link Add to gist Remove
	so that shouldn't be too bad	19:55	Copy link Message link Add to gist Remove
	moving it away would already be a good thing i'm sure		Copy link Message link Add to gist Remove
	i mean ... barely	19:56	Copy link Message link Add to gist Remove
	but, you know		Copy link Message link Add to gist Remove
	so, how widespread is bitfield support in C compilers that we target?		Copy link Message link Add to gist Remove
MasterDuke	can we set up *BSDs in travis, appveyor, or circleci? that's the main platform missing right? otherwise i'd say create a PR and see what ci reports	20:21	Copy link Message link Add to gist Remove
brrt	if a C compiler doesn't support bitfields, it's not a C compiler at all imho	20:22	Copy link Message link Add to gist Remove
timotimo	in other words, it would be safe to use bitfields in moarvm code?	20:23	Copy link Message link Add to gist Remove
20:40 zakharyas left 20:44 brrt left 21:33 sena_kun left 21:40 sena_kun joined 21:49 sena_kun left 22:02 patrickb left
mornfall	bitfields are fine, what isn't are assumptions about their layout	22:18	Copy link Message link Add to gist Remove
22:22 Kaiepi joined 22:49 sena_kun joined 23:06 lizmat_ joined 23:08 lizmat left 23:34 Kaypie joined, Kaiepi left 23:37 sena_kun left, sena_kun joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!