#moarvm on 27 January 2017 - Raku Programming Language Log

timotimo	now that unicode sequence names are case insensitive we can encode them with the same table we've used for the unicode names themselves	00:08	Copy link Message link Add to gist Remove
	encode/compress	00:09	Copy link Message link Add to gist Remove
	though the shift-1 table from the names is probably almost 100% worthless for the sequences		Copy link Message link Add to gist Remove
samcv	heh	00:24	Copy link Message link Add to gist Remove
	not totally		Copy link Message link Add to gist Remove
	not for the non-emoji ones		Copy link Message link Add to gist Remove
timotimo	how many of those exist?	00:25	Copy link Message link Add to gist Remove
samcv	www.unicode.org/Public/UCD/latest/u...uences.txt		Copy link Message link Add to gist Remove
	like 200? 300?		Copy link Message link Add to gist Remove
	lots of them have LETTER in it		Copy link Message link Add to gist Remove
	here is the gist gist.github.com/samcv/a5a8bc715d39...8e8483597c with the shift's	00:26	Copy link Message link Add to gist Remove
timotimo	oh, and all these syllables	00:28	Copy link Message link Add to gist Remove
	OK, cool		Copy link Message link Add to gist Remove
	and also WITH		Copy link Message link Add to gist Remove
samcv	yea	00:30	Copy link Message link Add to gist Remove
	oh btw	00:31	Copy link Message link Add to gist Remove
	here are the provisional ones www.unicode.org/Public/UCD/latest/u...esProv.txt		Copy link Message link Add to gist Remove
	compare		Copy link Message link Add to gist Remove
geekosaur	that kind of thing was why I poked at compressing by words; there's a lot of reuse		Copy link Message link Add to gist Remove
samcv	KEYCAP NUMBER SIGN, to the emoji listing		Copy link Message link Add to gist Remove
	keycap: #		Copy link Message link Add to gist Remove
	is what it is in the emoji		Copy link Message link Add to gist Remove
	m: say "\c[keycap: *]"		Copy link Message link Add to gist Remove Run code
camelia	rakudo-moar d06d7c: OUTPUT«*️⃣␤»		Copy link Message link Add to gist Remove
samcv	m: FE0F.uniname.say	00:32	Copy link Message link Add to gist Remove Run code
camelia	rakudo-moar d06d7c: OUTPUT«===SORRY!=== Error while compiling <tmp>␤Undeclared name:␤ FE0F used at line 1␤␤»		Copy link Message link Add to gist Remove
samcv	m: 0xFE0F.uniname.say		Copy link Message link Add to gist Remove Run code
camelia	rakudo-moar d06d7c: OUTPUT«VARIATION SELECTOR-16␤»		Copy link Message link Add to gist Remove
timotimo	that's a really short list	00:34	Copy link Message link Add to gist Remove
samcv	the provisional is		Copy link Message link Add to gist Remove
timotimo	yup		Copy link Message link Add to gist Remove
	m: say uniname 0xFE0F		Copy link Message link Add to gist Remove Run code
camelia	rakudo-moar d06d7c: OUTPUT«VARIATION SELECTOR-16␤»		Copy link Message link Add to gist Remove
samcv	but those are www.unicode.org/Public/emoji/4.0/em...uences.txt here		Copy link Message link Add to gist Remove
timotimo	ah		Copy link Message link Add to gist Remove
	that's more like it		Copy link Message link Add to gist Remove
samcv	keycap: 2, vs KEYCAP DIGIT TWO	00:35	Copy link Message link Add to gist Remove
	seems fine for us to support both		Copy link Message link Add to gist Remove
timotimo	yeah		Copy link Message link Add to gist Remove
	i'm going to bed early today		Copy link Message link Add to gist Remove
	seeya!		Copy link Message link Add to gist Remove
samcv	as long as the numbers are allowed otherwise can escape them		Copy link Message link Add to gist Remove
	bye!		Copy link Message link Add to gist Remove
timotimo	o/		Copy link Message link Add to gist Remove
	wait a minute ... we can't represent * or # in our base40 yet :o	00:36	Copy link Message link Add to gist Remove
	another shift level, then?		Copy link Message link Add to gist Remove
samcv	hah		Copy link Message link Add to gist Remove
	or just don't use shift level for it and store as normal string		Copy link Message link Add to gist Remove
	there's also non-ascii characters	00:37	Copy link Message link Add to gist Remove
timotimo	if we can signal up front that we want a normal string there		Copy link Message link Add to gist Remove
	oh, right, you mentioned that		Copy link Message link Add to gist Remove
samcv	i mean they're gonna be stored in a separate structure		Copy link Message link Add to gist Remove
timotimo	1F1E8 1F1EE ; Emoji_Flag_Sequence ; Côte d’Ivoire # 6.0 [1] (🇨🇮)		Copy link Message link Add to gist Remove
samcv	the sequences anyway		Copy link Message link Add to gist Remove
timotimo	and also Åland Islands		Copy link Message link Add to gist Remove
samcv	utf-8 encoded between "" works fine in C yes?	00:38	Copy link Message link Add to gist Remove
	it just literally takes the data put there		Copy link Message link Add to gist Remove
	into the char *		Copy link Message link Add to gist Remove
timotimo	i've never tried :)		Copy link Message link Add to gist Remove
samcv	i'm pretty sure that's true		Copy link Message link Add to gist Remove
	would make less sense it it wasn't		Copy link Message link Add to gist Remove
timotimo	stackoverflow would know. also, we want to be compatible to a whole bunch of platforms and their compilers		Copy link Message link Add to gist Remove
	to be fair, that's gcc on almost all platforms, otherwise clang or msvc	00:39	Copy link Message link Add to gist Remove
samcv	yeh it works perfect :) just tried with 🐧		Copy link Message link Add to gist Remove
	well i think C just puts whatever the fuck data you have in between		Copy link Message link Add to gist Remove
	so you can put really anything that uses 8 bits per thing		Copy link Message link Add to gist Remove
timotimo	oh		Copy link Message link Add to gist Remove
samcv	as long as it fits in 8 bits		Copy link Message link Add to gist Remove
timotimo	we could use strings for our 16bit pieces, then :P	00:40	Copy link Message link Add to gist Remove
samcv	since it's totally naive		Copy link Message link Add to gist Remove
	heh		Copy link Message link Add to gist Remove
timotimo	except it'll probably be confused by \0 in the middle, and it'll put a \0 at the end, too		Copy link Message link Add to gist Remove
samcv	madness		Copy link Message link Add to gist Remove
timotimo	anyway, thinking about this is what costs me a lot of sleep		Copy link Message link Add to gist Remove
	so i'll just drop it :)		Copy link Message link Add to gist Remove
samcv	it's not stored literally as a \0 though		Copy link Message link Add to gist Remove
	only in the table		Copy link Message link Add to gist Remove
	and that's as a char		Copy link Message link Add to gist Remove
	hahah!		Copy link Message link Add to gist Remove
	sleep well		Copy link Message link Add to gist Remove
timotimo	i'll try		Copy link Message link Add to gist Remove
	have a good day :)	00:41	Copy link Message link Add to gist Remove
MasterDuke	samcv: in radix(), for every char MVM_string_get_grapheme_at_nocheck() is called twice, which switches on the string's body.storage_type. so i pulled that check out and have a different while loop for every type	01:43	Copy link Message link Add to gist Remove
samcv	nice	01:44	Copy link Message link Add to gist Remove
MasterDuke	so it's definitely more efficient by work being done, but it makes the function about 3 times as big code-wise		Copy link Message link Add to gist Remove
	which is why i'm guessing it isn't all that much faster		Copy link Message link Add to gist Remove
samcv	hm	01:45	Copy link Message link Add to gist Remove
	link to code?		Copy link Message link Add to gist Remove
MasterDuke	diff: gist.github.com/MasterDuke17/58754...2751650bcd	01:47	Copy link Message link Add to gist Remove
01:50 pyrimidine joined
samcv	MasterDuke, would a list of unicode codepoints which have numeric values speed things up	02:02	Copy link Message link Add to gist Remove
	for the non-ascii side	02:03	Copy link Message link Add to gist Remove
	like if we had a switch		Copy link Message link Add to gist Remove
	actually the switch could have the ascii digits too		Copy link Message link Add to gist Remove
	then just go the length of the string and pass value to the switch of the grapheme	02:04	Copy link Message link Add to gist Remove
	and the compiler can optimize it		Copy link Message link Add to gist Remove
02:22 pyrimidine joined
MasterDuke	samcv: maybe	03:29	Copy link Message link Add to gist Remove
03:33 pyrimidine joined
samcv	MasterDuke, yeah i think that may be the fastest way to do it	06:49	Copy link Message link Add to gist Remove
07:04 pyrimidine joined 07:11 domidumont joined 07:18 domidumont joined 07:51 pyrimidine joined 08:22 domidumont joined 08:23 domidumont joined 08:44 zakharyas joined 08:55 pyrimidine joined 09:28 pyrimidine joined 09:42 Ven joined 10:28 pyrimidine joined 10:53 pyrimidine joined 11:20 ilmari[m] joined 11:43 pyrimidine joined 12:16 pyrimidine joined 12:35 Ven joined 13:41 pyrimidine joined 13:44 pyrimidi_ joined
samcv	.ask timotimo so I have added some indexes to names.c for every other codepoint	13:54	Copy link Message link Add to gist Remove
yoleaux2	samcv: I'll pass your message to timotimo.		Copy link Message link Add to gist Remove
samcv	see here: gist.github.com/samcv/ebbcf638b925...4dde1a510f	13:55	Copy link Message link Add to gist Remove
	also we don't store any codepoints without names anymore		Copy link Message link Add to gist Remove
	we call uninames with a codepoint number, if it returns 0 then uninames has generated the codepoint's name such as <control-0000>, otherwise it returns an offset, which is the offset for the total number of unicode names (not an offset for the codeme array per se)	13:57	Copy link Message link Add to gist Remove
	the name_index array holds the location in the codeme array of every other codepoint which HAS a name in the codeme table	13:58	Copy link Message link Add to gist Remove
	so the 0th item in the codeme table, is at 0, the 2nd one is at 4		Copy link Message link Add to gist Remove
14:30 pyrimidine joined 14:33 Ven joined 14:47 pyrimidine joined 15:01 pyrimidine joined 15:16 Ven joined 15:41 pyrimidine joined
samcv	i gotta go to bed, but I updated the gist, and the repo so it should be basically ready for you to see if you can get working	15:42	Copy link Message link Add to gist Remove
15:45 pyrimidine joined 15:58 domidumont joined 16:50 brrt joined
brrt	good *, #moarvm	16:50	Copy link Message link Add to gist Remove
brokenchicken	\o	16:51	Copy link Message link Add to gist Remove
jnthn	o/	16:52	Copy link Message link Add to gist Remove
brrt	\o	16:54	Copy link Message link Add to gist Remove
	so… what i wanted to report about		Copy link Message link Add to gist Remove
	is that explaining register allocation is actually really, really hard		Copy link Message link Add to gist Remove
	the processs itself isn't, but it requires a lot of assumptions and background about how compilation works in the first place	16:55	Copy link Message link Add to gist Remove
	so…. i'm meaning to write a blog post about having implemented linear scan, but i find myself unable to explain what i did		Copy link Message link Add to gist Remove
	this bothers me		Copy link Message link Add to gist Remove
brokenchicken	:)	17:00	Copy link Message link Add to gist Remove
brrt	maybe i can try here and if it works, i can transform it into a post		Copy link Message link Add to gist Remove
	basically; register allocation is all about ensuring that the right values are in the right registers in time for use by the instructions of the program	17:01	Copy link Message link Add to gist Remove
	there's two trivial ways to do that	17:04	Copy link Message link Add to gist Remove
	one is to compute and store every value ever in a register		Copy link Message link Add to gist Remove
	that is of course not correct for larger programs because the set of available registers is pretty small	17:05	Copy link Message link Add to gist Remove
	the second is to store-and-load every value ever from memory just before they are used		Copy link Message link Add to gist Remove
	that's correct (there is usually enough memory available to do this) but it's very much suboptimal, because a): your bytecode size grows by a bunch for all the load-and-store instructions, and b): memory operations are pretty slow if cached, and disastrously slow if not	17:06	Copy link Message link Add to gist Remove
	so what you want to do is to find the solution wherein you minimize the number of memory traffic required		Copy link Message link Add to gist Remove
	s/number/volume/		Copy link Message link Add to gist Remove
	so the first thing you need to know, is which values are necessary when	17:07	Copy link Message link Add to gist Remove
	and you typicallly have to take flow control into account with that		Copy link Message link Add to gist Remove
	i.e. you need to 'deliver' the value that was computing using whatever flow control the programmer has designed	17:08	Copy link Message link Add to gist Remove
	one of the solutions is to divide the variables into a set that you'll store in registesr, and a set that you'll keep in memory	17:09	Copy link Message link Add to gist Remove
	but that's not very cool either, since it's well possible that different control flow paths use different sets of variables; you might end up having to 'sacrifice' one path for the other	17:10	Copy link Message link Add to gist Remove
	so the solution to that is to stop looking at variables but just at the values used	17:11	Copy link Message link Add to gist Remove
	that's achieved by the SSA transformation		Copy link Message link Add to gist Remove
	SSA = single static assignment, and it transforms every variable assignment into a declaration-and-assignment of a new variable; it makes variables immutable	17:12	Copy link Message link Add to gist Remove
	or in other words, it replaces variables with their values		Copy link Message link Add to gist Remove
	the expression JIT already does this		Copy link Message link Add to gist Remove
	so to find out where a register is required for a value analyze the 'liveness' of program values; the result of this analysis is a 'live range' for every value used	17:23	Copy link Message link Add to gist Remove
	a live range is basically the set of all instructions (or in our case, things-that-represent-instructions) that either have the value as output (definitions) or as input (uses)	17:25	Copy link Message link Add to gist Remove
	that set can contain more than one definition - contrary to my earlier statement! - because definitions can be connected using PHI nodes		Copy link Message link Add to gist Remove
	and a PHI node is basically something that describes a value defined in two or more different control flow paths	17:26	Copy link Message link Add to gist Remove
nine	.oO(Memory traffic volume depends not only on how often in the code a memory slot is accessed but also how often that code is run (e.g. loops). So it's gonna be a trade off between local and global optimization)	17:27	Copy link Message link Add to gist Remove
brrt	that's exactly right :-)		Copy link Message link Add to gist Remove
	basically, you have to compute the expected cost of spilling		Copy link Message link Add to gist Remove
	decommuting &		Copy link Message link Add to gist Remove
nine	.tell brrt so as we're talking about a JIT, do you actually have some information about runtime behavior accessible in the register allocator? I.e. does it have an idea how often loops may run?	17:29	Copy link Message link Add to gist Remove
yoleaux2	nine: I'll pass your message to brrt.		Copy link Message link Add to gist Remove
18:00 pyrimidine joined 18:50 ggoebel joined 19:18 zakharyas joined 19:23 dogbert17 joined 19:26 ggoebel joined 20:29 pyrimidine joined 21:19 pyrimidine joined 21:49 domidumont joined 22:08 pyrimidi_ joined 22:18 Ven joined 22:26 ggoebel joined
MasterDuke	samcv: turns out replacing this ( github.com/MoarVM/MoarVM/blob/mast...#L375-L392 ) with a giant switch statement is 25% slower	22:33	Copy link Message link Add to gist Remove
22:33 Ven joined
MasterDuke	for 1m iterations of nqp::radix() it resulted in 300m more instructions, 250m more branches, 8m more missed branches	22:35	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!