#moarvm on 17 January 2017 - Raku Programming Language Log

00:09 Ven joined 00:49 Ven joined 01:09 Ven joined 01:12 vendethiel joined 01:28 Ven joined 01:48 Ven joined 02:08 Ven joined 02:29 Ven joined 02:46 Ven joined 02:48 ilbot3 joined 03:04 Ven joined 03:24 Ven joined 03:26 lizmat joined 03:45 Ven joined 04:04 Ven joined 04:24 Ven joined 04:44 Ven joined 05:04 Ven joined 05:24 Ven joined 05:44 Ven joined 06:04 Ven joined 06:24 Ven joined 06:50 Ven joined 07:02 brrt joined 07:13 domidumont joined 07:19 domidumont joined 07:21 Ven joined 07:33 geekosaur joined
nwc10	good *, jnthn	08:11	Copy link Message link Add to gist Remove
	jnthn: subtle attempt to get me to create a coffee slick - table football players send a ball ricocheting round the kitchen floor under me, to distract me	08:12	Copy link Message link Add to gist Remove
	and I feel that I'll use that as a feed line (er, drink) to spam the channel with the information that we're trying to recruit coffee drinking table football players: unternehmen.geizhals.at/about/de/jobs/	08:13	Copy link Message link Add to gist Remove
	(says so in the job ads)		Copy link Message link Add to gist Remove
brrt	good *, nwc10	08:17	Copy link Message link Add to gist Remove
arnsholt	I'm decent at table football (for a Norwegian, anyways), but I've already got a job (and not that keen on moving to Vienna =)	08:18	Copy link Message link Add to gist Remove
	I'm very good at drinking coffee, though		Copy link Message link Add to gist Remove
brrt	i suck at table football and at table tennis and most table sports	08:19	Copy link Message link Add to gist Remove
nwc10	I suck too (at all these things)		Copy link Message link Add to gist Remove
brrt	caffeine intake tolerance is reasonable		Copy link Message link Add to gist Remove
nwc10	I think that they tolerate me because I drink some coffee and can do Perl OK.		Copy link Message link Add to gist Remove
brrt	nwc10: you're situated in vienna, then?	08:21	Copy link Message link Add to gist Remove
	do they speak german all the time?		Copy link Message link Add to gist Remove
nwc10	no. because I slack at tech stuff in German		Copy link Message link Add to gist Remove
	and "for some value of" German		Copy link Message link Add to gist Remove
	(they have different words for some things, which they insist they are correct)		Copy link Message link Add to gist Remove
	and the best bit - "so that word that I don't know, is it German German, Austrian German, or something anyone outside of Vienna would look at me funny for?"	08:22	Copy link Message link Add to gist Remove
	"don't know"		Copy link Message link Add to gist Remove
	(sometimes)		Copy link Message link Add to gist Remove
arnsholt	=D		Copy link Message link Add to gist Remove
nwc10	Vienna is intersting, particularly coming from London and having lived in Cambridge	08:23	Copy link Message link Add to gist Remove
	it's small enough to feel like Cambridge		Copy link Message link Add to gist Remove
	but I think I can just see (a corner of) the OPEC HQ (if I look down a particular gap from exactly the right place in the office)	08:24	Copy link Message link Add to gist Remove
	and it's the UN's number 3 city (after NY and Geneva)		Copy link Message link Add to gist Remove
	so it's not even just "a capital city"		Copy link Message link Add to gist Remove
	it's got some pretentions above that		Copy link Message link Add to gist Remove
brrt	is vienna expensive to live in?	08:25	Copy link Message link Add to gist Remove
nwc10	not by London standards :-)		Copy link Message link Add to gist Remove
	(this is hard to answer)	08:26	Copy link Message link Add to gist Remove
brrt	i guess, yeah.		Copy link Message link Add to gist Remove
	comparing 'costs of living' is really difficult accross countries		Copy link Message link Add to gist Remove
nwc10	I forget where you'd need to go to find the numbers for the "Big Mac" index	08:27	Copy link Message link Add to gist Remove
	but more useful things like the ratio of "median propery cost" to "median salary"		Copy link Message link Add to gist Remove
	or other stuff that tries to figure out how hard you need to run just to stand still		Copy link Message link Add to gist Remove
	no idea		Copy link Message link Add to gist Remove
08:35 zakharyas joined 08:37 Ven joined
arnsholt	nwc10: Big Mac index is The Economist, IIRC	08:50	Copy link Message link Add to gist Remove
nwc10	I can tell you that McDonalds here sells beer and takes Amex	08:51	Copy link Message link Add to gist Remove
	which partially makes up for the fact that it's McDonalds :-)		Copy link Message link Add to gist Remove
08:54 Ven joined
brrt	hehehe	09:04	Copy link Message link Add to gist Remove
arnsholt	Just like Denmark!	09:08	Copy link Message link Add to gist Remove
samcv	idk if you guys saw what i said in #perl6 but, we can save 1/3 of the size of the unicode name database if we compress as base 40		Copy link Message link Add to gist Remove
arnsholt	(Beer at McD's, that is)		Copy link Message link Add to gist Remove
samcv	which is pretty nice. saves 250KB out of a total of 787KB	09:09	Copy link Message link Add to gist Remove
arnsholt	Nice!		Copy link Message link Add to gist Remove
	Since your fiddling with that kind of idea: Have you tried Huffman coding it too?	09:10	Copy link Message link Add to gist Remove
samcv	nope		Copy link Message link Add to gist Remove
	i mean the strings are short. so. idk		Copy link Message link Add to gist Remove
	though i guess the whole thing could be compressed together but, still	09:11	Copy link Message link Add to gist Remove
nwc10	runtime readonly access (from memory shared between processes) usually for the (most) win		Copy link Message link Add to gist Remove
samcv	also one thing that is annoying is that it takes about as much space to store the point indexes compared to the actual unicode data	09:12	Copy link Message link Add to gist Remove
	so I need to find a way to somehow compress that		Copy link Message link Add to gist Remove
	i mean maybe I could have a bunch of smaller structs? and one for each set of codepoints?	09:13	Copy link Message link Add to gist Remove
arnsholt	Point indexes?		Copy link Message link Add to gist Remove
samcv	so I would have to store much lower numbers and could use narrower int's. actually that's a fantastic idea. have been thinking what to do about that for a while		Copy link Message link Add to gist Remove
	that map each point to a column in the unicode data struct	09:14	Copy link Message link Add to gist Remove
arnsholt	Aha		Copy link Message link Add to gist Remove
samcv	well it depends how many we end up having tbh		Copy link Message link Add to gist Remove
09:14 Ven joined
samcv	it could be a lot and it gets expensive if it goes over the length of a short	09:15	Copy link Message link Add to gist Remove
	well even short's are too big		Copy link Message link Add to gist Remove
	there's so many points		Copy link Message link Add to gist Remove
	either way, splitting up points with an offset so i can use the narrowest value will save a lot of space	09:16	Copy link Message link Add to gist Remove
nwc10	I'm not familiar enough with the data to know if this is a daft suggestion, but IIRC Unicode does tend to do things in 256 codepoint blocks	09:17	Copy link Message link Add to gist Remove
	so is there a saving if you do some sort of "long" pointer based on the block, and then a shorter offset table for each of the 256 code points in the block, and add them together?		Copy link Message link Add to gist Remove
09:18 domidumont joined
samcv	that is sort of kinda truish. but the blocks are mostly irrelevant because they basically automatically sort themself. because no row in the bitmap is identical	09:18	Copy link Message link Add to gist Remove
	so storing indexes to that bitmap is the bigger issue		Copy link Message link Add to gist Remove
	since there are 0xE01EF codepoints, well technically there are higher, but that's the highest named codepoint	09:19	Copy link Message link Add to gist Remove
	so that fits into a 20bit integer, but if we did that for all codepoints :P	09:20	Copy link Message link Add to gist Remove
	right now we _sort_ of do that	09:22	Copy link Message link Add to gist Remove
lizmat	but we're basically talking about ascending integer values ?		Copy link Message link Add to gist Remove
samcv	we still use up a 16bit integer * 52102		Copy link Message link Add to gist Remove
	even though we do apply offsets for ranges of codepoints		Copy link Message link Add to gist Remove
	yeah lizmat		Copy link Message link Add to gist Remove
	so i'm thinking of just splitting it so i only have to store a short instead	09:23	Copy link Message link Add to gist Remove
	and one of the reason we ONLY store 52,000 is because CJK ideographs and stuff the name is derived from its properties	09:24	Copy link Message link Add to gist Remove
	well. that's not exactly why but.		Copy link Message link Add to gist Remove
	but atm it's a huge if else tree	09:25	Copy link Message link Add to gist Remove
lizmat	I was reminded of some search engine internals I was involved with 14+ years ago	09:26	Copy link Message link Add to gist Remove
samcv	but yeah it does look like it does it by plane I think		Copy link Message link Add to gist Remove
lizmat	it was able to encode an offset with about .5 bit in the end	09:27	Copy link Message link Add to gist Remove
	(on average)		Copy link Message link Add to gist Remove
samcv	nice		Copy link Message link Add to gist Remove
	atm i think it's by block or something, it's divided up and i think each divided up section is deduplicated, but not the whole thing	09:28	Copy link Message link Add to gist Remove
09:29 Ven joined
jnthn	morning o/	09:37	Copy link Message link Add to gist Remove
samcv	morning jnthn		Copy link Message link Add to gist Remove
09:38 domidumont joined
samcv	also i'm guessing let's say I have a char *things[100] = { NULL, NULL.... }; it's going to take up the space of how many pointers? i mean it depends on how it stores where the pointers are to the pointers. becuase it has to store where the pointers are somewhere	09:44	Copy link Message link Add to gist Remove
jnthn	(nwc10 job add) I can say that Vienna seems really nice; when I was planning a move back to central Europe a couple of years back, it was on my shortlist of options. :)		Copy link Message link Add to gist Remove
	*ad		Copy link Message link Add to gist Remove
samcv	if anybody knows. i'm guessing obv you can't depend on anything, but worst case an array of half null pointers, the null pointers could cost the size of 2 poniters for every NULL?	09:45	Copy link Message link Add to gist Remove
jnthn	samcv: If you just have something like `static Foo *bar = { baz, NULL, wat, NULL };` you mean?	09:47	Copy link Message link Add to gist Remove
samcv	yep		Copy link Message link Add to gist Remove
jnthn	Pretty certain there's no compression of any kind on that		Copy link Message link Add to gist Remove
	NULL will take as much as any other pointer in the array		Copy link Message link Add to gist Remove
samcv	well i know AT LEAST it takes up the size of a pointer		Copy link Message link Add to gist Remove
	but somewhere it has pointers to point to the NULL pointers	09:48	Copy link Message link Add to gist Remove
jnthn	Since arrays are accesed by multiplying the element size by the index		Copy link Message link Add to gist Remove
samcv	err or any pointer		Copy link Message link Add to gist Remove
	ah ok. yeah		Copy link Message link Add to gist Remove
jnthn	So the storage of an array is elems * sizeof(elem_type)		Copy link Message link Add to gist Remove
samcv	kk, so all the pointers are all contiguous	09:49	Copy link Message link Add to gist Remove
jnthn	Yeah, a C array will be contiguous in virtual memory :)		Copy link Message link Add to gist Remove
samcv	so 1 pointer + the size of an arrays worth of pointers		Copy link Message link Add to gist Remove
jnthn	nod		Copy link Message link Add to gist Remove
samcv	so if i compress the strings, instead of a NULL pointer for something with no name having 8 bytes, it will be stored in 2 bytes instead :)	09:58	Copy link Message link Add to gist Remove
	in addition to the 1/3 size savings		Copy link Message link Add to gist Remove
10:07 dogbert17_ joined 10:08 Ven joined 10:16 brrt joined 10:28 Ven joined 10:48 Ven joined 11:08 Ven joined 11:23 Ven joined
timotimo	if you have a whole lot of prefixes like that "LATIN LOWER CASE LETTER", you can have a little table of that	11:25	Copy link Message link Add to gist Remove
	however		Copy link Message link Add to gist Remove
	if you can't just pass around a pointer into the big table of strings		Copy link Message link Add to gist Remove
	you have to malloc and free		Copy link Message link Add to gist Remove
	which ... ugh		Copy link Message link Add to gist Remove
	i imagine that problem would also exist if you use base40 for our strings	11:27	Copy link Message link Add to gist Remove
	hmm. but most of the time we're already creating a VMString from those things	11:28	Copy link Message link Add to gist Remove
	yeah, my worries are entirely unfounded. cool!	11:29	Copy link Message link Add to gist Remove
11:44 brrt joined 11:45 zakharyas joined 11:52 Ven joined
samcv	hmm for some reason storing the strings in base 40 is not any smaller. at least compiled size. it must have a way of storing a char * [1000] more efficiently than many short arrays	11:59	Copy link Message link Add to gist Remove
	not sure how to do it without pointers though. and having one array with all the pointers to the short arrays caused the file to be ridiculous.	12:00	Copy link Message link Add to gist Remove
	:\		Copy link Message link Add to gist Remove
	*file size		Copy link Message link Add to gist Remove
jnthn	samcv: Which file are you checking the size of?		Copy link Message link Add to gist Remove
samcv	unicode names file		Copy link Message link Add to gist Remove
jnthn	Yes, I meant compiled output.		Copy link Message link Add to gist Remove
samcv	as a char *unicode_names[2000] or whatever, compared to a bunch of unsigned short unicode_name_xx []	12:01	Copy link Message link Add to gist Remove
	yeah i'm talking about compiled		Copy link Message link Add to gist Remove
jnthn	Yes, which compiled file did you look at?		Copy link Message link Add to gist Remove
samcv	one I made?. all the file has in it is unicode names		Copy link Message link Add to gist Remove
jnthn	Ah, OK		Copy link Message link Add to gist Remove
	I thought maybe you'd built it into moar already		Copy link Message link Add to gist Remove
samcv	that is the only thing in it. and I even removed all NULL and empty values		Copy link Message link Add to gist Remove
jnthn	the size of moar woudln't change		Copy link Message link Add to gist Remove
samcv	heh	12:02	Copy link Message link Add to gist Remove
jnthn	but libmoar.so would :)		Copy link Message link Add to gist Remove
timotimo	how do you store the base 40 thing? C doesn't support 40 bits per array element, so you'll have to do things manually with bit masks and shifts if they are to be stored tightly		Copy link Message link Add to gist Remove
samcv	timotimo, well		Copy link Message link Add to gist Remove
	this is my script github.com/samcv/UCD/blob/master/l...Base40.pm6		Copy link Message link Add to gist Remove
	you can store 3 characters inside two short's		Copy link Message link Add to gist Remove
	err	12:03	Copy link Message link Add to gist Remove
	3 characters inside 1 short		Copy link Message link Add to gist Remove
timotimo	oh, base 40 is not 40 bits, duh		Copy link Message link Add to gist Remove
samcv	yeah		Copy link Message link Add to gist Remove
	you can even do different case if you want to get fancy		Copy link Message link Add to gist Remove
timotimo	it's a bit late in the day to still be asleep inside your brain		Copy link Message link Add to gist Remove
samcv	and use one of the extra characters as a shift	12:04	Copy link Message link Add to gist Remove
	but c is compiling it to much bigger, but it should really be 1/3 the size in raw data		Copy link Message link Add to gist Remove
timotimo	you're actually spelling it "short"?	12:05	Copy link Message link Add to gist Remove
	i'm not sure if short is the same length everywhere		Copy link Message link Add to gist Remove
samcv	possible		Copy link Message link Add to gist Remove
timotimo	i'd go extra-sure and use MVMint8 or whatever		Copy link Message link Add to gist Remove
	do you know of dwarfdump? i've used it in the past a few times to get the actual size of things, but i'm not sure how well it deals with arrays	12:06	Copy link Message link Add to gist Remove
samcv	i have not used it before		Copy link Message link Add to gist Remove
	i mean it must be storing extra pointers or things to the arrays or whatever idk how else it would be the same final size, well actually 10% bigger	12:07	Copy link Message link Add to gist Remove
	and that's not making an array of pointers to these arrays		Copy link Message link Add to gist Remove
timotimo	just dwarfdump path/to/libmoar.so and go through a pager. it's a firehose of info, but searching for identifiers from the code can get you where you need to be		Copy link Message link Add to gist Remove
12:07 Ven joined
samcv	well i'm not compiling it into moar yet	12:07	Copy link Message link Add to gist Remove
timotimo	can you show me a diff or something?		Copy link Message link Add to gist Remove
samcv	just checking on it by itself		Copy link Message link Add to gist Remove
	stand by	12:08	Copy link Message link Add to gist Remove
	gist.github.com/ad1a2161645c2ac3b6...96245d8e7e here is names.str.c		Copy link Message link Add to gist Remove
timotimo	ah, yes, it's a big'un	12:09	Copy link Message link Add to gist Remove
samcv	ye that's the string one		Copy link Message link Add to gist Remove
	uploading the base 40 encoded one now		Copy link Message link Add to gist Remove
	gist.github.com/dd924ae9336bfb1605...956ded79ea here is that one		Copy link Message link Add to gist Remove
	i am storing the number of base 40 numbers as the 1st element		Copy link Message link Add to gist Remove
timotimo	ah, indeed	12:10	Copy link Message link Add to gist Remove
samcv	but that should be smaller than a 64bit pointer, and i'd think that it should be smaller		Copy link Message link Add to gist Remove
	since with the char * it has to store the pointers + the strings, but it must be packing the data differently? idk		Copy link Message link Add to gist Remove
	i've only checked compiled size		Copy link Message link Add to gist Remove
timotimo	it could be trying to ensure aligning		Copy link Message link Add to gist Remove
	you could hexdump the file to see if you can spot lots and lots of null bytes		Copy link Message link Add to gist Remove
samcv	yeah		Copy link Message link Add to gist Remove
	maybe it's not aligning the strings but is aligning the arrays hmm	12:11	Copy link Message link Add to gist Remove
timotimo	since we don't care a lot about aligned reads, you could make one big table with all the shorts in it and then assign &bigtable[offset] to all the uniname_* thingies		Copy link Message link Add to gist Remove
samcv	yeah		Copy link Message link Add to gist Remove
timotimo	give the C compiler less rope to hang us with		Copy link Message link Add to gist Remove
samcv	^		Copy link Message link Add to gist Remove
	one big table sounds like it would work	12:12	Copy link Message link Add to gist Remove
timotimo	you might be able to get the size of the C source to be a bit smaller by using 0x (if the number in decimal is longer than the number in hex)		Copy link Message link Add to gist Remove
samcv	though how do i figure out where to go in this table for a specific point. i'm guessing the number of chars of everything is pretty long. wait actually i already computed this h/o	12:13	Copy link Message link Add to gist Remove
	ok 272267 16 bit unsigned integers		Copy link Message link Add to gist Remove
	is all the names		Copy link Message link Add to gist Remove
	but then i'd have to store the index inside the big table to access them		Copy link Message link Add to gist Remove
timotimo	maybe we only need a linear scan or prefix sum once when we write the code out to the .c file?	12:14	Copy link Message link Add to gist Remove
	i.e. not actually store it, just compute it that one time from the lengths of string		Copy link Message link Add to gist Remove
	worst case we can go through the big table and jump from one entry to the next just because at the start it has the length already written in the table	12:15	Copy link Message link Add to gist Remove
	so we read 2, so we kip ahead 3, we read 6, we skip ahead 7, we read 5, we skip ahead 6, etc etc		Copy link Message link Add to gist Remove
samcv	i am not familiar with linear scan		Copy link Message link Add to gist Remove
	hm	12:16	Copy link Message link Add to gist Remove
timotimo	oh, i mean just go through all elements with a for loop		Copy link Message link Add to gist Remove
samcv	yeah i mean when we want a name lookup we load a hash table anyway		Copy link Message link Add to gist Remove
timotimo	oh, right, we do		Copy link Message link Add to gist Remove
samcv	so could just start from 0, and store number of 16bit ints after, if we see a 0 then the char has no name		Copy link Message link Add to gist Remove
	if it's not 0 then load the name	12:17	Copy link Message link Add to gist Remove
timotimo	ah, we only need the list of starting points for the hash table anyway?	12:18	Copy link Message link Add to gist Remove
samcv	starting? well i would think we'd start at 0	12:19	Copy link Message link Add to gist Remove
	and then possibly skip certain ranges that are long enough to matter		Copy link Message link Add to gist Remove
timotimo	er, i mean, the location where each name starts		Copy link Message link Add to gist Remove
	like, do we need one array that gives us codepoint to position in table?		Copy link Message link Add to gist Remove
samcv	well if we just go through the structure once and load the hash we don't care where it starts	12:20	Copy link Message link Add to gist Remove
	no		Copy link Message link Add to gist Remove
	well. i think we can access the name from the cp with the hash, but you would know better than me		Copy link Message link Add to gist Remove
timotimo	only if we ever use the codepoint itself as a hash key		Copy link Message link Add to gist Remove
samcv	atm i think we lookup the name in the array of char *'s for looking up by cp and when looking up by name use the hash		Copy link Message link Add to gist Remove
	hm		Copy link Message link Add to gist Remove
timotimo	sounds like we do need that	12:21	Copy link Message link Add to gist Remove
samcv	we might already i'm not sure		Copy link Message link Add to gist Remove
	idk at least we supply it to the macro in two places		Copy link Message link Add to gist Remove
	i really don't know though		Copy link Message link Add to gist Remove
timotimo	our new array of codepoint-to-name might look like unsigned short *cp_to_name = {&bigtable[0], &bigtable[4], &bigtable[18], &bigtable[42], &bigtable[1337], ...}	12:22	Copy link Message link Add to gist Remove
	might want a macro for &bigtable[N] there ...		Copy link Message link Add to gist Remove
samcv	so store it in multiple big tables?		Copy link Message link Add to gist Remove
	are we going to be able to find the index in the data structure directly or have to scan through it and load it?	12:23	Copy link Message link Add to gist Remove
timotimo	we definitely can create the indices while generating the .c file		Copy link Message link Add to gist Remove
samcv	imo it should be fine if we need to load a hash for it, because using \c[whatever] already has to load it		Copy link Message link Add to gist Remove
	but there's so many codepoints		Copy link Message link Add to gist Remove
timotimo	and we should. the less stuff we have to initialize at startup time, the better		Copy link Message link Add to gist Remove
samcv	ok		Copy link Message link Add to gist Remove
	so similar to how i was thinking of splitting up the indexes from cp to the bitfield?	12:24	Copy link Message link Add to gist Remove
timotimo	i don't remember how you were going to do that, but ... probably?		Copy link Message link Add to gist Remove
samcv	so that we don't have to store really big numbers inside it and can use narrower types?		Copy link Message link Add to gist Remove
	yeah		Copy link Message link Add to gist Remove
	was going to split it up into as many would fit into a 16bit uint		Copy link Message link Add to gist Remove
timotimo	you mean so that the index fits into 16bit?	12:25	Copy link Message link Add to gist Remove
samcv	which would half the number of bytes needed to store the index. because it ends up being much more than the data needed to store the property values		Copy link Message link Add to gist Remove
	yeah		Copy link Message link Add to gist Remove
timotimo	OK. if the index always fits into a whole number, that's good		Copy link Message link Add to gist Remove
samcv	we will have like 5k-10k bitfield rows, and then like		Copy link Message link Add to gist Remove
	huge number of codepoints		Copy link Message link Add to gist Remove
timotimo	because some codepoints share bitfield rows?	12:26	Copy link Message link Add to gist Remove
samcv	err wait what am i thinking about. 10k will fit into a 16bit fine, uhm		Copy link Message link Add to gist Remove
	most do		Copy link Message link Add to gist Remove
	if you dedup it properly	12:27	Copy link Message link Add to gist Remove
timotimo	content-addressed storage :D		Copy link Message link Add to gist Remove
samcv	but yeah the names take up MUCH more space than all the other content	12:29	Copy link Message link Add to gist Remove
	even without doing all these optimizations like splitting things up		Copy link Message link Add to gist Remove
timotimo	oh twitter	12:30	Copy link Message link Add to gist Remove
samcv	?		Copy link Message link Add to gist Remove
timotimo	someone tweets #perl6: how to use ... and Daily Tech Issues also tweets that exact thing		Copy link Message link Add to gist Remove
	no matter, just some random tangent	12:31	Copy link Message link Add to gist Remove
samcv	btw here is the C code that will convert from the base 40 numbers github.com/samcv/UCD/blob/master/base40decode.c	12:33	Copy link Message link Add to gist Remove
	and it's nice because we can extend it later and add more letters	12:34	Copy link Message link Add to gist Remove
timotimo	mhm, that looks simple enough		Copy link Message link Add to gist Remove
samcv	that \n' there should prolly be a '-', but. we can remove the \0 ones and have one value be a shift		Copy link Message link Add to gist Remove
	and if it sees that character in front of another it will change case or access another character		Copy link Message link Add to gist Remove
	whatever we want really		Copy link Message link Add to gist Remove
timotimo	right, we basically do utf8 :)		Copy link Message link Add to gist Remove
samcv	hmm?	12:35	Copy link Message link Add to gist Remove
timotimo	not important		Copy link Message link Add to gist Remove
samcv	oh lol		Copy link Message link Add to gist Remove
timotimo	tbh, it's not like utf8 at all		Copy link Message link Add to gist Remove
	ok, so what's the current state of generating the .c from our list of names?		Copy link Message link Add to gist Remove
samcv	in moarvm now?		Copy link Message link Add to gist Remove
	or in my repo		Copy link Message link Add to gist Remove
timotimo	whatever's newest with our ideas and experiments	12:36	Copy link Message link Add to gist Remove
samcv	oh. well the base 40 is what we should try to go for, because 1/3 reduction in size		Copy link Message link Add to gist Remove
	and figure out some way to get the data into some way that won't waste space		Copy link Message link Add to gist Remove
timotimo	right	12:37	Copy link Message link Add to gist Remove
	do we have code to stash all our base40 values into one big table yet?		Copy link Message link Add to gist Remove
	and generate a second table that has a pointer into the big table for every codepoint?		Copy link Message link Add to gist Remove
	so we can just get_chars(bigtable[codepoint], buf) or something?	12:38	Copy link Message link Add to gist Remove
samcv	yeah i do		Copy link Message link Add to gist Remove
timotimo	cool. but that still doesn't give us a small .o file?		Copy link Message link Add to gist Remove
samcv	nope. it gets to be like 90MB		Copy link Message link Add to gist Remove
	if i remove the table of pointers though, it becomes like 5% more than just an array of char *'s	12:39	Copy link Message link Add to gist Remove
timotimo	oh	12:40	Copy link Message link Add to gist Remove
samcv	but you can checkout the repo i have. and run UCD-download.p6, then run perl6 ./UCD-gen.p6 --less=1000 or something		Copy link Message link Add to gist Remove
timotimo	don't forget if you have uniname_1, uniname_2, ... it'll also generate one entry in the symbol table for each of those		Copy link Message link Add to gist Remove
samcv	and it will generate a file in ./build/names.c	12:41	Copy link Message link Add to gist Remove
	gist.github.com/ad811b58480561061a...c69ded1e73		Copy link Message link Add to gist Remove
	this is it with --less=2000		Copy link Message link Add to gist Remove
	err i did 100. but yeah		Copy link Message link Add to gist Remove
	you can run 'make' to compile both names and bitfield.c	12:43	Copy link Message link Add to gist Remove
	bitfield.c, if you run the compiled file, bitfield. it will work fine		Copy link Message link Add to gist Remove
	print out the property values and chars for at least like non control characters up to 100 or something		Copy link Message link Add to gist Remove
	using the grapheme cluster break to figure out whether to print the character verbatim otherwise just show U+		Copy link Message link Add to gist Remove
timotimo	did you hexdump the resulting file when you compile names.c?	12:44	Copy link Message link Add to gist Remove
	it has a section in it that's just:		Copy link Message link Add to gist Remove
	00002a10: 756e 696e 616d 655f 3000 756e 696e 616d uniname_0.uninam		Copy link Message link Add to gist Remove
	00002a20: 655f 3200 756e 696e 616d 655f 3400 756e e_2.uniname_4.un		Copy link Message link Add to gist Remove
	00002a30: 696e 616d 655f 3600 756e 696e 616d 655f iname_6.uniname_		Copy link Message link Add to gist Remove
	00002a40: 3430 0075 6e69 6e61 6d65 5f34 3200 756e 40.uniname_42.un		Copy link Message link Add to gist Remove
	00002a50: 696e 616d 655f 3434 005f 4954 4d5f 6465 iname_44._ITM_de		Copy link Message link Add to gist Remove
	00002a60: 7265 6769 7374 6572 544d 436c 6f6e 6554 registerTMCloneT		Copy link Message link Add to gist Remove
	00002a70: 6162 6c65 0075 6e69 6e61 6d65 5f31 3800 able.uniname_18.		Copy link Message link Add to gist Remove
samcv	heh		Copy link Message link Add to gist Remove
timotimo	00002a80: 756e 696e 616d 655f 3132 0075 6e69 6e61 uniname_12.unina		Copy link Message link Add to gist Remove
	00002a90: 6d65 5f31 3400 756e 696e 616d 655f 3539 me_14.uniname_59		Copy link Message link Add to gist Remove
	00002aa0: 0075 6e69 6e61 6d65 5f35 3700 756e 696e .uniname_57.unin		Copy link Message link Add to gist Remove
samcv	no wonder it uses so much space		Copy link Message link Add to gist Remove
timotimo	00002ab0: 616d 655f 3136 0075 6e69 6e61 6d65 5f35 ame_16.uniname_5		Copy link Message link Add to gist Remove
	00002ac0: 3100 756e 696e 616d 655f 3130 0075 6e69 1.uniname_10.uni		Copy link Message link Add to gist Remove
	i expect that's where your overhead comes from		Copy link Message link Add to gist Remove
samcv	well then proof it must be smaller!	12:45	Copy link Message link Add to gist Remove
	well the underlying data :P		Copy link Message link Add to gist Remove
jnthn	o.O	12:46	Copy link Message link Add to gist Remove
	Is that debug data, or a linking table, or?		Copy link Message link Add to gist Remove
timotimo	the code was:		Copy link Message link Add to gist Remove
	unsigned short uniname_32[3] = {2,31041,5000};		Copy link Message link Add to gist Remove
	unsigned short uniname_33[7] = {6,8963,19253,2409,24597,20858,17600};		Copy link Message link Add to gist Remove
	unsigned short uniname_34[6] = {5,28055,32060,15014,59721,29240};		Copy link Message link Add to gist Remove
	unsigned short uniname_35[5] = {4,23253,3418,59969,11760};		Copy link Message link Add to gist Remove
	so yeah, linking data		Copy link Message link Add to gist Remove
	it might go away if we put "static" in front?		Copy link Message link Add to gist Remove
ilmari	const too?	12:47	Copy link Message link Add to gist Remove
timotimo	doesn't		Copy link Message link Add to gist Remove
ilmari	are you looking at the actual code/text segments, or debug info		Copy link Message link Add to gist Remove
timotimo	1651889 16K -rwxr-xr-x. 1 timo timo 14K Jan 17 13:43 names*	12:48	Copy link Message link Add to gist Remove
	1669101 16K -rwxr-xr-x. 1 timo timo 16K Jan 17 13:47 staticnames*		Copy link Message link Add to gist Remove
	this is without any flags, so shouldn't have -g, right?		Copy link Message link Add to gist Remove
	-O3 doesn't make it better		Copy link Message link Add to gist Remove
	OK, strip makes it go down to 8.3K		Copy link Message link Add to gist Remove
ilmari	use size, not ls		Copy link Message link Add to gist Remove
samcv	from how big?	12:49	Copy link Message link Add to gist Remove
	which file are you testing on?		Copy link Message link Add to gist Remove
ilmari	timotimo: how about const?		Copy link Message link Add to gist Remove
samcv	this is the 100 line file?		Copy link Message link Add to gist Remove
timotimo	i added const, it made ti bigger		Copy link Message link Add to gist Remove
samcv	err 100 name file		Copy link Message link Add to gist Remove
	heh		Copy link Message link Add to gist Remove
timotimo	samcv: i took your last gist with names.c in it		Copy link Message link Add to gist Remove
samcv	this ? gist.github.com/samcv/ad811b584805...c69ded1e73		Copy link Message link Add to gist Remove
	kk		Copy link Message link Add to gist Remove
timotimo	precisely	12:50	Copy link Message link Add to gist Remove
samcv	ok i'm going to generate 2000 names. that may be better for comparison	12:51	Copy link Message link Add to gist Remove
timotimo	OK		Copy link Message link Add to gist Remove
samcv	and closer to real life		Copy link Message link Add to gist Remove
	100 is a little small		Copy link Message link Add to gist Remove
timotimo	well, as close as unicode gets to real life :P		Copy link Message link Add to gist Remove
ilmari	making it static consts takes it from 8992 to 7648	12:52	Copy link Message link Add to gist Remove
samcv	i go from 198 to 125K if i strip this		Copy link Message link Add to gist Remove
	<samcv> 100 is a little small		Copy link Message link Add to gist Remove
ilmari	text data bss dec hexfilename		Copy link Message link Add to gist Remove
	2503 1408 8 3919 f4fconstnames		Copy link Message link Add to gist Remove
	1115 2752 72 3939 f63names		Copy link Message link Add to gist Remove
	1115 2752 72 3939 f63staticnames		Copy link Message link Add to gist Remove
samcv	gist.github.com/7c95f29c5f89460f5b...dd72c3e689		Copy link Message link Add to gist Remove
	err here it is		Copy link Message link Add to gist Remove
ilmari	note how it moves from data to text, so it'll be mapped shared between processes	12:53	Copy link Message link Add to gist Remove
timotimo	that is desirable		Copy link Message link Add to gist Remove
ilmari	text data bss dec hexfilename	12:54	Copy link Message link Add to gist Remove
	55983 16608 8 72599 11b97constnames		Copy link Message link Add to gist Remove
	1115 71264 256 72635 11bbbnames		Copy link Message link Add to gist Remove
	1115 71264 256 72635 11bbbstaticnames		Copy link Message link Add to gist Remove
	the 2k-name one		Copy link Message link Add to gist Remove
samcv	all three of those are 2k name ones?	12:55	Copy link Message link Add to gist Remove
ilmari	yes		Copy link Message link Add to gist Remove
samcv	so we want static but not const?		Copy link Message link Add to gist Remove
ilmari	we want both, if they're actually constant	12:56	Copy link Message link Add to gist Remove
samcv	even if the size is bigger?		Copy link Message link Add to gist Remove
ilmari	the total size is 200 bytes bigger, but the actual data is moved from data (unshared) to text (shared)	12:57	Copy link Message link Add to gist Remove
samcv	ah ok		Copy link Message link Add to gist Remove
ilmari	so you'll save 55k per process		Copy link Message link Add to gist Remove
samcv	200 bytes is not much		Copy link Message link Add to gist Remove
	nice	12:58	Copy link Message link Add to gist Remove
nwc10	200 bytes should be enough for ilmari's lunch :-)		Copy link Message link Add to gist Remove
samcv	i get this warning though initialization discards ‘const’ qualifier from pointer target type	12:59	Copy link Message link Add to gist Remove
timotimo	right		Copy link Message link Add to gist Remove
	you need to put a const after the *		Copy link Message link Add to gist Remove
samcv	yeah. just noticed that		Copy link Message link Add to gist Remove
	damn you search and replace	13:00	Copy link Message link Add to gist Remove
timotimo	also, the file still contains all the uniname_* strings :(		Copy link Message link Add to gist Remove
ilmari	timotimo: that's just the debug info		Copy link Message link Add to gist Remove
timotimo	not when stripped, though		Copy link Message link Add to gist Remove
	'k		Copy link Message link Add to gist Remove
ilmari	which doesn't actually get mapped at runtie		Copy link Message link Add to gist Remove
	s/tie/time/		Copy link Message link Add to gist Remove
timotimo	x-wing vs runtime fighter		Copy link Message link Add to gist Remove
ilmari	randomascii.wordpress.com/2017/01/...nst-there/	13:01	Copy link Message link Add to gist Remove
timotimo	neat	13:02	Copy link Message link Add to gist Remove
13:09 Ven joined
timotimo	we could totally get sizes of moar and libmoar.so from statisfiable	13:09	Copy link Message link Add to gist Remove
	though ... we'd probably want per-moar-commit rather than per-rakudo-commit resolution there	13:10	Copy link Message link Add to gist Remove
samcv	that would be cool	13:16	Copy link Message link Add to gist Remove
timotimo	i asked in #whateverable	13:17	Copy link Message link Add to gist Remove
13:26 Ven joined
ilmari	nwc10: 🌯 time!	13:28	Copy link Message link Add to gist Remove
13:37 ilmari[m] joined
timotimo	hm, nqp-m -e '' takes about 14000 maxresidentk, a reduction of 250k would almost be noticable :3	13:45	Copy link Message link Add to gist Remove
	but with rakudo ... you'd hardly feel it at all :(		Copy link Message link Add to gist Remove
samcv	timotimo, how do I go from a unsigned short *, and iterate over the values?	13:48	Copy link Message link Add to gist Remove
	do I have to do bitwise operations to do that?	13:49	Copy link Message link Add to gist Remove
	i have never tried to do this before... never needed to. just used normal ints		Copy link Message link Add to gist Remove
13:53 ggoebel joined
timotimo	depends	14:13	Copy link Message link Add to gist Remove
	if you want to use pointer arithmetic, i.e. p++, or if you can deal with an index into the thing		Copy link Message link Add to gist Remove
	i don't actually know what your current use case is, so i can't advise well	14:14	Copy link Message link Add to gist Remove
samcv	storing pointers to the ararys in another array	14:15	Copy link Message link Add to gist Remove
	for the unicode names		Copy link Message link Add to gist Remove
timotimo	ah		Copy link Message link Add to gist Remove
	i'd spell that &otherarray[index]		Copy link Message link Add to gist Remove
	like, as constant		Copy link Message link Add to gist Remove
	actually, you could even store indices into the otherarray in an array and do an extra indirection when looking up stuff	14:23	Copy link Message link Add to gist Remove
samcv	what do you mean otherarray. you mean the array which contains pointers to the arrays?		Copy link Message link Add to gist Remove
timotimo	um		Copy link Message link Add to gist Remove
samcv	the data array or the indices array?		Copy link Message link Add to gist Remove
timotimo	i was still thinking of when i suggested to have all data in one gigantic array	14:24	Copy link Message link Add to gist Remove
samcv	yeah. i think that would be decent		Copy link Message link Add to gist Remove
	I should see how big that would end up		Copy link Message link Add to gist Remove
14:52 Ven joined 15:04 zakharyas joined 15:23 Ven joined
timotimo	i'm listening :P	15:39	Copy link Message link Add to gist Remove
16:02 brrt joined 16:18 brrt joined 16:22 zakharyas joined 16:28 colomon joined 16:30 brrt joined 16:38 Ven joined 16:53 Ven joined 18:08 Ven joined 18:14 Geth joined 18:17 zakharyas joined 18:24 domidumont joined 18:34 domidumont joined 18:50 FROGGS joined
nine	What do you people use for profiling moar?	19:09	Copy link Message link Add to gist Remove
timotimo	perf for rough ideas of what's going on (with -g) and callgrind if i need more details	19:10	Copy link Message link Add to gist Remove
	aaw, perf c2c will be available starting with linux 4.10, but i'm still getting 4.9 kernels	19:13	Copy link Message link Add to gist Remove
nine	Is "MVM_CALLSTACK_REGIION_SIZE" really correct or is the double I a typo?	19:45	Copy link Message link Add to gist Remove
timotimo	it's being used consistently at least :)	19:46	Copy link Message link Add to gist Remove
Geth	arVM: 21d7d6e603 \| (Stefan Seifert)++ \| 2 files Fix typo in MVM_CALLSTACK_REGIONS_SIZE's name Hopefully reduces confusion and distraction for the next one to dig into this code	19:47	Copy link Message link Add to gist Remove
nine	So, where were I?	19:48	Copy link Message link Add to gist Remove
timotimo	were you singing death, death, death, death, devil, devil, evil, evil songs?	19:50	Copy link Message link Add to gist Remove
20:00 domidumont joined 20:10 Ven joined 20:26 Ven joined
jnthn	Heh, I can to read the thing three times to spot the doubled letter :P	20:28	Copy link Message link Add to gist Remove
	*I had		Copy link Message link Add to gist Remove
20:41 Ven joined 20:55 Ven joined 21:03 zakharyas joined 21:40 Ven joined 21:55 Ven joined 22:40 cygx joined
cygx	so there's some weirdness happening when trying to build dyncall (and dyncallback specifically) with the latest 32-bit version of Strawberry Perl	22:42	Copy link Message link Add to gist Remove
yoleaux2	30 Dec 2016 21:41Z <samcv> cygx: i will look at it as soon as I wake up again. if i misinterpreted the bug		Copy link Message link Add to gist Remove
cygx	for some reason I've yet to figure out, make things the C files should be compiled with C++		Copy link Message link Add to gist Remove
	consequently, it will first fail to not find the dyncall header (as CFLAGS will not be picked up), and it will finally fail to link due to C++ name mangling	22:43	Copy link Message link Add to gist Remove
	good night o/	22:45	Copy link Message link Add to gist Remove

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!