#moarvm on 16 February 2014 - Raku Programming Language Log

github.com/moarvm/moarvm \| IRC logs at irclog.perlgeek.de/moarvm/today Set by moderator on 28 October 2013.
dalek	arVM: 0017869 \| jnthn++ \| src/io/syncpipe.c: Avoid an assertion fail on Windows.	00:23	Copy link Message link Add to gist Remove
	arVM: d364f5e \| jnthn++ \| src/io/syncstream.c: Correct a thinko.		Copy link Message link Add to gist Remove
	arVM: 705d815 \| jnthn++ \| / (12 files): Support a custom separator in sync streams. Well, at least a 1-char one. Enough for the tests.	01:05	Copy link Message link Add to gist Remove
jnthn	The only thing left behind in MVMOSHandle from the previous design now is directory handles.	01:12	Copy link Message link Add to gist Remove
	I'll leave those for tomorrow. Once that's done, then IO refactor is done.	01:19	Copy link Message link Add to gist Remove
01:44 woolfy joined 02:46 FROGGS_ joined 03:03 cognominal joined
timotimo	hmm. when perl6-m calls MVM_repr_box_num for the very first time - which seems to come from the datetime setup stuff it does when loading the core.setting? - it already uses 70 megabytes of resident ram	03:38	Copy link Message link Add to gist Remove
	the very first call to box_int however is at only 3 megabytes of resident ram	03:39	Copy link Message link Add to gist Remove
04:06 colomon joined
dalek	arVM: 5f86fb0 \| jimmy++ \| src/ (6 files): Add const keyword	04:13	Copy link Message link Add to gist Remove
timotimo	gist.github.com/timo/9ec85de3f926f3f299dd	04:21	Copy link Message link Add to gist Remove
	i wonder if these share the pointer to string memory or if they all have their own copy of the actual string itself	04:23	Copy link Message link Add to gist Remove
dalek	arVM: bf9c708 \| jimmy++ \| tools/ (2 files): updated tools/windows1252_cp_gen.p6	04:26	Copy link Message link Add to gist Remove
timotimo	strings:	04:35	Copy link Message link Add to gist Remove
	0 [================================================== 7941		Copy link Message link Add to gist Remove
	1 [ 13		Copy link Message link Add to gist Remove
	MVMStringBody.flags of the strings in the gen2		Copy link Message link Add to gist Remove
	seems like there's a decent memory win to be had by investing a little more time into trying to build strings with 8 bits instead of 32 bits per character		Copy link Message link Add to gist Remove
	but first i'll try to get some rest before it becomes day again	04:36	Copy link Message link Add to gist Remove
nwc10	jnthn++ # Moar Pandas	07:30	Copy link Message link Add to gist Remove
FROGGS_	good morning	07:38	Copy link Message link Add to gist Remove
09:00 FROGGS_ joined 10:03 lizmat joined 10:05 lizmat joined 10:32 crab2313 joined
jnthn	timotimo: At the moment, our hash key computation works only on 32-bit wide, so it forces things to that before doing it.	11:03	Copy link Message link Add to gist Remove
	timotimo: So that piece needs a tweak first.	11:04	Copy link Message link Add to gist Remove
12:56 crab2314 joined 12:58 crab2313 joined
timotimo	ah, hm.	13:47	Copy link Message link Add to gist Remove
	where do i find sthe hash function then?	13:48	Copy link Message link Add to gist Remove
jnthn	3rdparty/uthash.h or so	13:49	Copy link Message link Add to gist Remove
	But it only cares about blobs of memory so far.		Copy link Message link Add to gist Remove
timotimo	ah, mhm.	13:50	Copy link Message link Add to gist Remove
	did you see how often some of the strings get duplicated?	13:51	Copy link Message link Add to gist Remove
	there's a whole lot of code in there :\|	14:10	Copy link Message link Add to gist Remove
	i don't feel up to that task :(	14:25	Copy link Message link Add to gist Remove
	maybe there's some way to get a bit of dedup for the strings, though? perhaps we're accidentally copying in some place instead of re-using the string objects or something?	14:28	Copy link Message link Add to gist Remove
dalek	Heuristic branch merge: pushed 43 commits to MoarVM/gdb-support by timo	15:05	Copy link Message link Add to gist Remove
	arVM/gdb-support: b62c422 \| (Timo Paulssen)++ \| / (2 files): the gdb thing ought to live in tools/. it needs to be symlinked to where the moar binary lives anyway, so why not move it out of the way a little bit.		Copy link Message link Add to gist Remove
	Heuristic branch merge: pushed 25 commits to MoarVM by timo		Copy link Message link Add to gist Remove
15:25 crab2313 joined 16:36 d4l3k_ joined 17:04 rblackwe_ joined, dagurval_ joined
	timotimo is recording how much extra space utf8_decode wastes by not shrinking the buffer after decoding	17:24	Copy link Message link Add to gist Remove
	though it doesn't count that strings may get cleaned up, it's at about 64 bytes per string on average that we allocate too much		Copy link Message link Add to gist Remove
	but that's in compiling the setting	17:25	Copy link Message link Add to gist Remove
	otherwise it's at usually 33 bytes per string wasted		Copy link Message link Add to gist Remove
	for -e 'say 1' it outputs: 32000: 1083448 wasted (33.857750 per string)		Copy link Message link Add to gist Remove
	so 32k strings allocated, 1 megabyte wasted on that		Copy link Message link Add to gist Remove
	not terribly worrying, it seems to me		Copy link Message link Add to gist Remove
jnthn	True, though a meg win is kinda nice too	17:26	Copy link Message link Add to gist Remove
timotimo	i'll put in a realloc to see if it's only a meg or if the many copied strings make a bigger difference, or if the cleanups of the strings makes the difference smaller in practice	17:27	Copy link Message link Add to gist Remove
jnthn	Might also be checking what our first guess at the size is.	17:29	Copy link Message link Add to gist Remove
timotimo	16, eh?	17:30	Copy link Message link Add to gist Remove
	maybe.		Copy link Message link Add to gist Remove
	could even run-time tune that	17:31	Copy link Message link Add to gist Remove
	0.20user 0.02system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 89432maxresident)k		Copy link Message link Add to gist Remove
	that's -e 'say 1' with shrunk strings		Copy link Message link Add to gist Remove
	0.18user 0.04system 0:00.22elapsed 99%CPU (0avgtext+0avgdata 90388maxresident)k		Copy link Message link Add to gist Remove
	that's without shrunk strings	17:32	Copy link Message link Add to gist Remove
jnthn	Oh...I think a better guess is the number of codepoints will be the number of bytes we're decoding?		Copy link Message link Add to gist Remove
	As that covers the ASCII case correctly		Copy link Message link Add to gist Remove
timotimo	right		Copy link Message link Add to gist Remove
dalek	arVM: 1c64183 \| jnthn++ \| src/6model/reprs/MVMOSHandle.h: Remove unused symbol.	17:33	Copy link Message link Add to gist Remove
	arVM: 198c5d7 \| jnthn++ \| src/io/dirops.c: Move directory listing to new IO scheme.		Copy link Message link Add to gist Remove
jnthn	Anyway, a one line (or so) addition to save a meg seems worth it to me.		Copy link Message link Add to gist Remove
timotimo	i only shrink if it would save more than 4 32-bit integers in the buffer	17:34	Copy link Message link Add to gist Remove
	i should measure how often the shrink is actually needed		Copy link Message link Add to gist Remove
	(crazy strings, they are!)		Copy link Message link Add to gist Remove
dalek	arVM: 40029f3 \| jnthn++ \| src/6model/reprs/MVMOSHandle.h: Finish cleanup of OSHandle. Now everything is switched over to the new IO model.		Copy link Message link Add to gist Remove
timotimo	also, *=2ing the buffer all the time may not be a good idea if we're in the last 5% of the source string ;)	17:35	Copy link Message link Add to gist Remove
	actually, wouldn't the utf8 source string always be bigger than the number of codepoints?		Copy link Message link Add to gist Remove
	so starting at 16 in any case would be unwise either way?		Copy link Message link Add to gist Remove
jnthn	Well, identical for ASCII.		Copy link Message link Add to gist Remove
	And an over-estimate for other things		Copy link Message link Add to gist Remove
	So yeah you could turn the resize into an "just make sure we don't overflow it" sanity check.	17:36	Copy link Message link Add to gist Remove
timotimo	i've done that now, the amount of wastage has decreased sharply		Copy link Message link Add to gist Remove
jnthn	it's more efficient too since we're not realloc'ing, which may have to copy	17:37	Copy link Message link Add to gist Remove
timotimo	32000: 1412 wasted (0.044125 per string, shrunk 54 times)	17:39	Copy link Message link Add to gist Remove
	this is the core.setting compilation		Copy link Message link Add to gist Remove
	0.20user 0.03system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 89252maxresident)k	17:40	Copy link Message link Add to gist Remove
	another 200kbytes less		Copy link Message link Add to gist Remove
	i'll clean up the patch and commit it		Copy link Message link Add to gist Remove
	do we actually decode any utf16 in a regular program?	17:41	Copy link Message link Add to gist Remove
jnthn	No		Copy link Message link Add to gist Remove
timotimo	should i just copy the logic to the other encodings? latin1?		Copy link Message link Add to gist Remove
	actually, latin1 wouldn't ever need to do anything		Copy link Message link Add to gist Remove
jnthn	I'd hope latin1 just allocates the right size straight away	17:42	Copy link Message link Add to gist Remove
	So, another megabyte off with this?		Copy link Message link Add to gist Remove
dalek	arVM: aac8e0a \| (Timo Paulssen)++ \| src/strings/utf8.c: better estimate and perhaps shrink utf8 decoded buffers		Copy link Message link Add to gist Remove
timotimo	yes, another megabyte		Copy link Message link Add to gist Remove
	how do you feel about "smallstring"? if the string is so short it fits into a pointer, set a flag and use the pointer instead of a buffer?	17:43	Copy link Message link Add to gist Remove
	oh, wait, we have 32bit big codepoints there	17:44	Copy link Message link Add to gist Remove
	not 8 bit codepoints		Copy link Message link Add to gist Remove
jnthn	Yeah...		Copy link Message link Add to gist Remove
	Maybe some day it's worth it.		Copy link Message link Add to gist Remove
timotimo	yup		Copy link Message link Add to gist Remove
	until then, some other way of deduplicating very short strings may be in order		Copy link Message link Add to gist Remove
jnthn	I think we should try and clean up/fix what we have first...	17:45	Copy link Message link Add to gist Remove
timotimo	gist.github.com/timo/9ec85de3f926f3f299dd <- did you see this? this is just from looking at the nursery at some random point.		Copy link Message link Add to gist Remove
jnthn	How many times is the empty string duplicated?		Copy link Message link Add to gist Remove
timotimo	if i'm looking at it correctly, it seems like: often.		Copy link Message link Add to gist Remove
jnthn	Well, de-duplicating in the nursery isn't so worth it.		Copy link Message link Add to gist Remove
	I mean, those are most likely going to get collected		Copy link Message link Add to gist Remove
timotimo	right, but a whole bunch of strings ought to have found their way into the gen 2, too.	17:46	Copy link Message link Add to gist Remove
	i should actually properly investigate that		Copy link Message link Add to gist Remove
jnthn	Not if they're being produced afresh.		Copy link Message link Add to gist Remove
timotimo	mhm		Copy link Message link Add to gist Remove
jnthn	When was that nursery snapshot taken?	17:47	Copy link Message link Add to gist Remove
timotimo	i don't remember :(		Copy link Message link Add to gist Remove
	pretty early i guess		Copy link Message link Add to gist Remove
	in the empty program		Copy link Message link Add to gist Remove
jnthn	I wonder if all the :, <, prec, etc. are from <O(...)>	17:49	Copy link Message link Add to gist Remove
timotimo	well, the empty program doesn't call the allocater at all :)		Copy link Message link Add to gist Remove
	timotimo has a bigger nonsense-program	17:50	Copy link Message link Add to gist Remove
jnthn	gonna do dinner and stuff...will look at the STable repos issue afterwards.	17:51	Copy link Message link Add to gist Remove
timotimo	hm. string histogram seems b0rked ATM		Copy link Message link Add to gist Remove
	oh, yes, i only collect the flags, not the values right now	17:52	Copy link Message link Add to gist Remove
	i was editing the file pre-mv m)	17:55	Copy link Message link Add to gist Remove
	oh, i forgot to set the sampling rate up	17:57	Copy link Message link Add to gist Remove
	and i should also grab the MVMString from inside P6str.	18:00	Copy link Message link Add to gist Remove
	oh, i don't even have to, those are put into the memory separately and are just pointered at	18:01	Copy link Message link Add to gist Remove
jnthn	right	18:02	Copy link Message link Add to gist Remove
benabik	Compile errors on OS X with the new directory I/O bits. Most are fixed by adding an #include <dirent.h>, but one isn't...		Copy link Message link Add to gist Remove
	src/io/dirops.c:343:79: error: no member named 'encoding_type' in 'MVMIODirIter'		Copy link Message link Add to gist Remove
	return MVM_string_decode(tc, tc->instance->VMString, "", 0, data->encoding_type);		Copy link Message link Add to gist Remove
jnthn	whoa...just discovered I've been handed so many round tuits over the last few year's conferences, if I pile them all up I get a tower a foot high!	18:03	Copy link Message link Add to gist Remove
timotimo	:D	18:04	Copy link Message link Add to gist Remove
	gen 2 looks quite like this:	18:05	Copy link Message link Add to gist Remove
	'ctxsave' [================================================== 9		Copy link Message link Add to gist Remove
	'__6MODEL_CORE__' [============================================ 8		Copy link Message link Add to gist Remove
	'Regex' [============================================ 8		Copy link Message link Add to gist Remove
	the majority of strings appear twice or once		Copy link Message link Add to gist Remove
	before the very first nursery run, the nursery looks like this:		Copy link Message link Add to gist Remove
	'' [================================================== 102		Copy link Message link Add to gist Remove
	':' [============================================== 94		Copy link Message link Add to gist Remove
tadzik	:D		Copy link Message link Add to gist Remove
timotimo	'<' [======================================== 83		Copy link Message link Add to gist Remove
tadzik	why do we need such strings btw?	18:06	Copy link Message link Add to gist Remove
timotimo	for sad emoticons :<		Copy link Message link Add to gist Remove
tadzik	:<		Copy link Message link Add to gist Remove
timotimo	exactly!		Copy link Message link Add to gist Remove
tadzik	<:		Copy link Message link Add to gist Remove
	always look on the right side of life		Copy link Message link Add to gist Remove
	timotimo++ # awesome work	18:07	Copy link Message link Add to gist Remove
timotimo	thank you :3		Copy link Message link Add to gist Remove
	tadzik: it'll take a whole lot of time to nibble away the startup memory usage if it takes me like 1 week for each megabyte	18:09	Copy link Message link Add to gist Remove
	it's likely going to get harder over time, though		Copy link Message link Add to gist Remove
benabik	Ah. If I s/_type//, it seems to work.		Copy link Message link Add to gist Remove
	And dirent.h seems to exist on Linux, so I'll just #include it in a #ifndef _WIN32	18:10	Copy link Message link Add to gist Remove
dalek	arVM: da1ae55 \| benabik++ \| src/io/dirops.c: Fix directory I/O compilation on OS X Possibly fixes errors on Linux as well.	18:19	Copy link Message link Add to gist Remove
timotimo	jnthn: i've run through a whole bunch of collections now and the gen2 seems to contain the same string at most like 10 times (though i think i'm not sampling all of the objects)	18:22	Copy link Message link Add to gist Remove
	(and 596 completely filled pages) (and 2 empty pages) (freelist with 25 entries)		Copy link Message link Add to gist Remove
	VMArray [================================================== 26254		Copy link Message link Add to gist Remove
	MVMString [=============== 8221		Copy link Message link Add to gist Remove
	P6opaque [======= 3940		Copy link Message link Add to gist Remove
	m: 256 * 596	18:23	Copy link Message link Add to gist Remove Run code
camelia	( no output )		Copy link Message link Add to gist Remove
timotimo	m: say 256 * 596		Copy link Message link Add to gist Remove Run code
camelia	rakudo-moar 230a54: OUTPUT«152576␤»		Copy link Message link Add to gist Remove
timotimo	that's about how many objects are in that size class		Copy link Message link Add to gist Remove
	m: say (26254 + 8221 + 3940) / 152576		Copy link Message link Add to gist Remove Run code
camelia	rakudo-moar 230a54: OUTPUT«0.2517762␤»		Copy link Message link Add to gist Remove
timotimo	so we sample about 1/4th of the objects		Copy link Message link Add to gist Remove
	so, it's likely that the strings that appear most often only appear about 40 times	18:24	Copy link Message link Add to gist Remove
	timotimo turns the sampling up to 100%	18:28	Copy link Message link Add to gist Remove
	with a 100% sample rate looking at a random point in time i get:	18:54	Copy link Message link Add to gist Remove
	'dba' [================================================== 31	18:55	Copy link Message link Add to gist Remove
	'prec' [============================================= 28		Copy link Message link Add to gist Remove
	'assoc' [============================================= 28		Copy link Message link Add to gist Remove
	'' [======================================== 25		Copy link Message link Add to gist Remove
	so string duplication may not be a terribly big deal	18:56	Copy link Message link Add to gist Remove
	gist.github.com/timo/41d9eb48aea43a48d35a	19:05	Copy link Message link Add to gist Remove
	i suppose it may add up		Copy link Message link Add to gist Remove
	given the "long tail" etc		Copy link Message link Add to gist Remove
	MVMCode [================================================== 15827 ← i wonder if this is right?	19:09	Copy link Message link Add to gist Remove
jnthn	timotimo: Is that in gen2?	19:21	Copy link Message link Add to gist Remove
	timotimo: It's something the lexicals => locals work should help with, anyways.	19:22	Copy link Message link Add to gist Remove
	timotimo: The flattening part, anyway.		Copy link Message link Add to gist Remove
timotimo	gen2, aye.		Copy link Message link Add to gist Remove
jnthn	Well, the flattening work will help more with making less of 'em in the nursery, I suspect.	19:25	Copy link Message link Add to gist Remove
	timotimo: "at about 97 megabytes of ram usage all in all" in gist.github.com/timo/41d9eb48aea43a48d35a	19:29	Copy link Message link Add to gist Remove
	huh? :) The whole process is < 90MB of RAM now?		Copy link Message link Add to gist Remove
timotimo	huh huh?	19:30	Copy link Message link Add to gist Remove
	that's a random program that does stuff with a list over and over and throws the list away and makes a new one	19:31	Copy link Message link Add to gist Remove
	so it's more than 90 megabytes for that reason		Copy link Message link Add to gist Remove
jnthn	Oh!	19:36	Copy link Message link Add to gist Remove
	That explains it :)		Copy link Message link Add to gist Remove
21:50 benabik joined 21:59 tgt joined 22:38 tgt joined 23:28 crab2313 joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!