dalek arVM: 0017869 | jnthn++ | src/io/syncpipe.c:
Avoid an assertion fail on Windows.
00:23
arVM: d364f5e | jnthn++ | src/io/syncstream.c:
Correct a thinko.
arVM: 705d815 | jnthn++ | / (12 files):
Support a custom separator in sync streams.

Well, at least a 1-char one. Enough for the tests.
01:05
jnthn The only thing left behind in MVMOSHandle from the previous design now is directory handles. 01:12
I'll leave those for tomorrow. Once that's done, then IO refactor is done. 01:19
01:44 woolfy joined 02:46 FROGGS_ joined 03:03 cognominal joined
timotimo hmm. when perl6-m calls MVM_repr_box_num for the very first time - which seems to come from the datetime setup stuff it does when loading the core.setting? - it already uses 70 megabytes of resident ram 03:38
the very first call to box_int however is at only 3 megabytes of resident ram 03:39
04:06 colomon joined
dalek arVM: 5f86fb0 | jimmy++ | src/ (6 files):
Add const keyword
04:13
timotimo gist.github.com/timo/9ec85de3f926f3f299dd 04:21
i wonder if these share the pointer to string memory or if they all have their own copy of the actual string itself 04:23
dalek arVM: bf9c708 | jimmy++ | tools/ (2 files):
updated tools/windows1252_cp_gen.p6
04:26
timotimo strings: 04:35
0 [================================================== 7941
1 [ 13
MVMStringBody.flags of the strings in the gen2
seems like there's a decent memory win to be had by investing a little more time into trying to build strings with 8 bits instead of 32 bits per character
but first i'll try to get some rest before it becomes day again 04:36
nwc10 jnthn++ # Moar Pandas 07:30
FROGGS_ good morning 07:38
09:00 FROGGS_ joined 10:03 lizmat joined 10:05 lizmat joined 10:32 crab2313 joined
jnthn timotimo: At the moment, our hash key computation works only on 32-bit wide, so it forces things to that before doing it. 11:03
timotimo: So that piece needs a tweak first. 11:04
12:56 crab2314 joined 12:58 crab2313 joined
timotimo ah, hm. 13:47
where do i find sthe hash function then? 13:48
jnthn 3rdparty/uthash.h or so 13:49
But it only cares about blobs of memory so far.
timotimo ah, mhm. 13:50
did you see how often some of the strings get duplicated? 13:51
there's a whole lot of code in there :| 14:10
i don't feel up to that task :( 14:25
maybe there's some way to get a bit of dedup for the strings, though? perhaps we're accidentally copying in some place instead of re-using the string objects or something? 14:28
dalek Heuristic branch merge: pushed 43 commits to MoarVM/gdb-support by timo 15:05
arVM/gdb-support: b62c422 | (Timo Paulssen)++ | / (2 files):
the gdb thing ought to live in tools/.

it needs to be symlinked to where the moar binary lives anyway, so why not move it out of the way a little bit.
Heuristic branch merge: pushed 25 commits to MoarVM by timo
15:25 crab2313 joined 16:36 d4l3k_ joined 17:04 rblackwe_ joined, dagurval_ joined
timotimo is recording how much extra space utf8_decode wastes by not shrinking the buffer after decoding 17:24
though it doesn't count that strings may get cleaned up, it's at about 64 bytes per string on average that we allocate too much
but that's in compiling the setting 17:25
otherwise it's at usually 33 bytes per string wasted
for -e 'say 1' it outputs: 32000: 1083448 wasted (33.857750 per string)
so 32k strings allocated, 1 megabyte wasted on that
not terribly worrying, it seems to me
jnthn True, though a meg win is kinda nice too 17:26
timotimo i'll put in a realloc to see if it's only a meg or if the many copied strings make a bigger difference, or if the cleanups of the strings makes the difference smaller in practice 17:27
jnthn Might also be checking what our first guess at the size is. 17:29
timotimo 16, eh? 17:30
maybe.
could even run-time tune that 17:31
0.20user 0.02system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 89432maxresident)k
that's -e 'say 1' with shrunk strings
0.18user 0.04system 0:00.22elapsed 99%CPU (0avgtext+0avgdata 90388maxresident)k
that's without shrunk strings 17:32
jnthn Oh...I think a better guess is the number of codepoints will be the number of bytes we're decoding?
As that covers the ASCII case correctly
timotimo right
dalek arVM: 1c64183 | jnthn++ | src/6model/reprs/MVMOSHandle.h:
Remove unused symbol.
17:33
arVM: 198c5d7 | jnthn++ | src/io/dirops.c:
Move directory listing to new IO scheme.
jnthn Anyway, a one line (or so) addition to save a meg seems worth it to me.
timotimo i only shrink if it would save more than 4 32-bit integers in the buffer 17:34
i should measure how often the shrink is actually needed
(crazy strings, they are!)
dalek arVM: 40029f3 | jnthn++ | src/6model/reprs/MVMOSHandle.h:
Finish cleanup of OSHandle.

Now everything is switched over to the new IO model.
timotimo also, *=2ing the buffer all the time may not be a good idea if we're in the last 5% of the source string ;) 17:35
actually, wouldn't the utf8 source string *always* be bigger than the number of codepoints?
so starting at 16 in any case would be unwise either way?
jnthn Well, identical for ASCII.
And an over-estimate for other things
So yeah you could turn the resize into an "just make sure we don't overflow it" sanity check. 17:36
timotimo i've done that now, the amount of wastage has decreased sharply
jnthn it's more efficient too since we're not realloc'ing, which may have to copy 17:37
timotimo 32000: 1412 wasted (0.044125 per string, shrunk 54 times) 17:39
this is the core.setting compilation
0.20user 0.03system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 89252maxresident)k 17:40
another 200kbytes less
i'll clean up the patch and commit it
do we actually decode any utf16 in a regular program? 17:41
jnthn No
timotimo should i just copy the logic to the other encodings? latin1?
actually, latin1 wouldn't ever need to do anything
jnthn I'd hope latin1 just allocates the right size straight away 17:42
So, another megabyte off with this?
dalek arVM: aac8e0a | (Timo Paulssen)++ | src/strings/utf8.c:
better estimate and perhaps shrink utf8 decoded buffers
timotimo yes, another megabyte
how do you feel about "smallstring"? if the string is so short it fits into a pointer, set a flag and use the pointer instead of a buffer? 17:43
oh, wait, we have 32bit big codepoints there 17:44
not 8 bit codepoints
jnthn Yeah...
Maybe some day it's worth it.
timotimo yup
until then, some other way of deduplicating very short strings may be in order
jnthn I think we should try and clean up/fix what we have first... 17:45
timotimo gist.github.com/timo/9ec85de3f926f3f299dd <- did you see this? this is just from looking at the nursery at some random point.
jnthn How many times is the empty string duplicated?
timotimo if i'm looking at it correctly, it seems like: often.
jnthn Well, de-duplicating in the nursery isn't so worth it.
I mean, those are most likely going to get collected
timotimo right, but a whole bunch of strings ought to have found their way into the gen 2, too. 17:46
i should actually properly investigate that
jnthn Not if they're being produced afresh.
timotimo mhm
jnthn When was that nursery snapshot taken? 17:47
timotimo i don't remember :(
pretty early i guess
in the empty program
jnthn I wonder if all the :, <, prec, etc. are from <O(...)> 17:49
timotimo well, the empty program doesn't call the allocater at all :)
timotimo has a bigger nonsense-program 17:50
jnthn gonna do dinner and stuff...will look at the STable repos issue afterwards. 17:51
timotimo hm. string histogram seems b0rked ATM
oh, yes, i only collect the flags, not the values right now 17:52
i was editing the file pre-mv m) 17:55
oh, i forgot to set the sampling rate up 17:57
and i should also grab the MVMString from inside P6str. 18:00
oh, i don't even have to, those are put into the memory separately and are just pointered at 18:01
jnthn right 18:02
benabik Compile errors on OS X with the new directory I/O bits. Most are fixed by adding an #include <dirent.h>, but one isn't...
src/io/dirops.c:343:79: error: no member named 'encoding_type' in 'MVMIODirIter'
return MVM_string_decode(tc, tc->instance->VMString, "", 0, data->encoding_type);
jnthn whoa...just discovered I've been handed so many round tuits over the last few year's conferences, if I pile them all up I get a tower a foot high! 18:03
timotimo :D 18:04
gen 2 looks quite like this: 18:05
'ctxsave' [================================================== 9
'__6MODEL_CORE__' [============================================ 8
'Regex' [============================================ 8
the majority of strings appear twice or once
before the very first nursery run, the nursery looks like this:
'' [================================================== 102
':' [============================================== 94
tadzik :D
timotimo '<' [======================================== 83
tadzik why do we need such strings btw? 18:06
timotimo for sad emoticons :<
tadzik :<
timotimo exactly!
tadzik <:
always look on the right side of life
timotimo++ # awesome work 18:07
timotimo thank you :3
tadzik: it'll take a whole lot of time to nibble away the startup memory usage if it takes me like 1 week for each megabyte 18:09
it's likely going to get harder over time, though
benabik Ah. If I s/_type//, it seems to work.
And dirent.h seems to exist on Linux, so I'll just #include it in a #ifndef _WIN32 18:10
dalek arVM: da1ae55 | benabik++ | src/io/dirops.c:
Fix directory I/O compilation on OS X

Possibly fixes errors on Linux as well.
18:19
timotimo jnthn: i've run through a whole bunch of collections now and the gen2 seems to contain the same string at most like 10 times (though i think i'm not sampling all of the objects) 18:22
(and 596 completely filled pages) (and 2 empty pages) (freelist with 25 entries)
VMArray [================================================== 26254
MVMString [=============== 8221
P6opaque [======= 3940
m: 256 * 596 18:23
camelia ( no output )
timotimo m: say 256 * 596
camelia rakudo-moar 230a54: OUTPUT«152576␤»
timotimo that's about how many objects are in that size class
m: say (26254 + 8221 + 3940) / 152576
camelia rakudo-moar 230a54: OUTPUT«0.2517762␤»
timotimo so we sample about 1/4th of the objects
so, it's likely that the strings that appear most often only appear about 40 times 18:24
timotimo turns the sampling up to 100% 18:28
with a 100% sample rate looking at a random point in time i get: 18:54
'dba' [================================================== 31 18:55
'prec' [============================================= 28
'assoc' [============================================= 28
'' [======================================== 25
so string duplication may not be a terribly big deal 18:56
gist.github.com/timo/41d9eb48aea43a48d35a 19:05
i suppose it may add up
given the "long tail" etc
MVMCode [================================================== 15827 ← i wonder if this is right? 19:09
jnthn timotimo: Is that in gen2? 19:21
timotimo: It's something the lexicals => locals work should help with, anyways. 19:22
timotimo: The flattening part, anyway.
timotimo gen2, aye.
jnthn Well, the flattening work will help more with making less of 'em in the nursery, I suspect. 19:25
timotimo: "at about 97 megabytes of ram usage all in all" in gist.github.com/timo/41d9eb48aea43a48d35a 19:29
huh? :) The whole process is < 90MB of RAM now?
timotimo huh huh? 19:30
that's a random program that does stuff with a list over and over and throws the list away and makes a new one 19:31
so it's more than 90 megabytes for that reason
jnthn Oh! 19:36
That explains it :)
21:50 benabik joined 21:59 tgt joined 22:38 tgt joined 23:28 crab2313 joined