dalek | arVM: 0017869 | jnthn++ | src/io/syncpipe.c: Avoid an assertion fail on Windows. |
00:23 | |
arVM: d364f5e | jnthn++ | src/io/syncstream.c: Correct a thinko. |
|||
arVM: 705d815 | jnthn++ | / (12 files): Support a custom separator in sync streams. Well, at least a 1-char one. Enough for the tests. |
01:05 | ||
jnthn | The only thing left behind in MVMOSHandle from the previous design now is directory handles. | 01:12 | |
I'll leave those for tomorrow. Once that's done, then IO refactor is done. | 01:19 | ||
01:44
woolfy joined
02:46
FROGGS_ joined
03:03
cognominal joined
|
|||
timotimo | hmm. when perl6-m calls MVM_repr_box_num for the very first time - which seems to come from the datetime setup stuff it does when loading the core.setting? - it already uses 70 megabytes of resident ram | 03:38 | |
the very first call to box_int however is at only 3 megabytes of resident ram | 03:39 | ||
04:06
colomon joined
|
|||
dalek | arVM: 5f86fb0 | jimmy++ | src/ (6 files): Add const keyword |
04:13 | |
timotimo | gist.github.com/timo/9ec85de3f926f3f299dd | 04:21 | |
i wonder if these share the pointer to string memory or if they all have their own copy of the actual string itself | 04:23 | ||
dalek | arVM: bf9c708 | jimmy++ | tools/ (2 files): updated tools/windows1252_cp_gen.p6 |
04:26 | |
timotimo | strings: | 04:35 | |
0 [================================================== 7941 | |||
1 [ 13 | |||
MVMStringBody.flags of the strings in the gen2 | |||
seems like there's a decent memory win to be had by investing a little more time into trying to build strings with 8 bits instead of 32 bits per character | |||
but first i'll try to get some rest before it becomes day again | 04:36 | ||
nwc10 | jnthn++ # Moar Pandas | 07:30 | |
FROGGS_ | good morning | 07:38 | |
09:00
FROGGS_ joined
10:03
lizmat joined
10:05
lizmat joined
10:32
crab2313 joined
|
|||
jnthn | timotimo: At the moment, our hash key computation works only on 32-bit wide, so it forces things to that before doing it. | 11:03 | |
timotimo: So that piece needs a tweak first. | 11:04 | ||
12:56
crab2314 joined
12:58
crab2313 joined
|
|||
timotimo | ah, hm. | 13:47 | |
where do i find sthe hash function then? | 13:48 | ||
jnthn | 3rdparty/uthash.h or so | 13:49 | |
But it only cares about blobs of memory so far. | |||
timotimo | ah, mhm. | 13:50 | |
did you see how often some of the strings get duplicated? | 13:51 | ||
there's a whole lot of code in there :| | 14:10 | ||
i don't feel up to that task :( | 14:25 | ||
maybe there's some way to get a bit of dedup for the strings, though? perhaps we're accidentally copying in some place instead of re-using the string objects or something? | 14:28 | ||
dalek | Heuristic branch merge: pushed 43 commits to MoarVM/gdb-support by timo | 15:05 | |
arVM/gdb-support: b62c422 | (Timo Paulssen)++ | / (2 files): the gdb thing ought to live in tools/. it needs to be symlinked to where the moar binary lives anyway, so why not move it out of the way a little bit. |
|||
Heuristic branch merge: pushed 25 commits to MoarVM by timo | |||
15:25
crab2313 joined
16:36
d4l3k_ joined
17:04
rblackwe_ joined,
dagurval_ joined
|
|||
timotimo is recording how much extra space utf8_decode wastes by not shrinking the buffer after decoding | 17:24 | ||
though it doesn't count that strings may get cleaned up, it's at about 64 bytes per string on average that we allocate too much | |||
but that's in compiling the setting | 17:25 | ||
otherwise it's at usually 33 bytes per string wasted | |||
for -e 'say 1' it outputs: 32000: 1083448 wasted (33.857750 per string) | |||
so 32k strings allocated, 1 megabyte wasted on that | |||
not terribly worrying, it seems to me | |||
jnthn | True, though a meg win is kinda nice too | 17:26 | |
timotimo | i'll put in a realloc to see if it's only a meg or if the many copied strings make a bigger difference, or if the cleanups of the strings makes the difference smaller in practice | 17:27 | |
jnthn | Might also be checking what our first guess at the size is. | 17:29 | |
timotimo | 16, eh? | 17:30 | |
maybe. | |||
could even run-time tune that | 17:31 | ||
0.20user 0.02system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 89432maxresident)k | |||
that's -e 'say 1' with shrunk strings | |||
0.18user 0.04system 0:00.22elapsed 99%CPU (0avgtext+0avgdata 90388maxresident)k | |||
that's without shrunk strings | 17:32 | ||
jnthn | Oh...I think a better guess is the number of codepoints will be the number of bytes we're decoding? | ||
As that covers the ASCII case correctly | |||
timotimo | right | ||
dalek | arVM: 1c64183 | jnthn++ | src/6model/reprs/MVMOSHandle.h: Remove unused symbol. |
17:33 | |
arVM: 198c5d7 | jnthn++ | src/io/dirops.c: Move directory listing to new IO scheme. |
|||
jnthn | Anyway, a one line (or so) addition to save a meg seems worth it to me. | ||
timotimo | i only shrink if it would save more than 4 32-bit integers in the buffer | 17:34 | |
i should measure how often the shrink is actually needed | |||
(crazy strings, they are!) | |||
dalek | arVM: 40029f3 | jnthn++ | src/6model/reprs/MVMOSHandle.h: Finish cleanup of OSHandle. Now everything is switched over to the new IO model. |
||
timotimo | also, *=2ing the buffer all the time may not be a good idea if we're in the last 5% of the source string ;) | 17:35 | |
actually, wouldn't the utf8 source string *always* be bigger than the number of codepoints? | |||
so starting at 16 in any case would be unwise either way? | |||
jnthn | Well, identical for ASCII. | ||
And an over-estimate for other things | |||
So yeah you could turn the resize into an "just make sure we don't overflow it" sanity check. | 17:36 | ||
timotimo | i've done that now, the amount of wastage has decreased sharply | ||
jnthn | it's more efficient too since we're not realloc'ing, which may have to copy | 17:37 | |
timotimo | 32000: 1412 wasted (0.044125 per string, shrunk 54 times) | 17:39 | |
this is the core.setting compilation | |||
0.20user 0.03system 0:00.23elapsed 99%CPU (0avgtext+0avgdata 89252maxresident)k | 17:40 | ||
another 200kbytes less | |||
i'll clean up the patch and commit it | |||
do we actually decode any utf16 in a regular program? | 17:41 | ||
jnthn | No | ||
timotimo | should i just copy the logic to the other encodings? latin1? | ||
actually, latin1 wouldn't ever need to do anything | |||
jnthn | I'd hope latin1 just allocates the right size straight away | 17:42 | |
So, another megabyte off with this? | |||
dalek | arVM: aac8e0a | (Timo Paulssen)++ | src/strings/utf8.c: better estimate and perhaps shrink utf8 decoded buffers |
||
timotimo | yes, another megabyte | ||
how do you feel about "smallstring"? if the string is so short it fits into a pointer, set a flag and use the pointer instead of a buffer? | 17:43 | ||
oh, wait, we have 32bit big codepoints there | 17:44 | ||
not 8 bit codepoints | |||
jnthn | Yeah... | ||
Maybe some day it's worth it. | |||
timotimo | yup | ||
until then, some other way of deduplicating very short strings may be in order | |||
jnthn | I think we should try and clean up/fix what we have first... | 17:45 | |
timotimo | gist.github.com/timo/9ec85de3f926f3f299dd <- did you see this? this is just from looking at the nursery at some random point. | ||
jnthn | How many times is the empty string duplicated? | ||
timotimo | if i'm looking at it correctly, it seems like: often. | ||
jnthn | Well, de-duplicating in the nursery isn't so worth it. | ||
I mean, those are most likely going to get collected | |||
timotimo | right, but a whole bunch of strings ought to have found their way into the gen 2, too. | 17:46 | |
i should actually properly investigate that | |||
jnthn | Not if they're being produced afresh. | ||
timotimo | mhm | ||
jnthn | When was that nursery snapshot taken? | 17:47 | |
timotimo | i don't remember :( | ||
pretty early i guess | |||
in the empty program | |||
jnthn | I wonder if all the :, <, prec, etc. are from <O(...)> | 17:49 | |
timotimo | well, the empty program doesn't call the allocater at all :) | ||
timotimo has a bigger nonsense-program | 17:50 | ||
jnthn | gonna do dinner and stuff...will look at the STable repos issue afterwards. | 17:51 | |
timotimo | hm. string histogram seems b0rked ATM | ||
oh, yes, i only collect the flags, not the values right now | 17:52 | ||
i was editing the file pre-mv m) | 17:55 | ||
oh, i forgot to set the sampling rate up | 17:57 | ||
and i should also grab the MVMString from inside P6str. | 18:00 | ||
oh, i don't even have to, those are put into the memory separately and are just pointered at | 18:01 | ||
jnthn | right | 18:02 | |
benabik | Compile errors on OS X with the new directory I/O bits. Most are fixed by adding an #include <dirent.h>, but one isn't... | ||
src/io/dirops.c:343:79: error: no member named 'encoding_type' in 'MVMIODirIter' | |||
return MVM_string_decode(tc, tc->instance->VMString, "", 0, data->encoding_type); | |||
jnthn | whoa...just discovered I've been handed so many round tuits over the last few year's conferences, if I pile them all up I get a tower a foot high! | 18:03 | |
timotimo | :D | 18:04 | |
gen 2 looks quite like this: | 18:05 | ||
'ctxsave' [================================================== 9 | |||
'__6MODEL_CORE__' [============================================ 8 | |||
'Regex' [============================================ 8 | |||
the majority of strings appear twice or once | |||
before the very first nursery run, the nursery looks like this: | |||
'' [================================================== 102 | |||
':' [============================================== 94 | |||
tadzik | :D | ||
timotimo | '<' [======================================== 83 | ||
tadzik | why do we need such strings btw? | 18:06 | |
timotimo | for sad emoticons :< | ||
tadzik | :< | ||
timotimo | exactly! | ||
tadzik | <: | ||
always look on the right side of life | |||
timotimo++ # awesome work | 18:07 | ||
timotimo | thank you :3 | ||
tadzik: it'll take a whole lot of time to nibble away the startup memory usage if it takes me like 1 week for each megabyte | 18:09 | ||
it's likely going to get harder over time, though | |||
benabik | Ah. If I s/_type//, it seems to work. | ||
And dirent.h seems to exist on Linux, so I'll just #include it in a #ifndef _WIN32 | 18:10 | ||
dalek | arVM: da1ae55 | benabik++ | src/io/dirops.c: Fix directory I/O compilation on OS X Possibly fixes errors on Linux as well. |
18:19 | |
timotimo | jnthn: i've run through a whole bunch of collections now and the gen2 seems to contain the same string at most like 10 times (though i think i'm not sampling all of the objects) | 18:22 | |
(and 596 completely filled pages) (and 2 empty pages) (freelist with 25 entries) | |||
VMArray [================================================== 26254 | |||
MVMString [=============== 8221 | |||
P6opaque [======= 3940 | |||
m: 256 * 596 | 18:23 | ||
camelia | ( no output ) | ||
timotimo | m: say 256 * 596 | ||
camelia | rakudo-moar 230a54: OUTPUT«152576» | ||
timotimo | that's about how many objects are in that size class | ||
m: say (26254 + 8221 + 3940) / 152576 | |||
camelia | rakudo-moar 230a54: OUTPUT«0.2517762» | ||
timotimo | so we sample about 1/4th of the objects | ||
so, it's likely that the strings that appear most often only appear about 40 times | 18:24 | ||
timotimo turns the sampling up to 100% | 18:28 | ||
with a 100% sample rate looking at a random point in time i get: | 18:54 | ||
'dba' [================================================== 31 | 18:55 | ||
'prec' [============================================= 28 | |||
'assoc' [============================================= 28 | |||
'' [======================================== 25 | |||
so string duplication may not be a terribly big deal | 18:56 | ||
gist.github.com/timo/41d9eb48aea43a48d35a | 19:05 | ||
i suppose it may add up | |||
given the "long tail" etc | |||
MVMCode [================================================== 15827 ← i wonder if this is right? | 19:09 | ||
jnthn | timotimo: Is that in gen2? | 19:21 | |
timotimo: It's something the lexicals => locals work should help with, anyways. | 19:22 | ||
timotimo: The flattening part, anyway. | |||
timotimo | gen2, aye. | ||
jnthn | Well, the flattening work will help more with making less of 'em in the nursery, I suspect. | 19:25 | |
timotimo: "at about 97 megabytes of ram usage all in all" in gist.github.com/timo/41d9eb48aea43a48d35a | 19:29 | ||
huh? :) The whole process is < 90MB of RAM now? | |||
timotimo | huh huh? | 19:30 | |
that's a random program that does stuff with a list over and over and throws the list away and makes a new one | 19:31 | ||
so it's more than 90 megabytes for that reason | |||
jnthn | Oh! | 19:36 | |
That explains it :) | |||
21:50
benabik joined
21:59
tgt joined
22:38
tgt joined
23:28
crab2313 joined
|