timotimo also, a tiny bit of compression would do wonders to our .moarvm files, since we decode that part anyway 00:01
like, a quite big chunk of the serialized blob is just AAA due to 64bit numbers that are 0 00:03
or "small"
jnthn We shouldn't be storing serialization stuff in .moarvm any more 00:24
Uh
As in, as strings we should not
Should look into that
I wish people would remember that if you compress, you can't mmap.
As soon as your 2 processes in, you're a 50% reduction ahead. 00:25
And s/your/you're/, but grammar is optional after drinking an eviltwin. : 00:26
*:)
As for recording where code was when a GC occurred - probably not useful. You just need to modify a tiny thing to make that happen at a different point. Or another thread can affect it. Maybe over a program than runs for a very long time you can get something interesting statistically, but the existing allocation profile already gives you data like that. 00:28
Put another way, GC is amortized over all allocations, so which one triggers it is not really so interesting. 00:29
timotimo OK, fair enough 01:13
my point about the base64 thing is: we can't mmap it anyway because it's base64 and we turn it into regular data right away 01:14
and going from base64 to regular binary data will get us from 4/3 down to 1
and since we are already base64-decoding before we read stuff from the serialized blob, we could just as well replace base64 with some compression scheme 01:25
JimmyZ_ timotimo: I think jnthn++ meant compressing all the .moarvm like jar does blocks mmap 02:46
and you meant compressing only the string part 02:47
timotimo yes, only the serialisation blobs 02:55
JimmyZ_ timotimo: + 1 to it :) 02:57
timotimo what kind of compression library can we use without adding a dependency that our users may not have our want? 02:59
I think RLE is not good enough
JimmyZ_ pack ? 03:00
as just write strings as binary string? 03:01
timotimo well yeah 03:04
we would put the serialised blob somewhere where binary data can live
but on top of that, having small numbers represented with fewer bytes may be worth something 03:05
JimmyZ_ that is, out of .moarvm files?
timotimo maybe just encoding every 64 bit chunk as a varint
no, inside these files
JimmyZ_ jnthn++ said 'We shouldn't be storing serialization stuff in .moarvm any more' 03:06
timotimo like, the bytecode section already contains binary data
JimmyZ_ I couldn't follow he
timotimo he corrected himself
he meant we should not use the string heap
JimmyZ_ so why? 03:07
timotimo so why what?
JimmyZ_ why we shouldn't store strings in .moarvm? 03:08
timotimo no no no
we should not store serialised objects in the strings 03:09
but strings will stay where they are
JimmyZ_ oh
timotimo :)
JimmyZ_ that is, replace base64 code with binary data in Strings heap? 03:10
timotimo I mean there is nothing wrong with storing the serialised blob as "packed" data 03:11
JimmyZ_ ok
timotimo but there are so many null bytes
we should store the serialised blob somewhere in the moarvm file that is not the string heap
make a new section for serialised blobs 03:12
JimmyZ_ fair enough
timotimo I am very tired and should go to bed soon
JimmyZ_ and still use base64 ?
good night
timotimo maybe you want to build a simple compression by using write varint 03:13
no, get rid of base64
dalek arVM: a533a69 | jimmy++ | src/ (2 files):
Small fixes.
06:02
arVM: 228e07e | jimmy++ | / (5 files):
rename nodes_moar.h to nodes.h, since there is no other nodes_*.h files.
06:45
dalek arVM: 8fa0c7d | TimToady++ | src/6model/reprs/NFA. (2 files):
detect useless SUBRULE edges
08:32
JimmyZ_ timotimo: about base64, I didn't find how to replace it with varint :( 12:32
timotimo OK, maybe i'll have a look 12:36
the problem is also that we can't just put the data back into the string heap
or at least we shouldn't
timotimo that means we need to have another way to give the serialized blob to the deserialize instruction 12:36
JimmyZ_ why we can't? 12:37
jnthn But...we already *do* have that way 12:38
timotimo can the string heap store all-binary data without trouble?
jnthn That's why I can't figure out why you're seeing base64 strings.
The string heap stores strings in utf-8
The seriealized data has its own binary section of the bytecode file 12:39
timotimo oh?
i must be speaking of something else, then
jnthn And has for ages
Well, we may have had some regression at some point that makes it do the base-64 thing...
timotimo check_and_dissect_input does a base64 decode on its input
and one of the instructions in a moarvm --dump of a module reads 00803 deserialize loc_1_str, loc_12_obj, loc_6_obj, loc_10_obj, loc_13_obj 12:40
where loc_1_str is the base64 encoded blob
JimmyZ_ Yeah, I saw it, I was even not sure what base64_encode encodes..
oh, it encodes serialized things 12:42
not strings 12:43
timotimo yes 12:44
that's what i was trying to explain %)
i was saying we store a serialized blob on the string heap
jnthn Yes, in all .moarvm files, or just a handful? 12:45
timotimo i shall see. 12:46
jnthn OK. I gotta head to the station fairly soon, but will be online on the train. :) 12:47
timotimo cairo has it
HTTP::Server::Async has a null string that it passes to deserialize 12:48
same with ::Request 12:49
could be about repossession conflicts, actually
SDL2::Raw has a base64 blob 12:50
zavolaj has a null string, too 12:51