00:47
lizmat left
01:55
MasterDuke_ joined
01:57
MasterDuke left
01:59
MasterDuke_ is now known as MasterDuke
|
|||
Kaiepi | how much work is it to add support for a new encoding? | 02:16 | |
utf32 or wchar strings, i mean | 02:17 | ||
03:05
MasterDuke left
03:18
MasterDuke joined
05:28
robertle left
05:29
domidumont joined
05:36
domidumont left
05:37
domidumont joined
05:58
Kaiepi left
05:59
lizmat joined
06:04
Kaiepi joined
|
|||
samcv | Kaiepi: a new encoding or to add a new type of string? | 06:04 | |
i mean we have 32 bit strings that are stored as 32 bit signed integers | 06:05 | ||
but that's not utf32. what are you trying to achieve | 06:06 | ||
Kaiepi | i meant support for strings using wchar_t | ||
samcv | like on the command line? | ||
what is this for. i'm confused | 06:07 | ||
Kaiepi | i wanted to write bindings for editline, but it uses wchar_t or ascii depending on how it's compiled | ||
samcv | are we talking about nativecall? | 06:08 | |
Kaiepi | yeah | ||
samcv | ok. that makes this make a lot more sense | ||
Kaiepi: it seems like a compiler dependent type | 06:09 | ||
Kaiepi | it is | ||
samcv | i mean do we need a wchar_t* that is null terminated? | 06:10 | |
Kaiepi | i made a pullreq that exposes wchar_t as a type for moar, jvm, and the js runtime | ||
samcv | Kaiepi: so do we need null terminated wchar_t*'s? | 06:11 | |
is that what we'd have to pass through to the external libraries? | 06:12 | ||
or is another convention | |||
Kaiepi | i think they might use WEOL instead, but i'll need to check | ||
06:12
lizmat left
|
|||
Kaiepi | going off the docs for wide strings on my system they use wide null characters | 06:18 | |
06:18
domidumont left
06:19
brrt joined
|
|||
brrt | \o | 06:34 | |
06:35
robertle joined
|
|||
nwc10 | o/ | 06:35 | |
brrt | what i've found is that it is really difficult to explain the original need for invokish and friends to a non-moarvm-engineer audinece | 06:38 | |
Kaiepi | i'm willing to try to understand | 06:45 | |
brrt | :-) | 06:51 | |
well, it is no longer as necessary, since invokish was removed | 06:56 | ||
but the tl;dr is that MoarVM uses explicit control flow throughout;, that this makes implementation of things like resumable exceptions possible (unlike, say, implicit control flow through recursing over the C stack), | 06:58 | ||
but the problem for the JIT is that as far as the 'C' stack is concerned, JIT compiled code is running 'on top of' the interpreter frame | |||
i.e. [ start | interpreter | JIT -> (grow upwards) | 06:59 | ||
now if the JIT compiled code needs a service from the interpreter (like, invoking a new frame), it needs to store its current state and return to the interpreter, rather than call into a new interpreter frame | |||
which means that we need to explicitly managage our current position and state in the JIT as well as in the interpreter | 07:00 | ||
my recent patchset was about finding a much cleverer way to do this than we had before | |||
07:36
zakharyas joined
07:47
zakharyas left,
zakharyas joined
08:17
lizmat joined
08:30
domidumont joined
09:01
zakharyas left
10:03
zakharyas joined
10:13
brrt left
10:38
brrt joined
10:54
Kaiepi left
10:59
zakharyas left
11:14
lizmat left
11:17
lizmat joined
12:23
brrt left
12:36
Kaiepi joined
12:53
brrt joined
13:02
zakharyas joined
14:13
zakharyas left,
zakharyas joined
14:36
scovit joined
|
|||
Geth | MoarVM/pluggable-spesh: 22 commits pushed by (Jonathan Worthington)++ review: github.com/MoarVM/MoarVM/compare/3...0e61206253 |
14:59 | |
jnthn | Just a rebase | 15:00 | |
15:09
robertle left
15:14
raiph joined
15:29
brrt left
15:34
domidumont left
|
|||
Geth | MoarVM/pluggable-spesh: 9eebf10d05 | (Jonathan Worthington)++ | src/spesh/plugin.c Make spesh plugins work correctly with the JIT We already put a deopt annotation on the prepargs that we are going to delete as part of the optimization. Steal that and put it on the guards, so that we get a viable position to use for deopt_offset. It doesn't matter that this will not result in a very precise position, since this is just used to answer the question of "which (if any) of the inlines are we in". All the guards will fall within the inlines. |
15:59 | |
16:58
raiph left
17:04
robertle joined
17:06
zakharyas left,
zakharyas joined
17:30
domidumont joined
18:04
zakharyas left
18:32
lizmat left
18:47
lizmat joined
18:53
ggoebel left
19:00
ggoebel joined
19:07
domidumont left
19:12
brrt joined
19:47
zakharyas joined
|
|||
robertle | folks, I am wondering whether you could help me understand something about GC in general. a bit off topic, but perhaps you find it interesting as well. | 19:59 | |
I am building a toy scheme interpreter, and I have some memory management and GC already. my heap objects are not movable, so that the immediate value on the stack or in another heap object can just be direkt pointer to the heap object. | 20:00 | ||
20:00
brrt left
|
|||
japhb | robertle: gchandbook.org/ | 20:01 | |
jnthn | It can be a direct pointer even if they are movable. :) | ||
(You just need to be able to find all the pointers :) | |||
robertle | this works, but I cannot compact my heap, I can't do generational GC, and my allocation is quite involved as I need to find an empty slot | ||
japhb | I was serious about that link. | ||
robertle | japhb: I'll definitely read it! | 20:02 | |
japhb | It (and it's predecessor) are very good. :-) | ||
robertle | jnthn: right, but finding all pointers is quite hard too | ||
jnthn | I read that book before doing the MoarVM GC. It was *very* helpful :) | ||
robertle: Well, at the very least it enforces that you know certain things | 20:03 | ||
japhb | Heh | ||
robertle | scheme creates insane amounts of short-lived garbage in the form of cons cells, so I would like to play with a different strategy. one where I can just use a quick bump allocator into a new generation, and then gc-copy into a different generation | ||
jnthn | e.g. you can't just do a conservative stack scan | ||
japhb | robertle: half-space compacting nursery sounds like something useful | 20:04 | |
(MoarVM uses that too) | |||
jnthn | Yeah, I picked semispace for Perl 6 'cus it does the very same | ||
Tons of short-live Scalars for example | |||
And Str and Int and friends are immutable | |||
*short-lived | |||
robertle | the problem I currently have is that I seem to have a fundamental problem understanding something. | ||
20:05
brrt joined
|
|||
japhb | jnthn: How many semispace copies before promote to second gen these days? | 20:05 | |
robertle | so my first attempt was to have my immediate values still be pointers, but to boxes in a fixed location. that box then points to the actual heap. this way I can move the actual entry | ||
jnthn | @japhb 1 (well, 2 if you count the copy inot gen2) | ||
robertle | but that stinks big time: allocating the boxes is about as complicated as my previous allocation scheme | ||
jnthn | So there's just a flag in the header saying "we already saw this in the nursery" | 20:06 | |
robertle | unsurprising really, they are also on the heap and immovable..., just smaller | ||
japhb | nodnod | ||
robertle: You are going to recreate the history of GC manually if you don't watch out. ;-) | |||
jnthn | robertle: Yeah, I suspect you need to go all-in with moving | ||
japhb | s/manually/the hard way/ | ||
robertle | I tried to understand how this work in moar, but don't get it: | ||
if you have a value in say a register, and it refers to an object on the heap. does it refer directly to the heap object? or via an indirection? | 20:07 | ||
japhb | robertle: ... also, to the top of the object? To the GC header? To the object field that's the actual referenced thing? | 20:08 | |
robertle | not sure, moar is so much more complicated than my toy that I can't really say. but really, anything on the heap that can be moved and that gets GCed | 20:09 | |
what's the most simple thing that still ends up on the heap? | |||
japhb | I suspect Scheme has simple enough primitives that you don't need to take references to object fields, as you are taking a reference to an entire object or a primitive directly. | ||
robertle | correct, I only | 20:10 | |
ever reference objects | |||
japhb | robertle: Careful to separate "implementation language's heap" (malloc and friends) and "target language's heap" (cons and friends) | ||
robertle | ok, I meant target language heap | 20:11 | |
jnthn | robertle: It refers directly to the object's memory, which starts with a predictable header | ||
japhb | And for the latter, the MoarVM equivalent is I believe MVMCollectable or so. | ||
jnthn | robertle: That header lets us find out what kind of thing it is, so we can in turn discover its pointers | ||
robertle | ok, that much I understand. | ||
jnthn | robertle: We also have metadata telling us which registers hold objects | ||
The big change in terms of what you work with inside of your GC is probably going to be that you get an extra LoI | 20:12 | ||
So the queue of things to process is not pointers to objects, but rather pointers to pointers to objects | |||
robertle | "LoI"? | ||
jnthn | So that the GC can update them | ||
Level of Indirection | |||
japhb | Level of Indirection | ||
jinx! | |||
jnthn | :) | ||
jnthn gotta go, but bbi20 | 20:13 | ||
japhb | o/ | ||
jnthn | (happy to continue answering questions then :)) | ||
robertle | hold on a moment. so in my very simple case I have values e.g. on the stack. in the case of e.g. a string, that value has a type tag and a pointer to the string object on the target language heap | 20:14 | |
you are saying that is basically the same in moar? the vale in a register directly references target heap memory? | |||
japhb | robertle: MoarVM has virtual registers. | 20:15 | |
There is separate handling of pointers in machine registers, which is what the MVMROOT macro is for. | 20:16 | ||
brrt | actually, being pedantic, MVMROOT is for collectable objects that don't live in registers, since registers are GC roots by default | 20:17 | |
japhb | (It marks that pointer as being in the "root set" which is the list of starting points, the roots of the forest of object graphs, that will eventually lead to all live objects in the target language's heap | ||
robertle | ok, but still. you have a pointer to the heap object. so if you move the object during GC, you update the register? | ||
japhb | ) | ||
brrt | and registers are marked as containing collectable objects | ||
japhb | brrt, apologies for that. | ||
brrt | my apologies for pedantry :-) | ||
robertle: correct. the GC manages these all the pointers to objects that might be moved | 20:18 | ||
japhb | Yes, GC is generally done at one additional level of indirection, so it operates on pointers to pointers, because it needs the pointers to be *themselves* writable. | ||
brrt | so suppose object a refers to object b; the GC takes a pointer to the pointer from A to B | ||
robertle | right, we are getting closer to where my brain gets stuck: so your GC has a pointer to the register, so that it can update it when it moves the object the register points to | 20:19 | |
brrt | 'virtual' register, but yes | ||
robertle | but surely there can be multiple registers pointing to the same object | ||
and also slots in objects on the heap | |||
brrt | effectively more like a stack address | ||
yes | |||
so these are managed in an array of pointer pointers | 20:20 | ||
in c, this is a pointer pointer pointer :-) | |||
japhb | .oO( You had me at *** ... ) |
20:21 | |
robertle | and also slots in objects on the heapto the same object? | ||
brrt | all of 'm | 20:22 | |
robertle | what? I didn't mean to write that last line, doesn't make any sense! what I wanted to say: | ||
brrt | also, because of generational GC, we also maintain all cross-generation pointers | ||
robertle | and also slots in objects on the heapto the same object? | ||
brrt | yes | ||
robertle | very simple case: one object on the heap, two registers that point to that object. GC kicks in. what happens? | 20:23 | |
brrt | GC creates a list of pointers to the registers | 20:24 | |
GC moves the object | |||
GC updates the registers through the pointers | |||
done | |||
(a 'list' being a contiguous array of memory) | 20:25 | ||
i.e. an array | |||
robertle | ok, but I don't see how that is done effectively: you either update the list of pointers when you move the object. so go through the whole list and update every pointer that points to the old location. for every object. | 20:26 | |
I don't see how I can make that not quadratic | |||
brrt | it seems very linear time to me :-) | 20:27 | |
you only update each reference to the old position once | 20:28 | ||
and you don't need to look them up because you've stashed them in a list | |||
robertle | right, but you need to scan the list of all pointers. and you need to do that | ||
for each object moved | |||
brrt | but you scan all objects in a separate phase, once, then figure out which ones you need to move | 20:29 | |
robertle | ok, so I have a list of all object that need moving. possibly with an old and new location | ||
brrt | yes. and you *also* have a list of all references to those objects | ||
robertle | hold on, that may be the critical part | 20:30 | |
where do you keep that list of per-object references? | |||
brrt | you create it while scanning the heap for all reachable objects | 20:32 | |
robertle | k, but you somehow have to link it from the object itself? | ||
brrt | every object you encounter, you maybe put it on the list to be moved, and alll to objects you encounter, you put on the list of references-to-update | ||
robertle | getting closer, this is great | 20:34 | |
so what data do you have as part of the heap object to support this? I would imagine you either need to store a pointer from the object to the list of references to it, or have a slot in which you temporarily store the new location | 20:35 | ||
dude that book isn't exactly cheap! but there you go... | 20:37 | ||
brrt | i'm confused now... what we have is a flag that says if something is first-or second gen (i.e. moveable or not), and a function that lists all references within the object | 20:38 | |
anyway, i'm afk. i wish you good luck :-) | |||
20:39
brrt left
20:52
zakharyas left
|
|||
jnthn | robertle: Perhaps the key part missing here is the need for a forwarding pointer | 20:57 | |
robertle | explain please | ||
jnthn | If we have 3 references to the same object, then when we discover each of them we'll put a pointer to location of the reference into our "todo" list | 20:58 | |
We do need to update all of those references when we move the object, *but* we only want to move the object once | 20:59 | ||
robertle | ah, wait! | ||
don't say anymore | |||
I get it: whan you move the object, you put a pointer in it's place that refers to the new location. | 21:00 | ||
right? | |||
jnthn | Yeah. I guess in your existing collector you already have some kind of "seen" bit in your object headers to know which objects you already marked, so you don't get stuck when marking self-referential data structures? | 21:01 | |
In a moving collector that bit really becomes an "I already moved this" | 21:02 | ||
robertle | yes, I have two bits of GC management per heap object | ||
jnthn | So there's two cases when processing a pointer to an object reference | ||
The first one is you come, follow the pointer to the object, and observe that it doesn't have the "already moved" bit set | |||
In that case, you 1) copy the object's memory to the new location where it will live, 2) set the "already moved" bit, 3) save the new address inside the *old* memory of the object (which is safe because you already copied it), and 4) update the reference that you saw to point ot the new location | 21:03 | ||
The second came is that you follow the pointer to the object and see that object was already moved. You the find the address written in 3 and then do 4 | 21:04 | ||
*second case | |||
robertle | cool, understood. and I don't even need an extra bit: i currently have a "mark" bit. if I immediately move the object when I mark it, it's the same :) | 21:05 | |
jnthn | Right :) | ||
robertle | beautiful, that solves my problems and isn't even so much work to do from where I currently stand | ||
jnthn | In the MoarVM collector, the second case is handled here: github.com/MoarVM/MoarVM/blob/mast...ect.c#L214 | ||
There's only one line of actual work in there, the rest is all just debug/assertion :) | 21:06 | ||
And here is the first case: github.com/MoarVM/MoarVM/blob/mast...ect.c#L312 | 21:07 | ||
It's a bit more involved here there's the generational thing going on, but you can see the memcpy and the bit being set | |||
robertle | great, and it also solves one of my worries which is the extra memory I need for GC and heap management: most objects are cons cells, and they are just 16bytes on the heap. I really need to avoid large data management data | 21:08 | |
jnthn | process_worklist is *the* heart of the MoarVM collector, really | ||
Pretty much everything else is just supporting that algorithm :) | 21:09 | ||
robertle | totally unrelated to this question: but does moar do any of the GC work in parallel with heap-mutating code? or at least do some stop-the-world GC work in parallel? | ||
jnthn | (Which in places is still quite tricky, for example sync-up of threads) | ||
No and yes | |||
The first case tends to be called a concurrent collector (it collects concurrent with the running code) | 21:10 | ||
robertle | yeah | ||
jnthn | The second is a parallel collector | ||
robertle | so moar is non-concurrent but parallel? | ||
jnthn | MoarVM's collector is parallel - multiple threads join in to do the collectiong work - but it's barely parallel | ||
oops | |||
MoarVM's collector is parallel - multiple threads join in to do the collectiong work - but it's barely concurrent | |||
At the end of the collection process, it reaches a point where individual threads can finish up their bit and then go on their way as soon as they have | 21:11 | ||
But for most of the time it needs the world to be stopped | |||
There's a latency/throughput trade-off here | 21:12 | ||
robertle | ok, and what did you mean with the "sync-up of threads" before? | ||
jnthn | Taking all the running threads and making them stop their work and come join in with GC | ||
That's what the stuff in src/gc/orchestrate.c does | |||
robertle | ok, I see | 21:13 | |
jnthn | But it also has to account for stuck threads | ||
e.g. those that are doing blocking I/O or sleep or waiting on a mutex | |||
In MoarVM, when going to wait on such a think, the thread flags itself as unable to join in with GC | 21:14 | ||
*thing | |||
robertle | ah, darn! my toy doesn't have threads yet, so I am good | ||
jnthn | Yeah, that makes things far easier :) | ||
When you do, you'll notice that being first to move an object is a data race, and will need a scheme to deal with it :) | |||
The MoarVM scheme is that every object has an owner, and we delegate its GC work to the thread responsible for its owner | 21:15 | ||
robertle | you mean with multiple threads doing GC? right, that's why I was asking... | ||
jnthn | That's what the "work passing" stuff in collect.c | ||
is doing | |||
A potential alternative is to race to write the forwarder pointer | 21:16 | ||
But there's a tricky detail there, I guess, because how do you know if the address there is a forwarder or not? | 21:17 | ||
One way to deal with that is to make every single object 1 pointer bigger so that can always be zeroed to indicate "not moved" | |||
But that makes every object bigger | |||
Another would be to require the platform can do a double-cas and then install the pointer and a header bit together | 21:18 | ||
But that may not be portable enough | |||
robertle | my heap is segmented into arenas, and each arena has it's own list of things that still need to be done. so in theory I could shard by arena. I then would need to protect the list when appending/consuming... | ||
jnthn | Yeah | ||
Yeah, there's what MoarVM is doing I guess. Each thread has its own nursery | |||
robertle | you would need the pointer and the header bit to be quite close together to be cas-able. I currently have them quite far apart | 21:19 | |
jnthn | It's the forwarder in the object and the header bit in the object, so they're close enough | ||
But you still need a double-cas operation | |||
robertle | btw, this is the toy: github.com/robertlemmen/keiryaku | 21:20 | |
jnthn | I guess yet another scheme is to have two bits, CAS in place the "I'm going to write a forwarder" bit, if you win then write the forwarder, then cas in an "I'm done writing the forwarder". If you lose the first race then you have to spin. | ||
I guess with enough clever you could use that as a fallback on platforms without double-cas and use the double-cas where possible :) | 21:21 | ||
But we'd actually then have a new problem if we did that in MoarVM | |||
robertle | ?? | 21:22 | |
jnthn | We rely on the things in a particular thread's fromspace (bit of memory we copy from) to land up in the same thread's tospace. | 21:23 | |
Which in turn means we know we'll always end up with no more in tospace than fromspace, so we can't ever overflow it | |||
We'd need a new scheme to deal with that | |||
Also, the current design tries to keep things local, and we'd lose that too | 21:24 | ||
So maybe the current work-pass scheme ain't so bad after all :) | |||
robertle | hmm, all very difficult. do you have much in terms of benchmarks or so to test the GC against? | 21:25 | |
jnthn | I've mainly tuned it on real-world workloads, since we have plenty of those :-) | 21:26 | |
CORE.setting compilation is quite good for measuring how well generational things are working | |||
Hammering a multi-threaded web server isn't bad for looking at the threading case | 21:27 | ||
I rewrote the thread sync-up algorithm at some point after looking at profiles of the latter | |||
robertle | aren't they very fickle? I have an app at work for example (java). it creates shitloads of garbage, so really the main performance factor is the nursery size. if short-lived shit makes it into the real heap, then performance goes down the drain... | ||
jnthn | That's a factor too, though also a big nursery means longer collects and less cache locality in the collects | 21:28 | |
What I did find was very worth tuning was the decision about when to do full collects | 21:29 | ||
robertle | exactly. so more tuning required than one wants to do... | ||
more generally: are you happy with it's performance in the web service (i.e. cro I guess) case? | |||
jnthn | Pretty sure a tuning there got 10% off CORE.setting compilation by putting into MoarVM an algorithm that used the overall heap size to decide how often to collect gen2 | ||
robertle | I played with that a while ago because it's also what I do at work, and have to admit the results seemed a bit mixed to me | 21:30 | |
jnthn | Happy with performance overall with Cro overall? Not really yet, though it's easily handling my production workloads. | 21:31 | |
But GC isn't much of a factor there, I don't think. | |||
robertle | what do you think are the big possible performance gains still to be had then? | ||
jnthn | A lot more of the cost comes from concurrency control overheads in Supply | ||
Which is in no small part because LEAVE is essential for safely releasing locks, but also a pig. | 21:32 | ||
I'm overall quite happy with the GC itself. We could do with escape analysis and "stack" allocation. | 21:33 | ||
Which would mean that a lot less things ever end up being the GCs job | |||
robertle | ah! I have read and tried to understand dybvig's scheme paper, which I believe is at it's core about escape analysis and figuring out when stuff can safely be put on the stack | 21:34 | |
but it's mind-boggling | |||
jnthn | The other thing is that we've just a bunch more things to do to turn Perl 6 programs into better quickened bytecode and/or machine code | ||
I'll probably spend much of next week re-working Scalar and assignment | 21:35 | ||
Because at the moment we can't optimize out assignment type checks or lower it into something cheap, despite it being a really common thing. | |||
robertle | hmm, yeah. I noticed that there is quite some work going into hand-crafting core functionality in nqp. should't it be rakudo that generates reasonable code from perl6? | ||
jnthn | The problem is that a lot of the properties that stops it generating simpler code are things that aren't so easy to do static analysis on | 21:36 | |
robertle | are you saying the language shouldn't be quite so late-binding? ;) | ||
jnthn | Not really, I'm saying that Rakudo's code-gen isn't really where we can make the gains | 21:37 | |
robertle | ok, but if that is the case, why does it help if people write clever nqp implementations of stuff? | ||
jnthn | And where we do need to generate different/better code, the main reason is so the VM can do a better job | 21:38 | |
Because they aren't writing Perl 6 code by that point, and thus avoiding various of the trickier-to-analyze things | |||
Same reason that NQP programs generally run faster | 21:40 | ||
robertle | hm, ok | ||
jnthn | Gotta go again for a bit | 21:41 | |
robertle | anyway thanks for the chat, the forwarding-pointer-in-old-heap location was exactly what I was missing. I am sure that is going to make it relatively easy to switch to a bump allocator and generational GC. | ||
I need to be in bed as well :) | 21:42 | ||
jnthn | :-) | ||
'night o/ | |||
22:26
robertle left
23:44
Kaiepi left
|