00:47 lizmat left 01:55 MasterDuke_ joined 01:57 MasterDuke left 01:59 MasterDuke_ is now known as MasterDuke
Kaiepi how much work is it to add support for a new encoding? 02:16
utf32 or wchar strings, i mean 02:17
03:05 MasterDuke left 03:18 MasterDuke joined 05:28 robertle left 05:29 domidumont joined 05:36 domidumont left 05:37 domidumont joined 05:58 Kaiepi left 05:59 lizmat joined 06:04 Kaiepi joined
samcv Kaiepi: a new encoding or to add a new type of string? 06:04
i mean we have 32 bit strings that are stored as 32 bit signed integers 06:05
but that's not utf32. what are you trying to achieve 06:06
Kaiepi i meant support for strings using wchar_t
samcv like on the command line?
what is this for. i'm confused 06:07
Kaiepi i wanted to write bindings for editline, but it uses wchar_t or ascii depending on how it's compiled
samcv are we talking about nativecall? 06:08
Kaiepi yeah
samcv ok. that makes this make a lot more sense
Kaiepi: it seems like a compiler dependent type 06:09
Kaiepi it is
samcv i mean do we need a wchar_t* that is null terminated? 06:10
Kaiepi i made a pullreq that exposes wchar_t as a type for moar, jvm, and the js runtime
samcv Kaiepi: so do we need null terminated wchar_t*'s? 06:11
is that what we'd have to pass through to the external libraries? 06:12
or is another convention
Kaiepi i think they might use WEOL instead, but i'll need to check
06:12 lizmat left
Kaiepi going off the docs for wide strings on my system they use wide null characters 06:18
06:18 domidumont left 06:19 brrt joined
brrt \o 06:34
06:35 robertle joined
nwc10 o/ 06:35
brrt what i've found is that it is really difficult to explain the original need for invokish and friends to a non-moarvm-engineer audinece 06:38
Kaiepi i'm willing to try to understand 06:45
brrt :-) 06:51
well, it is no longer as necessary, since invokish was removed 06:56
but the tl;dr is that MoarVM uses explicit control flow throughout;, that this makes implementation of things like resumable exceptions possible (unlike, say, implicit control flow through recursing over the C stack), 06:58
but the problem for the JIT is that as far as the 'C' stack is concerned, JIT compiled code is running 'on top of' the interpreter frame
i.e. [ start | interpreter | JIT -> (grow upwards) 06:59
now if the JIT compiled code needs a service from the interpreter (like, invoking a new frame), it needs to store its current state and return to the interpreter, rather than call into a new interpreter frame
which means that we need to explicitly managage our current position and state in the JIT as well as in the interpreter 07:00
my recent patchset was about finding a much cleverer way to do this than we had before
07:36 zakharyas joined 07:47 zakharyas left, zakharyas joined 08:17 lizmat joined 08:30 domidumont joined 09:01 zakharyas left 10:03 zakharyas joined 10:13 brrt left 10:38 brrt joined 10:54 Kaiepi left 10:59 zakharyas left 11:14 lizmat left 11:17 lizmat joined 12:23 brrt left 12:36 Kaiepi joined 12:53 brrt joined 13:02 zakharyas joined 14:13 zakharyas left, zakharyas joined 14:36 scovit joined
Geth MoarVM/pluggable-spesh: 22 commits pushed by (Jonathan Worthington)++
review: github.com/MoarVM/MoarVM/compare/3...0e61206253
14:59
jnthn Just a rebase 15:00
15:09 robertle left 15:14 raiph joined 15:29 brrt left 15:34 domidumont left
Geth MoarVM/pluggable-spesh: 9eebf10d05 | (Jonathan Worthington)++ | src/spesh/plugin.c
Make spesh plugins work correctly with the JIT

We already put a deopt annotation on the prepargs that we are going to delete as part of the optimization. Steal that and put it on the guards, so that we get a viable position to use for deopt_offset. It doesn't matter that this will not result in a very precise position, since this is just used to answer the question of "which (if any) of the inlines are we in". All the guards will fall within the inlines.
15:59
16:58 raiph left 17:04 robertle joined 17:06 zakharyas left, zakharyas joined 17:30 domidumont joined 18:04 zakharyas left 18:32 lizmat left 18:47 lizmat joined 18:53 ggoebel left 19:00 ggoebel joined 19:07 domidumont left 19:12 brrt joined 19:47 zakharyas joined
robertle folks, I am wondering whether you could help me understand something about GC in general. a bit off topic, but perhaps you find it interesting as well. 19:59
I am building a toy scheme interpreter, and I have some memory management and GC already. my heap objects are not movable, so that the immediate value on the stack or in another heap object can just be direkt pointer to the heap object. 20:00
20:00 brrt left
japhb robertle: gchandbook.org/ 20:01
jnthn It can be a direct pointer even if they are movable. :)
(You just need to be able to find all the pointers :)
robertle this works, but I cannot compact my heap, I can't do generational GC, and my allocation is quite involved as I need to find an empty slot
japhb I was serious about that link.
robertle japhb: I'll definitely read it! 20:02
japhb It (and it's predecessor) are very good. :-)
robertle jnthn: right, but finding all pointers is quite hard too
jnthn I read that book before doing the MoarVM GC. It was *very* helpful :)
robertle: Well, at the very least it enforces that you know certain things 20:03
japhb Heh
robertle scheme creates insane amounts of short-lived garbage in the form of cons cells, so I would like to play with a different strategy. one where I can just use a quick bump allocator into a new generation, and then gc-copy into a different generation
jnthn e.g. you can't just do a conservative stack scan
japhb robertle: half-space compacting nursery sounds like something useful 20:04
(MoarVM uses that too)
jnthn Yeah, I picked semispace for Perl 6 'cus it does the very same
Tons of short-live Scalars for example
And Str and Int and friends are immutable
*short-lived
robertle the problem I currently have is that I seem to have a fundamental problem understanding something.
20:05 brrt joined
japhb jnthn: How many semispace copies before promote to second gen these days? 20:05
robertle so my first attempt was to have my immediate values still be pointers, but to boxes in a fixed location. that box then points to the actual heap. this way I can move the actual entry
jnthn @japhb 1 (well, 2 if you count the copy inot gen2)
robertle but that stinks big time: allocating the boxes is about as complicated as my previous allocation scheme
jnthn So there's just a flag in the header saying "we already saw this in the nursery" 20:06
robertle unsurprising really, they are also on the heap and immovable..., just smaller
japhb nodnod
robertle: You are going to recreate the history of GC manually if you don't watch out. ;-)
jnthn robertle: Yeah, I suspect you need to go all-in with moving
japhb s/manually/the hard way/
robertle I tried to understand how this work in moar, but don't get it:
if you have a value in say a register, and it refers to an object on the heap. does it refer directly to the heap object? or via an indirection? 20:07
japhb robertle: ... also, to the top of the object? To the GC header? To the object field that's the actual referenced thing? 20:08
robertle not sure, moar is so much more complicated than my toy that I can't really say. but really, anything on the heap that can be moved and that gets GCed 20:09
what's the most simple thing that still ends up on the heap?
japhb I suspect Scheme has simple enough primitives that you don't need to take references to object fields, as you are taking a reference to an entire object or a primitive directly.
robertle correct, I only 20:10
ever reference objects
japhb robertle: Careful to separate "implementation language's heap" (malloc and friends) and "target language's heap" (cons and friends)
robertle ok, I meant target language heap 20:11
jnthn robertle: It refers directly to the object's memory, which starts with a predictable header
japhb And for the latter, the MoarVM equivalent is I believe MVMCollectable or so.
jnthn robertle: That header lets us find out what kind of thing it is, so we can in turn discover its pointers
robertle ok, that much I understand.
jnthn robertle: We also have metadata telling us which registers hold objects
The big change in terms of what you work with inside of your GC is probably going to be that you get an extra LoI 20:12
So the queue of things to process is not pointers to objects, but rather pointers to pointers to objects
robertle "LoI"?
jnthn So that the GC can update them
Level of Indirection
japhb Level of Indirection
jinx!
jnthn :)
jnthn gotta go, but bbi20 20:13
japhb o/
jnthn (happy to continue answering questions then :))
robertle hold on a moment. so in my very simple case I have values e.g. on the stack. in the case of e.g. a string, that value has a type tag and a pointer to the string object on the target language heap 20:14
you are saying that is basically the same in moar? the vale in a register directly references target heap memory?
japhb robertle: MoarVM has virtual registers. 20:15
There is separate handling of pointers in machine registers, which is what the MVMROOT macro is for. 20:16
brrt actually, being pedantic, MVMROOT is for collectable objects that don't live in registers, since registers are GC roots by default 20:17
japhb (It marks that pointer as being in the "root set" which is the list of starting points, the roots of the forest of object graphs, that will eventually lead to all live objects in the target language's heap
robertle ok, but still. you have a pointer to the heap object. so if you move the object during GC, you update the register?
japhb )
brrt and registers are marked as containing collectable objects
japhb brrt, apologies for that.
brrt my apologies for pedantry :-)
robertle: correct. the GC manages these all the pointers to objects that might be moved 20:18
japhb Yes, GC is generally done at one additional level of indirection, so it operates on pointers to pointers, because it needs the pointers to be *themselves* writable.
brrt so suppose object a refers to object b; the GC takes a pointer to the pointer from A to B
robertle right, we are getting closer to where my brain gets stuck: so your GC has a pointer to the register, so that it can update it when it moves the object the register points to 20:19
brrt 'virtual' register, but yes
robertle but surely there can be multiple registers pointing to the same object
and also slots in objects on the heap
brrt effectively more like a stack address
yes
so these are managed in an array of pointer pointers 20:20
in c, this is a pointer pointer pointer :-)
japhb
.oO( You had me at *** ... )
20:21
robertle and also slots in objects on the heapto the same object?
brrt all of 'm 20:22
robertle what? I didn't mean to write that last line, doesn't make any sense! what I wanted to say:
brrt also, because of generational GC, we also maintain all cross-generation pointers
robertle and also slots in objects on the heapto the same object?
brrt yes
robertle very simple case: one object on the heap, two registers that point to that object. GC kicks in. what happens? 20:23
brrt GC creates a list of pointers to the registers 20:24
GC moves the object
GC updates the registers through the pointers
done
(a 'list' being a contiguous array of memory) 20:25
i.e. an array
robertle ok, but I don't see how that is done effectively: you either update the list of pointers when you move the object. so go through the whole list and update every pointer that points to the old location. for every object. 20:26
I don't see how I can make that not quadratic
brrt it seems very linear time to me :-) 20:27
you only update each reference to the old position once 20:28
and you don't need to look them up because you've stashed them in a list
robertle right, but you need to scan the list of all pointers. and you need to do that
for each object moved
brrt but you scan all objects in a separate phase, once, then figure out which ones you need to move 20:29
robertle ok, so I have a list of all object that need moving. possibly with an old and new location
brrt yes. and you *also* have a list of all references to those objects
robertle hold on, that may be the critical part 20:30
where do you keep that list of per-object references?
brrt you create it while scanning the heap for all reachable objects 20:32
robertle k, but you somehow have to link it from the object itself?
brrt every object you encounter, you maybe put it on the list to be moved, and alll to objects you encounter, you put on the list of references-to-update
robertle getting closer, this is great 20:34
so what data do you have as part of the heap object to support this? I would imagine you either need to store a pointer from the object to the list of references to it, or have a slot in which you temporarily store the new location 20:35
dude that book isn't exactly cheap! but there you go... 20:37
brrt i'm confused now... what we have is a flag that says if something is first-or second gen (i.e. moveable or not), and a function that lists all references within the object 20:38
anyway, i'm afk. i wish you good luck :-)
20:39 brrt left 20:52 zakharyas left
jnthn robertle: Perhaps the key part missing here is the need for a forwarding pointer 20:57
robertle explain please
jnthn If we have 3 references to the same object, then when we discover each of them we'll put a pointer to location of the reference into our "todo" list 20:58
We do need to update all of those references when we move the object, *but* we only want to move the object once 20:59
robertle ah, wait!
don't say anymore
I get it: whan you move the object, you put a pointer in it's place that refers to the new location. 21:00
right?
jnthn Yeah. I guess in your existing collector you already have some kind of "seen" bit in your object headers to know which objects you already marked, so you don't get stuck when marking self-referential data structures? 21:01
In a moving collector that bit really becomes an "I already moved this" 21:02
robertle yes, I have two bits of GC management per heap object
jnthn So there's two cases when processing a pointer to an object reference
The first one is you come, follow the pointer to the object, and observe that it doesn't have the "already moved" bit set
In that case, you 1) copy the object's memory to the new location where it will live, 2) set the "already moved" bit, 3) save the new address inside the *old* memory of the object (which is safe because you already copied it), and 4) update the reference that you saw to point ot the new location 21:03
The second came is that you follow the pointer to the object and see that object was already moved. You the find the address written in 3 and then do 4 21:04
*second case
robertle cool, understood. and I don't even need an extra bit: i currently have a "mark" bit. if I immediately move the object when I mark it, it's the same :) 21:05
jnthn Right :)
robertle beautiful, that solves my problems and isn't even so much work to do from where I currently stand
jnthn In the MoarVM collector, the second case is handled here: github.com/MoarVM/MoarVM/blob/mast...ect.c#L214
There's only one line of actual work in there, the rest is all just debug/assertion :) 21:06
And here is the first case: github.com/MoarVM/MoarVM/blob/mast...ect.c#L312 21:07
It's a bit more involved here there's the generational thing going on, but you can see the memcpy and the bit being set
robertle great, and it also solves one of my worries which is the extra memory I need for GC and heap management: most objects are cons cells, and they are just 16bytes on the heap. I really need to avoid large data management data 21:08
jnthn process_worklist is *the* heart of the MoarVM collector, really
Pretty much everything else is just supporting that algorithm :) 21:09
robertle totally unrelated to this question: but does moar do any of the GC work in parallel with heap-mutating code? or at least do some stop-the-world GC work in parallel?
jnthn (Which in places is still quite tricky, for example sync-up of threads)
No and yes
The first case tends to be called a concurrent collector (it collects concurrent with the running code) 21:10
robertle yeah
jnthn The second is a parallel collector
robertle so moar is non-concurrent but parallel?
jnthn MoarVM's collector is parallel - multiple threads join in to do the collectiong work - but it's barely parallel
oops
MoarVM's collector is parallel - multiple threads join in to do the collectiong work - but it's barely concurrent
At the end of the collection process, it reaches a point where individual threads can finish up their bit and then go on their way as soon as they have 21:11
But for most of the time it needs the world to be stopped
There's a latency/throughput trade-off here 21:12
robertle ok, and what did you mean with the "sync-up of threads" before?
jnthn Taking all the running threads and making them stop their work and come join in with GC
That's what the stuff in src/gc/orchestrate.c does
robertle ok, I see 21:13
jnthn But it also has to account for stuck threads
e.g. those that are doing blocking I/O or sleep or waiting on a mutex
In MoarVM, when going to wait on such a think, the thread flags itself as unable to join in with GC 21:14
*thing
robertle ah, darn! my toy doesn't have threads yet, so I am good
jnthn Yeah, that makes things far easier :)
When you do, you'll notice that being first to move an object is a data race, and will need a scheme to deal with it :)
The MoarVM scheme is that every object has an owner, and we delegate its GC work to the thread responsible for its owner 21:15
robertle you mean with multiple threads doing GC? right, that's why I was asking...
jnthn That's what the "work passing" stuff in collect.c
is doing
A potential alternative is to race to write the forwarder pointer 21:16
But there's a tricky detail there, I guess, because how do you know if the address there is a forwarder or not? 21:17
One way to deal with that is to make every single object 1 pointer bigger so that can always be zeroed to indicate "not moved"
But that makes every object bigger
Another would be to require the platform can do a double-cas and then install the pointer and a header bit together 21:18
But that may not be portable enough
robertle my heap is segmented into arenas, and each arena has it's own list of things that still need to be done. so in theory I could shard by arena. I then would need to protect the list when appending/consuming...
jnthn Yeah
Yeah, there's what MoarVM is doing I guess. Each thread has its own nursery
robertle you would need the pointer and the header bit to be quite close together to be cas-able. I currently have them quite far apart 21:19
jnthn It's the forwarder in the object and the header bit in the object, so they're close enough
But you still need a double-cas operation
robertle btw, this is the toy: github.com/robertlemmen/keiryaku 21:20
jnthn I guess yet another scheme is to have two bits, CAS in place the "I'm going to write a forwarder" bit, if you win then write the forwarder, then cas in an "I'm done writing the forwarder". If you lose the first race then you have to spin.
I guess with enough clever you could use that as a fallback on platforms without double-cas and use the double-cas where possible :) 21:21
But we'd actually then have a new problem if we did that in MoarVM
robertle ?? 21:22
jnthn We rely on the things in a particular thread's fromspace (bit of memory we copy from) to land up in the same thread's tospace. 21:23
Which in turn means we know we'll always end up with no more in tospace than fromspace, so we can't ever overflow it
We'd need a new scheme to deal with that
Also, the current design tries to keep things local, and we'd lose that too 21:24
So maybe the current work-pass scheme ain't so bad after all :)
robertle hmm, all very difficult. do you have much in terms of benchmarks or so to test the GC against? 21:25
jnthn I've mainly tuned it on real-world workloads, since we have plenty of those :-) 21:26
CORE.setting compilation is quite good for measuring how well generational things are working
Hammering a multi-threaded web server isn't bad for looking at the threading case 21:27
I rewrote the thread sync-up algorithm at some point after looking at profiles of the latter
robertle aren't they very fickle? I have an app at work for example (java). it creates shitloads of garbage, so really the main performance factor is the nursery size. if short-lived shit makes it into the real heap, then performance goes down the drain...
jnthn That's a factor too, though also a big nursery means longer collects and less cache locality in the collects 21:28
What I did find was very worth tuning was the decision about when to do full collects 21:29
robertle exactly. so more tuning required than one wants to do...
more generally: are you happy with it's performance in the web service (i.e. cro I guess) case?
jnthn Pretty sure a tuning there got 10% off CORE.setting compilation by putting into MoarVM an algorithm that used the overall heap size to decide how often to collect gen2
robertle I played with that a while ago because it's also what I do at work, and have to admit the results seemed a bit mixed to me 21:30
jnthn Happy with performance overall with Cro overall? Not really yet, though it's easily handling my production workloads. 21:31
But GC isn't much of a factor there, I don't think.
robertle what do you think are the big possible performance gains still to be had then?
jnthn A lot more of the cost comes from concurrency control overheads in Supply
Which is in no small part because LEAVE is essential for safely releasing locks, but also a pig. 21:32
I'm overall quite happy with the GC itself. We could do with escape analysis and "stack" allocation. 21:33
Which would mean that a lot less things ever end up being the GCs job
robertle ah! I have read and tried to understand dybvig's scheme paper, which I believe is at it's core about escape analysis and figuring out when stuff can safely be put on the stack 21:34
but it's mind-boggling
jnthn The other thing is that we've just a bunch more things to do to turn Perl 6 programs into better quickened bytecode and/or machine code
I'll probably spend much of next week re-working Scalar and assignment 21:35
Because at the moment we can't optimize out assignment type checks or lower it into something cheap, despite it being a really common thing.
robertle hmm, yeah. I noticed that there is quite some work going into hand-crafting core functionality in nqp. should't it be rakudo that generates reasonable code from perl6?
jnthn The problem is that a lot of the properties that stops it generating simpler code are things that aren't so easy to do static analysis on 21:36
robertle are you saying the language shouldn't be quite so late-binding? ;)
jnthn Not really, I'm saying that Rakudo's code-gen isn't really where we can make the gains 21:37
robertle ok, but if that is the case, why does it help if people write clever nqp implementations of stuff?
jnthn And where we do need to generate different/better code, the main reason is so the VM can do a better job 21:38
Because they aren't writing Perl 6 code by that point, and thus avoiding various of the trickier-to-analyze things
Same reason that NQP programs generally run faster 21:40
robertle hm, ok
jnthn Gotta go again for a bit 21:41
robertle anyway thanks for the chat, the forwarding-pointer-in-old-heap location was exactly what I was missing. I am sure that is going to make it relatively easy to switch to a bump allocator and generational GC.
I need to be in bed as well :) 21:42
jnthn :-)
'night o/
22:26 robertle left 23:44 Kaiepi left