dalek | Heuristic branch merge: pushed 25 commits to MoarVM/even-moar-jit by bdw | 05:50 | |
05:50
brrt joined
|
|||
brrt | good * again | 05:55 | |
hey #perl6 | 05:59 | ||
eh | 06:00 | ||
#moarvm | |||
honest question | |||
is writing a binary heap obscure? | |||
(an implementation of) | |||
konobi | brrt: i'd have a look at libconcurrencykit for something like that | 06:01 | |
brrt | well, i wasn't necessarily asking for a concurrent binary heap | ||
although that is possible too, and fun | 06:02 | ||
konobi | this is concurrent in NUMA terms | ||
brrt | okay, for the fun of it i'll check out libconcurrencykit, but that was not really my question | ||
konobi | so dealing with the swathe of optimized versions that work across numa implementations | ||
what are you trying to get out as an answer? | 06:03 | ||
brrt | 'yes, i wouldn't write a binary heap unless it was really necessary and i had done proper preparation', 'no, writing a binary heap is childs play', and the latter possibly followed by 'which is why i don't spend my time on childish things' or possibly 'i do it all the time' | 06:04 | |
basically, i'm trying to assess if the people who claim that writing a binary heap is obscure - something i see quite a bit - are thinking in the same way about it as I am, or whether my perspective is warped somehow | 06:05 | ||
i know writing binary heaps is a simple exercise that nearly everybody doe, or rather, most people with CS education do | |||
it's not just about the binary heaps though :-) | 06:06 | ||
geekosaur | you may be seeing a division between people with formal CS / algo training and those of us without it | 06:07 | |
brrt | hmm | ||
okay | |||
geekosaur doesn't even remember what a binary heap is ... self taught, algorithms is still a shortcoming | |||
brrt | a binary heap is a tree structure in which every node has two children and satisfies the heap propertie in which every node is larger than both of its children | 06:08 | |
it's usual implementation is in a linear array | |||
hmm okay | 06:09 | ||
but that is confusing... | |||
(and it's purpose is to get the smallest element at constant time while maintaing a heap through inserts and deletes with log(n) time) | 06:10 | ||
otoh, my perspective probably /is/ warped; i started out with C, from a book, that had me do exercises on merge sort and quick sort, linked lists and trees just to get started | 06:14 | ||
if i comparse that with what I later did making PHP websites, then, yes, binary heaps are comparatively obscure | |||
decommute, apologies for being weird this morning & | 06:26 | ||
06:40
domidumont joined
06:44
domidumont joined
06:45
lizmat joined
08:10
zakharyas joined
10:12
TheLemonMan joined
|
|||
arnsholt | Yeah, I think the division (re: binary heap) is mostly theoretical background vs. non-theoretical background | 10:13 | |
A binary heap isn't *terribly* hard to implement, but it's a data structure whose utility isn't necessarily immediately obvious | 10:14 | ||
10:31
lizmat joined
|
|||
moritz | the problem with big binary heaps is that they tend to be not very cache friendly | 10:51 | |
arnsholt | Yeah, there's a very interesting Poul-Henning Kamp article about that | 10:58 | |
queue.acm.org/detail.cfm?id=1814327 | |||
konobi | yeah, it's why i had mentioned libck... as they have a variety of heap implementations, each with better relative performance based on workload (and data/graphs to provide evidence for that) | 11:08 | |
11:09
nine joined,
mtj_ joined,
nwc10 joined
11:10
mtj__ joined
11:11
mtj_ joined
11:22
lizmat joined
12:01
lizmat joined
|
|||
TheLemonMan | jnthn, moarvm's #145 can be closed | 12:17 | |
and #175 too, I can't reproduce it anymore | 12:21 | ||
JimmyZ | #175 maybe need a test | 12:35 | |
12:38
lizmat joined
|
|||
jnthn | Seems JimmyZ++ beat me to closing it :) | 12:40 | |
JimmyZ | didn't close #175 yet, I think is needs a test to roast | 12:41 | |
TimToady fondly remembers his first heap, written as part of a scheduler for a discrete event simulator | 13:09 | ||
written in Pascal, but definitely not a toy | 13:10 | ||
ran on a very expensive Amdahl, so speed was of the essence | 13:11 | ||
14:23
brrt joined
|
|||
brrt thinks heaps are just about one the most awesome data structures so is probably biased | 14:23 | ||
*one of | |||
timotimo | heaps are cool | 14:24 | |
the thing is, that heap that's presented in that article is also a heap | |||
its allocation strategy is quite a bit more complicated than binary heaps', but as soon as just a single page has been swapped out and has to be grabbed from even an SSD, it outperforms a binary tree | |||
brrt | yeah, it's a heap, and it's is a binary heap in the sense that a heap node has two children | 14:25 | |
timotimo | on the other hand, you're writing our JIT. you probably instantiate that data structure at the beginning of jit compilation and purge it at the end | ||
brrt | yes | ||
timotimo | that's true | ||
brrt | so, i'm kind of having the nagging feeling PHK overstated his achievements somewhat | ||
and, yes, i'll kill my data structures as soon as i get the chance :-) | 14:26 | ||
timotimo | data structures must perish, yeee-art! | ||
brrt | but, on the other hand, not too quickly | 14:27 | |
i'm wondering if i can figure out the worst-case storage requirements for the definitions-and-uses array | |||
to put it shortly: i can and do precompute the number of definitions-and-uses which i need | 14:28 | ||
i need definitions-and-uses to figure out where to insert stores and loads after spilling, and whether to split a live range | |||
however, if i spill something, then the single live range covering it is split into N live ranges (one for each definition and one for each use) | 14:29 | ||
<rubberduck>: so suppose i have X live ranges, and all together they have Y definitions and Z uses...a | 14:30 | ||
now, i know that X <= max(Y,Z) | |||
timotimo | you think someone's going to purposefully write code that explodes the size requirements of the jit? :) | ||
brrt | no, i think, how am i going to make space for all the 'tiny live ranges' that have split of from the main live range | 14:31 | |
considering that the live ranges themselves are on a dynamic array so i can add them as i like | |||
but i'd prefer to have not X tiny buffers of arrays, most of which contain 1 definition and 1 use, for each of the definitions/uses per live range | 14:32 | ||
TimToady | is that typical? | ||
brrt | in unoptimized expr jit code, yes | ||
when i have written the optimizer, the case of single-definition-many-uses will be more common | |||
TimToady backs away from the yak | 14:33 | ||
brrt | yak? :-) | ||
TimToady | the one you're about to shave... | ||
brrt | haha | ||
yaks like these make projects like graal/truffle make sense | 14:34 | ||
but, anyway | |||
what i do, is i allocate a single buffer of Y + Z in size, and assign pointers into that to each of the live range objects | 14:35 | ||
however, i may have to spill some of the live ranges.... | |||
hmmm | |||
let me think about that for a bit | 14:36 | ||
suppose we have a live range that is like this: { defs => [ 1, 4, 9], uses => [2,3,5,8,13] } | 14:37 | ||
spilling this live range means: - inserting stores after 1,4,9; inserting loads before 2,3,5,8,13 | |||
producing not 1 but 3 (for all defs) + 5 (for all uses) live ranges | |||
(let's ignore the fact that 1 and 2 and 3 are just after one another) | 14:38 | ||
so what i get is: [ { defs => [1], uses => [1] }, { defs => [2], uses => [2] }, { defs => [3], uses => [3] }; -- where the 'defs' before the old uses refers to the _load_ i insert before, and the 'uses' refer to the _store_ i insert after | 14:39 | ||
however, that at most makes Y + Z new live ranges, Z new definitions, and Y uses | 14:41 | ||
does that make sense? | |||
it means i can 'fragment' my live ranges as much as i like and never need more than Y + Z live ranges and (Y + Z) * 2 uses/definitions | |||
and since memory is cheap, i'll allocate X + Y + Z live ranges, and will never have to reuse an old live range | 14:42 | ||
/splitting/ live ranges complicates that, though | 14:43 | ||
however, not by all that much, since much the same logic applies.... | 14:44 | ||
also | |||
when i'm going to split a live range, that's usually because i'm going to spill part of it | 14:45 | ||
so it is feasible to allocate the 'spilled' part temporarily and then the usual logic applies again | |||
on the other hand | 14:49 | ||
splitting is an advanced feature | |||
doesn't have to be in there immediately | 14:50 | ||
timotimo | doesn't sound like it's optional :) | 14:53 | |
brrt | spilling isn't optional. but splitting is being clever about spilling | 14:57 | |
[Coke] is sad you cannot shave the yaks in FarCry 4. | |||
brrt | i.e. suppose i need a register used by our fictional live range after sequence point 5; then all of [1,4] and [2,3,5] can use the in-register version | 14:58 | |
timotimo afkbbl | |||
14:58
lizmat joined
15:18
domidumont joined
15:21
domidumont joined
16:24
domidumont joined
|
|||
timotimo | jcdav.is/2016/09/01/How-the-JVM-com...r-strings/ - i guess we're still very far away from "string comparison is a big part of our performance", but this is still an interesting read | 17:39 | |
hoelzro | timotimo: that guy's blog is really interesting | 17:53 | |
I got hooked on it when I saw his post on how the JVM uses SIGSEGV tricks to implement a GC fence | 17:54 | ||
hoelzro doesn't know if that's the right term | |||
jcdav.is/2015/11/09/More-JVM-Signal-Tricks/ | |||
18:03
zakharyas joined
|
|||
TheLemonMan | iirc D uses a similar approach | 18:32 | |
hoelzro | oh, really? that's cool | 18:35 | |
18:38
domidumont joined
|
|||
TheLemonMan | hoelzro, github.com/dlang/druntime/blob/mas...1910-L1913 | 18:40 | |
hoelzro | hmm, interesting | 18:54 | |
so the thread that's GC'ing (I presume) fires a SIGUSR1 at each thread to get them to pause? | |||
how do they know that that thread is in any sort of consistent state? | |||
TheLemonMan | if there's something that modern database taught us is that consistency is overrated :P | 19:01 | |
timotimo | At first glance, this doesn’t seem to serve any real purpose right before the return. Rather than doing any useful work, its just a way of trying to read from 0x7fce84071000, the designated polling page. Modifying access to this page lets the JVM stop threads cleanly - in places where the state of the world is well-known. | 19:03 | |
jcdav.is/2015/11/09/More-JVM-Signal-Tricks/ | |||
jnthn | iiuc, you just try to deref something at each safepoint and if it SIGSEGVs or so then you enter the GC | 19:22 | |
It avoid the branches around various Moar safepoints | 19:23 | ||
timotimo | right | ||
jnthn | At the cost of needing platform-specific trickery :) | ||
timotimo | right | ||
not something we need immediately | |||
otoh, we can mprotect "don't access" other threads to make them grind to a halt | |||
when we want to dump backtraces of all threads and exit the interpreter immediately | |||
probably not easy to make work right | 19:24 | ||
jnthn | Remeber that this doesn't free you solving the blocked threads problem | ||
timotimo | well, if we were going to crash and burn anyway ... :) | ||
you mean threads blocked with something that sets them "blocked"? | 19:25 | ||
jnthn | A thread blocking on I/O, a mutex, etc. is not executing | ||
Right | |||
timotimo | the only big problem about that is when it goes on to immediately continue doing stuff | ||
er, i suppose that's already obvious to you | |||
jnthn | At the moment the Moar mechanism for that + safepoints is unified into a single flag :) | ||
Held per thread for cache friendliness | |||
timotimo | mhm, fair enough | 19:26 | |
21:42
lizmat joined
22:45
dalek joined
|