01:29
cognominal joined
|
|||
timotimo | hm, we currently don't measure MVMString allocations at all in the profiler. i wonder how we could do that properly | 01:32 | |
01:48
ilbot3 joined
02:13
lizmat joined
05:29
lizmat joined
06:46
domidumont joined
06:51
domidumont joined
07:54
lizmat joined
|
|||
jnthn | timotimo: Just stick in allocation logging instructions after all the string ops would be a start. | 09:48 | |
12:08
brrt joined
|
|||
brrt | ehm, how much worth my while does any of you think it to be for me to refactor spesh allocation out of spesh into its own thing | 12:55 | |
jnthn | Well, are you going to use it for something else? :) | 12:57 | |
brrt | the jit more or less wants its own thing, because it has much shorter lifetime than the spesh graph | 12:59 | |
to be preciese, 'structures generated by the compiler tend to live much shorter than the spesh graph' | |||
arguably not the biggest worry | 13:00 | ||
timotimo | what i've seen for "extremely short lifetime" is having a circular buffer that gets cleared per-frame in OpenGL-like stuff; in particular for the nintendo 3ds | 13:01 | |
brrt | doesn't work, because i can't predict how much memory i'll need | 13:02 | |
circular buffers are / can be cute, though | 13:03 | ||
timotimo | well, if your current memory needs exceed a single buffer, build a second one | ||
if your lifetime is *really* short, you can use the stack :) | |||
jnthn | brrt: Hm, don't we typically discard the spesh graph right after the JIT? | 13:21 | |
brrt | hmmm | ||
true | |||
jnthn | brrt: Or are you saying you only need the memory for one phase of the JIT? | ||
brrt | ehm, well, ehm, | ||
let me think about the correct answer | |||
timotimo | we're still keeping memory around a bit for logging and such | ||
jnthn | If yes, you do get a win in terms of lower memory overhead but you have to be darn careful you don't get pointers in the wrong direction between the lifetime'd regions. | ||
timotimo: Yes, but we JIT after that :) | |||
timotimo | whereas we enter jit, use the memory, leave jit and kick it out immediately, i'd say | 13:22 | |
jnthn | That is, at the logging phase we're still interpreting. | ||
brrt | the thing i'm thinking about is the tile list and the value descriptors | ||
jnthn | Then we spesh based on the logged stuff, then we JIT. | ||
brrt | those are per-basic block | ||
timotimo | right | ||
brrt | at best | ||
you'll never have a pointer to any of these from the spesh graph | 13:23 | ||
13:29
zakharyas joined
|
|||
brrt | its probably not important enough; my second consideration was 'not have to walk through three object layers in order to get at the spesh pool' | 13:30 | |
timotimo | we could just pass the spesh pool along on the stack if we're so worried about performance | ||
or hope that the c compiler and/or cpu caches will make it work fine for us | 13:31 | ||
brrt | i'm not | 13:33 | |
i'm worried about convenience | |||
performance, what, me worry? | |||
timotimo | :D | 13:34 | |
well, we can still have macros | |||
brrt | ok,ok, i'll do something useful instead | 13:35 | |
... whenever i actually have time | |||
timotimo | i was wondering: with the short-string-cache, we could compile things like substr when we know the argument for length is 1 to immediately go through the cache in the jitted code and only hit the C function if we know the cache isn't hit | 13:37 | |
but that'd also mean we'll hit the cache twice. though the cache will already be in the cache :P | |||
hm, except, substr has to go through the grapheme iterator | |||
it could work for chr, though | 13:39 | ||
brrt | better answer | ||
we can transform, at spesh or preferably JIT time, the substr, etc. ops into 'low level string operations code' | |||
these we can JIT fast | 13:40 | ||
timotimo | hm, you're suggesting we have some basic operations like "gimme a grapheme iterator", "destroy the grapheme iterator", "advance the iterator", ... | ||
brrt | kind of, yes | 13:41 | |
timotimo | that's not bad | ||
brrt | it's a lot of work | ||
timotimo | with the expr jit that works much simpler than with current spesh | ||
brrt | right | 13:42 | |
(i may want to add a LOOP primitive, just to make it more like LISP) | |||
timotimo | :D | ||
brrt | actually, there are other, better reasons for it, but it is quite odd | 13:43 | |
timotimo | fair enough, yeah | ||
brrt | the better reasons are that we'd like to have explicit which variables are updated in the loop, so that we can take them into account | ||
timotimo | oh? | 13:44 | |
is that for certain optimization techniques? | |||
or just better compilation or something? | |||
brrt | in this case, to keep the tree structure meaningful | ||
e.g. suppose we have a loop that updates two variables | |||
in lisp, that would be something like | 13:45 | ||
(loop ((x 1 (+ x 1)) (y 10 (- y 1)) (z 0 (+ z (* x y)))) (> x y)) | 13:46 | ||
something like that.. actually, there is supposed to be a body in there | |||
point is, the loop terminates when x is greater than y | 13:47 | ||
which node, in this case, defines x on output | |||
actually, i'm not sure that makes a lot of sense | 13:48 | ||
i'd think a LOOP would be void-valued in most cases | |||
better point: expression language doesn't support direct assignment to either x or y | 13:49 | ||
timotimo | i'm not sure i follow | ||
brrt | no, i'm not sure i do either | 13:50 | |
:-) | |||
timotimo | you don't mean how the ((x 1 ...)) part declares a starting value for x? | ||
brrt | my point is basically this: the expresson 'tree' forms a DAG, right? | ||
timotimo | right, you had that point in the past | ||
and how things have to get replicated so that you can, for example, refer to the same value twice | 13:51 | ||
brrt | uhuh | ||
well, suppose i have a loop that calls do_foo (x,y) repeatedly | |||
i'll need to have something that refers to the notion of x and y inside the loop | 13:52 | ||
timotimo | right | ||
brrt | because they are not just their original values anymore | ||
the context of this was 'it would be cool if the expression jit could deal with multiple-basic block regions, preferably hot loops' | |||
low level string ops would fall into that category | 13:53 | ||
timotimo | mhm | ||
brrt | the answer in this case was 'we should have a looping structure to keep track of the changes' | ||
timotimo | afl has certainly found a crapton of crashes, claiming a whole bunch of them are unique | 14:01 | |
but i suppose many still fall into the same category either way | 14:02 | ||
m: say ^593 .pick(3) | 14:03 | ||
camelia | rakudo-moar 9b579d: OUTPUTĀ«(273 436 151)ā¤Ā» | ||
timotimo | those are the crashes from the S1 crashes folder :) | 14:04 | |
a bunch of invalid writes of size 1 inside MVM_bytecode_finish_frame | 14:08 | ||
also, a calloc of size 0 | 14:09 | ||
this next one isn't as interesting, i bet. invalit write of size 8 coming from arg_s, so it's probably just a far-outta-range argument to arg_s | 14:13 | ||
jnthn | Hm, though the validator should really catch those | ||
14:14
lizmat joined
|
|||
timotimo | yeah, i don't think it has code for that yet, though | 14:15 | |
and just looking at it from afar tells me it's annoying to write code in it | |||
jnthn | urgh, hottest day this year | 14:17 | |
timotimo | hopefully "ever", not just "so far" :) | 14:18 | |
jnthn | Well, tomorrow - if the forecasts hold up - it'll break. | ||
(Thunderstorms forecast) | |||
Wouldn't surprise me if it hits this temperature again later on in the year. | |||
timotimo | probably :| | 14:19 | |
jnthn wonders if it's time to get the fan | |||
timotimo | just yesterday i got a link to a good AC unit and i was surprised how relatively speaking it was kind of cheap | ||
but i guess it's also expensive in the "eats all your electrons" way | 14:20 | ||
or rather: brakes the wobbling of all the electrons between you and the nearest power plant | 14:22 | ||
jnthn | They all need some way to stick a pipe to outside, though, I think? | ||
timotimo | yeah | 14:23 | |
the same page that was on also offers a ... well, it looks like a huge sock you put over the window and the hose goes through that | |||
i have no idea how it's supposed to work :) | 14:27 | ||
glue it onto the window frame, perhaps | |||
jnthn | The fan is deployed :) | 14:28 | |
Hmm...interesting memory corruption is interesting... | 14:37 | ||
Smells like GC | 14:42 | ||
timotimo | "EU now has 1 GB of free space" | ||
14:47
brrt joined
|
|||
brrt | i should have studied so much more math when i had the chance... | 14:47 | |
jnthn | Duh, found it | 14:50 | |
dalek | arVM/new-multi-cache: 621fe27 | jnthn++ | src/6model/reprs/MVMMultiCache.c: Add missing MVM_ASSIGN_REF. Fixes missing write barriers when assigning into a multi-cache, which caused various crashes. |
14:52 | |
brrt | jnthn++ | 14:53 | |
jnthn | Hopefully spectest comes out a bit better now :) | ||
Yup :) | 14:57 | ||
nwc10 | ASAN's verdict might still be "a dead mouse on your carpet" | 14:59 | |
(it's running) | |||
jnthn | Maybe, though if it is then the issue is well hidden | 15:01 | |
The spectest run had both FSA and GC debugging on | |||
OK, so...we have a new multi-dispatch cache :) | |||
Which knows about nameds, in theory :) | 15:02 | ||
timotimo | does it already make that benchmark liz recently showed a bunch of times faster? | ||
jnthn | No | ||
Because Rakudo's multi-dispatch code doesn't yet take advantage of it. | |||
It assumes it can't install things with named args into the cache. | 15:03 | ||
lizmat | jnthn: where would that need to be fix? in src/Perl6 ? | ||
timotimo | ah, ok | ||
might be in the BOOTSTRAP or near that | |||
jnthn | lizmat: Yeah, src/Perl6/Metamodel/BOOTSTRAP.nqp or so | ||
lizmat: I need to look at the code there carefully 'cus it's been a little while | |||
lizmat | stuff like bind_ons_param and so ? | 15:04 | |
*one | |||
jnthn | No, that's sig binding | ||
Closer to find_best_dispatchee | |||
lizmat | ah, ok | ||
jnthn | lizmat: I *think* it'll need to tease apart "needs a bind check just to validate nameds" from "needs a bind check because of unpacks or constraints" | ||
Which I believe are conflated at the moment | 15:05 | ||
timotimo | oh, could find_best_dispatchee run super crazy often in many of my tests just because nameds are involved? | ||
lizmat | timotimo: yes | ||
timotimo | i should have known :) | ||
jnthn | Quite possibly. I mean, @a[$foo]:exists is the classic example | ||
And what triggered me to do something about it. | |||
timotimo | i have the feeling nobody bothered to tell me about that :P | ||
jnthn | Well, it's about to change, so... :P | 15:06 | |
timotimo | yay :) | ||
jnthn | The new cache is kinda interesting. | ||
It's structured as a tree rather than an array of array of type tuples | |||
So it should be a bit lighter on memory, and a bit faster to search in | 15:07 | ||
timotimo | very cool | ||
jnthn | It hashes on the memory address of the interned callsite | ||
To find the tree top | 15:08 | ||
It also doesn't have a size limit. | |||
timotimo | ooooh | 15:09 | |
jnthn | Which is a trade-off :) | ||
timotimo | right, we still don't have a clue how to clear those out | ||
jnthn | Well, one idea is just to say "if it gets huge, throw it away" | ||
timotimo | right | ||
only the ones that show up often will get back in, then | 15:10 | ||
jnthn | And it'll be reconstructed with stuff the program is currently interested in. | ||
It suffers a bit on megamorphs. | |||
timotimo | yeah | ||
jnthn | Though no worse than what it replaces. | ||
timotimo | surely | ||
jnthn | "in theory" | ||
:) | |||
In practice, not sure :) | |||
timotimo | we don't have diagnostics in place yet | ||
jnthn | Well, I added two | 15:11 | |
timotimo | Oh, ok! | ||
jnthn | Though they're #define'd things | ||
timotimo | that's fine in my opinion | ||
jnthn | One is for dumping the cache on each add to see what on earth is in there. | ||
Well, more like, to see the tree structure :) | |||
timotimo | oh, that's probably a bit noisy | ||
jnthn | The other just points out whenever the cache size hits a power of 2 size | ||
Starting at 32. So 32 entries, 64 entries, 128 entries, etc. | 15:12 | ||
timotimo | so, how does it look on the global scale? we have a single tree that holds everything? | ||
jnthn | No, it's still one cache per proto | 15:13 | |
So they are GC-able in that sense. | |||
timotimo | ah, good | 15:14 | |
jnthn | The data structure is documetned in the MVMMultiCache.h :) | ||
timotimo | ah, this is an interesting design, like a tree of ops | 15:18 | |
and the way it gets safepoint-freed | 15:19 | ||
nwc10 | jnthn: ASAN tolerates your code. No mouse for you! :-) | 15:27 | |
jnthn | Phew! | ||
Don't want to be full for the karahi chicken I'll hopefully find the energy to make in this heat :) | 15:28 | ||
nwc10 | anyway, "lack of mouse" is pretty cool | ||
(er, sorry, not literally) | |||
jnthn++ | |||
I think you've nailed it. At least, at the MoarVM level | |||
jnthn | :) | 15:29 | |
Seems we get a *very* tiny improvement on the %h<a>:exists case simply out of the new cache saying "no" faster | 15:30 | ||
And so faster failing over | |||
Hm, turns out the simplest possible Rakudo patch isn't quite enough. | 15:33 | ||
I want to rest and make dinner, but gist.github.com/jnthn/b1b1a569c930...2b17a29edc is what I tried if anyone fancies figuring out why that doesn't work out | 15:35 | ||
(The reason could be nearly anywhere) | |||
afk for now | |||
16:18
domidumont joined
16:41
lizmat joined
17:31
brrt joined
|
|||
brrt | yes, the tree structure is cool | 17:31 | |
hurray for trees-in-arrays | 17:32 | ||
17:37
harrow joined
17:41
brrt joined
|
|||
brrt | stupid hot weather though | 17:47 | |
timotimo | yeah. with a bit of moisture and absolutely still-standing air .. it's not nice | 17:55 | |
18:42
brrt joined
19:09
FROGGS joined,
lizmat joined
19:23
zakharyas joined
|
|||
brrt | damnit, damnit to hell | 20:11 | |
grrrr | |||
ok, shall i tell you a story that will amuse you | |||
or not | 20:12 | ||
during register allocation, i need to insert loads and stores to ensure that values are in their correct place | |||
in order to load a value, i might need to spill a value | 20:13 | ||
whenever i spill a value, i spill it right after it is constructed | |||
i do that by noting the tile number that created it and inserting a 'spill' tile just after that | 20:14 | ||
however, it is just about possible that the load that overwrites it has to happen just before the next tile | 20:15 | ||
thus, we have tile i, {spill, load}, tile i + 1 | |||
since spill and load are not relatively ordered, it is just possible that i first load and then spill | 20:16 | ||
meaning i overwrite the value | 20:17 | ||
breakage follows | 20:18 | ||
FROGGS | uhh | 20:19 | |
jnthn | ouch | ||
timotimo | ah, whoops | ||
brrt | this is especially annoying, because i just simplified the tile editor by not requiring such relative orders | 20:20 | |
so i'm puzzling how to solve this cheaply and yet robustly | 20:21 | ||
the power-solution would be to allow tile inserts to specify insert-after or insert-before relations to specific tiles, not to order numbers | |||
but that makes the insertion code really complex | |||
because it requires topological sort (i think) | 20:22 | ||
well, not really, really complex, but more complex than i want to have to debug | |||
jnthn | Of course, you can solve every compiler problem with adding another phase. Like, "spit out virtual registers you can have as many as you like of"..."allocate them" :P | 20:23 | |
timotimo | everything ought to be topologically sorted anyway %) | ||
brrt | the not-so-robust solution, but one which might be workable, is to give the register allocator an insert-counter | ||
that is true, except for too slow compilation | |||
case in point: scala | |||
jnthn | Right :) | ||
Yeah, every problem except too many passes. | 20:24 | ||
brrt | and the theory would then be, because i'm deciding to insert the load based on the fact that its value has already been spilled, the spill must have been inserted before the load | 20:25 | |
('its' means: the value in the register) | 20:26 | ||
thus, sorting by the insert-sequence number secondary to sorting to the tile order number would give us the necessary relative ordering | |||
but it feels rather breakable to me | |||
jnthn | Yeah, it has a slightly fragile feel to me also | ||
brrt | its made more complicated because a pre-coloring pass (which is in the works) assigns registers in backwards order | 20:27 | |
timotimo | i'm minimizing a bunch of crashing test cases now, so that i can download it (and give it to y'all) | ||
brrt | i.e. you detect which register to assign a value based on the later consumers of that value | 20:28 | |
that will need some work with the register assignment logic, fwiw... | |||
timotimo | yeah, it's a bit more pull than push, isn't it? | ||
brrt | yes | ||
so that complicates manners, but i'm not sure it breaks them | 20:29 | ||
the insert-before relationship also requires that i mange the tile that created a value descriptor | 20:30 | ||
otherwise, i can't easily find the tile that created my descriptor that i'm kicking out of the register | 20:31 | ||
i have half a mind to rename MVMJitValueDescriptor to MVMJitValue | 20:34 | ||
jnthn | Well, if you talk about them a lot then it's a load less verbose... | 20:35 | |
jnthn just went to take trash, and on returning to his apartment noticed it smells a little like walking into an Indian restaurant :) | |||
The air outside has been notably warmer/more humid than I have inside today, so been reluctant to open windows. :S | 20:36 | ||
brrt: I'm wondering a bit if things get simpler if the tiles were projected down to a linear bunch of instructions before doing the allocating? Or is that just restating the "add another pass" approach? | 20:38 | ||
brrt | tiles already are a linear list of instructions :-) | ||
so the pass is already there | 20:39 | ||
jnthn | oh :) | ||
brrt | well, they didn't used to be, they used to be tagged to the tree, but that gave lots of conceptual difficulties | 20:40 | |
jnthn | So in that sense the tiles "don't exist" by this point in some sense, we just have a linear bunch of instructions and a CFG? | ||
brrt | correct | 20:41 | |
i fear i have a terminology screwup again though | |||
when i say 'tile', i mean the thing that stands in for the generated bytecode | 20:42 | ||
i don't have a CFG for it, either, yet | 20:43 | ||
but that is in the plans | |||
the relation between the tree and the 'tile' is pretty weak at that point | 20:45 | ||
(lol @ indian restaurant) | 20:46 | ||
FROGGS | tsss, brits :P | 20:48 | |
jnthn | :P | 20:52 | |
Worth it...it tasted good :) | |||
timotimo | i think i'm maxing out hack with my niced processes ... | 20:54 | |
the first crash thingie is already taking ages to minimize ... | 21:02 | ||
the first stage is removing blocks, which is difficult in .moarvm files i think | 21:05 | ||
[+] Block removal complete, 18 bytes deleted. | 21:06 | ||
FROGGS | wow, 18 bytes :D | 21:07 | |
timotimo | out of half a megabyte | ||
FROGGS | that's like... more than I have fingers *g* | ||
gnight | 21:09 | ||
timotimo | it's super annoying that everything internet related, but especially ssh sessions, are suuuuper laggy right now :( | 21:10 | |
brrt done for tonight | 21:11 | ||
timotimo | now it's doing a second pass ... that'll take a long amount again :\ | 21:30 | |
somehow it's removing more blocks now, though | |||
cool, xz'd it's now only 4.3K instead of the uncompressed 452K | 22:00 |