01:29 cognominal joined
timotimo hm, we currently don't measure MVMString allocations at all in the profiler. i wonder how we could do that properly 01:32
01:48 ilbot3 joined 02:13 lizmat joined 05:29 lizmat joined 06:46 domidumont joined 06:51 domidumont joined 07:54 lizmat joined
jnthn timotimo: Just stick in allocation logging instructions after all the string ops would be a start. 09:48
12:08 brrt joined
brrt ehm, how much worth my while does any of you think it to be for me to refactor spesh allocation out of spesh into its own thing 12:55
jnthn Well, are you going to use it for something else? :) 12:57
brrt the jit more or less wants its own thing, because it has much shorter lifetime than the spesh graph 12:59
to be preciese, 'structures generated by the compiler tend to live much shorter than the spesh graph'
arguably not the biggest worry 13:00
timotimo what i've seen for "extremely short lifetime" is having a circular buffer that gets cleared per-frame in OpenGL-like stuff; in particular for the nintendo 3ds 13:01
brrt doesn't work, because i can't predict how much memory i'll need 13:02
circular buffers are / can be cute, though 13:03
timotimo well, if your current memory needs exceed a single buffer, build a second one
if your lifetime is *really* short, you can use the stack :)
jnthn brrt: Hm, don't we typically discard the spesh graph right after the JIT? 13:21
brrt hmmm
true
jnthn brrt: Or are you saying you only need the memory for one phase of the JIT?
brrt ehm, well, ehm,
let me think about the correct answer
timotimo we're still keeping memory around a bit for logging and such
jnthn If yes, you do get a win in terms of lower memory overhead but you have to be darn careful you don't get pointers in the wrong direction between the lifetime'd regions.
timotimo: Yes, but we JIT after that :)
timotimo whereas we enter jit, use the memory, leave jit and kick it out immediately, i'd say 13:22
jnthn That is, at the logging phase we're still interpreting.
brrt the thing i'm thinking about is the tile list and the value descriptors
jnthn Then we spesh based on the logged stuff, then we JIT.
brrt those are per-basic block
timotimo right
brrt at best
you'll never have a pointer to any of these from the spesh graph 13:23
13:29 zakharyas joined
brrt its probably not important enough; my second consideration was 'not have to walk through three object layers in order to get at the spesh pool' 13:30
timotimo we could just pass the spesh pool along on the stack if we're so worried about performance
or hope that the c compiler and/or cpu caches will make it work fine for us 13:31
brrt i'm not 13:33
i'm worried about convenience
performance, what, me worry?
timotimo :D 13:34
well, we can still have macros
brrt ok,ok, i'll do something useful instead 13:35
... whenever i actually have time
timotimo i was wondering: with the short-string-cache, we could compile things like substr when we know the argument for length is 1 to immediately go through the cache in the jitted code and only hit the C function if we know the cache isn't hit 13:37
but that'd also mean we'll hit the cache twice. though the cache will already be in the cache :P
hm, except, substr has to go through the grapheme iterator
it could work for chr, though 13:39
brrt better answer
we can transform, at spesh or preferably JIT time, the substr, etc. ops into 'low level string operations code'
these we can JIT fast 13:40
timotimo hm, you're suggesting we have some basic operations like "gimme a grapheme iterator", "destroy the grapheme iterator", "advance the iterator", ...
brrt kind of, yes 13:41
timotimo that's not bad
brrt it's a lot of work
timotimo with the expr jit that works much simpler than with current spesh
brrt right 13:42
(i may want to add a LOOP primitive, just to make it more like LISP)
timotimo :D
brrt actually, there are other, better reasons for it, but it is quite odd 13:43
timotimo fair enough, yeah
brrt the better reasons are that we'd like to have explicit which variables are updated in the loop, so that we can take them into account
timotimo oh? 13:44
is that for certain optimization techniques?
or just better compilation or something?
brrt in this case, to keep the tree structure meaningful
e.g. suppose we have a loop that updates two variables
in lisp, that would be something like 13:45
(loop ((x 1 (+ x 1)) (y 10 (- y 1)) (z 0 (+ z (* x y)))) (> x y)) 13:46
something like that.. actually, there is supposed to be a body in there
point is, the loop terminates when x is greater than y 13:47
which node, in this case, defines x on output
actually, i'm not sure that makes a lot of sense 13:48
i'd think a LOOP would be void-valued in most cases
better point: expression language doesn't support direct assignment to either x or y 13:49
timotimo i'm not sure i follow
brrt no, i'm not sure i do either 13:50
:-)
timotimo you don't mean how the ((x 1 ...)) part declares a starting value for x?
brrt my point is basically this: the expresson 'tree' forms a DAG, right?
timotimo right, you had that point in the past
and how things have to get replicated so that you can, for example, refer to the same value twice 13:51
brrt uhuh
well, suppose i have a loop that calls do_foo (x,y) repeatedly
i'll need to have something that refers to the notion of x and y inside the loop 13:52
timotimo right
brrt because they are not just their original values anymore
the context of this was 'it would be cool if the expression jit could deal with multiple-basic block regions, preferably hot loops'
low level string ops would fall into that category 13:53
timotimo mhm
brrt the answer in this case was 'we should have a looping structure to keep track of the changes'
timotimo afl has certainly found a crapton of crashes, claiming a whole bunch of them are unique 14:01
but i suppose many still fall into the same category either way 14:02
m: say ^593 .pick(3) 14:03
camelia rakudo-moar 9b579d: OUTPUTĀ«(273 436 151)ā¤Ā»
timotimo those are the crashes from the S1 crashes folder :) 14:04
a bunch of invalid writes of size 1 inside MVM_bytecode_finish_frame 14:08
also, a calloc of size 0 14:09
this next one isn't as interesting, i bet. invalit write of size 8 coming from arg_s, so it's probably just a far-outta-range argument to arg_s 14:13
jnthn Hm, though the validator should really catch those
14:14 lizmat joined
timotimo yeah, i don't think it has code for that yet, though 14:15
and just looking at it from afar tells me it's annoying to write code in it
jnthn urgh, hottest day this year 14:17
timotimo hopefully "ever", not just "so far" :) 14:18
jnthn Well, tomorrow - if the forecasts hold up - it'll break.
(Thunderstorms forecast)
Wouldn't surprise me if it hits this temperature again later on in the year.
timotimo probably :| 14:19
jnthn wonders if it's time to get the fan
timotimo just yesterday i got a link to a good AC unit and i was surprised how relatively speaking it was kind of cheap
but i guess it's also expensive in the "eats all your electrons" way 14:20
or rather: brakes the wobbling of all the electrons between you and the nearest power plant 14:22
jnthn They all need some way to stick a pipe to outside, though, I think?
timotimo yeah 14:23
the same page that was on also offers a ... well, it looks like a huge sock you put over the window and the hose goes through that
i have no idea how it's supposed to work :) 14:27
glue it onto the window frame, perhaps
jnthn The fan is deployed :) 14:28
Hmm...interesting memory corruption is interesting... 14:37
Smells like GC 14:42
timotimo "EU now has 1 GB of free space"
14:47 brrt joined
brrt i should have studied so much more math when i had the chance... 14:47
jnthn Duh, found it 14:50
dalek arVM/new-multi-cache: 621fe27 | jnthn++ | src/6model/reprs/MVMMultiCache.c:
Add missing MVM_ASSIGN_REF.

Fixes missing write barriers when assigning into a multi-cache, which caused various crashes.
14:52
brrt jnthn++ 14:53
jnthn Hopefully spectest comes out a bit better now :)
Yup :) 14:57
nwc10 ASAN's verdict might still be "a dead mouse on your carpet" 14:59
(it's running)
jnthn Maybe, though if it is then the issue is well hidden 15:01
The spectest run had both FSA and GC debugging on
OK, so...we have a new multi-dispatch cache :)
Which knows about nameds, in theory :) 15:02
timotimo does it already make that benchmark liz recently showed a bunch of times faster?
jnthn No
Because Rakudo's multi-dispatch code doesn't yet take advantage of it.
It assumes it can't install things with named args into the cache. 15:03
lizmat jnthn: where would that need to be fix? in src/Perl6 ?
timotimo ah, ok
might be in the BOOTSTRAP or near that
jnthn lizmat: Yeah, src/Perl6/Metamodel/BOOTSTRAP.nqp or so
lizmat: I need to look at the code there carefully 'cus it's been a little while
lizmat stuff like bind_ons_param and so ? 15:04
*one
jnthn No, that's sig binding
Closer to find_best_dispatchee
lizmat ah, ok
jnthn lizmat: I *think* it'll need to tease apart "needs a bind check just to validate nameds" from "needs a bind check because of unpacks or constraints"
Which I believe are conflated at the moment 15:05
timotimo oh, could find_best_dispatchee run super crazy often in many of my tests just because nameds are involved?
lizmat timotimo: yes
timotimo i should have known :)
jnthn Quite possibly. I mean, @a[$foo]:exists is the classic example
And what triggered me to do something about it.
timotimo i have the feeling nobody bothered to tell me about that :P
jnthn Well, it's about to change, so... :P 15:06
timotimo yay :)
jnthn The new cache is kinda interesting.
It's structured as a tree rather than an array of array of type tuples
So it should be a bit lighter on memory, and a bit faster to search in 15:07
timotimo very cool
jnthn It hashes on the memory address of the interned callsite
To find the tree top 15:08
It also doesn't have a size limit.
timotimo ooooh 15:09
jnthn Which is a trade-off :)
timotimo right, we still don't have a clue how to clear those out
jnthn Well, one idea is just to say "if it gets huge, throw it away"
timotimo right
only the ones that show up often will get back in, then 15:10
jnthn And it'll be reconstructed with stuff the program is currently interested in.
It suffers a bit on megamorphs.
timotimo yeah
jnthn Though no worse than what it replaces.
timotimo surely
jnthn "in theory"
:)
In practice, not sure :)
timotimo we don't have diagnostics in place yet
jnthn Well, I added two 15:11
timotimo Oh, ok!
jnthn Though they're #define'd things
timotimo that's fine in my opinion
jnthn One is for dumping the cache on each add to see what on earth is in there.
Well, more like, to see the tree structure :)
timotimo oh, that's probably a bit noisy
jnthn The other just points out whenever the cache size hits a power of 2 size
Starting at 32. So 32 entries, 64 entries, 128 entries, etc. 15:12
timotimo so, how does it look on the global scale? we have a single tree that holds everything?
jnthn No, it's still one cache per proto 15:13
So they are GC-able in that sense.
timotimo ah, good 15:14
jnthn The data structure is documetned in the MVMMultiCache.h :)
timotimo ah, this is an interesting design, like a tree of ops 15:18
and the way it gets safepoint-freed 15:19
nwc10 jnthn: ASAN tolerates your code. No mouse for you! :-) 15:27
jnthn Phew!
Don't want to be full for the karahi chicken I'll hopefully find the energy to make in this heat :) 15:28
nwc10 anyway, "lack of mouse" is pretty cool
(er, sorry, not literally)
jnthn++
I think you've nailed it. At least, at the MoarVM level
jnthn :) 15:29
Seems we get a *very* tiny improvement on the %h<a>:exists case simply out of the new cache saying "no" faster 15:30
And so faster failing over
Hm, turns out the simplest possible Rakudo patch isn't quite enough. 15:33
I want to rest and make dinner, but gist.github.com/jnthn/b1b1a569c930...2b17a29edc is what I tried if anyone fancies figuring out why that doesn't work out 15:35
(The reason could be nearly anywhere)
afk for now
16:18 domidumont joined 16:41 lizmat joined 17:31 brrt joined
brrt yes, the tree structure is cool 17:31
hurray for trees-in-arrays 17:32
17:37 harrow joined 17:41 brrt joined
brrt stupid hot weather though 17:47
timotimo yeah. with a bit of moisture and absolutely still-standing air .. it's not nice 17:55
18:42 brrt joined 19:09 FROGGS joined, lizmat joined 19:23 zakharyas joined
brrt damnit, damnit to hell 20:11
grrrr
ok, shall i tell you a story that will amuse you
or not 20:12
during register allocation, i need to insert loads and stores to ensure that values are in their correct place
in order to load a value, i might need to spill a value 20:13
whenever i spill a value, i spill it right after it is constructed
i do that by noting the tile number that created it and inserting a 'spill' tile just after that 20:14
however, it is just about possible that the load that overwrites it has to happen just before the next tile 20:15
thus, we have tile i, {spill, load}, tile i + 1
since spill and load are not relatively ordered, it is just possible that i first load and then spill 20:16
meaning i overwrite the value 20:17
breakage follows 20:18
FROGGS uhh 20:19
jnthn ouch
timotimo ah, whoops
brrt this is especially annoying, because i just simplified the tile editor by not requiring such relative orders 20:20
so i'm puzzling how to solve this cheaply and yet robustly 20:21
the power-solution would be to allow tile inserts to specify insert-after or insert-before relations to specific tiles, not to order numbers
but that makes the insertion code really complex
because it requires topological sort (i think) 20:22
well, not really, really complex, but more complex than i want to have to debug
jnthn Of course, you can solve every compiler problem with adding another phase. Like, "spit out virtual registers you can have as many as you like of"..."allocate them" :P 20:23
timotimo everything ought to be topologically sorted anyway %)
brrt the not-so-robust solution, but one which might be workable, is to give the register allocator an insert-counter
that is true, except for too slow compilation
case in point: scala
jnthn Right :)
Yeah, every problem except too many passes. 20:24
brrt and the theory would then be, because i'm deciding to insert the load based on the fact that its value has already been spilled, the spill must have been inserted before the load 20:25
('its' means: the value in the register) 20:26
thus, sorting by the insert-sequence number secondary to sorting to the tile order number would give us the necessary relative ordering
but it feels rather breakable to me
jnthn Yeah, it has a slightly fragile feel to me also
brrt its made more complicated because a pre-coloring pass (which is in the works) assigns registers in backwards order 20:27
timotimo i'm minimizing a bunch of crashing test cases now, so that i can download it (and give it to y'all)
brrt i.e. you detect which register to assign a value based on the later consumers of that value 20:28
that will need some work with the register assignment logic, fwiw...
timotimo yeah, it's a bit more pull than push, isn't it?
brrt yes
so that complicates manners, but i'm not sure it breaks them 20:29
the insert-before relationship also requires that i mange the tile that created a value descriptor 20:30
otherwise, i can't easily find the tile that created my descriptor that i'm kicking out of the register 20:31
i have half a mind to rename MVMJitValueDescriptor to MVMJitValue 20:34
jnthn Well, if you talk about them a lot then it's a load less verbose... 20:35
jnthn just went to take trash, and on returning to his apartment noticed it smells a little like walking into an Indian restaurant :)
The air outside has been notably warmer/more humid than I have inside today, so been reluctant to open windows. :S 20:36
brrt: I'm wondering a bit if things get simpler if the tiles were projected down to a linear bunch of instructions before doing the allocating? Or is that just restating the "add another pass" approach? 20:38
brrt tiles already are a linear list of instructions :-)
so the pass is already there 20:39
jnthn oh :)
brrt well, they didn't used to be, they used to be tagged to the tree, but that gave lots of conceptual difficulties 20:40
jnthn So in that sense the tiles "don't exist" by this point in some sense, we just have a linear bunch of instructions and a CFG?
brrt correct 20:41
i fear i have a terminology screwup again though
when i say 'tile', i mean the thing that stands in for the generated bytecode 20:42
i don't have a CFG for it, either, yet 20:43
but that is in the plans
the relation between the tree and the 'tile' is pretty weak at that point 20:45
(lol @ indian restaurant) 20:46
FROGGS tsss, brits :P 20:48
jnthn :P 20:52
Worth it...it tasted good :)
timotimo i think i'm maxing out hack with my niced processes ... 20:54
the first crash thingie is already taking ages to minimize ... 21:02
the first stage is removing blocks, which is difficult in .moarvm files i think 21:05
[+] Block removal complete, 18 bytes deleted. 21:06
FROGGS wow, 18 bytes :D 21:07
timotimo out of half a megabyte
FROGGS that's like... more than I have fingers *g*
gnight 21:09
timotimo it's super annoying that everything internet related, but especially ssh sessions, are suuuuper laggy right now :( 21:10
brrt done for tonight 21:11
timotimo now it's doing a second pass ... that'll take a long amount again :\ 21:30
somehow it's removing more blocks now, though
cool, xz'd it's now only 4.3K instead of the uncompressed 452K 22:00