github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
jnthn oh, also 00:00
I guess it's feasible that the set we produce for box/unbox elimination might be worth calling set elimination on too
timotimo that's true
gnite 00:17
jnthn 'night 00:20
01:21 Kaypie joined, Kaiepi left 02:42 lizmat left 04:42 brrt joined
Geth MoarVM/jit-expr-refactor: 7 commits pushed by (Bart Wiegmans)++ 05:05
05:49 robertle left 07:10 zakharyas joined 07:31 brrt left 07:38 domidumont joined 07:43 domidumont left 07:45 domidumont joined 08:04 robertle joined 08:17 lizmat joined 08:54 brrt joined
Geth MoarVM/jit-expr-refactor: 27d7d533c7 | (Bart Wiegmans)++ | src/jit/expr.c
[JIT] Assign a size to CONST_LARGE and CONST_PTR

Otherwise if we use this in operators that require size information (e.g. EQ) we'll try to insert a CAST, which is nonsense.
This surfaced after rebasing on the new expression JIT templates, so, go team.
09:09
MoarVM/jit-expr-refactor: 708a36de72 | (Bart Wiegmans)++ | 2 files
[JIT] Mark tree nodes in template precompiler

Tree nodes (n) should be distinguished from large constant references (c), input operands (i), within-template links (l) and constant symbols or numbers (.). So that we can handle the nodes specially in the template application logic.
09:27 zakharyas left
jnthn o/ 09:27
nwc10 \o 09:28
lizmat o/ 09:32
brrt \o
Geth MoarVM: vendethiel++ created pull request #903:
fix minor typos
09:33
MoarVM/du-chains-and-opts: 2356999dbb | ven++ (committed using GitHub Web editor) | src/spesh/optimize.c
fix minor typos
09:34
MoarVM/du-chains-and-opts: 78f8402780 | (Jonathan Worthington)++ (committed using GitHub Web editor) | src/spesh/optimize.c
Merge pull request #903 from vendethiel/patch-1

fix minor typos
09:41 domidumont left
Geth MoarVM/du-chains-and-opts: 5f8fbec65a | (Jonathan Worthington)++ | src/spesh/usages.h
Turn off DU chain checks by default

Now that things seem to be working nicely, we'll disable it, so once this branch is merged users won't pay the price of the sanity checking slowing the optimizer down.
09:48
jnthn I think that branch can merge now 09:49
lizmat whee! :-)
brrt \o/
jnthn The DU chains themselves have been passing their sanity checks for a while. 09:50
There's many more things to do, but I don't think I need to do them in a branch. :)
lizmat
.oO( immediate dogfooding )
jnthn Stage parse : 48.649 09:51
I'm sure that's another second lower than it was
nwc10 hopes that dogfood isn't the lunch menu 09:52
Geth MoarVM/master: 54 commits pushed by (Jonathan Worthington)++, (Timo Paulssen)++, (Elizabeth Mattijsen)++, ven++
review: github.com/MoarVM/MoarVM/compare/1...6aece9b48d
10:01
jnthn Well, that took us some way past the 10,000 commit mark :) 10:02
samcv woo do we have a 10,000 commit party? 10:15
10:35 stmuk_ joined 10:36 stmuk left
jnthn :) 10:42
lizmat jnthn: are you going to bump ?
jnthn Next little mystery: why doesn't an invoke_v ever inline or optimize...
lizmat jnthn: or is it too soon ?
jnthn Well, I've commits probably coming soon in both MoarVM and NQP that do some further improvements :)
lizmat "probably coming soon" as in today ? 10:43
:-)
jnthn Yes :)
lizmat cool!
lizmat can wait
but only barely :-)
jnthn grmbl 10:55
So the various non-object-return invoke ops weren't marked :logged, which in turn meant they couldn't find any logged facts about the type of the invocant 10:56
We'll emit some more invoke_v after an improvement I've done in the QAST compiler 10:57
Adding :logged helps very nicely.
In that the calls then get inlined etc.
Alas, it breaks the NQP build
And the frame it claims breaks it doesn't even use any of the ops I stuck the annotations on 11:08
Grr, still not seeing what's up 11:31
lunch &
samcv jnthn: not sure if this is safe, but doing this cut time from 5 seconds to 4.5 secs on gist.github.com/samcv/9a65a586ea21...e-time-nqp 11:37
this is what i did gist.github.com/samcv/1955ad025b18...b3e1b3e00b
assuming worklist->include_gen2 never changes during the worklist being used 11:38
normally the outer if is: *item_to_add && (worklist->include_gen2 || !((*item_to_add)->flags & MVM_CF_SECOND_GEN)) 11:40
and i split it into one that assumes true so it has one less if closure, and the other that assumes true and just checks one less variable
Geth MoarVM/jit-expr-refactor: ca667f53b6 | (Bart Wiegmans)++ | 5 files
[JIT] Remove cast information from expr ops table

We only need this in one place, so might as well define it there.
11:47
jnthn samcv: It doesn't change during the lifetime of the worklist 12:23
samcv ok
jnthn So such an opt is safe
Similar could be done for VMArray too 12:24
Those macros also should go into worklist.h, but I figure you just put them there for testing. Maybe could be written as static inline too 12:25
samcv yeah
jnthn But yeah, that's a nice optimization opportunity for aggregates
samcv++
nwc10 I forget if I'm asking a questiong that I've asked before - do we have anything like speed.pypy.org/ that touts how things are improving? 12:26
lizmat there's the bench repo ? 12:27
nwc10 although OMG that does have too little detail on the front page, and information overload immediately behind
I was more meaning pretty graphs that massage things to look impressive :-)
samcv it looks like i don't get too much more speedup past that point.. probably the rest taken up by hash iteration 12:30
gc is at 60% at the point i'm at now. which is still totally terrible but yeah
process_worklist is at 20% so it's not like MVMHash_gc_mark is quite as crazy as it was when i first started optimizing 12:31
(while MVM_hash_gc_mark is at 35% currently)
jnthn oh, heck, I found it 12:37
wow
There's an osrpoint right before a prepargs
We then stack up an sp_resolvecode and guard ahead of the prepargs 12:38
But the osrpoint location ends up still after those new instructions
So when we OSR, we end up without stuff we need 12:39
Nothing to do with invoke_v at all, it is just the first case that triggered this situation
nwc10 the "situation" being that the OSR is stored as "before this instruction", which spesh treats as "before this exact opcode" but really should be "before the opcodes that come to represent this logical unit of work" ? 12:40
jnthn Yes 12:41
nwc10 good. I did understand it well enough
does that mean that the fix is relatively simple? Or are there cases where "before this exact opcode" is correct, and others where "before this logical unit of work" are correct, so there needs to be more metadata stored?
jnthn I think it's as simple as checking for an OSR deopt annotation upon instruction insertion and moving it 12:42
nwc10 if so good, that was what I meant by the first case (but didn't know the detail of the "how")
the "relatively simple" :-)
jnthn Otherwise it'd imply we were inserting instructions with visible side-effects 12:43
Well, so long as you don't count deopt as a visible side-effect, which it is if you profile, but... :)
12:43 domidumont joined
jnthn That helps. There's now something ungood during CORE.setting build, but that might be my NQP change 12:57
hm, alas no 12:59
nwc10 do you have sufficient second machine to test the "simple" OSR fix alone on MoarVM master (through NQP to spectest?) 13:01
lizmat jnthn: any specifics
?
nwc10 because if that is a stand-along bug fix, it's useful to have that done
(and would also confirm that the fix isn't exposing some other bug that CORE.setting was relying on)
jnthn Yeah, will try that alone also 13:02
But the error is "Void return not allowed to context requiring a return value"
And only happens under inlining 13:03
lizmat that doesn't ring any bell for me :-(
jnthn so seems very related to the allowing opt of invoke_v
Well, more opt of...
I wonder if uninlining isn't handling the situation well... 13:04
Hm, not obviously 13:06
lol 13:10
The inliner didn't delete the void return instruction
So it tried to return from the enclosing thing 13:11
timotimo that's funny :D 13:18
jnthn Easily fixed too
timotimo for sure
Geth MoarVM: 1371879313 | (Jonathan Worthington)++ | src/spesh/inline.c
Delete void return instruction when inlining
13:24
MoarVM: 7e625b031a | (Jonathan Worthington)++ | src/spesh/manipulate.c
Move OSR point backwards on instruction insertion

So that when we stack up resolve/guard instructions ahead of an instruction that is an OSR entry point, then we will enter on OSR prior to the inserted instructions.
MoarVM: 1b1edfe5ff | (Jonathan Worthington)++ | 2 files
Mark invoke_v as :logged

So that we are able to resolve statistics about it and optimize the invocation.
13:25
13:44 brrt left
jnthn *sigh* Still so many tricky problems 13:44
I've been looking at `my $foo = "blah"; for ^10_000_000 { my int $i = $foo.chars }` as my example 13:48
Now it does successfully turn the decont_i into a set of the original unboxed value
However, there's two things that keep the boxed value alive 13:49
One is that it's guarded. We can prove the guard isn't needed, but don't yet do away with it.
The second is that the value was read the other side of the guard, thus meaning the other side of a deopt point 13:50
Which is now gone, but still, we don't know that it isn't used across other deopt points 13:51
So we'll need a better way to model deopt usages
timotimo so, i'm wondering: since the OSR we got from the int @a = ^10_000 was so subpar, under what circumstances does the same optimized code get used in the future? 13:59
i.e. if a program optimizes that STORE method via osr first, then uses it a bunch of times, will those times use the result of the osr? 14:00
jnthn I guess it depends on if the arg types match
timotimo is it intended that we're not using the arguments in that specific case for facts on the param ops? 14:01
jnthn Dunno, could just be something not yet done
timotimo i'm not terribly eager to look into osr code :D 14:02
it really just made a certain specialization, so that'd be why the args aren't used 14:05
the spesh log doesn't give a type tuple for the stats that get OSR hits; in fact, for that frame there are no type tuples at all 14:06
i must be misunderstanding something 14:14
14:44 brrt joined
samcv uthash is not too terrible, but it's also really weird. how we store the pointer to the hash table in every single hash item 14:50
not sure why the developers though that was a great idea
and complicates deallocating the hash since you have to make sure you don't delete whichever item you use as the "head" of the hash. though that's arbitrary. and then deleting that special element becomes more expensive
i tried to refactor it to remove this, but after many hours i figured out it would probably be faster to add support for another hash library so i stopped in sadness 14:52
jnthn I'm not sure we should use any library by this point, just replace it with our own that's desinged according to the properties we want 14:53
samcv yeah. i mean i may adapt something to our uses 14:54
jnthn I still think we want a single blob of memory
samcv well i *will* do that. but if i can help it i won't write my own
yep
jnthn And we only ever allocate a new one upon growth
samcv yep
jnthn If we then only ever read the address of that block once before doing any operations, and allocate it using the FSA + safepoint mechanism, then we fix the explosions under concurrent mis-use, I'd expect. 14:55
(That doesn't mean it'll be well behaved under concurrent mis-use, just not explosive. That's enough.)
samcv yeah
but hands down the worst design decision of uthash is having one master hash handle, and relying in everything thta all handles contain references to the table 14:56
and to fix it i'd have to change uthash entirely (every single function/macro almost). plus do the work i'd have to do anyway changing moarvm code 14:57
brrt i think the 'table' bit of a hash table is relatively easy, it is the 'hash' bit that i've found hard :-) 14:58
samcv yeah 14:59
and also how it assigns values. like it uses the offset of the struct to return elements. and also having it being full of macros
ok fine. that's just as insane a design decision. but how they made it necesitates macros... 15:00
jnthn Well, I think the idea behind it was you that you can use any C structure has your hash 15:01
Uh, hash element
Which we don't really need 15:02
samcv yeah. that would be my first worst part about it. only reason it isn't is because that has a reason it could be a benefit for some people 15:04
heh 15:05
15:07 robertle left 15:09 brrt left
[Coke] MoarVM panic: Internal error: zeroed target thread ID in work pass 15:47
The spawned command 'perl6' exited unsuccessfully (exit code: 17) 15:48
(this on a branch of docs. Will update everything and see if I can repro it.)
16:17 domidumont left
jnthn Finally think I've figured out a workable, more accurate, deopt tracking algorithm 17:10
Also for gurads, I think we need to at least model such that they introduce a new SSA version 17:16
So then we can distinguish the facts before the guard and the facts after the guard
timotimo that'd be nice
jnthn The the deopt, I found a few algorithms that totally won't work 17:17
But I think I've hit on one that maybe will 17:18
[Coke] (the rakudo update made it vanish, whee) 17:19
jnthn We keep track of writers with outstanding reads. When we hit a deopt point, we record that deopt point against all of the writers with outstanding reads
In the event of a loop, we encounter a read before a write. In the immediate we could disregard it, and then that'd mean the write followed by a deopt point would record that deopt point on the write. 17:21
Trouble is that the write then lives on forever and potentially gets other not relevant deopt points makekd on it. Hm. 17:25
*marked
Ah, maybe dropping them after we saw all the other reads will cut it 17:28
Hm, not, it won't, it'll be premature 17:29
Oh, I think we can do it by saying if we saw all the predecessors of the basic block that had the missing read, then we're good. 17:31
jnthn goes for some rest :) 17:33
17:49 domidumont joined 17:50 lizmat left 18:43 undersightable6 joined
samcv removing the previous pointer reduces mem usage by 8.7MB. (max usage 242MB originally) 18:51
from the hash script i've been testing 18:52
at least if /usr/bin/time is to be believed 18:56
that seems to line up closish with 1 million * 64 bits + a bit extra 18:58
jnthn Nice :) 19:00
19:07 domidumont left 19:15 Deknos joined 19:20 dogbert17 joined 19:33 robertle joined
dogbert17 hmm, I still have code which is considerably slower now than with 2018.06 19:46
timotimo what's it look like? 19:47
dogbert17 one observation is that the function taking the most time, according to perf, is now MVM_spesh_arg_guard_run instead of MVM_interp_run which was the costliest in 2018.06
timotimo can you get a spesh log and grep out all "waiting for" lines? 19:48
dogbert17 timotimo: want a gist
ok
19:48 Deknos left
timotimo just a shot in the dark 19:48
dogbert17 only ~10 lines like 'Was waiting 61216us for logs on the log queue.' 19:50
timotimo OK, that's good 19:51
dogbert17 this function, in fourth place, is new, compared to 2018.06 resolve_using_guards (5.82%) 19:52
in the perf report that is
.seen AlexDaniel 19:56
yoleaux I saw AlexDaniel 17:52Z in #perl6: <AlexDaniel> I was about to click 👍 too :)
19:56 stmuk joined
dogbert17 don't we have a bot which might be able to figure out when/if a perf regression occured? 19:57
19:59 stmuk_ left 20:05 brrt joined
dogbert17 timotimo: want to have a look at the code? It might feel slightly familiar :) 20:05
timotimo sure
dogbert17 gist.github.com/dogbert17/f663fabb...3182b78c46
on my system 2018.06 takes ~7.5 sec while HEAD(Camelia takes ~10 sec 20:06
timotimo you're not comparing camelia vs your machine, are you? 20:07
benchable ought to be able to figure out where it changed 20:08
benchable6: gist.githubusercontent.com/dogbert...tfile1.txt
benchable6 timotimo, I cannot recognize this command. See wiki for some examples: github.com/perl6/whateverable/wiki/Benchable
timotimo benchable6: releases gist.githubusercontent.com/dogbert...tfile1.txt
benchable6 timotimo, Successfully fetched the code from the provided URL
timotimo, starting to benchmark the 31 given commits
20:08 lizmat joined
dogbert17 timotimo: no, I'n not comparing against Camelia rather using the same version at home 20:09
timotimo ok 20:10
"releases" wasn't the best idea, i guess
that'll take what 5 minutes?
dogbert17 it shouldn't take too long ...
timotimo if it's 10 seconds, that'd be 310 seconds 20:11
dogbert17 hopefully it's woth the wait :)
*worth
but if it only tests releases I guess we won't learn much 20:12
timotimo also true
it'll "zoom in on performance differences", too
dogbert17 aha
benchable6 timotimo, «hit the total time limit of 240 seconds»
timotimo dangit
dogbert17 grr 20:13
timotimo benchable6: 2018.06,HEAD gist.githubusercontent.com/dogbert...tfile1.txt
benchable6 timotimo, Successfully fetched the code from the provided URL
timotimo, starting to benchmark the 2 given commits
dogbert17 clever
quite a bit of stuff has become faster recently but not everything 20:14
timotimo bbiab for dinner prep
benchable6 timotimo, ¦2018.06: «4.9331» ¦HEAD: «9.1578»
timotimo, benchmarked the given commits and found a performance difference > 10%, now trying to bisect
dogbert17 ooh, it seems to be working, amazing
20:16 brrt left
benchable6 timotimo, «hit the total time limit of 240 seconds» 20:17
nwc10 sigh, where's the `sudo benchmarkable` option? :-/ 20:18
MasterDuke it runs the code 5 times per commit, so anything longer than ~2-3s will likely timeout
though i should probably increase the timeout for benchable6 20:19
dogbert17 MasterDuke: that would be awesome, although I could change the script so it runs faster 20:22
benchable6: 2018.06,HEAD gist.githubusercontent.com/dogbert...tfile1.txt 20:25
benchable6 dogbert17, Successfully fetched the code from the provided URL
dogbert17, starting to benchmark the 2 given commits
nwc10 to be clear, I wasn't aware how sophisticated but easy benchable6 was. (Or at least, is supposed to be, if you don't give it too slow a code example)
benchable6 dogbert17, ¦2018.06: «4.9101» ¦HEAD: «9.1712» 20:26
dogbert17, benchmarked the given commits and found a performance difference > 10%, now trying to bisect
dogbert17 heh, those are the same times as before?
MasterDuke the numbers in the loops look the same 20:27
dogbert17 I edited the script, perhaps the url changed
yes it did, sigh 20:28
let's play the waiting game then
benchable6 dogbert17, «hit the total time limit of 240 seconds» 20:29
MasterDuke i just upped it to 300, i'll restart the bot
dogbert17 cool 20:30
20:30 benchable6 left
MasterDuke rm, it restarted, but i don't see it rejoined... 20:32
*hm
20:32 benchable6 joined
MasterDuke there we go 20:32
dogbert17 benchable6: 2018.06,HEAD gist.githubusercontent.com/dogbert...tfile1.txt
benchable6 dogbert17, Successfully fetched the code from the provided URL
dogbert17, starting to benchmark the 2 given commits
dogbert17, ¦2018.06: «2.1404» ¦HEAD: «3.7665» 20:33
dogbert17, benchmarked the given commits and found a performance difference > 10%, now trying to bisect
dogbert17, gist.github.com/4cdc381083c9db8f4e...80f23001b8 20:36
MasterDuke github.com/rakudo/rakudo/commit/d3...c5d8165116 20:38
dogbert17 ok, so it's this one: github.com/rakudo/rakudo/commit/d3...c5d8165116
MasterDuke really should make those links... 20:39
dogbert17 might be ok then since it warns that there may be initial slowdown followed by cool opts at a later date
benchable6++ 20:40
20:59 japhb_ joined 21:01 japhb left 21:16 stmuk_ joined 21:18 stmuk left
timotimo the profiler is unhappy with this code :( 21:22
i think i see very deep recursion?
MasterDuke any thoughts/comments on github.com/MoarVM/MoarVM/pull/899 ? 22:06
dogbert17 timotimo: yes, that's why I thought you might remember seeing it before. It has been discussed earlier since it messed things up for the profiler. 22:25
timotimo mhm 22:26
dogbert17 perhaps not so easy to remember, you've probably looked at hundreds of perl6 snippets since :)
I believe you were a bit sceptical towards the junction that's used in the script 22:27
22:55 stmuk joined 22:57 stmuk_ left 23:22 lizmat left 23:25 lizmat joined
Geth MoarVM/new-deopt-point-algo: 7e1ab4b2bd | (Jonathan Worthington)++ | 5 files
Sketch out more precise deopt algorithm

As described in the commit body. It seems to do about the right thing, except in the case of loops, where it makes a gross overestimate of what needs to be preserved due to that part not yet being implemented. Includes dumping of what deopt indices are keeping instructions alive.
23:25
jnthn MasterDuke: Just glanced it, seems OK in principle 23:27
Though I'm a bit fried from trying to figure out a more precise deopt algo :)
sleep o/ 23:51
timotimo gnite jnthn 23:55