github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
jnthn | oh, also | 00:00 | |
I guess it's feasible that the set we produce for box/unbox elimination might be worth calling set elimination on too | |||
timotimo | that's true | ||
gnite | 00:17 | ||
jnthn | 'night | 00:20 | |
01:21
Kaypie joined,
Kaiepi left
02:42
lizmat left
04:42
brrt joined
|
|||
Geth | MoarVM/jit-expr-refactor: 7 commits pushed by (Bart Wiegmans)++ | 05:05 | |
05:49
robertle left
07:10
zakharyas joined
07:31
brrt left
07:38
domidumont joined
07:43
domidumont left
07:45
domidumont joined
08:04
robertle joined
08:17
lizmat joined
08:54
brrt joined
|
|||
Geth | MoarVM/jit-expr-refactor: 27d7d533c7 | (Bart Wiegmans)++ | src/jit/expr.c [JIT] Assign a size to CONST_LARGE and CONST_PTR Otherwise if we use this in operators that require size information (e.g. EQ) we'll try to insert a CAST, which is nonsense. This surfaced after rebasing on the new expression JIT templates, so, go team. |
09:09 | |
MoarVM/jit-expr-refactor: 708a36de72 | (Bart Wiegmans)++ | 2 files [JIT] Mark tree nodes in template precompiler Tree nodes (n) should be distinguished from large constant references (c), input operands (i), within-template links (l) and constant symbols or numbers (.). So that we can handle the nodes specially in the template application logic. |
|||
09:27
zakharyas left
|
|||
jnthn | o/ | 09:27 | |
nwc10 | \o | 09:28 | |
lizmat | o/ | 09:32 | |
brrt | \o | ||
Geth | MoarVM: vendethiel++ created pull request #903: fix minor typos |
09:33 | |
MoarVM/du-chains-and-opts: 2356999dbb | ven++ (committed using GitHub Web editor) | src/spesh/optimize.c fix minor typos |
09:34 | ||
MoarVM/du-chains-and-opts: 78f8402780 | (Jonathan Worthington)++ (committed using GitHub Web editor) | src/spesh/optimize.c Merge pull request #903 from vendethiel/patch-1 fix minor typos |
|||
09:41
domidumont left
|
|||
Geth | MoarVM/du-chains-and-opts: 5f8fbec65a | (Jonathan Worthington)++ | src/spesh/usages.h Turn off DU chain checks by default Now that things seem to be working nicely, we'll disable it, so once this branch is merged users won't pay the price of the sanity checking slowing the optimizer down. |
09:48 | |
jnthn | I think that branch can merge now | 09:49 | |
lizmat | whee! :-) | ||
brrt | \o/ | ||
jnthn | The DU chains themselves have been passing their sanity checks for a while. | 09:50 | |
There's many more things to do, but I don't think I need to do them in a branch. :) | |||
lizmat | .oO( immediate dogfooding ) |
||
jnthn | Stage parse : 48.649 | 09:51 | |
I'm sure that's another second lower than it was | |||
nwc10 hopes that dogfood isn't the lunch menu | 09:52 | ||
Geth | MoarVM/master: 54 commits pushed by (Jonathan Worthington)++, (Timo Paulssen)++, (Elizabeth Mattijsen)++, ven++ review: github.com/MoarVM/MoarVM/compare/1...6aece9b48d |
10:01 | |
jnthn | Well, that took us some way past the 10,000 commit mark :) | 10:02 | |
samcv | woo do we have a 10,000 commit party? | 10:15 | |
10:35
stmuk_ joined
10:36
stmuk left
|
|||
jnthn | :) | 10:42 | |
lizmat | jnthn: are you going to bump ? | ||
jnthn | Next little mystery: why doesn't an invoke_v ever inline or optimize... | ||
lizmat | jnthn: or is it too soon ? | ||
jnthn | Well, I've commits probably coming soon in both MoarVM and NQP that do some further improvements :) | ||
lizmat | "probably coming soon" as in today ? | 10:43 | |
:-) | |||
jnthn | Yes :) | ||
lizmat | cool! | ||
lizmat can wait | |||
but only barely :-) | |||
jnthn | grmbl | 10:55 | |
So the various non-object-return invoke ops weren't marked :logged, which in turn meant they couldn't find any logged facts about the type of the invocant | 10:56 | ||
We'll emit some more invoke_v after an improvement I've done in the QAST compiler | 10:57 | ||
Adding :logged helps very nicely. | |||
In that the calls then get inlined etc. | |||
Alas, it breaks the NQP build | |||
And the frame it claims breaks it doesn't even use any of the ops I stuck the annotations on | 11:08 | ||
Grr, still not seeing what's up | 11:31 | ||
lunch & | |||
samcv | jnthn: not sure if this is safe, but doing this cut time from 5 seconds to 4.5 secs on gist.github.com/samcv/9a65a586ea21...e-time-nqp | 11:37 | |
this is what i did gist.github.com/samcv/1955ad025b18...b3e1b3e00b | |||
assuming worklist->include_gen2 never changes during the worklist being used | 11:38 | ||
normally the outer if is: *item_to_add && (worklist->include_gen2 || !((*item_to_add)->flags & MVM_CF_SECOND_GEN)) | 11:40 | ||
and i split it into one that assumes true so it has one less if closure, and the other that assumes true and just checks one less variable | |||
Geth | MoarVM/jit-expr-refactor: ca667f53b6 | (Bart Wiegmans)++ | 5 files [JIT] Remove cast information from expr ops table We only need this in one place, so might as well define it there. |
11:47 | |
jnthn | samcv: It doesn't change during the lifetime of the worklist | 12:23 | |
samcv | ok | ||
jnthn | So such an opt is safe | ||
Similar could be done for VMArray too | 12:24 | ||
Those macros also should go into worklist.h, but I figure you just put them there for testing. Maybe could be written as static inline too | 12:25 | ||
samcv | yeah | ||
jnthn | But yeah, that's a nice optimization opportunity for aggregates | ||
samcv++ | |||
nwc10 | I forget if I'm asking a questiong that I've asked before - do we have anything like speed.pypy.org/ that touts how things are improving? | 12:26 | |
lizmat | there's the bench repo ? | 12:27 | |
nwc10 | although OMG that does have too little detail on the front page, and information overload immediately behind | ||
I was more meaning pretty graphs that massage things to look impressive :-) | |||
samcv | it looks like i don't get too much more speedup past that point.. probably the rest taken up by hash iteration | 12:30 | |
gc is at 60% at the point i'm at now. which is still totally terrible but yeah | |||
process_worklist is at 20% so it's not like MVMHash_gc_mark is quite as crazy as it was when i first started optimizing | 12:31 | ||
(while MVM_hash_gc_mark is at 35% currently) | |||
jnthn | oh, heck, I found it | 12:37 | |
wow | |||
There's an osrpoint right before a prepargs | |||
We then stack up an sp_resolvecode and guard ahead of the prepargs | 12:38 | ||
But the osrpoint location ends up still after those new instructions | |||
So when we OSR, we end up without stuff we need | 12:39 | ||
Nothing to do with invoke_v at all, it is just the first case that triggered this situation | |||
nwc10 | the "situation" being that the OSR is stored as "before this instruction", which spesh treats as "before this exact opcode" but really should be "before the opcodes that come to represent this logical unit of work" ? | 12:40 | |
jnthn | Yes | 12:41 | |
nwc10 | good. I did understand it well enough | ||
does that mean that the fix is relatively simple? Or are there cases where "before this exact opcode" is correct, and others where "before this logical unit of work" are correct, so there needs to be more metadata stored? | |||
jnthn | I think it's as simple as checking for an OSR deopt annotation upon instruction insertion and moving it | 12:42 | |
nwc10 | if so good, that was what I meant by the first case (but didn't know the detail of the "how") | ||
the "relatively simple" :-) | |||
jnthn | Otherwise it'd imply we were inserting instructions with visible side-effects | 12:43 | |
Well, so long as you don't count deopt as a visible side-effect, which it is if you profile, but... :) | |||
12:43
domidumont joined
|
|||
jnthn | That helps. There's now something ungood during CORE.setting build, but that might be my NQP change | 12:57 | |
hm, alas no | 12:59 | ||
nwc10 | do you have sufficient second machine to test the "simple" OSR fix alone on MoarVM master (through NQP to spectest?) | 13:01 | |
lizmat | jnthn: any specifics | ||
? | |||
nwc10 | because if that is a stand-along bug fix, it's useful to have that done | ||
(and would also confirm that the fix isn't exposing some other bug that CORE.setting was relying on) | |||
jnthn | Yeah, will try that alone also | 13:02 | |
But the error is "Void return not allowed to context requiring a return value" | |||
And only happens under inlining | 13:03 | ||
lizmat | that doesn't ring any bell for me :-( | ||
jnthn | so seems very related to the allowing opt of invoke_v | ||
Well, more opt of... | |||
I wonder if uninlining isn't handling the situation well... | 13:04 | ||
Hm, not obviously | 13:06 | ||
lol | 13:10 | ||
The inliner didn't delete the void return instruction | |||
So it tried to return from the enclosing thing | 13:11 | ||
timotimo | that's funny :D | 13:18 | |
jnthn | Easily fixed too | ||
timotimo | for sure | ||
Geth | MoarVM: 1371879313 | (Jonathan Worthington)++ | src/spesh/inline.c Delete void return instruction when inlining |
13:24 | |
MoarVM: 7e625b031a | (Jonathan Worthington)++ | src/spesh/manipulate.c Move OSR point backwards on instruction insertion So that when we stack up resolve/guard instructions ahead of an instruction that is an OSR entry point, then we will enter on OSR prior to the inserted instructions. |
|||
MoarVM: 1b1edfe5ff | (Jonathan Worthington)++ | 2 files Mark invoke_v as :logged So that we are able to resolve statistics about it and optimize the invocation. |
13:25 | ||
13:44
brrt left
|
|||
jnthn | *sigh* Still so many tricky problems | 13:44 | |
I've been looking at `my $foo = "blah"; for ^10_000_000 { my int $i = $foo.chars }` as my example | 13:48 | ||
Now it does successfully turn the decont_i into a set of the original unboxed value | |||
However, there's two things that keep the boxed value alive | 13:49 | ||
One is that it's guarded. We can prove the guard isn't needed, but don't yet do away with it. | |||
The second is that the value was read the other side of the guard, thus meaning the other side of a deopt point | 13:50 | ||
Which is now gone, but still, we don't know that it isn't used across other deopt points | 13:51 | ||
So we'll need a better way to model deopt usages | |||
timotimo | so, i'm wondering: since the OSR we got from the int @a = ^10_000 was so subpar, under what circumstances does the same optimized code get used in the future? | 13:59 | |
i.e. if a program optimizes that STORE method via osr first, then uses it a bunch of times, will those times use the result of the osr? | 14:00 | ||
jnthn | I guess it depends on if the arg types match | ||
timotimo | is it intended that we're not using the arguments in that specific case for facts on the param ops? | 14:01 | |
jnthn | Dunno, could just be something not yet done | ||
timotimo | i'm not terribly eager to look into osr code :D | 14:02 | |
it really just made a certain specialization, so that'd be why the args aren't used | 14:05 | ||
the spesh log doesn't give a type tuple for the stats that get OSR hits; in fact, for that frame there are no type tuples at all | 14:06 | ||
i must be misunderstanding something | 14:14 | ||
14:44
brrt joined
|
|||
samcv | uthash is not too terrible, but it's also really weird. how we store the pointer to the hash table in every single hash item | 14:50 | |
not sure why the developers though that was a great idea | |||
and complicates deallocating the hash since you have to make sure you don't delete whichever item you use as the "head" of the hash. though that's arbitrary. and then deleting that special element becomes more expensive | |||
i tried to refactor it to remove this, but after many hours i figured out it would probably be faster to add support for another hash library so i stopped in sadness | 14:52 | ||
jnthn | I'm not sure we should use any library by this point, just replace it with our own that's desinged according to the properties we want | 14:53 | |
samcv | yeah. i mean i may adapt something to our uses | 14:54 | |
jnthn | I still think we want a single blob of memory | ||
samcv | well i *will* do that. but if i can help it i won't write my own | ||
yep | |||
jnthn | And we only ever allocate a new one upon growth | ||
samcv | yep | ||
jnthn | If we then only ever read the address of that block once before doing any operations, and allocate it using the FSA + safepoint mechanism, then we fix the explosions under concurrent mis-use, I'd expect. | 14:55 | |
(That doesn't mean it'll be well behaved under concurrent mis-use, just not explosive. That's enough.) | |||
samcv | yeah | ||
but hands down the worst design decision of uthash is having one master hash handle, and relying in everything thta all handles contain references to the table | 14:56 | ||
and to fix it i'd have to change uthash entirely (every single function/macro almost). plus do the work i'd have to do anyway changing moarvm code | 14:57 | ||
brrt | i think the 'table' bit of a hash table is relatively easy, it is the 'hash' bit that i've found hard :-) | 14:58 | |
samcv | yeah | 14:59 | |
and also how it assigns values. like it uses the offset of the struct to return elements. and also having it being full of macros | |||
ok fine. that's just as insane a design decision. but how they made it necesitates macros... | 15:00 | ||
jnthn | Well, I think the idea behind it was you that you can use any C structure has your hash | 15:01 | |
Uh, hash element | |||
Which we don't really need | 15:02 | ||
samcv | yeah. that would be my first worst part about it. only reason it isn't is because that has a reason it could be a benefit for some people | 15:04 | |
heh | 15:05 | ||
15:07
robertle left
15:09
brrt left
|
|||
[Coke] | MoarVM panic: Internal error: zeroed target thread ID in work pass | 15:47 | |
The spawned command 'perl6' exited unsuccessfully (exit code: 17) | 15:48 | ||
(this on a branch of docs. Will update everything and see if I can repro it.) | |||
16:17
domidumont left
|
|||
jnthn | Finally think I've figured out a workable, more accurate, deopt tracking algorithm | 17:10 | |
Also for gurads, I think we need to at least model such that they introduce a new SSA version | 17:16 | ||
So then we can distinguish the facts before the guard and the facts after the guard | |||
timotimo | that'd be nice | ||
jnthn | The the deopt, I found a few algorithms that totally won't work | 17:17 | |
But I think I've hit on one that maybe will | 17:18 | ||
[Coke] | (the rakudo update made it vanish, whee) | 17:19 | |
jnthn | We keep track of writers with outstanding reads. When we hit a deopt point, we record that deopt point against all of the writers with outstanding reads | ||
In the event of a loop, we encounter a read before a write. In the immediate we could disregard it, and then that'd mean the write followed by a deopt point would record that deopt point on the write. | 17:21 | ||
Trouble is that the write then lives on forever and potentially gets other not relevant deopt points makekd on it. Hm. | 17:25 | ||
*marked | |||
Ah, maybe dropping them after we saw all the other reads will cut it | 17:28 | ||
Hm, not, it won't, it'll be premature | 17:29 | ||
Oh, I think we can do it by saying if we saw all the predecessors of the basic block that had the missing read, then we're good. | 17:31 | ||
jnthn goes for some rest :) | 17:33 | ||
17:49
domidumont joined
17:50
lizmat left
18:43
undersightable6 joined
|
|||
samcv | removing the previous pointer reduces mem usage by 8.7MB. (max usage 242MB originally) | 18:51 | |
from the hash script i've been testing | 18:52 | ||
at least if /usr/bin/time is to be believed | 18:56 | ||
that seems to line up closish with 1 million * 64 bits + a bit extra | 18:58 | ||
jnthn | Nice :) | 19:00 | |
19:07
domidumont left
19:15
Deknos joined
19:20
dogbert17 joined
19:33
robertle joined
|
|||
dogbert17 | hmm, I still have code which is considerably slower now than with 2018.06 | 19:46 | |
timotimo | what's it look like? | 19:47 | |
dogbert17 | one observation is that the function taking the most time, according to perf, is now MVM_spesh_arg_guard_run instead of MVM_interp_run which was the costliest in 2018.06 | ||
timotimo | can you get a spesh log and grep out all "waiting for" lines? | 19:48 | |
dogbert17 | timotimo: want a gist | ||
ok | |||
19:48
Deknos left
|
|||
timotimo | just a shot in the dark | 19:48 | |
dogbert17 | only ~10 lines like 'Was waiting 61216us for logs on the log queue.' | 19:50 | |
timotimo | OK, that's good | 19:51 | |
dogbert17 | this function, in fourth place, is new, compared to 2018.06 resolve_using_guards (5.82%) | 19:52 | |
in the perf report that is | |||
.seen AlexDaniel | 19:56 | ||
yoleaux | I saw AlexDaniel 17:52Z in #perl6: <AlexDaniel> I was about to click 👍 too :) | ||
19:56
stmuk joined
|
|||
dogbert17 | don't we have a bot which might be able to figure out when/if a perf regression occured? | 19:57 | |
19:59
stmuk_ left
20:05
brrt joined
|
|||
dogbert17 | timotimo: want to have a look at the code? It might feel slightly familiar :) | 20:05 | |
timotimo | sure | ||
dogbert17 | gist.github.com/dogbert17/f663fabb...3182b78c46 | ||
on my system 2018.06 takes ~7.5 sec while HEAD(Camelia takes ~10 sec | 20:06 | ||
timotimo | you're not comparing camelia vs your machine, are you? | 20:07 | |
benchable ought to be able to figure out where it changed | 20:08 | ||
benchable6: gist.githubusercontent.com/dogbert...tfile1.txt | |||
benchable6 | timotimo, I cannot recognize this command. See wiki for some examples: github.com/perl6/whateverable/wiki/Benchable | ||
timotimo | benchable6: releases gist.githubusercontent.com/dogbert...tfile1.txt | ||
benchable6 | timotimo, Successfully fetched the code from the provided URL | ||
timotimo, starting to benchmark the 31 given commits | |||
20:08
lizmat joined
|
|||
dogbert17 | timotimo: no, I'n not comparing against Camelia rather using the same version at home | 20:09 | |
timotimo | ok | 20:10 | |
"releases" wasn't the best idea, i guess | |||
that'll take what 5 minutes? | |||
dogbert17 | it shouldn't take too long ... | ||
timotimo | if it's 10 seconds, that'd be 310 seconds | 20:11 | |
dogbert17 | hopefully it's woth the wait :) | ||
*worth | |||
but if it only tests releases I guess we won't learn much | 20:12 | ||
timotimo | also true | ||
it'll "zoom in on performance differences", too | |||
dogbert17 | aha | ||
benchable6 | timotimo, «hit the total time limit of 240 seconds» | ||
timotimo | dangit | ||
dogbert17 | grr | 20:13 | |
timotimo | benchable6: 2018.06,HEAD gist.githubusercontent.com/dogbert...tfile1.txt | ||
benchable6 | timotimo, Successfully fetched the code from the provided URL | ||
timotimo, starting to benchmark the 2 given commits | |||
dogbert17 | clever | ||
quite a bit of stuff has become faster recently but not everything | 20:14 | ||
timotimo | bbiab for dinner prep | ||
benchable6 | timotimo, ¦2018.06: «4.9331» ¦HEAD: «9.1578» | ||
timotimo, benchmarked the given commits and found a performance difference > 10%, now trying to bisect | |||
dogbert17 | ooh, it seems to be working, amazing | ||
20:16
brrt left
|
|||
benchable6 | timotimo, «hit the total time limit of 240 seconds» | 20:17 | |
nwc10 | sigh, where's the `sudo benchmarkable` option? :-/ | 20:18 | |
MasterDuke | it runs the code 5 times per commit, so anything longer than ~2-3s will likely timeout | ||
though i should probably increase the timeout for benchable6 | 20:19 | ||
dogbert17 | MasterDuke: that would be awesome, although I could change the script so it runs faster | 20:22 | |
benchable6: 2018.06,HEAD gist.githubusercontent.com/dogbert...tfile1.txt | 20:25 | ||
benchable6 | dogbert17, Successfully fetched the code from the provided URL | ||
dogbert17, starting to benchmark the 2 given commits | |||
nwc10 | to be clear, I wasn't aware how sophisticated but easy benchable6 was. (Or at least, is supposed to be, if you don't give it too slow a code example) | ||
benchable6 | dogbert17, ¦2018.06: «4.9101» ¦HEAD: «9.1712» | 20:26 | |
dogbert17, benchmarked the given commits and found a performance difference > 10%, now trying to bisect | |||
dogbert17 | heh, those are the same times as before? | ||
MasterDuke | the numbers in the loops look the same | 20:27 | |
dogbert17 | I edited the script, perhaps the url changed | ||
yes it did, sigh | 20:28 | ||
let's play the waiting game then | |||
benchable6 | dogbert17, «hit the total time limit of 240 seconds» | 20:29 | |
MasterDuke | i just upped it to 300, i'll restart the bot | ||
dogbert17 | cool | 20:30 | |
20:30
benchable6 left
|
|||
MasterDuke | rm, it restarted, but i don't see it rejoined... | 20:32 | |
*hm | |||
20:32
benchable6 joined
|
|||
MasterDuke | there we go | 20:32 | |
dogbert17 | benchable6: 2018.06,HEAD gist.githubusercontent.com/dogbert...tfile1.txt | ||
benchable6 | dogbert17, Successfully fetched the code from the provided URL | ||
dogbert17, starting to benchmark the 2 given commits | |||
dogbert17, ¦2018.06: «2.1404» ¦HEAD: «3.7665» | 20:33 | ||
dogbert17, benchmarked the given commits and found a performance difference > 10%, now trying to bisect | |||
dogbert17, gist.github.com/4cdc381083c9db8f4e...80f23001b8 | 20:36 | ||
MasterDuke | github.com/rakudo/rakudo/commit/d3...c5d8165116 | 20:38 | |
dogbert17 | ok, so it's this one: github.com/rakudo/rakudo/commit/d3...c5d8165116 | ||
MasterDuke | really should make those links... | 20:39 | |
dogbert17 | might be ok then since it warns that there may be initial slowdown followed by cool opts at a later date | ||
benchable6++ | 20:40 | ||
20:59
japhb_ joined
21:01
japhb left
21:16
stmuk_ joined
21:18
stmuk left
|
|||
timotimo | the profiler is unhappy with this code :( | 21:22 | |
i think i see very deep recursion? | |||
MasterDuke | any thoughts/comments on github.com/MoarVM/MoarVM/pull/899 ? | 22:06 | |
dogbert17 | timotimo: yes, that's why I thought you might remember seeing it before. It has been discussed earlier since it messed things up for the profiler. | 22:25 | |
timotimo | mhm | 22:26 | |
dogbert17 | perhaps not so easy to remember, you've probably looked at hundreds of perl6 snippets since :) | ||
I believe you were a bit sceptical towards the junction that's used in the script | 22:27 | ||
22:55
stmuk joined
22:57
stmuk_ left
23:22
lizmat left
23:25
lizmat joined
|
|||
Geth | MoarVM/new-deopt-point-algo: 7e1ab4b2bd | (Jonathan Worthington)++ | 5 files Sketch out more precise deopt algorithm As described in the commit body. It seems to do about the right thing, except in the case of loops, where it makes a gross overestimate of what needs to be preserved due to that part not yet being implemented. Includes dumping of what deopt indices are keeping instructions alive. |
23:25 | |
jnthn | MasterDuke: Just glanced it, seems OK in principle | 23:27 | |
Though I'm a bit fried from trying to figure out a more precise deopt algo :) | |||
sleep o/ | 23:51 | ||
timotimo | gnite jnthn | 23:55 |