01:52 ilbot3 joined 02:44 http_GK1wmSU joined 02:47 http_GK1wmSU left 04:12 vendethiel joined 04:56 lizmat joined 06:01 brrt joined 07:11 brrt joined
brrt good * #moarvm 08:06
08:13 zakharyas joined 08:30 vendethiel- joined 08:33 harrow joined 08:48 vendethiel joined 08:57 edehont joined
brrt interestingly, my bug is insensitive to gc debugging 08:59
so i think i'm adding a pointer somewhere
to garbage
jnthn moarning o/ 09:10
Second to last day of the heatwave... 09:13
brrt moarning jnthn 09:20
jnthn o/ brrt
How's your pointer to garbage hunt going?
brrt not super good 09:21
my second bisect had a slightly different frame (without ASAN) that is responsible for breakage 09:22
although, it might arguably still be the same frame, given that the bb numbers match up
yeah, it's the same frame after all 09:23
but it has a different frame number
i wonder if i changed something silly
i'm going to run a third bisect, and if it comes up the same, then i'm happy 09:24
(i tried installing gc debugging, but it made virtually no difference)
09:40 vendethiel joined
Geth MoarVM: 25419ccf97 | (Jonathan Worthington)++ | 7 files
When logging invokes, don't store closures.

This can, in some cases, seriously extend the lifetimes of objects that were closed over. Instead, just log what we'll really want: the static frame of the invocation target (which actually simplifies the handling in the stats code) and whether the caller was its outer (this will be useful for future optimizations).
09:54
jnthn Nice. In the read a million lines benchmark that wins us a bit back by meaning we no longer do 2 full GC runs due to things getting promoted 09:58
brrt well, third bisect agrees with first 10:14
jnthn lunch; bbiab 11:23
11:28 markmont joined 11:34 TimToady joined 11:52 AlexDani` joined
jnthn back 11:57
12:16 brrt joined 12:18 dogbert17 joined 12:22 AlexDani` joined 13:00 brrt joined 13:06 zakharyas joined
Geth MoarVM: f66c82641a | (Jonathan Worthington)++ | 3 files
Move stack simulation types into header.

So we'll be able to refer to them outside of the spesh stats source file.
13:53
MoarVM: e354d1f901 | (Jonathan Worthington)++ | 4 files
Start keeping simulated stack around between logs.

So we can continue incrementing OSR points and accumulating data in long-lived frames. At least, that's the theory; it doesn't seem to quite manage that yet. It does, however, seem to avoid various bad stack depth values, so the ordering of specializations works out looking more like it should.
MoarVM: e12d52c251 | (Jonathan Worthington)++ | 4 files
Record and insert rw-ness into type tuples.

Gets us able to resolve more multis and do more inlining again.
14:13
14:15 zakharyas joined 14:31 brrt joined
Geth MoarVM: d865eb5172 | (Jonathan Worthington)++ | src/spesh/optimize.c
Fix "close enough to inline" check.

99% was meant to be sufficient, but never was due to > instead of >=.
14:35
MoarVM: 9f9c4b4673 | (Jonathan Worthington)++ | src/spesh/stats.c
Fix accounting error in callsite stats.

Led to spurious planning of unrequired certain specializations.
14:49 Slaash joined 14:51 Slaash left 14:56 travis-ci joined
travis-ci MoarVM build errored. Jonathan Worthington 'Fix accounting error in callsite stats. 14:56
travis-ci.org/MoarVM/MoarVM/builds/261031926 github.com/MoarVM/MoarVM/compare/e...9c4b4673f2
14:56 travis-ci left
lizmat just as I was about to do a version bump 14:58
Zoffix github error 15:01
Zoffix is getting angry unicorns in the browser too
15:07 brrt joined
jnthn ah, just a git clone error 15:16
japhb wonders from what source data (if any) github.com/status is computed from 15:18
brrt okay, i may have found the source of my weird, weird bug
???? and it may be embarassing 15:19
japhb brrt: Congrats on finding it nonetheless!
brrt maybe :-P
jnthn brrt: haha, can't beat my d865eb5172 in terms of embarassing find today :P
brrt anyway, turns out, carg will put arguments on the stack for anything that isn't a int or a ptr 15:20
which is ehm, surprising
Geth MoarVM/even-moar-jit: 82ca7faf91 | (Bart Wiegmans)++ | 3 files
ARGLIST on should not be fussy about type

I think a casual user could be forgiven for not caring about the difference about int and ptr and reg (although if portability is a concern, it is a real difference). So the code for selecting arg locations should not be fussy about that difference (the ABI isn't).
This appears to fix the bugs in CORE.setting compilation reported by nwc10++, but I'm not 100% positive since ASAN on OS X does not quite meet my expectations.
15:30
japhb brrt: Is it ready for merge after that fix? 15:34
brrt if it passes ASAN, then i'd hope so 15:35
strictly speaking that was a template bug, not a compiler bug, but the unfussing should help a bit
but i mean, we've got to draw the line somewhere :-) 15:37
japhb Indeed! 15:38
jnthn m: say 1741 / 349 15:41
camelia 4.988539?
jnthn m: say 1741 / 351
camelia 4.960114?
jnthn Hmm
A bit of dodgy arithmetic somewhere in the stats, methinks...
lizmat
.oO( off by two )
15:42
jnthn Off by a fator of 5 actually 15:44
lizmat 349 vs 351 :-) 15:46
jnthn Oh, I was trying to see if 1741 was an exact multiple of the other two
brrt (351 is 3^3*13, 349 is prime 15:53
Geth MoarVM: 8153063fa0 | (Jonathan Worthington)++ | src/spesh/stats.c
Bump spesh stats version on updates.

Otherwise we'll never do planning for them, and so never produce new specializations. This was the final issue blocking OSR of long-lived frames that are hot loops, but highly costly on their first call (by, for example, spending time in the multi-dispatch slow path for some calls).
15:56
jnthn Hmmm 15:58
Well, weekends pondering: can we do a better job of exception handler representation in the CFG than anchoring them all from the entry block? 16:00
16:01 sivoais joined
jnthn I think we surely can 16:05
Though at a cost
We'd need to insert actual BB breaks at all invokish/throwish (which arguably we should anyway)
timotimo those are a lot of ops
jnthn Then when we're in the region of a handler, we make all such BBs have the exception handler as a successor
timotimo: It is, but this is just a graph representation, so the only real cost is at spesh time 16:06
timotimo i'm not sure how many of our optimizations require there to be no bb break between things
right
jnthn Well, and for us reading the graph
I don't think many of them do
Because we have an SSA representation
The problem is that if you consider something like
my $iterator := $seq.iterator;
while (my $val := $iterator.pull-one) !=:= IterationEnd { ...stuff... } 16:07
We'd really like to be able to spesh that pull-one call
But we can't 16:08
Because $iterator isn't known because of a PHI merge from the 0 node
Because "redo" exceptions take you to the start of the loop
But we don't have a precise representation of such things
timotimo ah, redo, right
lizmat jnthn: and $iterator, aka a lexical ?
ah, eyes :-(
jnthn lizmat: Actually they're not lexicals in the code I'm considering here :) 16:09
(Lowered to registers) 16:10
Anyway, because we stick an edge from 0 from all handlers, we conservatively figure we don't know what's in the register
So we can't do anything about that findmethod op except the usual monomorphic cache trick 16:11
Which of course means we don't know the coderef so we can't sp_fastinvoke_o it (or inline it were it small enough) 16:12
Geth MoarVM: 8325f0117b | (Jonathan Worthington)++ | src/spesh/stats.c
Better no-arg and no-object-arg callsite handling.

Give these all a single type specialization. This means that we can now use logged information in performing the optimization of them.
16:15
16:16 TimToady joined
jnthn Well, tests seem happy and stuff :) 16:19
So, I think that's my spesh hackery for this week. 16:21
Overall status now is that we have (a) a much better data set to work on, (b) a LOT less spesh bugs and various historical workarounds cleared up, (c) the ability to inline/fastinvoke more things already (and potential for a bunch more), (d) are doing the work on a background thread 16:24
16:25 travis-ci joined
travis-ci MoarVM build passed. Jonathan Worthington 'Bump spesh stats version on updates. 16:25
travis-ci.org/MoarVM/MoarVM/builds/261063197 github.com/MoarVM/MoarVM/compare/9...53063fa0b2
16:25 travis-ci left
nwc10 so even single threaded "traditional" code can exploit both cores of one's smartwatch... 16:25
Zoffix \o/
jnthn This has been...a lot of work.
Zoffix jnthn++
nwc10 jnthn++
I hope that past jnthn has rewared future jnthn with a suitably filled beer fridge 16:26
timotimo nwc10: not terribly likely that the usage of the second processor will go up very far
unless we make our spesh optimizer do a whole lot more stuff
nwc10 which is now a more sensible option to choose/explore, because it's not a direct slowdown on (under loaded) multi-core machines 16:27
but "make the CPU hot because we can" isn't an end in itself :-) 16:28
We could make the spesh thread mine bitcoin when it would otherwise be idle, and donate to www.perlfoundation.org/perl_6_core_...pment_fund 16:29
this is neither cost effective nor ethical
jnthn Talking of TPF... 16:30
Well, or said fund
Around 4 weeks ago I was told the grant was approved, at which point I put aside all other work so I could focus full time on getting this spesh stuff done, which it really needed. Then a couple of days after I'd dug in, I was told they would announce it after completing one internal legal procedure. 16:31
So anyway, I've actually been working for about 4 weeks full time before TPF put up the announcement. 16:32
Which means there's rather *less* than 200 hours left by this point.
nwc10 "yay"
Zoffix Understandable. 16:36
We need to get jnthn++ more money \o/
I think I know of a way to get a bit, actually. Gonna think about it over the weekend. 16:37
like 1-2 grand
16:41 edehont joined
jnthn :) 16:41
jnthn wonders how many weeks escape analysis and related opts would need... 16:42
nwc10 1) what would be the estimated benefit 16:43
2) does most of the benefit only happen near the end of the work, or is some of it incremental?
and i guess even
3) does it open up more stuff that other folks could do
jnthn Good question. My guess on (1) is "fairly high" for Perl 6 in that at the moment: 16:44
nwc10 (not that it's obvious who might be other folks)
jnthn my $a = foo(); bar($a);
Where bar is sub bar($b) { }
GC allocates a Scalar container
That kind of thing is very common 16:45
nwc10 .tell brrt 'works on "my" machine' - ie ASAN views origin/even-moar-jit as boring.
jnthn So I suspect a *lot* of Scalars would be possible to avoid allocating using the GC
Probably a number of temporary box/unbox too 16:46
2 - probably partly incremental in that you can do the inter-procedural bit later, but given nearly eveyrthing in Perl 6 is a procedure call... :)
(Before inlining) 16:47
Granted you'd do this after inlining
timotimo well, we already do inlining :)
jnthn Right :) 16:49
I mean you'd inline *then* perform the analysis
timotimo oh, yes indeed
16:50 travis-ci joined
travis-ci MoarVM build passed. Jonathan Worthington 'Better no-arg and no-object-arg callsite handling. 16:50
travis-ci.org/MoarVM/MoarVM/builds/261069689 github.com/MoarVM/MoarVM/compare/8...25f0117b78
16:50 travis-ci left
timotimo the spvm author wrote a post "spvm is now 6x faster than perl 5.26" and says that'll go up to 20x when a jit is implemented 16:50
16:50 brrt joined
timotimo i'm hoping to see the actual code they used to get to that number 16:51
jnthn Yeah, I was curious about the claim, but figured I can wait for the results :) 16:52
timotimo i see moarvm.org/measurements has entries for recent days, too. they are all 0 bytes big, though
nwc10 I think that it might be "6x faster than Perl 5.26 for the things that spvm can do" but this part is unclear 16:53
jnthn I figured it was "for a very specific set of benchmarks" 16:54
timotimo yeah, spvm is "for fast computations", which i assume means arithmetic and such
brrt nwc10: thats good news 16:55
timotimo i wonder if there's anything we can do to get faster random numbers
nwc10 the impression I got (not sure if I read this, inferred it, or plain got it wrong) was that the intent was to work outwards and increase the scope of the things that spvm can do
timotimo building a list from ^100 .roll(1_000_000) takes us about 6.3 seconds on my machine
running [+] on that afterwards takes only 0.8s which is nice, but not stellar 16:56
brrt i did see some broken spectests when runn
timotimo aha, 67% time spent in a push-all inside range.pm, which is osr'd + speshed, but not jitted 16:57
brrt on even-moar-jit
timotimo hah, we don't jit rand_I 16:59
that's easy!
jnthn nwc10: Yeah, keeping the speed up while adding features is the tricky part :) 17:01
nwc10: I glanced at how scope handling worked and it looked like it'd need a good re-work for closures, for example.
timotimo i wonder if we should have a specific non-bigint rand function that can do completely without Int 17:03
jnthn Righty, I'm off to cook 17:04
bbl
timotimo good cookin'
nwc10 that's a lot more thorough than my level of inspection. (clearly I slack)
timotimo actually, "just" a special case inside bigint_rand for smallbigint would be a big win already 17:06
oh, we use mp_rand, though
jitting this made hardly a difference, clearly because it spends most of its time doing other stuff 17:17
fascinating. even though $!min is 0, adding that to the result of rand_I takes about 16.8% of the total time inside push-all 17:19
it's probably rather bad that we force the bigint to full big int before every single rand_I 17:21
hum. we're using mp_mod to get a number in the range that the user was asking for 17:36
isn't that a bad idea for distributions of random numbers?
gtg, but i'll continue making rand_I faster inside moarvm 17:41
17:42 hoelzro joined
Geth MoarVM: MasterDuke17++ created pull request #624:
Fix spelling in comments
17:42
MoarVM: d1951981ef | MasterDuke17++ (committed using GitHub Web editor) | src/math/bigintops.c
Fix spelling in comments
17:43
MoarVM: 604da4d062 | lizmat++ (committed using GitHub Web editor) | src/math/bigintops.c
Merge pull request #624 from MasterDuke17/patch-2

Fix spelling in comments
nwc10 jnthn: ASAN does not wish to comment on MoarVM at this time. (neither master nor even-moar-jit generate any excitement) 18:13
lizmat wonders if this is a good moment to bump 18:35
nwc10 while jnthn is eating? :-)
[Coke] GO GO GO GO
18:38 brrt joined
japhb lizmat: Given that jnthn++ called it a week, I'd say go for it. 18:43
brrt if you bump, i'll merge master, and create a pull request 18:56
jnthn OMG. even-moar-jit time? :D 19:00
19:00 robertle joined
timotimo \o/ 19:03
brrt uhuh
i can't promise a completely clean spectest
jnthn Is it dirtier than before? :) 19:05
Zoffix :o 19:07
wow cool :D
brrt to be honest, i don't run master so often :-) 19:08
Geth MoarVM/even-moar-jit: 26 commits pushed by MasterDuke17++, (Jonathan Worthington)++, lizmat++, (Bart Wiegmans)++
review: github.com/MoarVM/MoarVM/compare/8...14c3f6820a
jnthn 2017.06 did a huge I/O overhaul. 2017.07 did a ton of string/Unicode stuff. 2017.08 will be huge amount of spesh changes + huge amount of JIT changes + re-implemented GC sync up :P 19:09
Crazy times ;) 19:10
Zoffix :)
timotimo so ... with my changes to sidestep bigint arithmetic for small rand_I max values i get different numbers for the same srand initialization value 19:13
should i bother trying to make that compatible?
jnthn: opinions? 19:14
i mean, you can switch tommath between rand and arc4random with a define flag 19:15
jnthn Hmmm
I guess it behaves predictably still, just different values? 19:16
timotimo yeah
jnthn Probably it's OK 19:17
timotimo thanks, let me measure the speedup now
huh, i thought this was faster before 19:19
oh, because of the .say
ah, yes
Geth MoarVM: bdw++ created pull request #625:
Merge new 'expression' JIT backend
19:20
timotimo m: say 1.5 / 5.5 19:21
camelia 0.272727?
timotimo almost but not quite 4x
m: say (87512 + 87472 + 94660 + 94552 + 94692) / (103676 + 103752 + 103564 + 104184 + 104068) 19:22
camelia 0.8837618?
timotimo that's the memory usage difference 19:23
[Coke] timotimo: between what and what?
timotimo srand(1); say [+] ^1000 .roll(1_000_000) 19:24
brrt okay, weekend for me 19:25
Geth MoarVM/speed_up_rand_I: db4997d87c | (Timo Paulssen)++ | 4 files
jit rand_I and also give it a smallbigint path

almost 4x faster for max values that fit in 32bit
timotimo [Coke]: ^- this is the patch; the jitting itself made hardly a difference 19:26
jnthn brrt++ # now I've got my weekend reading, I guess :) 19:30
timotimo huh, infix:<+> is 50% interped, 50% jitted 19:36
how does that happen
Zoffix brrt, how long have you worked on that branch? (github shows earliest as Jan 2016, but says it's showing only latest 250 commits)
timotimo no bails in the jitlog at least 19:40
19:45 brrt joined
brrt Zoffix: June 2015 iirc 19:45
jnthn: have fun :-)
Zoffix wow. brrt++ 19:47
brrt and its been over 300 commits for sure
20:39 AlexDaniel joined
nwc10 ASAN considers origin/even-moar-jit to be unworthy of comment 21:44
(we do not live in interesting times)
21:44 markmont joined
japhb Speak for yourself, nwc10. ;-) 21:59
22:05 yoleaux joined
timotimo my branch survives spectests 22:09
but i haven't tried asan
trying now 22:11
oh
nwc10: you turn off the leak detecto, right?
cool, it's appy 22:31
Geth MoarVM: db4997d87c | (Timo Paulssen)++ | 4 files
jit rand_I and also give it a smallbigint path

almost 4x faster for max values that fit in 32bit
22:33
MoarVM: 600c2e9cf2 | (Timo Paulssen)++ | 4 files
Merge branch 'speed_up_rand_I'

adds a smallbigint path to MVM_bigint_rand so we can get around allocating an mp_int to hold the max value as real bigint and doing a mod on the mp_int to get the real value.
MasterDuke timotimo, brrt: is this something that the jit could do directly in assembly? github.com/MoarVM/MoarVM/blob/mast...#L620-L639 22:37
timotimo the expr jit can surely do this 22:38
is pow_i currently just unjitted and thus breaks some jit compilation?
MasterDuke currently unjitted, haven't checked any jit logs 22:39