github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:04
elcaro left
00:05
elcaro joined
|
|||
timotimo | m: say 4605054973 * 100 / 4594950376 | 00:11 | |
camelia | 100.21990655335 | ||
timotimo | m: say 100 * (4605054973 R/ 4594950376) | ||
camelia | 99.78057597446 | ||
01:30
leont left,
leont joined
03:04
leont left
04:04
linkable6 left,
evalable6 left
04:06
evalable6 joined
04:07
linkable6 joined
|
|||
nwc10 | jnthn: oops, yes, was supposed to be marked ready for review. It is now | 06:35 | |
MasterDuke: nice. Particularlly the LLi miss improvement | |||
that's actually bigger than my guess. If we can find more things like that, it will all add up. | 06:36 | ||
08:07
Altai-man joined
08:12
sena_kun joined
08:14
Altai-man left
|
|||
MasterDuke | yeah, not sure how noticeable it was in time, but lots of little optimizations add up | 08:56 | |
09:03
zakharyas joined
09:05
Altai-man joined
09:07
sena_kun left
09:09
domidumont joined
10:35
frost-lab joined
|
|||
timotimo | if we could cheaply sort worklists by repr ... | 10:54 | |
jnthn | The number of reprs is fixed, so you could have per-repr worklists...but why? :) | 10:58 | |
10:58
Kaiepi left,
Kaiepi joined
|
|||
timotimo | keep the instruction cache hot by running the same repr's gc_mark over and over | 10:59 | |
jnthn | Wonder if that's a measurable improvement... | 11:00 | |
timotimo | in cachegrind, probably, in wallclock, probably not | ||
nine | I fear the only way to know is to try | ||
jnthn | I guess the downside is it would scatter object graphs more | ||
timotimo | ah, since new objects are allocated in the nursery again | 11:01 | |
jnthn | Just that we move objects as we encounter them in the list, and so objects that reference each other are sometimes adjacent in the worklist and so end up copied into the other semispace or into gen2 one after the other, although it's less true of gen2 given the free list | 11:02 | |
So you *might* lose some memory cache hits | |||
timotimo | it's also less true of gen2 given that different reprs are likely to have slightly different sizes | 11:03 | |
jnthn | But I've no idea how these two effects would play off each other | ||
Or if either is even going to be significant | |||
timotimo | yeah | ||
nwc10 | it sounds like quite a bit of work (or am I wrong on that part?) and complexity (or wrong?) for potentially marginal gain. | 11:04 | |
timotimo | yeah | ||
i wouldn't mind a factor of 2 speedup for the gc, but this is not how we get that | 11:05 | ||
nwc10 | usually that sort of speed up is "better algorithm" but that's never easy, even if it's possible. | 11:06 | |
timotimo | well, there is a whole research field for "better algorithm" in GC | ||
but many of those better algorithms are not trivial to adopt to a whole system | 11:07 | ||
lizmat | there's even a book about it: www.bookdepository.com/The-Garbage...gKUB_D_BwE | ||
oops | |||
timotimo | like when you have to add not only write barriers but also read barriers to all your gc-object-using C code when you change to a concurrent GC | ||
was concurrent the word for when the gc runs while mutators also run? | |||
lizmat | www.bookdepository.com/The-Garbage...1420082791 # better link | 11:08 | |
timotimo | 1kg of book for almost a hundred bucks | ||
nwc10 | jnthn pointed me at this a few weeks ago: sqlite.1065341.n5.nabble.com/50-fas...78082.html -- ... is 50% faster than the 3.7.17 release | 11:09 | |
from 16 months ago. That is to say, it does 50% more work using the same | |||
jnthn | I read that book (well, most of it) before working on the MoarVM GC :) | ||
nwc10 | number of CPU cycles. | ||
a lot of small wins can add up. | 11:10 | ||
timotimo | right | ||
nwc10 | as to read barriers, my fear would be "and then the code is even more complex, and fewer people understand it, and more time is lost to bugs than was gained from speedup" | ||
timotimo | yes, absolutely | ||
getting more and more code ported from C to nqp or similar would be a way to get this smoothed out | 11:11 | ||
that's also not easy, either | |||
lizmat | fwiw, I'm considering moving the shaped array code to Raku land | 11:15 | |
to get more flexibility | |||
the code basically predates Christmas and has been untouched basically since then | |||
and we have now better HLL optimizing | 11:16 | ||
and it would make it easier to port shaped arrays to new VM's | |||
(think .NET :-) | |||
MasterDuke | are you talking about code that's currently in moarvm or nqp> | ||
? | |||
jnthn | Except that the CLR and JVM both natively provide shaped arrays too | 11:17 | |
lizmat | yes | ||
jnthn | And they are more efficient there than resizable ones | ||
Heck, at the VM level that's probably true in MoarVM also; it doesn't have to do any resize check logic | |||
lizmat | fact is, that shaped arrays are still at least 5x as slow as unshaped arrays | 11:18 | |
in Raku land :-( | |||
jnthn | Yes, but that appears to be related to type checking and method resolution issues. | ||
lizmat | well, yes | ||
but that doesn't matter to people wanting to use it :-) | |||
jnthn | Anyway, big -12 | 11:19 | |
uh, -1 | |||
Unless we find we somehow can't fix the type/method issues | 11:20 | ||
lizmat | well, it's my intent to document all of the related nqp ops first | ||
and grok how that part of Rakudo actually works | |||
my prototype atm is about 5x as fast the current shaped array performance | 11:21 | ||
and might well get merged with the current backend implementation if we can fix the type checking / resolution issues | 11:22 | ||
jnthn | Well, does the new thing use the multidim repr? That's the key part to the VM having a clue what to do with it | 11:25 | |
MasterDuke | running `my $a; for 1..5 -> $x { for 1..5 -> $y { $a = $x gcd $y } }; say now - INIT now; say $a` with MVM_SPESH_DISABLE=1, why in the world would end up in this branch three times? github.com/MoarVM/MoarVM/blob/mast...ops.c#L453 | ||
jnthn | (And getting the compact memory layout) | ||
MasterDuke: Maybe the numbers produced by `now` are big enough? | 11:27 | ||
m: say now.WHAT | |||
camelia | (Instant) | ||
jnthn | m: say now.^mro | ||
camelia | ((Instant) (Cool) (Any) (Mu)) | ||
jnthn | I forget how Instant is represented though | ||
MasterDuke | but i'm gcd'ing `$x` and `$y`, not `now` | 11:28 | |
jnthn | If now is involving rational arithmetic anywhere that uses gcd internally | 11:29 | |
lizmat | jnthn: it would use a single array, with a single index internally, so it would be compact | ||
timotimo | can always bt and mvm_dump_backtrace | ||
jnthn | Maybe breakpoint it and...what timo said | ||
lizmat | jnthn: in any case, I'm exploring this in module space :-) | 11:31 | |
jnthn | ok | 11:32 | |
Geth | MoarVM/update-docs: ae5f7ad447 | (Elizabeth Mattijsen)++ | 8 files Update some docs to Raku era Unless they're specifically historically inclined. |
11:54 | |
MoarVM: lizmat++ created pull request #1394: Update some docs to Raku era |
11:55 | ||
MasterDuke | interesting. i ran Daniel Lemire's benchmark code of a bunch of different gcd implementations. there is a version that takes half the time as moarvm's implementation. but if i stick it in moarvm, my example gets 1s slower (1.7s -> 2.7s) | 12:14 | |
m: my $a; for 1..5_000 -> int $x { for 1..5_000 -> int $y { $a = $x gcd $y } }; say now - INIT now; say $a | |||
camelia | 3.8432893 5000 |
||
MasterDuke | hm. with spesh disabled current is still about 1s faster, but the absolute times have increased (10.4s -> 11.9s) | 12:16 | |
oh wait, i might be reading his benchmark results backwards | 12:19 | ||
12:22
zakharyas left
|
|||
MasterDuke | would using __builtin_ctz() be a portability problem for moarvm? | 12:50 | |
looks like it's not available in visual studio | 12:58 | ||
but it's only 1s faster when doing 100_000_000 gcds, this probably isn't worth it | 13:04 | ||
lizmat | how is memory doing ? | 13:05 | |
that could be another reason ? | |||
I mean gcd gets used a lot for Rats | |||
MasterDuke | memory should be identical | 13:06 | |
13:07
sena_kun joined
13:09
Altai-man left
|
|||
jnthn | MasterDuke: typical way with unportable things is a probe to see if it's available, and a fallback approach if not | 13:27 | |
MasterDuke | yeah, looks like there's a _BitScanReverse that can be used instead. but all told it doesn't seem worth the trouble right now | 13:30 | |
13:44
Geth left
13:45
Geth joined
14:03
lucasb joined
14:06
bartolin left,
bartolin joined
14:09
zakharyas joined
14:25
leont joined
14:27
frost-lab left
|
|||
lizmat | some unexpected timings: github.com/Raku/nqp/issues/685 | 15:33 | |
jnthn | Can you try using $a? | 15:38 | |
Or declaring it outside of the loop? | |||
(I suspect spesh will be dropping the atpos entirely) | |||
lizmat | try using $a ? | 15:39 | |
jnthn | Yes, at the moment it's an unused variable, and atpos is a pure operation | ||
lizmat | .670 vs .1458 | 15:40 | |
.670 vs 1.458 | |||
so no change really, the slow one being a little faster ? | 15:41 | ||
not even that | |||
jnthn | OK, was curious how much that would be part of it | ||
I suspect it's the extra allocations | |||
lizmat | are there other side effects to nqp::shift? | ||
jnthn | Well, the point of shift is to have an effect :) | 15:42 | |
lizmat | I found one case in nqp where an iterator is used to iterate over a list just for the number of elements in the list | ||
so not actually using the nqp::shift($iter) value | |||
jnthn | Huh, in a place it could just use nqp::elems? | ||
lizmat | yes | ||
jnthn | oops | ||
lizmat | vm/moar/QAST/QASTRegexCompilerMAST.nqp line 303 | 15:43 | |
if nqp::iterator / nqp::shift would be faster than manual indexing, a lot of Rakudo internals could also benefit from that, fwiw | 15:47 | ||
also: changing the nqp::list to a nqp::list_i, makes it worse | 15:48 | ||
aah... oops | 15:49 | ||
jnthn | I wondered how much it could be GC overhead of allocating iterator objects, but it's not | ||
lizmat | hmmm... looks like it does make things way worse | 15:50 | |
jnthn | So yeah, certainly room for improvement | ||
lizmat | $ time nqp -e 'my $l := nqp::list_i(1,2,3,4,5,6,7,8,9,10); my int $j := 10000000; while $j-- { my $iter := nqp::iterator($l); nqp::while($iter, my int $a := nqp::shift($iter)) }' # 2.830 | ||
jnthn | Yes, because shift returns an object | ||
So it's doing a box/unbox every element | |||
lizmat | ah... | ||
jnthn | Anyway, given it's not GC overhead, then it's the shift/boolification that wants a look | 15:51 | |
lizmat | looks like... | 15:52 | |
would be nice if an nqp::iterator / nqp::shift combo would be faster | |||
jnthn | Probably can be | 15:53 | |
lizmat | should I tackle that one case where nqp::shift() is not being used ? | ||
jnthn | Yeah, go for it | ||
lizmat | will do | ||
github.com/Raku/nqp/commit/829f1d42f9 | 15:57 | ||
nine | lizmat: using nqp::shift_i instead of nqp::shift in your example is 60 % faster | 15:59 | |
lizmat | ahhh.... so feels like effectively, nqp::iterator($list) is about the same as nqp::clone($list) ? | 16:00 | |
jnthn | No, it just creates an object with an index and a pointer to the list | 16:01 | |
lizmat | fwiw: I was working on documenting nqp ops, and found nqp::iterator listed under list ops, rather than hash ops | 16:03 | |
jnthn | It's both, I guess | ||
lizmat | yes, and then I remembered why I wasn't using it for lists | ||
because it was slower | |||
I hadn't realized how much slower | 16:04 | ||
jnthn | But still being used enough to make it worth speeding up? | 16:05 | |
lizmat | afaik, nqp::iterator is *not* used in the core on lists because it was slower | ||
however, that means that a lot of the NQP code is doing manual indexing, which is more error prone from a maintenance point of view | 16:06 | ||
basically anything that runs over an IterationBuffer or a $!reified in Rakudo | |||
nine | FWIW I don't see anything that's obviously slow about nqp::iterator | 16:07 | |
lizmat | and there's quite a lot of that | ||
but shouldn't we look at nqp::shift() ? | |||
nine | Oh, now I do | ||
nqp::atpos can be devirtualized by the JIT. nqp::shift on an nqp::iterator however always goes through REPR(target)->pos_funcs.at_pos | 16:08 | ||
So the shift on the iterator is devirtualized, but not the following at_pos | |||
jnthn | That plus the integer addition and comparison are JIT straight into assembly code, whereas the boolification is maybe still a C function call | 16:09 | |
nine | So keeping track of the iteration position and using nqp::atpos in HLL is actually a perfect example of how using smaller, less powerful operations leads to better optimization opportunities for the VM | ||
jnthn | That plus another example of how things change when you have a JIT rather than interpret everything | 16:10 | |
lizmat | so, would that be easily fixable? | 16:11 | |
nine | And spesh which takes more credit for devirtualization | ||
lizmat | or shall we deprecate support for nqp::iterator(List) | ||
nine | It's certainly possible to extend spesh and the JIT to do deep devirtualization and get rid of the boolification slow down. But then just not using nqp::iterator would get us to the same place much easier. | 16:13 | |
lizmat | yeah, feels like a lot of work to get to a point we already are in most cases | 16:14 | |
otoh, those where the explicit cases of using nqp::iterator | |||
when we say "for @array { }" would that not codegen to a nqp::iterator thing? | |||
jnthn | In NQP, yes | 16:15 | |
lizmat | that happens a lot more in NQP | ||
jnthn | I wonder how we can move that to an iterator object in NQP code | ||
Should be quite possible | 16:16 | ||
And then rely on inlining to make it cheaper | |||
Should probably check that it really does come out just as well | |||
lizmat | not quite following what should be quite possible | ||
nine | To implement an iterator object in pure NQP | 16:17 | |
jnthn | Replacing the use of nqp::iterator in NQP for array iterations | ||
So we can drop VM-levels support for nqp::iterator(List) | 16:18 | ||
*level | |||
lizmat | class ListIterator { has $!list; has int $index = -1; method shift() ... } ? | 16:19 | |
nine | That + has $limit = nqp::elems($!list); | 16:21 | |
lizmat | I seem to recall that there is no point in doing that, as the nqp::elems() gets optimized pretty quickly | ||
nine | It's necessary for keeping the same semantics though. | 16:23 | |
Now if we even want to keep those semantics is another question | |||
lizmat | huh? why would that be needed for keeing the same semantiics ? | 16:24 | |
*keeping | |||
MasterDuke | fwiw, it looks like there (at least) a couple cases of `nqp::iterator(@...)` in the rakudo core | 16:25 | |
lizmat | MasterDuke: there are? | ||
hmmm. | |||
nine | It's a difference when the array gets changed during the loop (push or pop) | ||
lizmat | aah... ok, and nqp::shift() doesn't follow that currently? | ||
then yes | |||
MasterDuke | looks like 9 where it's explicitly an '@'-sigiled variable, and a couple more where it's probably a list even if not '@' | 16:30 | |
lizmat | yeah... looking at them now | 16:32 | |
17:00
rypervenche left
17:03
rypervenche joined
17:06
Altai-man joined
17:08
sena_kun left
17:35
patrickb joined
18:01
domidumont left
18:58
zakharyas left
20:49
zakharyas joined
21:07
sena_kun joined
21:09
Altai-man left
21:26
patrickb left
|
|||
Geth | MoarVM: ae5f7ad447 | (Elizabeth Mattijsen)++ | 8 files Update some docs to Raku era Unless they're specifically historically inclined. |
21:46 | |
MoarVM: a595d9ddc4 | (Jonathan Worthington)++ (committed using GitHub Web editor) | 8 files Merge pull request #1394 from MoarVM/update-docs Update some docs to Raku era |
|||
21:55
zakharyas left,
sena_kun left
21:56
sena_kun joined
22:24
sena_kun left,
sena_kun joined
22:35
sena_kun left
|