github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:01
lizmat joined,
p6bannerbot sets mode: +v lizmat
00:06
lizmat left
00:09
dogbert11 joined
00:10
p6bannerbot sets mode: +v dogbert11
00:13
dogbert17 left
01:14
fake_space_whale joined,
p6bannerbot sets mode: +v fake_space_whale
01:21
MasterDuke left
01:40
ZzZombo left
02:35
ZzZombo joined,
p6bannerbot sets mode: +v ZzZombo,
ZzZombo left
06:04
domidumont joined
06:05
p6bannerbot sets mode: +v domidumont
06:14
lizmat joined,
p6bannerbot sets mode: +v lizmat
06:16
patrickb joined,
p6bannerbot sets mode: +v patrickb
06:20
fake_space_whale left
06:49
lizmat left
07:01
lizmat joined,
p6bannerbot sets mode: +v lizmat
07:06
lizmat left
07:21
robertle joined
07:22
p6bannerbot sets mode: +v robertle
07:45
domidumont left
07:48
domidumont joined,
p6bannerbot sets mode: +v domidumont
07:56
lizmat joined,
p6bannerbot sets mode: +v lizmat
09:37
ZofBot left,
huggable left,
p6bannerbot left,
buggable left
10:40
ZzZombo joined
10:42
ZzZombo_ joined,
ZzZombo_ left,
ZzZombo_ joined
10:46
ZzZombo left,
ZzZombo_ is now known as ZzZombo
10:54
robertle left
11:22
p6bannerbot joined,
ChanServ sets mode: +o p6bannerbot,
ZofBot joined,
p6bannerbot sets mode: +v ZofBot,
huggable joined,
buggable joined
11:23
p6bannerbot sets mode: +v huggable,
p6bannerbot sets mode: +v buggable
11:43
Kaiepi left
11:44
Kaiepi joined
11:45
p6bannerbot sets mode: +v Kaiepi
12:49
scovit left
12:55
scovit joined,
p6bannerbot sets mode: +v scovit
12:57
brrt joined
12:58
p6bannerbot sets mode: +v brrt
|
|||
brrt | \o | 12:58 | |
jnthn | o/ brrt | 12:59 | |
brrt | ohai jnthn | ||
I find that I'm not sure how pass-by-reference works in nativecall | 13:00 | ||
i would have expected that we'd pass a pointer to the MVMRegister in args | 13:01 | ||
but that doesn't appear to be how it works | |||
jnthn | I don't know, alas | 13:24 | |
nine++ probably does | |||
13:25
AlexDaniel left
13:26
AlexDaniel joined,
p6bannerbot sets mode: +v AlexDaniel
13:35
scovit left
|
|||
Geth | MoarVM/vectorization: 8 commits pushed by (Timo Paulssen)++
|
14:08 | |
timotimo | lizmat: ^- here's the op i was talking about | 14:09 | |
lizmat | timotimo: does it come with documentation in ops.markdown ? | 14:10 | |
timotimo | not yet | ||
my num @a = 1e0..500_000e0; my num @b = 500_000e0...1e0; my num $c = 5e0; my num @out; my $time = now; for ^500_000 { @out[$_] = @a[$_] + @b[$_] * $c; }; say now - $time; say @out[99] | 14:11 | ||
evalable6 | 0.34655233 2499605 |
||
timotimo | use nqp; my num @a = 1e0..500_000e0; my num @b = 500_000e0...1e0; my num @c = 5e0; my num @out; my $time = now; nqp::vectorapply(@b, @c, @b, 95, 1, 64); nqp::vectorapply(@a, @b, @out, 93, 0, 64); say now - $time; say @out[99] | ||
those are roughly equivalent | 14:12 | ||
because 95 is mul_n and 93 is add_n | |||
one of them is a cross operator, the one with a 1 in between, the other is a zip operator, the one with a 0 in between | |||
lizmat | that looks pretty cool | 14:13 | |
timotimo | what i'd like you to have a look at is: | ||
make @out = @a Z+ @b X* $c turn into vectorapply calls | |||
they currently only work for 64bit wide arrays of int and num, and if it's a cross operator the smaller one has to be a native array, too, of the right kind and size, with only one element | 14:14 | ||
lizmat | intriguing! :-) looks very cool | ||
timotimo | \o/ | 14:15 | |
lizmat | fwiw, I was first going to take a stab at documenting the new MAIN interface and write tests for it | ||
timotimo | sure! | ||
no hurry :) | |||
lizmat | and then I was planning to have a look at R#2360, attempting to fix nqp::p6store | ||
synopsebot | R#2360 [open]: github.com/rakudo/rakudo/issues/2360 my %*FOO is Set = <a b c> dies | ||
timotimo | the vectorapply version of that code can run 300 times and still finish a tiny bit faster than the for ^500_000 version | 14:17 | |
lizmat | and before all of that, first some sun / wind / cycling& | ||
timotimo: so you're saying that's potentially 300x as fast ? | |||
timotimo | maybe i'll figure out soon-ish why it's even faster to have $c replaced with a 500_000 element @c array and using @c[$_] as well | 14:18 | |
yeah, and potentially about 1.5kx faster than using Z+ and X* | 14:19 | ||
mhhh, my num @a = 1e0..500_000e0; takes about no time at all, but my num @a = 500_000e0...1e0; takes about 10 seconds; we recently optimized special cases of ... for for loops, surely we can put that into the push_all for the ... iterator, too :) | 14:24 | ||
14:54
fake_space_whale joined
14:55
p6bannerbot sets mode: +v fake_space_whale
15:21
domidumont left
15:22
tadzik left,
tadzik joined
15:23
p6bannerbot sets mode: +v tadzik
15:27
brrt left
15:36
lizmat left
16:02
lizmat joined,
p6bannerbot sets mode: +v lizmat
16:06
lizmat left
16:26
patrickb left
16:33
shareable6 left,
reportable6 left,
committable6 left,
quotable6 left,
squashable6 left,
reportable6 joined,
shareable6 joined,
committable6 joined,
quotable6 joined,
squashable6 joined,
evalable6 left,
bisectable6 left,
evalable6 joined,
bisectable6 joined
16:34
p6bannerbot sets mode: +v reportable6,
p6bannerbot sets mode: +v shareable6,
p6bannerbot sets mode: +v committable6,
p6bannerbot sets mode: +v quotable6,
p6bannerbot sets mode: +v squashable6,
p6bannerbot sets mode: +v evalable6,
p6bannerbot sets mode: +v bisectable6
16:36
lizmat joined,
p6bannerbot sets mode: +v lizmat
16:39
releasable6 left,
notable6 left,
greppable6 left,
releasable6 joined,
notable6 joined,
greppable6 joined
16:40
p6bannerbot sets mode: +v releasable6,
p6bannerbot sets mode: +v notable6,
p6bannerbot sets mode: +v greppable6
16:42
unicodable6 left,
unicodable6 joined
16:43
p6bannerbot sets mode: +v unicodable6
16:49
ankitkk left,
ankitkk joined
16:50
p6bannerbot sets mode: +v ankitkk
16:51
brrt joined
16:52
p6bannerbot sets mode: +v brrt
17:00
robertle joined
17:01
p6bannerbot sets mode: +v robertle
17:03
domidumont joined,
p6bannerbot sets mode: +v domidumont
|
|||
brrt | timotimo++ pretty cool work | 17:08 | |
17:10
fake_space_whale left,
domidumont left
|
|||
lizmat | timotimo: afaik, ... is still a gather / take combo | 17:12 | |
17:23
fake_space_whale joined
17:24
p6bannerbot sets mode: +v fake_space_whale
|
|||
nine | brrt: but....that should be exactly how it works? | 17:32 | |
brrt: that's also why I added a getarg op for reading the value back from the args buffer | |||
brrt | oh, really | 17:37 | |
..... so, I don't have to add a 'copy-back-to-frame' for rw arguments | 17:38 | ||
that's good news | |||
that simplifies things tremendously | |||
nine++ | 17:39 | ||
nine | My initial implementation just read the value from the local with lots of assumptions about which local that might be. But that was a tiny bit too fragile ;) | 17:44 | |
timotimo | lizmat: OK! | 17:46 | |
brrt | yeah, i can imagine :-) | 17:47 | |
timotimo | so i'm using nine's example profile data again, and the "paths" data for one function that appears in 522 call sites was a proud ~12 megabytes, which my program took about one and a half minutes to put together into a json blob | ||
with a whole lot of memory usage | 17:48 | ||
i.e. when i tried it earlier, it tried to dump core because it reached the maximum my ram had to offer | |||
17:48
evalable6 left
|
|||
timotimo | that's not quite acceptable %) | 17:48 | |
17:48
evalable6 joined
17:49
shareable6 left
|
|||
timotimo | also, it'll be interesting to build the flame graph data when there's theoretically hundreds of megabytes of data in there | 17:49 | |
17:49
p6bannerbot sets mode: +v evalable6
|
|||
timotimo | brrt: you think the vectorization branch is an acceptable way forward? it's surely not optimal, but it's certainly faster than what our zip/cross ops currently can do | 17:51 | |
brrt | I have totally not reviewed it | 17:52 | |
timotimo | it's probably more efficient to try to do all operations on each little bunch of data? | ||
rather than going through all data with one operation, then through all data with another | |||
and it's surely wasteful to require intermediate arrays to be made | 17:53 | ||
brrt | hmmmm | 17:54 | |
timotimo | though if every operation only goes from two arrays to one, i'd assume most of the time you can have at most one temporary array? | ||
brrt | in honesty you may have exceeded my expertise :-) | ||
timotimo | haha | ||
i have no expertise either, that's why i just let the C compiler do 100% of the work | |||
brrt | scarily, I'm getting good at writing adhoc jit templates | 17:57 | |
not the most portable of skills.. | |||
dogbert11 | brrt: do you have any theories as to why some spectest files fails when run with MVM_JIT_EXPR_DISABLE=1 ? | ||
brrt | dogbert11: nope, can you point me to the right ones? | 17:58 | |
dogbert11 | brrt; try running - MVM_JIT_EXPR_DISABLE=1 ./perl6 t/spec/S05-mass/properties-block.t | ||
brrt | huh, that's funny | 17:59 | |
dogbert11 | I thought so too. quite strange | ||
brrt | goes away with MVM_JIT_DISABLE=1 | 18:00 | |
okay, I can probably figure that out | |||
I'll put it somewhere on my todo list | |||
dogbert11 | ++brrt | ||
brrt | .oO( we need an inverse jit bisect ) |
18:01 | |
I need to fixup jit bisect anyway ... | |||
anyway, I'll have to do all that later, afk for now :-) | 18:02 | ||
18:03
brrt left
|
|||
timotimo | oh, the cro process is still at like 3.9 gigs RSS | 18:05 | |
japhb | yikes | 18:27 | |
timotimo | oh lord, this can't be right | 18:29 | |
the json was being created with :pretty | 18:30 | ||
that's pretty bad for a deeeeeeeply nested structure | |||
routine-paths in 2.7811155 | 18:31 | ||
routine-paths json in 2.95350603: 263873 characters | |||
^- with :!pretty | |||
routine-paths in 2.910559 | |||
routine-paths json in 120.8043488: 13517443 characters | |||
^- with :pretty | |||
japhb | timotimo: When you're doing really serious vector/matrix/tensor operations, beyond a certain point runtime will be utterly dominated by memory hierarchy effects. Chunking large arrays so that all operations on a given set of data fit in fast caches makes a huge difference (consider e.g. multiplying a pair of 8k x 8k matrices). | ||
timotimo | japhb: sadly, that means much more work :) | 18:32 | |
japhb | timotimo: Actually ... maybe not. It may be that if you want to do that sort of thing, we instead automate using one of the fast linear algebra libraries. | ||
timotimo | true | 18:33 | |
japhb | Don't get me wrong, I think your current research is very useful. I was just answering your question earlier about vectorization of large volumes of data. | ||
timotimo | alternatively, maybe the liboil compiler would actually be nice to put into moar | 18:34 | |
yeah, i think i got you right :) | |||
TBF with the stuff i've implemented so far, i don't think matrix multiplication is particularly possible to implement | 18:35 | ||
japhb | timotimo: Have you looked at PDL from the Perl 5 world? | 18:36 | |
timotimo | i have not | ||
japhb | It's interesting just from the point of view of the things it makes easy, and the magic it does behind the scenes to make that fast-ish. | 18:37 | |
timotimo | i've looked a little into numpy | 18:38 | |
japhb | But it was not trying to do true CPU vectorization, rather just able to pump large multidim arrays into optimized C routines | ||
timotimo | scipy has a thing that lets you write C++ code using some c++ library that does multidim arrays that you can slice every which way | ||
japhb | It could not, for example, hold a candle to the real C/C++ fast linear algebra stuff. Still, it beat the blazes off doing things element-wise. | ||
timotimo | last time i looked it was barely documented, barely hackable if you want very specific behaviour of the compiler, and apparently hadn't been touched in a couple of years | 18:39 | |
got a tree with 162338 nodes | 18:42 | ||
routine-paths in 272.2697089 | |||
oh jeez here we go | |||
in comparison, the stuff i pasted above had "got a tree with 5755 nodes" | 18:43 | ||
routine-paths json in 90.608918: 7433700 characters | 18:44 | ||
japhb | m: say 162338 / 5755, 272.2697089 / 2.7811155 | ||
camelia | 28.20816797.899461169 | ||
japhb | m: say 162338 / 5755, ' ', 272.2697089 / 2.7811155 | ||
camelia | 28.208167 97.899461169 | ||
timotimo | now chrome is chugging along on the json and the react component tree | ||
japhb | Hmmm, some nonlinear effects there, but at least not O(n**2) | ||
18:45
shareable6 joined
|
|||
timotimo | aye, you must imagine the call graph and we've got a set of leaf nodes | 18:45 | |
japhb | Are you sorting the keys? Looks like there might be an N log N effect | ||
(Just staring at the ratios) | 18:46 | ||
timotimo | and the code goes via the parent ids towards the known roots | ||
japhb | Ah, yeah, that would do it | ||
18:46
p6bannerbot sets mode: +v shareable6
|
|||
timotimo | i should be able to construct an sql query that picks every "current node"'s parent rather than going node-by-node | 18:46 | |
19:07
Kaiepi left,
Kaiepi joined
19:08
p6bannerbot sets mode: +v Kaiepi
|
|||
diakopter | heh portable | 19:58 | |
20:26
Kaiepi left
20:36
Kaiepi joined,
p6bannerbot sets mode: +v Kaiepi
20:42
Kaiepi left,
Kaiepi joined
20:43
p6bannerbot sets mode: +v Kaiepi
21:49
squashable6 left
21:50
squashable6 joined,
p6bannerbot sets mode: +v squashable6
21:53
squashable6 left,
squashable6 joined
21:54
p6bannerbot sets mode: +v squashable6
|
|||
timotimo | i'm not sure where to stop adding "vectorized" stuff. like, i think coercing an array of int to an array of num and vice versa seems very useful to have | 23:29 | |
but coercing int or num to str ... useful for sure, but not appropriate for the vectorapply op, i don't think | 23:30 | ||
23:54
greppable6 left,
greppable6 joined
23:55
p6bannerbot sets mode: +v greppable6
|