github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
00:01 lizmat joined, p6bannerbot sets mode: +v lizmat 00:06 lizmat left 00:09 dogbert11 joined 00:10 p6bannerbot sets mode: +v dogbert11 00:13 dogbert17 left 01:14 fake_space_whale joined, p6bannerbot sets mode: +v fake_space_whale 01:21 MasterDuke left 01:40 ZzZombo left 02:35 ZzZombo joined, p6bannerbot sets mode: +v ZzZombo, ZzZombo left 06:04 domidumont joined 06:05 p6bannerbot sets mode: +v domidumont 06:14 lizmat joined, p6bannerbot sets mode: +v lizmat 06:16 patrickb joined, p6bannerbot sets mode: +v patrickb 06:20 fake_space_whale left 06:49 lizmat left 07:01 lizmat joined, p6bannerbot sets mode: +v lizmat 07:06 lizmat left 07:21 robertle joined 07:22 p6bannerbot sets mode: +v robertle 07:45 domidumont left 07:48 domidumont joined, p6bannerbot sets mode: +v domidumont 07:56 lizmat joined, p6bannerbot sets mode: +v lizmat 09:37 ZofBot left, huggable left, p6bannerbot left, buggable left 10:40 ZzZombo joined 10:42 ZzZombo_ joined, ZzZombo_ left, ZzZombo_ joined 10:46 ZzZombo left, ZzZombo_ is now known as ZzZombo 10:54 robertle left 11:22 p6bannerbot joined, ChanServ sets mode: +o p6bannerbot, ZofBot joined, p6bannerbot sets mode: +v ZofBot, huggable joined, buggable joined 11:23 p6bannerbot sets mode: +v huggable, p6bannerbot sets mode: +v buggable 11:43 Kaiepi left 11:44 Kaiepi joined 11:45 p6bannerbot sets mode: +v Kaiepi 12:49 scovit left 12:55 scovit joined, p6bannerbot sets mode: +v scovit 12:57 brrt joined 12:58 p6bannerbot sets mode: +v brrt
brrt \o 12:58
jnthn o/ brrt 12:59
brrt ohai jnthn
I find that I'm not sure how pass-by-reference works in nativecall 13:00
i would have expected that we'd pass a pointer to the MVMRegister in args 13:01
but that doesn't appear to be how it works
jnthn I don't know, alas 13:24
nine++ probably does
13:25 AlexDaniel left 13:26 AlexDaniel joined, p6bannerbot sets mode: +v AlexDaniel 13:35 scovit left
Geth MoarVM/vectorization: 8 commits pushed by (Timo Paulssen)++ 14:08
timotimo lizmat: ^- here's the op i was talking about 14:09
lizmat timotimo: does it come with documentation in ops.markdown ? 14:10
timotimo not yet
my num @a = 1e0..500_000e0; my num @b = 500_000e0...1e0; my num $c = 5e0; my num @out; my $time = now; for ^500_000 { @out[$_] = @a[$_] + @b[$_] * $c; }; say now - $time; say @out[99] 14:11
evalable6 0.34655233
2499605
timotimo use nqp; my num @a = 1e0..500_000e0; my num @b = 500_000e0...1e0; my num @c = 5e0; my num @out; my $time = now; nqp::vectorapply(@b, @c, @b, 95, 1, 64); nqp::vectorapply(@a, @b, @out, 93, 0, 64); say now - $time; say @out[99]
those are roughly equivalent 14:12
because 95 is mul_n and 93 is add_n
one of them is a cross operator, the one with a 1 in between, the other is a zip operator, the one with a 0 in between
lizmat that looks pretty cool 14:13
timotimo what i'd like you to have a look at is:
make @out = @a Z+ @b X* $c turn into vectorapply calls
they currently only work for 64bit wide arrays of int and num, and if it's a cross operator the smaller one has to be a native array, too, of the right kind and size, with only one element 14:14
lizmat intriguing! :-) looks very cool
timotimo \o/ 14:15
lizmat fwiw, I was first going to take a stab at documenting the new MAIN interface and write tests for it
timotimo sure!
no hurry :)
lizmat and then I was planning to have a look at R#2360, attempting to fix nqp::p6store
synopsebot R#2360 [open]: github.com/rakudo/rakudo/issues/2360 my %*FOO is Set = <a b c> dies
timotimo the vectorapply version of that code can run 300 times and still finish a tiny bit faster than the for ^500_000 version 14:17
lizmat and before all of that, first some sun / wind / cycling&
timotimo: so you're saying that's potentially 300x as fast ?
timotimo maybe i'll figure out soon-ish why it's even faster to have $c replaced with a 500_000 element @c array and using @c[$_] as well 14:18
yeah, and potentially about 1.5kx faster than using Z+ and X* 14:19
mhhh, my num @a = 1e0..500_000e0; takes about no time at all, but my num @a = 500_000e0...1e0; takes about 10 seconds; we recently optimized special cases of ... for for loops, surely we can put that into the push_all for the ... iterator, too :) 14:24
14:54 fake_space_whale joined 14:55 p6bannerbot sets mode: +v fake_space_whale 15:21 domidumont left 15:22 tadzik left, tadzik joined 15:23 p6bannerbot sets mode: +v tadzik 15:27 brrt left 15:36 lizmat left 16:02 lizmat joined, p6bannerbot sets mode: +v lizmat 16:06 lizmat left 16:26 patrickb left 16:33 shareable6 left, reportable6 left, committable6 left, quotable6 left, squashable6 left, reportable6 joined, shareable6 joined, committable6 joined, quotable6 joined, squashable6 joined, evalable6 left, bisectable6 left, evalable6 joined, bisectable6 joined 16:34 p6bannerbot sets mode: +v reportable6, p6bannerbot sets mode: +v shareable6, p6bannerbot sets mode: +v committable6, p6bannerbot sets mode: +v quotable6, p6bannerbot sets mode: +v squashable6, p6bannerbot sets mode: +v evalable6, p6bannerbot sets mode: +v bisectable6 16:36 lizmat joined, p6bannerbot sets mode: +v lizmat 16:39 releasable6 left, notable6 left, greppable6 left, releasable6 joined, notable6 joined, greppable6 joined 16:40 p6bannerbot sets mode: +v releasable6, p6bannerbot sets mode: +v notable6, p6bannerbot sets mode: +v greppable6 16:42 unicodable6 left, unicodable6 joined 16:43 p6bannerbot sets mode: +v unicodable6 16:49 ankitkk left, ankitkk joined 16:50 p6bannerbot sets mode: +v ankitkk 16:51 brrt joined 16:52 p6bannerbot sets mode: +v brrt 17:00 robertle joined 17:01 p6bannerbot sets mode: +v robertle 17:03 domidumont joined, p6bannerbot sets mode: +v domidumont
brrt timotimo++ pretty cool work 17:08
17:10 fake_space_whale left, domidumont left
lizmat timotimo: afaik, ... is still a gather / take combo 17:12
17:23 fake_space_whale joined 17:24 p6bannerbot sets mode: +v fake_space_whale
nine brrt: but....that should be exactly how it works? 17:32
brrt: that's also why I added a getarg op for reading the value back from the args buffer
brrt oh, really 17:37
..... so, I don't have to add a 'copy-back-to-frame' for rw arguments 17:38
that's good news
that simplifies things tremendously
nine++ 17:39
nine My initial implementation just read the value from the local with lots of assumptions about which local that might be. But that was a tiny bit too fragile ;) 17:44
timotimo lizmat: OK! 17:46
brrt yeah, i can imagine :-) 17:47
timotimo so i'm using nine's example profile data again, and the "paths" data for one function that appears in 522 call sites was a proud ~12 megabytes, which my program took about one and a half minutes to put together into a json blob
with a whole lot of memory usage 17:48
i.e. when i tried it earlier, it tried to dump core because it reached the maximum my ram had to offer
17:48 evalable6 left
timotimo that's not quite acceptable %) 17:48
17:48 evalable6 joined 17:49 shareable6 left
timotimo also, it'll be interesting to build the flame graph data when there's theoretically hundreds of megabytes of data in there 17:49
17:49 p6bannerbot sets mode: +v evalable6
timotimo brrt: you think the vectorization branch is an acceptable way forward? it's surely not optimal, but it's certainly faster than what our zip/cross ops currently can do 17:51
brrt I have totally not reviewed it 17:52
timotimo it's probably more efficient to try to do all operations on each little bunch of data?
rather than going through all data with one operation, then through all data with another
and it's surely wasteful to require intermediate arrays to be made 17:53
brrt hmmmm 17:54
timotimo though if every operation only goes from two arrays to one, i'd assume most of the time you can have at most one temporary array?
brrt in honesty you may have exceeded my expertise :-)
timotimo haha
i have no expertise either, that's why i just let the C compiler do 100% of the work
brrt scarily, I'm getting good at writing adhoc jit templates 17:57
not the most portable of skills..
dogbert11 brrt: do you have any theories as to why some spectest files fails when run with MVM_JIT_EXPR_DISABLE=1 ?
brrt dogbert11: nope, can you point me to the right ones? 17:58
dogbert11 brrt; try running - MVM_JIT_EXPR_DISABLE=1 ./perl6 t/spec/S05-mass/properties-block.t
brrt huh, that's funny 17:59
dogbert11 I thought so too. quite strange
brrt goes away with MVM_JIT_DISABLE=1 18:00
okay, I can probably figure that out
I'll put it somewhere on my todo list
dogbert11 ++brrt
brrt
.oO( we need an inverse jit bisect )
18:01
I need to fixup jit bisect anyway ...
anyway, I'll have to do all that later, afk for now :-) 18:02
18:03 brrt left
timotimo oh, the cro process is still at like 3.9 gigs RSS 18:05
japhb yikes 18:27
timotimo oh lord, this can't be right 18:29
the json was being created with :pretty 18:30
that's pretty bad for a deeeeeeeply nested structure
routine-paths in 2.7811155 18:31
routine-paths json in 2.95350603: 263873 characters
^- with :!pretty
routine-paths in 2.910559
routine-paths json in 120.8043488: 13517443 characters
^- with :pretty
japhb timotimo: When you're doing really serious vector/matrix/tensor operations, beyond a certain point runtime will be utterly dominated by memory hierarchy effects. Chunking large arrays so that all operations on a given set of data fit in fast caches makes a huge difference (consider e.g. multiplying a pair of 8k x 8k matrices).
timotimo japhb: sadly, that means much more work :) 18:32
japhb timotimo: Actually ... maybe not. It may be that if you want to do that sort of thing, we instead automate using one of the fast linear algebra libraries.
timotimo true 18:33
japhb Don't get me wrong, I think your current research is very useful. I was just answering your question earlier about vectorization of large volumes of data.
timotimo alternatively, maybe the liboil compiler would actually be nice to put into moar 18:34
yeah, i think i got you right :)
TBF with the stuff i've implemented so far, i don't think matrix multiplication is particularly possible to implement 18:35
japhb timotimo: Have you looked at PDL from the Perl 5 world? 18:36
timotimo i have not
japhb It's interesting just from the point of view of the things it makes easy, and the magic it does behind the scenes to make that fast-ish. 18:37
timotimo i've looked a little into numpy 18:38
japhb But it was not trying to do true CPU vectorization, rather just able to pump large multidim arrays into optimized C routines
timotimo scipy has a thing that lets you write C++ code using some c++ library that does multidim arrays that you can slice every which way
japhb It could not, for example, hold a candle to the real C/C++ fast linear algebra stuff. Still, it beat the blazes off doing things element-wise.
timotimo last time i looked it was barely documented, barely hackable if you want very specific behaviour of the compiler, and apparently hadn't been touched in a couple of years 18:39
got a tree with 162338 nodes 18:42
routine-paths in 272.2697089
oh jeez here we go
in comparison, the stuff i pasted above had "got a tree with 5755 nodes" 18:43
routine-paths json in 90.608918: 7433700 characters 18:44
japhb m: say 162338 / 5755, 272.2697089 / 2.7811155
camelia 28.20816797.899461169
japhb m: say 162338 / 5755, ' ', 272.2697089 / 2.7811155
camelia 28.208167 97.899461169
timotimo now chrome is chugging along on the json and the react component tree
japhb Hmmm, some nonlinear effects there, but at least not O(n**2)
18:45 shareable6 joined
timotimo aye, you must imagine the call graph and we've got a set of leaf nodes 18:45
japhb Are you sorting the keys? Looks like there might be an N log N effect
(Just staring at the ratios) 18:46
timotimo and the code goes via the parent ids towards the known roots
japhb Ah, yeah, that would do it
18:46 p6bannerbot sets mode: +v shareable6
timotimo i should be able to construct an sql query that picks every "current node"'s parent rather than going node-by-node 18:46
19:07 Kaiepi left, Kaiepi joined 19:08 p6bannerbot sets mode: +v Kaiepi
diakopter heh portable 19:58
20:26 Kaiepi left 20:36 Kaiepi joined, p6bannerbot sets mode: +v Kaiepi 20:42 Kaiepi left, Kaiepi joined 20:43 p6bannerbot sets mode: +v Kaiepi 21:49 squashable6 left 21:50 squashable6 joined, p6bannerbot sets mode: +v squashable6 21:53 squashable6 left, squashable6 joined 21:54 p6bannerbot sets mode: +v squashable6
timotimo i'm not sure where to stop adding "vectorized" stuff. like, i think coercing an array of int to an array of num and vice versa seems very useful to have 23:29
but coercing int or num to str ... useful for sure, but not appropriate for the vectorapply op, i don't think 23:30
23:54 greppable6 left, greppable6 joined 23:55 p6bannerbot sets mode: +v greppable6