00:06 vendethiel joined
timotimo by using nqp::bindpos directly in the code, i get the true potential of this code 00:33
which is a median 48 FPS with a 95th percentile of 50.9 fps (and a 5th percentile of 34.3 fps) 00:34
and only two GC runs during the whole thing, even though i'm filling an array with frame timings
00:46 vendethiel joined
timotimo by turning Bool.roll into rand_n(1e0) < 0.5e0 i get between 158 and 83 fps 00:50
(though 83 is the 5th percentile, the 25th percentile and 75th percentile are 141.5 and 154 fps respectively)
diakopter what is fps in this case 01:01
timotimo frames per second of building a 320x240px big image of black and white pixels
in this case i'm using a pixelmode that has 8 bits per pixel, 332 bits for RGB respectively
diakopter ah 01:02
timotimo now that i don't have the p6box_i problem any more (because i uglified the code) i can try RGB888 again 01:05
that gives me a peak framerate of 173 and median 159 01:06
that doesn't seem right :D
but what do i know, perhaps it's faster because the hardware likes that format much more? but the functions that'd be responsible for moving that texture data around didn't even show up in my profiles 01:07
diakopter I don't see how it wouldn't be right 01:09
doesn't seem too fast to me
timotimo it's not fast, no
there's still lots and lots of overhead 01:10
diakopter I mean, thousands of frames per second might be too fast
timotimo yeah
13.18% "self" time in get_int says perf 01:13
that seems to be the candidate from P6opaque maybe? i can't really tell from the disassembly alone, as for some reason it doesn't seem to show the source file name 01:14
yeah, looks like it is 01:15
7.8% self-time spent in generating more numbers from our tiny mersenne twister random number generator
5.5% self-time spent in sc_get_object. interesting.
3.4% and 3.2% in MV_sc_get_sc_object and MVM_sc_get_sc 01:16
the first time i see something from i965_dri.so (intel graphics card driver) show up is at 0.25% children and self-time 01:17
i'm heading towards bed 01:19
01:50 vendethiel joined 04:14 vendethiel joined 06:08 vendethiel joined 07:30 vendethiel- joined 07:45 domidumont joined 07:47 FROGGS joined 07:51 domidumont joined 08:06 zakharyas joined
lizmat timotimo: I'm surprised at that difference with Bool.roll, as that is defined as: 08:35
multi method roll(Bool:U:) { nqp::p6bool(nqp::isge_n(nqp::rand_n(2e0), 1e0))
looks like to me that should get inlined, and then only the p6bool would be extra ?
which I was told, was a very cheap operation? (apparently not?)
09:12 vendethiel joined 09:30 zakharyas joined
jnthn p6bool is; will have to look at exactly what code we're producing after optimization there... 10:02
10:11 vendethiel joined
timotimo oh. roll isn't being jitted, nor inlined. it *could* be the jit doesn't know about it yet, eh? 12:54
jnthn Is it being specialized? 13:00
timotimo the frame? yes 13:04
jnthn k
Maybe we don't JIT rand_n yet?
timotimo p6bool could probably be specialized into a branch with two wval or getspeshslot calls at the end of 'em 13:05
we do jit randscale_n, which is what rand_n turns into, iiuc 13:06
jnthn ah, k 13:07
Guess we'll have to look at the baillog then :)
timotimo huh. i see a method roll be jitted here 13:09
let's see ... 13:10
it shows up as explicitly spesh-entered in the routines tab, though
could be the same kind of failure we're having related to counting the int objects that get boxed by returning from postcircumfix:<[ ]> or ASSIGN-POS 13:11
perhaps multiple consecutive inlines break something in our pipeline
i randomly saw atposref_i doesn't get jitted; that could have some effect on the performance of lowercase-int arrays 13:13
ASSIGN-POS 13:15
perl#sources/51E302443A2C8FF185ABC10CA1E5520EFEE885A1 (NativeCall::Types):83
^- this. i like this.
/home/timo/perl6/ecosystem/SDL2_raw-p6/lib/.precomp/73D4D4C93FE31C1CCF2B20A1F7D157ECB4E85699.1456073030.9528/B2/B2B2B60DC07CD98A96F95F0EDE4CD9790DD4C56D:<unknown> <- this ... not so much 13:16
BinGOs is 'compiling src/core/interp.o' supposed to take a long time? 14:07
FROGGS BinGOs: with clang, yes 14:08
takes seconds with gcc, but potentially minutes with clang
timotimo yeah, clang is obnoxiously slow at that one
FROGGS (it is a huuuuuuge switch statement) 14:09
jnthn Well, huge computed goto in clang/gcc also
BinGOs 23409 bingos 1 103 0 98708K 93252K CPU2 2 5:21 100.00% clang
FROGGS yeah
BinGOs sighs.
jnthn optimizers gone wild
BinGOs okay I shall exercise patience.
geekosaur there's a file in the clang/llvm source code that translates ARM assembler opcodes. it can take hours to compile with clang 14:10
BinGOs I have another 3 CPUs
14:27 colomon joined
BinGOs This is Rakudo version 2016.02-5-g96a1954 built on MoarVM version 2016.02 15:12
yay.
lizmat :-) 15:13
15:27 vendethiel joined 16:59 colomon joined 17:29 vendethiel joined 18:22 patrickz joined 19:22 FROGGS joined 19:34 colomon joined 20:14 domidumont joined 20:59 TimToady joined 21:10 colomon joined
timotimo what's the major difficulty i'm likely not seeing for moving freeing of stuff into a separate thread? 22:20
jnthn Well, the icky races mostly
What if it's still freeing when the mutator triggers another collection, etc. 22:21
timotimo mhm
what kind of datastructure do we have that'd be good for passing the data around? just an array with a semaphor? perhaps one inbox-array per thread that's under the control of the freeing thread?
jnthn We don't need to pass data around really...
Well, hm 22:22
It's a bit tricky :)
timotimo there needs to be some kind of command queue or something
jnthn Sorta
I was thinking of running a general "service" thread that can do this, but also spesh
Need to think about it when I've not been busy thinking about $dayjob stuff :)
timotimo ah, right, right 22:23
maybe it's not something i should try to tackle without some design work from you up front
thoughts about a heap explorer and potential ways to implement it have been terrorizing me when i tried to sleep myself to health during the day 22:24
i was thinking that perhaps the explorer might want to be a separate process. with ptrace on linux it could attach to the moarvm process and then read /proc/$pid/mem with seek and read to read out data 22:27
but that's hardly cross-platform
OTOH, if i have to stop the whole thing anyway, might as well have something inside moar's process (dynamically loaded comes to mind) and dig around in its own memory, because that's bound to be quite a bit faster than going through syscalls 22:38
23:01 BinGOs joined, timotimo joined, harrow joined, ashleydev joined, lnx joined, geekosaur joined, camelia joined, [Coke] joined, moritz joined, mst joined, cognominal joined, xiaomiao joined, leedo joined, nebuchadnezzar joined, pochi joined, ilmari joined, jnthn joined, nwc10 joined, retupmoca joined, dalek joined, _longines joined, synopsebot6 joined, hoelzro joined, flussence joined, sivoais joined, avar joined, FROGGS joined 23:03 pyrimidine joined, orbus joined, mtj_ joined, khagan joined, colomon joined 23:06 patrickz joined, ggoebel16 joined, lizmat joined, nine joined, ChanServ joined 23:07 lizmat_ joined 23:09 vendethiel joined