00:06
vendethiel joined
|
|||
timotimo | by using nqp::bindpos directly in the code, i get the true potential of this code | 00:33 | |
which is a median 48 FPS with a 95th percentile of 50.9 fps (and a 5th percentile of 34.3 fps) | 00:34 | ||
and only two GC runs during the whole thing, even though i'm filling an array with frame timings | |||
00:46
vendethiel joined
|
|||
timotimo | by turning Bool.roll into rand_n(1e0) < 0.5e0 i get between 158 and 83 fps | 00:50 | |
(though 83 is the 5th percentile, the 25th percentile and 75th percentile are 141.5 and 154 fps respectively) | |||
diakopter | what is fps in this case | 01:01 | |
timotimo | frames per second of building a 320x240px big image of black and white pixels | ||
in this case i'm using a pixelmode that has 8 bits per pixel, 332 bits for RGB respectively | |||
diakopter | ah | 01:02 | |
timotimo | now that i don't have the p6box_i problem any more (because i uglified the code) i can try RGB888 again | 01:05 | |
that gives me a peak framerate of 173 and median 159 | 01:06 | ||
that doesn't seem right :D | |||
but what do i know, perhaps it's faster because the hardware likes that format much more? but the functions that'd be responsible for moving that texture data around didn't even show up in my profiles | 01:07 | ||
diakopter | I don't see how it wouldn't be right | 01:09 | |
doesn't seem too fast to me | |||
timotimo | it's not fast, no | ||
there's still lots and lots of overhead | 01:10 | ||
diakopter | I mean, thousands of frames per second might be too fast | ||
timotimo | yeah | ||
13.18% "self" time in get_int says perf | 01:13 | ||
that seems to be the candidate from P6opaque maybe? i can't really tell from the disassembly alone, as for some reason it doesn't seem to show the source file name | 01:14 | ||
yeah, looks like it is | 01:15 | ||
7.8% self-time spent in generating more numbers from our tiny mersenne twister random number generator | |||
5.5% self-time spent in sc_get_object. interesting. | |||
3.4% and 3.2% in MV_sc_get_sc_object and MVM_sc_get_sc | 01:16 | ||
the first time i see something from i965_dri.so (intel graphics card driver) show up is at 0.25% children and self-time | 01:17 | ||
i'm heading towards bed | 01:19 | ||
01:50
vendethiel joined
04:14
vendethiel joined
06:08
vendethiel joined
07:30
vendethiel- joined
07:45
domidumont joined
07:47
FROGGS joined
07:51
domidumont joined
08:06
zakharyas joined
|
|||
lizmat | timotimo: I'm surprised at that difference with Bool.roll, as that is defined as: | 08:35 | |
multi method roll(Bool:U:) { nqp::p6bool(nqp::isge_n(nqp::rand_n(2e0), 1e0)) | |||
looks like to me that should get inlined, and then only the p6bool would be extra ? | |||
which I was told, was a very cheap operation? (apparently not?) | |||
09:12
vendethiel joined
09:30
zakharyas joined
|
|||
jnthn | p6bool is; will have to look at exactly what code we're producing after optimization there... | 10:02 | |
10:11
vendethiel joined
|
|||
timotimo | oh. roll isn't being jitted, nor inlined. it *could* be the jit doesn't know about it yet, eh? | 12:54 | |
jnthn | Is it being specialized? | 13:00 | |
timotimo | the frame? yes | 13:04 | |
jnthn | k | ||
Maybe we don't JIT rand_n yet? | |||
timotimo | p6bool could probably be specialized into a branch with two wval or getspeshslot calls at the end of 'em | 13:05 | |
we do jit randscale_n, which is what rand_n turns into, iiuc | 13:06 | ||
jnthn | ah, k | 13:07 | |
Guess we'll have to look at the baillog then :) | |||
timotimo | huh. i see a method roll be jitted here | 13:09 | |
let's see ... | 13:10 | ||
it shows up as explicitly spesh-entered in the routines tab, though | |||
could be the same kind of failure we're having related to counting the int objects that get boxed by returning from postcircumfix:<[ ]> or ASSIGN-POS | 13:11 | ||
perhaps multiple consecutive inlines break something in our pipeline | |||
i randomly saw atposref_i doesn't get jitted; that could have some effect on the performance of lowercase-int arrays | 13:13 | ||
ASSIGN-POS | 13:15 | ||
perl#sources/51E302443A2C8FF185ABC10CA1E5520EFEE885A1 (NativeCall::Types):83 | |||
^- this. i like this. | |||
/home/timo/perl6/ecosystem/SDL2_raw-p6/lib/.precomp/73D4D4C93FE31C1CCF2B20A1F7D157ECB4E85699.1456073030.9528/B2/B2B2B60DC07CD98A96F95F0EDE4CD9790DD4C56D:<unknown> <- this ... not so much | 13:16 | ||
BinGOs | is 'compiling src/core/interp.o' supposed to take a long time? | 14:07 | |
FROGGS | BinGOs: with clang, yes | 14:08 | |
takes seconds with gcc, but potentially minutes with clang | |||
timotimo | yeah, clang is obnoxiously slow at that one | ||
FROGGS | (it is a huuuuuuge switch statement) | 14:09 | |
jnthn | Well, huge computed goto in clang/gcc also | ||
BinGOs | 23409 bingos 1 103 0 98708K 93252K CPU2 2 5:21 100.00% clang | ||
FROGGS | yeah | ||
BinGOs sighs. | |||
jnthn | optimizers gone wild | ||
BinGOs | okay I shall exercise patience. | ||
geekosaur | there's a file in the clang/llvm source code that translates ARM assembler opcodes. it can take hours to compile with clang | 14:10 | |
BinGOs | I have another 3 CPUs | ||
14:27
colomon joined
|
|||
BinGOs | This is Rakudo version 2016.02-5-g96a1954 built on MoarVM version 2016.02 | 15:12 | |
yay. | |||
lizmat | :-) | 15:13 | |
15:27
vendethiel joined
16:59
colomon joined
17:29
vendethiel joined
18:22
patrickz joined
19:22
FROGGS joined
19:34
colomon joined
20:14
domidumont joined
20:59
TimToady joined
21:10
colomon joined
|
|||
timotimo | what's the major difficulty i'm likely not seeing for moving freeing of stuff into a separate thread? | 22:20 | |
jnthn | Well, the icky races mostly | ||
What if it's still freeing when the mutator triggers another collection, etc. | 22:21 | ||
timotimo | mhm | ||
what kind of datastructure do we have that'd be good for passing the data around? just an array with a semaphor? perhaps one inbox-array per thread that's under the control of the freeing thread? | |||
jnthn | We don't need to pass data around really... | ||
Well, hm | 22:22 | ||
It's a bit tricky :) | |||
timotimo | there needs to be some kind of command queue or something | ||
jnthn | Sorta | ||
I was thinking of running a general "service" thread that can do this, but also spesh | |||
Need to think about it when I've not been busy thinking about $dayjob stuff :) | |||
timotimo | ah, right, right | 22:23 | |
maybe it's not something i should try to tackle without some design work from you up front | |||
thoughts about a heap explorer and potential ways to implement it have been terrorizing me when i tried to sleep myself to health during the day | 22:24 | ||
i was thinking that perhaps the explorer might want to be a separate process. with ptrace on linux it could attach to the moarvm process and then read /proc/$pid/mem with seek and read to read out data | 22:27 | ||
but that's hardly cross-platform | |||
OTOH, if i have to stop the whole thing anyway, might as well have something inside moar's process (dynamically loaded comes to mind) and dig around in its own memory, because that's bound to be quite a bit faster than going through syscalls | 22:38 | ||
23:01
BinGOs joined,
timotimo joined,
harrow joined,
ashleydev joined,
lnx joined,
geekosaur joined,
camelia joined,
[Coke] joined,
moritz joined,
mst joined,
cognominal joined,
xiaomiao joined,
leedo joined,
nebuchadnezzar joined,
pochi joined,
ilmari joined,
jnthn joined,
nwc10 joined,
retupmoca joined,
dalek joined,
_longines joined,
synopsebot6 joined,
hoelzro joined,
flussence joined,
sivoais joined,
avar joined,
FROGGS joined
23:03
pyrimidine joined,
orbus joined,
mtj_ joined,
khagan joined,
colomon joined
23:06
patrickz joined,
ggoebel16 joined,
lizmat joined,
nine joined,
ChanServ joined
23:07
lizmat_ joined
23:09
vendethiel joined
|