01:41 FROGGS_ joined 01:58 ilbot3 joined
samcv well i have not succeeded in getting gcc to vectorize it 02:11
MasterDuke `#pragma ivdep` doesn't work? 02:12
samcv doesn't do anything nope
MasterDuke huh 02:16
samcv also nicely lldb can do it using just sse1/2 not even needing more recent ones. tried for a long time on gcc. and using debug messages and it doesn't seem to know how to assign from one type to another 02:36
i saved my old version which i haven't pushed anywhere yet. but that doesn't vectorize, though it's faster than what we do now 02:38
MasterDuke unfortunately i know almost absolutely nothing about vectorizing code. hopefully someone else can help 02:44
samcv it could be thta gcc doesn't support it. though i may try in a debian chroot since my distro uses a hardened gcc 02:48
MasterDuke what distro? 02:54
samcv Sabayon
it's based on gentoo
MasterDuke i've heard of it, never used it 02:55
samcv yeah no difference in a chroot. it vectorizes some things. but not what i want 03:00
and the gcc on debian testing is 7.3, i have version 6 on mine atm 03:06
though i'll try a couple more options
MasterDuke: success! src/strings/ops.c:280:17: note: loop vectorized 03:25
had to simplify more code to get it to work and use gcc 7 but i got it!
japhb samcv++ # Persistence! 03:31
samcv it's vectorizing the copy from 32->8 bits and the 8->32 bit ones. but not the bitwise operation that checks if the 32bit string fits in 8 bits 03:38
it tries to tell me " note: reduction: not commutative/associative: " (luckily the gcc 7 vectorization debug messages are 30% better than gcc 6. though i guess it's just not detecting it? since OR shouldn't matter what order 03:40
wow. well i got it vectorized but I had to do something pretty stupid to get it to work 04:02
int val2 = ((active_blob[i] & 0xffffff80) + 0x80) & (0xffffff80-1);
val |= val2;
^ this vectorizes with gcc 7 but just doing val |= ... doesn't...
i go from 16->13 seconds on gcc6 on that string test file i've been playing around with. i think clang5 goes from 11->5.5 seconds with the changes 04:30
going to check gcc 7 now
06:00 mojca joined
nine samcv: I hope on gcc 7 it didn't go from 16 seconds to 2 hours ;) 06:43
07:06 domidumont joined 07:13 domidumont joined 08:18 domidumont joined
dogbert17 timotimo: I've tried to break your 'dont_gc_in_spesh' branch but I have failed :) 12:10
timotimo cool 12:11
dogbert17 still interested in a new Coverity Scan though :) 12:12
RT #130370 13:14
13:59 AlexDaniel joined 14:58 undersightable6 joined 15:02 zakharyas joined 15:07 dalek joined, p6lert joined, synopsebot joined, Geth_ joined 15:35 Kaypie joined 16:53 mojca1 joined 17:32 zakharyas joined
timotimo impressive. one Bridge i see in this speshlog could be 2 instructions, but is 7 ... ok maybe it would have to be 3 because takedispatcher probably can't just be thrown out? 18:39
and again there's this case where it does p6oget_o twice in a row 18:41
Empty ends up capturing lexicals and immediately after that overwriting the register it put that in with an argument's value 18:44
huh. there's no code in there, just a class body that was put there with a BEGIN phaser 18:45
i mean, it's only 40.5 miliseconds from 64k calls; 0.04% of run time 18:46
for some reason its push-all actually grabs the argument value, even though it's just a $ value 18:47
oh, that'd be the self, i imagine
lizmat how difficult would it be to JIT template nqp::ctx ?
timotimo no, it has two getargs, one being the self; self isn't even used, the body is empty
lizmat timotimo: it's preventing "Promise.start" from being JITted 18:48
timotimo that's a BAIL?
lizmat BAIL: op <ctx>
timotimo i see 18:49
lizmat is what it says in the JIT log
timotimo looks trivial
gimme a sec
ah, a bit harder than i imagined, but no big deal 18:50
what's your test code? 18:53
lizmat another that that blocks DYNAMIC from being Jitted is getlexreldyn
await do for ^100000 { start { } }
my estimate is that ^^^ would become 7% faster 18:55
timotimo ok, let me look at getlexreldyn real quick
eh, that's one of the ops that wants a repr check in front
they are ugly to implement 18:56
lizmat ah, ok,. well, let me just put them in a ticket then
Geth_ MoarVM: fc1bb31263 | (Timo Paulssen)++ | 2 files
jit ctx (to benefit Promise.start)
timotimo sure 18:57
oh, you bumped right away? 18:59
maybe i should have at least run "make test" 19:00
lizmat ah... ok, well, let's see :-)
Moar / NQP build ok so far 19:01
rakudo builds ok, make test ok 19:04
timotimo OK
should be safe, then
lizmat whee, Promise.start now jits :-) 19:06
timotimo that's good 19:07
lizmat alas, the difference in performance is not as much as I hoped :- 19:08
timotimo what. what. WHAT 19:09
there's a PHI in the middle of a block
that is *not* right
lizmat something related to my question ? 19:10
timotimo no, just in the speshlog for that example code 19:12
lizmat should I make a ticket for that ? 19:14
fwiw, spectest is fine
timotimo yeah, a moarvm ticket, maybe titled "spesh sometimes leaves PHI nodes in the middle of blocks"
it's not dangerous; PHI doesn't end up in the generated code. but it is a sign that somewhere in spesh's code there's a mild confusiong 19:15
lizmat and we don't want that
timotimo yeah, it could mean something or it could mean nothing
lizmat so, "append_ins: <PHI>" is what's wrong ? 19:16
timotimo no, that's just a consequence; you'd have to look at the speshlog instead 19:17
lizmat ah, ok
timotimo PHI only don't show up in the jit log because i put code in to hide phi, but it only looks at the beginning of BBs, which is the only place that should ever have PHI nodes
ok what is this about; there's a call to set_docee in the start method? 19:20
ok, it's inlined
lizmat set_docee?? 19:21
timotimo oh perhaps it's block's clone or something
lizmat yeah, looks like
that's in Block.WHY, but I don't see a WHY in the profile ? 19:22
timotimo it's inlined, but that doesn't necessarily mean it gets called 19:23
lizmat probably in BOOTSTRAP, line 1719 19:24
which is actually the most often called block in the profile (looks like 10x for eacht start { } 19:25
the Block apparently has a $!why ??? 19:26
timotimo it most likely doesn't 19:27
i didn't say set_docee was being called, just that it shows up in the speshlog
lizmat ah, ok 19:28
timotimo so if it is run on a block that does have a $!why it will properly be copied
lizmat but to appear in the spesh log, Block.clone would have to be called, right ? 19:31
jnthn timotimo: Branch elimination + BB fusing 20:07
Presuming the PHI only reads one input
Geth_ MoarVM/spesh-refactor-iffy: 7 commits pushed by (Bart Wiegmans)++ 20:25
20:26 mojca1 left 21:00 Kaiepi joined
timotimo so far i've only seen them with one in and one out, yeah 21:01
but it feels really wrong to have them left over
you know, i once had a branch that threw out single-arg phi nodes. it made things violently explode :) 21:02
lizmat and another Perl 6 Weekly hits the Net: p6weekly.wordpress.com/2018/03/31/...-released/ 22:05
22:42 Kaypie joined 22:57 Kaypie joined 23:16 ZofBot joined 23:34 SourceBaby joined
samcv lizmat++ 23:41