01:41
FROGGS_ joined
01:58
ilbot3 joined
|
|||
samcv | well i have not succeeded in getting gcc to vectorize it | 02:11 | |
MasterDuke | `#pragma ivdep` doesn't work? | 02:12 | |
samcv | doesn't do anything nope | ||
MasterDuke | huh | 02:16 | |
samcv | also nicely lldb can do it using just sse1/2 not even needing more recent ones. tried for a long time on gcc. and using debug messages and it doesn't seem to know how to assign from one type to another | 02:36 | |
i saved my old version which i haven't pushed anywhere yet. but that doesn't vectorize, though it's faster than what we do now | 02:38 | ||
MasterDuke | unfortunately i know almost absolutely nothing about vectorizing code. hopefully someone else can help | 02:44 | |
samcv | it could be thta gcc doesn't support it. though i may try in a debian chroot since my distro uses a hardened gcc | 02:48 | |
MasterDuke | what distro? | 02:54 | |
samcv | Sabayon | ||
it's based on gentoo | |||
MasterDuke | i've heard of it, never used it | 02:55 | |
samcv | yeah no difference in a chroot. it vectorizes some things. but not what i want | 03:00 | |
and the gcc on debian testing is 7.3, i have version 6 on mine atm | 03:06 | ||
though i'll try a couple more options | |||
MasterDuke: success! src/strings/ops.c:280:17: note: loop vectorized | 03:25 | ||
had to simplify more code to get it to work and use gcc 7 but i got it! | |||
japhb | samcv++ # Persistence! | 03:31 | |
samcv | it's vectorizing the copy from 32->8 bits and the 8->32 bit ones. but not the bitwise operation that checks if the 32bit string fits in 8 bits | 03:38 | |
it tries to tell me " note: reduction: not commutative/associative: " (luckily the gcc 7 vectorization debug messages are 30% better than gcc 6. though i guess it's just not detecting it? since OR shouldn't matter what order | 03:40 | ||
wow. well i got it vectorized but I had to do something pretty stupid to get it to work | 04:02 | ||
int val2 = ((active_blob[i] & 0xffffff80) + 0x80) & (0xffffff80-1); | |||
val |= val2; | |||
^ this vectorizes with gcc 7 but just doing val |= ... doesn't... | |||
i go from 16->13 seconds on gcc6 on that string test file i've been playing around with. i think clang5 goes from 11->5.5 seconds with the changes | 04:30 | ||
going to check gcc 7 now | |||
06:00
mojca joined
|
|||
nine | samcv: I hope on gcc 7 it didn't go from 16 seconds to 2 hours ;) | 06:43 | |
07:06
domidumont joined
07:13
domidumont joined
08:18
domidumont joined
|
|||
dogbert17 | timotimo: I've tried to break your 'dont_gc_in_spesh' branch but I have failed :) | 12:10 | |
timotimo | cool | 12:11 | |
dogbert17 | still interested in a new Coverity Scan though :) | 12:12 | |
RT #130370 | 13:14 | ||
13:59
AlexDaniel joined
14:58
undersightable6 joined
15:02
zakharyas joined
15:07
dalek joined,
p6lert joined,
synopsebot joined,
Geth_ joined
15:35
Kaypie joined
16:53
mojca1 joined
17:32
zakharyas joined
|
|||
timotimo | impressive. one Bridge i see in this speshlog could be 2 instructions, but is 7 ... ok maybe it would have to be 3 because takedispatcher probably can't just be thrown out? | 18:39 | |
and again there's this case where it does p6oget_o twice in a row | 18:41 | ||
Empty ends up capturing lexicals and immediately after that overwriting the register it put that in with an argument's value | 18:44 | ||
huh. there's no code in there, just a class body that was put there with a BEGIN phaser | 18:45 | ||
i mean, it's only 40.5 miliseconds from 64k calls; 0.04% of run time | 18:46 | ||
for some reason its push-all actually grabs the argument value, even though it's just a $ value | 18:47 | ||
oh, that'd be the self, i imagine | |||
lizmat | how difficult would it be to JIT template nqp::ctx ? | ||
timotimo | no, it has two getargs, one being the self; self isn't even used, the body is empty | ||
lizmat | timotimo: it's preventing "Promise.start" from being JITted | 18:48 | |
timotimo | that's a BAIL? | ||
lizmat | BAIL: op <ctx> | ||
timotimo | i see | 18:49 | |
lizmat | is what it says in the JIT log | ||
timotimo | looks trivial | ||
gimme a sec | |||
ah, a bit harder than i imagined, but no big deal | 18:50 | ||
what's your test code? | 18:53 | ||
lizmat | another that that blocks DYNAMIC from being Jitted is getlexreldyn | ||
await do for ^100000 { start { } } | |||
my estimate is that ^^^ would become 7% faster | 18:55 | ||
timotimo | ok, let me look at getlexreldyn real quick | ||
eh, that's one of the ops that wants a repr check in front | |||
they are ugly to implement | 18:56 | ||
lizmat | ah, ok,. well, let me just put them in a ticket then | ||
ok? | |||
Geth_ | MoarVM: fc1bb31263 | (Timo Paulssen)++ | 2 files jit ctx (to benefit Promise.start) |
||
timotimo | sure | 18:57 | |
oh, you bumped right away? | 18:59 | ||
maybe i should have at least run "make test" | 19:00 | ||
lizmat | ah... ok, well, let's see :-) | ||
Moar / NQP build ok so far | 19:01 | ||
rakudo builds ok, make test ok | 19:04 | ||
timotimo | OK | ||
should be safe, then | |||
lizmat | whee, Promise.start now jits :-) | 19:06 | |
timotimo | that's good | 19:07 | |
lizmat | alas, the difference in performance is not as much as I hoped :- | 19:08 | |
timotimo | what. what. WHAT | 19:09 | |
there's a PHI in the middle of a block | |||
that is *not* right | |||
lizmat | something related to my question ? | 19:10 | |
timotimo | no, just in the speshlog for that example code | 19:12 | |
lizmat | should I make a ticket for that ? | 19:14 | |
fwiw, spectest is fine | |||
timotimo | yeah, a moarvm ticket, maybe titled "spesh sometimes leaves PHI nodes in the middle of blocks" | ||
it's not dangerous; PHI doesn't end up in the generated code. but it is a sign that somewhere in spesh's code there's a mild confusiong | 19:15 | ||
lizmat | and we don't want that | ||
timotimo | yeah, it could mean something or it could mean nothing | ||
lizmat | so, "append_ins: <PHI>" is what's wrong ? | 19:16 | |
timotimo | no, that's just a consequence; you'd have to look at the speshlog instead | 19:17 | |
lizmat | ah, ok | ||
timotimo | PHI only don't show up in the jit log because i put code in to hide phi, but it only looks at the beginning of BBs, which is the only place that should ever have PHI nodes | ||
ok what is this about; there's a call to set_docee in the start method? | 19:20 | ||
ok, it's inlined | |||
lizmat | set_docee?? | 19:21 | |
timotimo | oh perhaps it's block's clone or something | ||
lizmat | yeah, looks like | ||
that's in Block.WHY, but I don't see a WHY in the profile ? | 19:22 | ||
timotimo | it's inlined, but that doesn't necessarily mean it gets called | 19:23 | |
lizmat | probably in BOOTSTRAP, line 1719 | 19:24 | |
which is actually the most often called block in the profile (looks like 10x for eacht start { } | 19:25 | ||
) | |||
the Block apparently has a $!why ??? | 19:26 | ||
timotimo | it most likely doesn't | 19:27 | |
i didn't say set_docee was being called, just that it shows up in the speshlog | |||
lizmat | ah, ok | 19:28 | |
timotimo | so if it is run on a block that does have a $!why it will properly be copied | ||
lizmat | but to appear in the spesh log, Block.clone would have to be called, right ? | 19:31 | |
jnthn | timotimo: Branch elimination + BB fusing | 20:07 | |
Presuming the PHI only reads one input | |||
Geth_ | MoarVM/spesh-refactor-iffy: 7 commits pushed by (Bart Wiegmans)++
|
20:25 | |
20:26
mojca1 left
21:00
Kaiepi joined
|
|||
timotimo | so far i've only seen them with one in and one out, yeah | 21:01 | |
but it feels really wrong to have them left over | |||
you know, i once had a branch that threw out single-arg phi nodes. it made things violently explode :) | 21:02 | ||
lizmat | and another Perl 6 Weekly hits the Net: p6weekly.wordpress.com/2018/03/31/...-released/ | 22:05 | |
22:42
Kaypie joined
22:57
Kaypie joined
23:16
ZofBot joined
23:34
SourceBaby joined
|
|||
samcv | lizmat++ | 23:41 |