Geth | MoarVM: 32d66d5683 | (Samantha McVey)++ | src/strings/ops.c Speed up index 50% for flat haystack and diff type needle Convert the needle to match the haystack's data type when encountering a flat haystack (very common since Perl 6 flattens the haystack during regex). Also fix a problem in the 32 bit memmem loop. The loop runs memmem again ... (9 more lines) |
00:15 | |
00:16
AlexDaniel joined
|
|||
MasterDuke | samcv: src/strings/ops.c: In function ‘MVM_string_index’: src/strings/ops.c:498:51: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith] && ( start_ptr = mm_return_32 + 1) /* Set the new start pointer right after where we left off */ | 00:21 | |
samcv | yep working on it | 00:22 | |
Geth | MoarVM: afdcad424e | (Samantha McVey)++ | src/strings/ops.c Use char* for pointer addition to please MSVC |
00:23 | |
MasterDuke | not just msvc, gcc complained for me | ||
samcv | ah ok | ||
MasterDuke | hm, didn't really seem to make a difference for me on that code from earlier | 00:24 | |
samcv | MasterDuke: yeah it doesn't change that code because it is already doing memmem on 8 bit haystack and 8bit needle i believe | ||
but should improve a ton of other code | 00:25 | ||
MasterDuke | ah, the 32bit part gets converted to 8bit when flattened? | ||
samcv | yeah | ||
MasterDuke | oh well, a good optimization anyway | 00:26 | |
samcv | it makes indexing a single codeponit needle 2x as fast cool | 00:33 | |
index with word needle from 2.0685436 to 0.3240215 | |||
that's pretty big change | |||
this is my test file gist.github.com/adc8d50df303e457ed...b85b66e94e | 00:35 | ||
japhb | samcv: Ready to bump nqp/rakudo? | ||
samcv | that is fine with me. i gotta go to dinner but feel free | 00:36 | |
0.5x-3x faster it seems heh for the newly handled conditions | 00:37 | ||
not bad | |||
01:25
travis-ci joined
|
|||
travis-ci | MoarVM build failed. Samantha McVey 'Speed up index 50% for flat haystack and diff type needle | 01:25 | |
travis-ci.org/MoarVM/MoarVM/builds/359632075 github.com/MoarVM/MoarVM/compare/1...d66d568376 | |||
01:25
travis-ci left
01:54
FROGGS joined
01:56
ilbot3 joined
|
|||
AlexDaniel | well, there we go: R#1667 | 02:07 | |
synopsebot | R#1667 [open]: github.com/rakudo/rakudo/issues/1667 [perf] Some string benchmark | ||
MasterDuke | AlexDaniel: are those numbers for the other langs from the article? or did you run them yourself? | 02:10 | |
AlexDaniel | I did run myself of course | ||
MasterDuke | cool | ||
AlexDaniel | edited the issue a bit to clarify that… | 02:11 | |
MasterDuke | oh, samcv said 32d66d5683 wouldn't really help there | ||
i think looking at the code inspired the change though | 02:12 | ||
AlexDaniel | oh | ||
MasterDuke: where? | 02:13 | ||
MasterDuke | AlexDaniel: irclog.perlgeek.de/moarvm/2018-03-29#i_15978231 | 02:14 | |
AlexDaniel | ok | 02:16 | |
03:52
nativecallable6 joined,
reportable6 joined,
quotable6 joined
03:54
camelia joined
04:25
bartolin joined
04:49
notable6 joined
|
|||
samcv | MasterDuke, AlexDaniel` working on the collapse_strands issue now | 05:12 | |
going to extend the grapheme iterator functions to allow me moving to the next strand. what we already do is start with a MVMGrapheme8 string, and iterate the string into it. if we get something that won't fit in 8 bits we abort and put copy that 8 bit buffer into a 32 bit buffer then continue with the iterator | 05:14 | ||
copying from an 8bit buffer into a 32 bit buffer is pretty fast. much much faster than using an iterator since it's a very tight loop. instead what we will do is use memcpy to copy any 8bit strands into the new buffer instead of using the iterator | |||
then using a new graphemeiterator_next_strand function to move to the next strand/repetition after we've done the memcpy | 05:15 | ||
MasterDuke: and i think i figured out how to get converting 8bit to 32bit strings to use vector SIMD operations | 05:22 | ||
not going to alter that until i finish this. but would be cool if that can speed things up a lot | 05:23 | ||
6 | 05:58 | ||
06:27
domidumont joined
06:32
robertle joined
06:33
domidumont joined
06:53
dogbert17 joined
07:48
zakharyas joined
08:00
zakharyas joined
08:07
zakharyas joined
08:30
dogbert17 joined
09:00
zakharyas joined
09:07
zakharyas joined
09:59
brrt joined
|
|||
brrt | good * | 09:59 | |
10:07
zakharyas joined
|
|||
timotimo | yo brrt :) | 10:19 | |
brrt: did you see my question about interp_cur_op and the jit? | 10:21 | ||
brrt | i did not | ||
timotimo | basically, i'm writing code that'll be running when throwpayloadlexcaller runs and i'm wondering if using interp_cur_op gives me a sensible idea of what handlers (only inlines are relevant for this) we're currently in | 10:22 | |
brrt | it does not | 10:23 | |
we should have a function spesh_get_inline_by_position | |||
and a delegate, jit_get_inline_by_position | |||
timotimo | i need the amount of inlines we're in at that point, though :) | ||
brrt | if the current frame is JITTed | ||
then you need a spesh_get_inline_depth(inline_nr) | 10:24 | ||
in general, though | |||
screw interp_cur_op :-P | |||
especially from the PoV from the JIT | |||
timotimo | i just need something, anything ;) | 10:25 | |
doesn't have to be interp_cur_op | |||
brrt | why do you need to know about the inline structure? | 10:31 | |
what do you need to know about it | |||
timotimo | well, there's this thing in the profiler where we call prof_exit whenever we leave a frame | 10:32 | |
but if a frame is left via throwpayloadlexcaller, we skip over a prof_exit command | |||
(they are inserted into the bytecode) | |||
so when we unwind, we have to realize that and properly remove the exact right amount of inlined frames | |||
lizmat | hmmmm "perl6 --profile -Msnapper -e 'start {}; sleep 1" reliably segfaults for me | 10:38 | |
should I make a ticket, timotimo? | 10:39 | ||
timotimo | huh, that's funny | 10:44 | |
it calls an extop that's out of bounds or something? | |||
lizmat | perhaps starting a thread at compile time ? | 10:47 | |
hmmm... | |||
timotimo | no, threads just call into existing bytecode, and even if we do bytecode generation at run time it goes through the validator | 10:48 | |
we're possibly jumping into bytecode at an improper alignment and reading one byte off to the side or something | |||
10:50
dalek joined,
Geth joined,
p6lert joined,
synopsebot joined
10:51
SourceBaby_ joined
10:52
SourceBaby joined
11:47
domidumont joined
12:00
Voldenet joined
13:23
zakharyas joined
13:40
zakharyas joined
14:01
Util joined
14:07
AlexDaniel joined
14:27
zakharyas joined
14:29
FROGGS joined
|
|||
dogbert17 | .seen timotimo | 15:31 | |
yoleaux | I saw timotimo 11:14Z in #perl6-dev: <timotimo> eating memory really, really, really fast is often an infinite recursion | ||
16:07
zakharyas joined
16:12
domidumont joined
16:13
zakharyas joined
16:54
zakharyas joined
17:04
zakharyas joined
17:13
zakharyas joined
18:28
Kaiepi joined
|
|||
Kaiepi | i still don't quite understand what was meant by github.com/MoarVM/MoarVM/pull/824#...-375955795 | 18:39 | |
can someone explain in more detail? | 18:40 | ||
19:16
Kaiepi joined
19:27
zakharyas joined
19:30
robertle_ joined
19:47
Kaiepi joined
19:48
zakharyas joined
|
|||
timotimo | i'm not sure if niner is correct in his assumption here | 19:51 | |
but he has done a whole lot of nativecall hacking, whereas i did not | |||
Kaiepi | i need to test more cases, but i was able to get this to work with the additional changes to nqp and rakudo needed hastebin.com/niqucumaju.cpp | 19:56 | |
yeah, it complains about malformed utf8 when i test for Str with characters outside ascii's range | 20:07 | ||
timotimo | well, you'll still need to set its encoding to something other than utf8 | 20:12 | |
Kaiepi | like this? hastebin.com/kimeginata.pl | 20:30 | |
oh i got it | 20:33 | ||
or not | 20:34 | ||
it only works the first time running test...? hastebin.com/ijobayuluz.pl | 20:36 | ||
lizmat | if the JIT log says something like: "Cannot get template for: gt_n", it means what it says, right? | 20:43 | |
that nobody has implemented a JIT template for nqp::gt_n ? | |||
20:50
FROGGS joined
|
|||
timotimo | do we have floating point stuff in the expr jit yet? | 20:54 | |
21:03
Kaiepi joined
|
|||
lizmat | other terms I got: clone, prepargs, sp_findmeth, checkarity,, coerce_ni, bindattrs_o etc | 21:04 | |
lizmat takes an early night | 21:15 | ||
21:16
Kaiepi joined
|
|||
Kaiepi | for the wchar_t stuff, how would i go about debugging what's going on in moarvm? | 21:30 | |
should moar be built with -j<core count> by default? | 22:13 | ||
i tested make -j8 and moar built much more quickly without any of the files getting compiled out of order | 22:16 | ||
timotimo | i usually make -j which basically starts all jobs immediately | 22:39 | |
it's nice and fast and doesn't go wrong at all ever | |||
(in moarvm) | |||
22:46
lizmat joined
22:47
MasterDuke joined
|
|||
Kaiepi | i think i might leave it to the --make flag, or add a --makejobs flag | 23:13 | |
detecting how many cores to use isn't feasible for certain oses without Sys::CPU or Sys::Info being available in their package manager | 23:14 | ||
MasterDuke | timotimo: i just realized your branch to fix large profiles might mean i can profile the rakudo build again | ||
timotimo | oh, you can try | ||
it's not entirely correct, though | 23:15 | ||
MasterDuke | gonna spin up the vm and give it a shot | ||
timotimo: `Stage optimize : Profiling is already started at <unknown>:1 (<ephemeral file>:) ...` | 23:30 | ||
timotimo | oh, look, that's fascinating | 23:31 | |
MasterDuke | oh. i had --profile-stage=optimize | ||
timotimo | so maybe that feature is currently also busted. one more for the road | 23:32 | |
MasterDuke | `MoarVM panic: Profiler lost sequence` with just --profile-compile | ||
timotimo | can you turn off inlining? | ||
MasterDuke | hasn't died yet... | 23:35 | |
heh, `Stage parse : 604.735` | 23:44 | ||
timotimo: huh. just --profile-compile didn't die, and took much longer like it usually does, but no profile was created | 23:52 | ||
this is with your branch and MVM_SPESH_INLINE_DISABLE=1 |