Geth MoarVM: 32d66d5683 | (Samantha McVey)++ | src/strings/ops.c
Speed up index 50% for flat haystack and diff type needle

Convert the needle to match the haystack's data type when encountering a flat haystack (very common since Perl 6 flattens the haystack during regex).
Also fix a problem in the 32 bit memmem loop. The loop runs memmem again ... (9 more lines)
00:15
00:16 AlexDaniel joined
MasterDuke samcv: src/strings/ops.c: In function ‘MVM_string_index’: src/strings/ops.c:498:51: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith] && ( start_ptr = mm_return_32 + 1) /* Set the new start pointer right after where we left off */ 00:21
samcv yep working on it 00:22
Geth MoarVM: afdcad424e | (Samantha McVey)++ | src/strings/ops.c
Use char* for pointer addition to please MSVC
00:23
MasterDuke not just msvc, gcc complained for me
samcv ah ok
MasterDuke hm, didn't really seem to make a difference for me on that code from earlier 00:24
samcv MasterDuke: yeah it doesn't change that code because it is already doing memmem on 8 bit haystack and 8bit needle i believe
but should improve a ton of other code 00:25
MasterDuke ah, the 32bit part gets converted to 8bit when flattened?
samcv yeah
MasterDuke oh well, a good optimization anyway 00:26
samcv it makes indexing a single codeponit needle 2x as fast cool 00:33
index with word needle from 2.0685436 to 0.3240215
that's pretty big change
this is my test file gist.github.com/adc8d50df303e457ed...b85b66e94e 00:35
japhb samcv: Ready to bump nqp/rakudo?
samcv that is fine with me. i gotta go to dinner but feel free 00:36
0.5x-3x faster it seems heh for the newly handled conditions 00:37
not bad
01:25 travis-ci joined
travis-ci MoarVM build failed. Samantha McVey 'Speed up index 50% for flat haystack and diff type needle 01:25
travis-ci.org/MoarVM/MoarVM/builds/359632075 github.com/MoarVM/MoarVM/compare/1...d66d568376
01:25 travis-ci left 01:54 FROGGS joined 01:56 ilbot3 joined
AlexDaniel well, there we go: R#1667 02:07
synopsebot R#1667 [open]: github.com/rakudo/rakudo/issues/1667 [perf] Some string benchmark
MasterDuke AlexDaniel: are those numbers for the other langs from the article? or did you run them yourself? 02:10
AlexDaniel I did run myself of course
MasterDuke cool
AlexDaniel edited the issue a bit to clarify that… 02:11
MasterDuke oh, samcv said 32d66d5683 wouldn't really help there
i think looking at the code inspired the change though 02:12
AlexDaniel oh
MasterDuke: where? 02:13
MasterDuke AlexDaniel: irclog.perlgeek.de/moarvm/2018-03-29#i_15978231 02:14
AlexDaniel ok 02:16
03:52 nativecallable6 joined, reportable6 joined, quotable6 joined 03:54 camelia joined 04:25 bartolin joined 04:49 notable6 joined
samcv MasterDuke, AlexDaniel` working on the collapse_strands issue now 05:12
going to extend the grapheme iterator functions to allow me moving to the next strand. what we already do is start with a MVMGrapheme8 string, and iterate the string into it. if we get something that won't fit in 8 bits we abort and put copy that 8 bit buffer into a 32 bit buffer then continue with the iterator 05:14
copying from an 8bit buffer into a 32 bit buffer is pretty fast. much much faster than using an iterator since it's a very tight loop. instead what we will do is use memcpy to copy any 8bit strands into the new buffer instead of using the iterator
then using a new graphemeiterator_next_strand function to move to the next strand/repetition after we've done the memcpy 05:15
MasterDuke: and i think i figured out how to get converting 8bit to 32bit strings to use vector SIMD operations 05:22
not going to alter that until i finish this. but would be cool if that can speed things up a lot 05:23
6 05:58
06:27 domidumont joined 06:32 robertle joined 06:33 domidumont joined 06:53 dogbert17 joined 07:48 zakharyas joined 08:00 zakharyas joined 08:07 zakharyas joined 08:30 dogbert17 joined 09:00 zakharyas joined 09:07 zakharyas joined 09:59 brrt joined
brrt good * 09:59
10:07 zakharyas joined
timotimo yo brrt :) 10:19
brrt: did you see my question about interp_cur_op and the jit? 10:21
brrt i did not
timotimo basically, i'm writing code that'll be running when throwpayloadlexcaller runs and i'm wondering if using interp_cur_op gives me a sensible idea of what handlers (only inlines are relevant for this) we're currently in 10:22
brrt it does not 10:23
we should have a function spesh_get_inline_by_position
and a delegate, jit_get_inline_by_position
timotimo i need the amount of inlines we're in at that point, though :)
brrt if the current frame is JITTed
then you need a spesh_get_inline_depth(inline_nr) 10:24
in general, though
screw interp_cur_op :-P
especially from the PoV from the JIT
timotimo i just need something, anything ;) 10:25
doesn't have to be interp_cur_op
brrt why do you need to know about the inline structure? 10:31
what do you need to know about it
timotimo well, there's this thing in the profiler where we call prof_exit whenever we leave a frame 10:32
but if a frame is left via throwpayloadlexcaller, we skip over a prof_exit command
(they are inserted into the bytecode)
so when we unwind, we have to realize that and properly remove the exact right amount of inlined frames
lizmat hmmmm "perl6 --profile -Msnapper -e 'start {}; sleep 1" reliably segfaults for me 10:38
should I make a ticket, timotimo? 10:39
timotimo huh, that's funny 10:44
it calls an extop that's out of bounds or something?
lizmat perhaps starting a thread at compile time ? 10:47
hmmm...
timotimo no, threads just call into existing bytecode, and even if we do bytecode generation at run time it goes through the validator 10:48
we're possibly jumping into bytecode at an improper alignment and reading one byte off to the side or something
10:50 dalek joined, Geth joined, p6lert joined, synopsebot joined 10:51 SourceBaby_ joined 10:52 SourceBaby joined 11:47 domidumont joined 12:00 Voldenet joined 13:23 zakharyas joined 13:40 zakharyas joined 14:01 Util joined 14:07 AlexDaniel joined 14:27 zakharyas joined 14:29 FROGGS joined
dogbert17 .seen timotimo 15:31
yoleaux I saw timotimo 11:14Z in #perl6-dev: <timotimo> eating memory really, really, really fast is often an infinite recursion
16:07 zakharyas joined 16:12 domidumont joined 16:13 zakharyas joined 16:54 zakharyas joined 17:04 zakharyas joined 17:13 zakharyas joined 18:28 Kaiepi joined
Kaiepi i still don't quite understand what was meant by github.com/MoarVM/MoarVM/pull/824#...-375955795 18:39
can someone explain in more detail? 18:40
19:16 Kaiepi joined 19:27 zakharyas joined 19:30 robertle_ joined 19:47 Kaiepi joined 19:48 zakharyas joined
timotimo i'm not sure if niner is correct in his assumption here 19:51
but he has done a whole lot of nativecall hacking, whereas i did not
Kaiepi i need to test more cases, but i was able to get this to work with the additional changes to nqp and rakudo needed hastebin.com/niqucumaju.cpp 19:56
yeah, it complains about malformed utf8 when i test for Str with characters outside ascii's range 20:07
timotimo well, you'll still need to set its encoding to something other than utf8 20:12
Kaiepi like this? hastebin.com/kimeginata.pl 20:30
oh i got it 20:33
or not 20:34
it only works the first time running test...? hastebin.com/ijobayuluz.pl 20:36
lizmat if the JIT log says something like: "Cannot get template for: gt_n", it means what it says, right? 20:43
that nobody has implemented a JIT template for nqp::gt_n ?
20:50 FROGGS joined
timotimo do we have floating point stuff in the expr jit yet? 20:54
21:03 Kaiepi joined
lizmat other terms I got: clone, prepargs, sp_findmeth, checkarity,, coerce_ni, bindattrs_o etc 21:04
lizmat takes an early night 21:15
21:16 Kaiepi joined
Kaiepi for the wchar_t stuff, how would i go about debugging what's going on in moarvm? 21:30
should moar be built with -j<core count> by default? 22:13
i tested make -j8 and moar built much more quickly without any of the files getting compiled out of order 22:16
timotimo i usually make -j which basically starts all jobs immediately 22:39
it's nice and fast and doesn't go wrong at all ever
(in moarvm)
22:46 lizmat joined 22:47 MasterDuke joined
Kaiepi i think i might leave it to the --make flag, or add a --makejobs flag 23:13
detecting how many cores to use isn't feasible for certain oses without Sys::CPU or Sys::Info being available in their package manager 23:14
MasterDuke timotimo: i just realized your branch to fix large profiles might mean i can profile the rakudo build again
timotimo oh, you can try
it's not entirely correct, though 23:15
MasterDuke gonna spin up the vm and give it a shot
timotimo: `Stage optimize : Profiling is already started at <unknown>:1 (<ephemeral file>:) ...` 23:30
timotimo oh, look, that's fascinating 23:31
MasterDuke oh. i had --profile-stage=optimize
timotimo so maybe that feature is currently also busted. one more for the road 23:32
MasterDuke `MoarVM panic: Profiler lost sequence` with just --profile-compile
timotimo can you turn off inlining?
MasterDuke hasn't died yet... 23:35
heh, `Stage parse : 604.735` 23:44
timotimo: huh. just --profile-compile didn't die, and took much longer like it usually does, but no profile was created 23:52
this is with your branch and MVM_SPESH_INLINE_DISABLE=1