Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
00:02 reportable6 left
jnthnwrthngtn ggoebel: No, and curiously it manages to have no overlap with the things I tested :) 00:03
Though it's not quite testing the same thing I think; github.com/dyu/ffi-overhead/blob/m...ewplus.cpp is for example using the Node.js C API, whereas all the ones I tested didn't require writing any such C or C++ binding code 00:08
I tried with the Node.js FFI module with a more comparable API and it took 1 minute 57 seconds. I'm guessing that isn't the speediest option :) 00:09
(Although "if I google <language> FFI and pick the first sensible looking option" is arguably the a good strategy for picking a representative FFI impl to benchmar...) 00:13
Sleep o/ 00:17
moon-child I heard luajit cffi can be faster than c when calling into shared libs, because c has to go through plt and luajit can avoid that 00:19
timo i quickly wrote a/the abs calling example and in the spesh log i see we're still creating an Int needlessly before doing the call. we do use the register that was used to create that Int, however, so that's good 00:38
1% of time spent in abs, says perf 00:43
14.28% spent in gc_free, 10.22% in gc_collect_free_nursery_uncopied, 6.49% in gc_allocate_nursery
ah, with the perf map i now see 34.5% spent in <unit>, which is where the for loop got inlined into 00:45
00:46 patrickb left
timo it's using a lexical for my integer value, so there's more to be won there 00:58
replacing abs($x) with abs(999) doesn't make things noticeably faster 01:01
empty loop with the same amount of iterations takes 1.33s compared to the 2.3s with abs($x) 01:03
01:05 reportable6 joined
timo a loop of 500_000_000 written naively in C takes 1.25s 01:13
this is with -fno-builtin 01:14
Geth MoarVM/spesh_unbox_i_cpointer: 97b3af3ebf | (Timo Paulssen)++ | src/6model/reprs/CPointer.c
Optimize passing CPointer to nativecall ever so slightly

We emit an unbox_i operation in a compiled nativecall body in order to get the pointer value to pass to the native function. Without a spesh method on the CPointer repr, this would interpret as, and jit into, a call to CPointer's get_int.
Instead of a call, we just emit a spesh op to do the memory offset and read for us, which the JIT also likes very much.
02:05 evalable6 left, linkable6 left 02:07 linkable6 joined 02:32 frost joined 04:05 linkable6 left 04:06 linkable6 joined 04:08 evalable6 joined 06:00 ggoebel left 06:02 reportable6 left 06:04 reportable6 joined
nine timo: So....our loop with a native call is basically about a factor 6 off native C's performance? That's not that bad at all :) 07:53
timo well, hopefully we can also get there without forcing a c-style loop loop on our users 07:54
nine Btw. with that BEGIN EVAL issue in NativeCall sorted out, I measured a new record of 7.135s for csv-ip5xs.pl with 1_000_000 lines 07:56
Best I got before new-disp was around 14s
timo oh wow
nine I can remember when I used 10000 lines when benchmarking and got around the same time :) 07:57
08:03 dogbert17 left 08:04 dogbert17 joined, frost left
timo interesting. here's a couple of BBs in a row where the first one has the next four as its direct successors, because there's an FH Goto in each of them. however, all the ops in the first two can not throw. the first one has two "set", the second one has just "unless_i" 08:04
i wonder if we can win anything anywhere if we remove such useless connections when we detect that 08:05
phis would get smaller for one
nine Isn't it this way because of deopt?
08:07 frost joined
timo maybe you mean how the very first BB has every BB that can osr as a successor? 08:09
nine Rather that we can't simplify that easily. But maybe this is one of the rare cases where we actually can :) 08:10
timo hmpf, so the particles example from sdl2::raw has one single update loop, and there's a branch in it for when a particle in a given slot has zero-or-negative lifetime, then there's a chance for a new particle to be spawned in that place 08:19
unfortunately, all the dispatches from that branch get "never dispatched"
but the dispatch_o are rewritten to sp_dispatch_o, which don't log
right now we don't get a second chance here
nine There are no deopts from these non-optimal branches either, are there? 08:29
timo you mean in practice or just from what ops there are? 08:30
nine in practice 08:33
timo lemme log dat
ah, i couldn't find the corresponding spot because i was looking at the rwong code 08:41
the deopt comes from a dispatch_v raku-sink 08:43
simple-args-proto also gets deopted a fair bit, where it guardconcs against BOOTInt 08:46
interesting. it grab the value of $*COMPILING_CORE_SETTING and that ends up in the guardconc 08:47
so it used to be set to 0, but then it was not set any more? 08:48
and it got speshed during compilation where the dynvar was present
nine Sounds like a simple fix is to just always set that dynvar? 08:52
timo then it will leak into userspace 09:03
Geth MoarVM/new-disp-nativecall-libffi: 29 commits pushed by (Stefan Seifert)++, (Nicholas Clark)++, (Timo Paulssen)++
review: github.com/MoarVM/MoarVM/compare/0...d92825147a
timo sorry i got booted off the 'net 09:09
09:12 linkable6 left, evalable6 left
nine Right...then of course the obvious solution is to unset it, instead of setting it to 0 in token comp_unit 09:13
Geth MoarVM/new-disp-nativecall-libffi: 4 commits pushed by (Stefan Seifert)++, (Timo Paulssen)++ 09:21
timo i guess! 09:27
unless lookup of nonexistent dynvars is very slow
nine With unset I mean set it to NQPMu or whatever the we get in the dispatcher when the variable is never set 09:29
Anyway this seems to be the perfect case for MasterDuke++'s work on throwing away spesh cands when there are too many deopts 09:30
timo simple-args-proto doesn't get deopted that much ... or at least the other one gets deopted so much that anything else is drowned out 09:33
but yes, agreed on MasterDuke's work 09:39
it boggles me, kinda, how/why it breaks the way it does
at some point i should RR it out. like really buckle up and try to really focus 09:41
nine Yes. I only really started making progress on the return-from-nested-runloop issue when I fired up rr and single stepped all the way from the last callback to where it broke 09:52
Seeing it live in action made it quite obvious
10:03 MasterDuke left 10:06 MasterDuke joined 10:14 linkable6 joined
jnthnwrthngtn moarning o/ 10:14
lizmat o/ jnthnwrthngtn
10:15 evalable6 joined
jnthnwrthngtn timo: fwiw, my example just passed a literal to abs (knowing that it can't know to constant fold with FFI...) 10:16
timo OK. i tried both $_, $x, and a 999, the difference was very small 10:17
MasterDuke jnthnwrthngtn: i'm running a spectest now that i'm back on my desktop, but the rabbit hole i mentioned in github.com/MoarVM/MoarVM/pull/1600 is all the other encoders, which also don't take into account `start` or `length` 10:21
all tests now pass 10:22
jnthnwrthngtn MasterDuke: Yeah, I'm more inclined to just remove start/length arguments from all of them except maybe those where it serves some internal purpose (like you had). 10:29
Rather than fix it
I mean, substr exists :)
MasterDuke in that case, the function could(should?) be removed 10:31
jnthnwrthngtn What led you do want the start argument to the ASCII one? I'm guessing you discovered this for a reason 10:34
We could in theory keep it for that if it's useful, and remove the others.
Although if it's not a performance-critical place arguably just substr it for the ASCII one too
MasterDuke some silly experimentation after seeing lizmat++'s benchmark where `my int $a = $foo.Int` was slower than `my int $a = str $ = $foo` 10:35
the second form uses MVM_coerce_s_i, which is just strtoll, but the first goes via MVM_radix 10:36
lizmat on that note: if we would have a CCLASS_DECIMAL, we could make Str.Int 3.5x as fast for strings < 19 chars 10:37
MasterDuke so i was attempting to use strtoll in MVM_radix (assuming several conditions hold)
lizmat CCLASS_DECIMAL being just 0,1,2,3,4,5,6,7,8,9
MasterDuke and since MVM_radix takes an offset, i was trying to encode to a C string (for strtoll) just the correct substring of the MVMString passed 10:38
lizmat CCLASS_NUMERIC also includes non-ascii numerics, which *do* break on the my int $ = my str $ = "123" fast path
MasterDuke so passed the offset given to MVM_radix to MVM_string_ascii_encode_substr and noticed it wasn't doing anything 10:39
jnthnwrthngtn One could in theory look at if the string's representation claims to be ASCII or similar and then use the buffer directly 10:40
Then it's zero copy and a cheap check, at the cost of a little encapsulation breakage. 10:41
That said, if things we call expect zero termination, that won't quite work
MasterDuke i assume strtoll expects a zero-terminated string, but i guess a memcpy into a new buffer 1 bigger would still be faster 10:45
jnthnwrthngtn Yeah, and can (with length check also as a condition for applying the opt) alloca it too 10:46
MasterDuke right. but hm, would we expect a lot of ascii? i think even a literal string given on the command line is *not* ascii (i.e., when i was using MVM_string_ascii_encode_substr it wasn't using the `if (str->body.storage_type == MVM_STRING_GRAPHEME_ASCII) {` branch) 10:50
jnthnwrthngtn Dunno without looking at whether the utf8 decoder tracks this, tbh. 10:56
MasterDuke do we have any functions to check whether "something" (e.g., an MVMString, array of bytes) is composed of just ascii chars? 11:00
guess MVM_string_buf32_can_fit_into_8bit is close 11:05
moon-child means all the codepoints are <255? 11:06
MasterDuke github.com/MoarVM/MoarVM/blob/mast....h#L66-L77 11:07
could always implement lemire.me/blog/2020/07/21/avoid-ch...-matters/, but i wonder if MVM_string_buf32_can_fit_into_8bit is good enough? don't know what strtoll will do if it sees a non-ascii char 11:12
12:02 reportable6 left 12:05 reportable6 joined
MasterDuke why do non-native string seem to get created with a single strand? e.g., in `use nqp; my $a = "123"; say nqp::radix(10, $a, 0, 0)[0]`, $a has a storage_type == MVM_STRING_STRAND 12:09
12:34 squashable6 left 12:50 Kaiepi left 12:59 Kaiepi joined, Kaiepi left 13:00 Kaiepi joined, Kaiepi left, Kaiepi joined
MasterDuke and then why in the world does str.body.storage.strands.blob_string.body.num_graphs == 111? 13:01
13:02 Kaiepi left, Kaiepi joined 13:03 Kaiepi left, Kaiepi joined 13:05 ggoebel joined, Kaiepi left
MasterDuke i ran that example under rr, put a breakpoint in MVM_radix, then did `watch -l str->body.num_strands` and reverse-continue'd 13:06
it broke and shows the value going from 1 to 0 so i ran a backtrace 13:07
and it's in process_worklist -> ... -> MVM_frame_takeclosure ? 13:08
timo that looks odd, why is it storage.strands.blob_string, wouldn't it have to have a [0] in there somewhere? 13:10
MasterDuke same with strands[0] 13:11
timo can you double-check the REPR? 13:15
MasterDuke (rr) p REPR(str)->name 13:16
$40 = 0x7f9f74bb2731 "MVMString"
(rr) p REPR(str.body.storage.strands[0])->name
$42 = 0x7f9f74bb2731 "MVMString"
timo OK 13:19
but it's also changing from unrelated code, so perhaps the object got collected in the mean time and re-used or something? 13:20
13:26 Kaiepi joined
MasterDuke oh, maybe the string originally comes from nibbling the source code? 13:51
timo oh, for some reason i thought we were talking about the return value of radix, but that isn't a string 13:52
yeah i think we actually do have that, many strings being a reference into the source code they're compiled from. but only if they haven't gone through serialization
lizmat fwiw, I discussed this the other day with samcv 13:59
substr will just index into a string, and thus keep the big string alive 14:00
we discussed 2 ways of "separating" a substr from its source:
and encode(decode())
(or was it decode(encode()) :-) 14:01
jnthnwrthngtn But nqp::indexingoptimized is probably the best bet?
lizmat is that exposed? 14:02
jnthnwrthngtn m: use nqp; say nqp::indexingoptimized('foo')
camelia foo
lizmat hmmm... even documented: github.com/Raku/nqp/blob/master/do...goptimized
jnthnwrthngtn I think it's a no-op in the situation that there's no strands 14:03
lizmat still, not exposed at HLL level
or otherwise used in the setting 14:04
MasterDuke does .copy do it?
Geth MoarVM: 592cc85489 | niner++ (committed using GitHub Web editor) | 44 files
New disp nativecall (#1595)

With the new dispatch terminal, NativeCall will be able to benefit from the more efficient argument passing conventions. It will also benefit from dispatch programs tailored to the callsite, i.e. being able to replace more costly checks for containers with cheap guards.
Adds a new variant MVM_nativecall_dispatch which understands the new dispatcher argument passing convention to avoid allocating and populating an argument array for every call.
The dispatcher calling convention allows for unboxed values to get passed to a native function. Need to handle those in MVM_nativecall_dispatch instead of blindly assuming that we always get objects. ... (24 more lines)
14:08 linkable6 left
lizmat nine: I guess I should also merge the NQP PR now, right? 14:08
MasterDuke does it need a bump to go along with it? 14:09
lizmat MasterDuke: am bumping atm
14:10 linkable6 joined
lizmat nqp builds and tests fine without it 14:11
timo do we have any nativecall tests in nqp? 14:12
i think we don't
lizmat anyways, the Rakudo PR requires it, so I'll merge that now as well 14:14
14:15 linkable6 left 14:17 linkable6 joined 14:22 linkable6 left 14:24 frost left 14:25 linkable6 joined
nine yes, please :) 14:27
lizmat testing ... 14:30
nine: all bumped now 14:40
hmmm... so on strings: nqp::clone(native str) is basically a noop, right ? 14:45
because it will still refer to the same underlying static string? 14:46
jnthnwrthngtn No, clone is an object op, so you'll be boxing it to a Str and then cloning the Str, which will indeed refer to the same underlying static string.
So it's more of a wasteop :)
lizmat indeed... :-)
ok, so if we would create a method clone(Str:D:) { nqp::indexingoptimized($!value) } 14:47
that would sorta create the semantics that we would want to expose ?
jnthnwrthngtn Ah, in detaching it from the original string it was substr'd from? Yes. 14:49
Note there is a side-effect though
lizmat ah?
jnthnwrthngtn The compact representation of 'x' x 10000 relies on strands also, so you'd be exploding that into a flat string 14:50
Probably not a big deal
lizmat well, yeah... 14:51
jnthnwrthngtn Especially as that optimization isn't part of the language spec, just something MoarVM does
lizmat the thing is in the IRC log parsing, I currently slurp the whole file, and then start parsing from there
this implies that all substrings taken from that (such as nick names, and messages) are substrings into that slurped string, and thus are keeping that alive 14:52
I'm still not sure how much of an issue that is memory wise
MasterDuke what if you make them natives? in my example above, making the variable a native str means it isn't a strand when it gets to MVM_radix 14:53
lizmat but I do know that the whole slurped string will be stored with 4-byte ordinals because of the linefeeds in them
however, most of the messages will be just ASCII, so could be stored as 1-byte ordinals 14:54
which would save a lot of memory
MasterDuke: native or not, would not make a difference at that level ?
MasterDuke dunno
lizmat a Str is just a class Str { has str $!value }
aka just a wrapper around a native string 14:55
jnthnwrthngtn Indeed, strands exist at the `str` level
MasterDuke well, however it happens, in `use nqp; my $a = "123"; say nqp::radix(10, $a, 0, 0)[0]`, $a has a storage_type == MVM_STRING_STRAND. but if it's `my str $a <...>` the storage_type is not MVM_STRING_STRAND 14:57
lizmat so nqp::radix is changing the type of its source ? 14:58
that feels... unexpected ?
MasterDuke radix isn't changing anything, i just mean when $a is passed into radix it's whichever type 14:59
lizmat and what is nqp::getattr($a,Str,'$!value') ? 15:00
MasterDuke don't know what you mean? MVM_radix doesn't know anything about $a, just that it was given an MVMString 15:02
nine lizmat: I'd bet that all substrings of that large string will be in 4-byte representation as well. I don't think we go back implicitly from full unicode to ASCII as doing so requires a scan of the full string
lizmat but an nqp::indexingoptimized(nqp::substr(...)) might ? 15:03
MasterDuke no, it just copies whatever the bodies are composed of if they're all the same type, otherwise uses a grapheme iterator 15:04
lizmat ok, so the only way to potentially go from 4byte to 1byte is to decode(encode()) ? 15:06
MasterDuke oh wait, that iterator might actually do it
yeah, if the resulting string can be 8bit the iterator converts everything to that 15:07
lizmat but an nqp::indexingoptimized(nqp::substr(...)) would convert to 8bit when it can 15:08
MasterDuke it looks like yes 15:09
lizmat cool :-)
perhaps a Str.substr( :$clone!) candidate would be in order ? 15:10
jnthnwrthngtn Maybe :detach ?
MasterDuke or :isolate?
lizmat sure, works for me :-)
:detach feels more natural to me, fwiw 15:11
jnthnwrthngtn That it's achieved under the hood by cloning is an impl detail (and one we might some day choose to avoid by exposing a detach version right away)
lizmat it described the action
lizmat will do a PR
jnthnwrthngtn :detached also works because it describes the goal
MasterDuke :clipped, :snipped, :cut-out, :exiled, :ostracized 15:13
lizmat yeah, but you don't know whether it did that>
MasterDuke :excised
lizmat so feels to me that :detach describes intent
not guarantee the result
MasterDuke :sanctified 15:15
lots of options for this concept
jnthnwrthngtn Bless that sanctified little string
lizmat meh 15:16
.oO( @bikeshedders>>.exile )
15:17 dogbert17 left
MasterDuke huh. my initial experiment seemed to show that trying strtoll was a pretty big speedup, but now i'm not seeing it... 15:20
lizmat ok, lots of Str.substr candidates... will contemplate on right cause of action there 15:21
15:26 dogbert17 joined 15:33 ggoebel left
MasterDuke if i up the number of iterations i'm benchmarking i start to see a speedup, but not sure it's worth the complexity 15:34
no, nevermind, it's actually a slight slowdown 15:35
15:36 squashable6 joined
[Coke] :( 15:37
MasterDuke maybe my earlier benchmarking was flawed (but i thought i was checking the right branches were being taken) 15:41
maybe the cost of MVM_radix is really all in the overhead of allocating the array for the results 15:42
15:46 Kaiepi left 15:47 Kaiepi joined
lizmat yeah, I mean the algo is pretty simple :-) 15:50
MasterDuke ah, looks like my change makes it faster for longer strings. 4 chars is slower, but by 10 chars it's slightly faster 15:56
lizmat I'd say radix is generally used on smaller numbers, aka smaller strings ?
16:00 patrickb joined
MasterDuke well, MVM_radix is limited anyway depending on the base, since it stores the result in an MVMint64 16:01
lizmat well, yes... because it only knows about natives anyway ? 16:08
MasterDuke MVM_bigint_radix is the arbitrary size version
lizmat exposed as nqp::radix_I right ? 16:17
MasterDuke yep
interesting. radix_I is marked :pure, but radix isn't 16:20
jnthnwrthngtn: think github.com/MoarVM/MoarVM/commit/ed...845cacL277 was just an oversight? 16:21
16:25 rypervenche left 16:32 rypervenche joined
Geth MoarVM: MasterDuke17++ created pull request #1605:
Mark nqp::radix as :pure
japhb Mentioning here because it's likely to be NativeCall related: github.com/jonathanstowe/Crypt-Sod...h/issues/2 18:01
(Last commit in that repo appears to be from March) 18:02
18:02 reportable6 left
jnthnwrthngtn MasterDuke: hm, is that link meant to go to nfarunalt? 18:19
Probably radix on the next line. I think radix is pure 18:20
Yeah, it totally has to be
MasterDuke yeah, see github.com/MoarVM/MoarVM/pull/1605
jnthnwrthngtn Because all its inputs are immutable native values :)
Geth MoarVM: 9d5bcfc5d5 | (Daniel Green)++ | 2 files
Mark nqp::radix as :pure

I suspect this was just overlooked in ed9db7251d18d15bbf645e0a8b6cdb2654a32ece (where nqp::radix_I *was* marked
MoarVM: c779c6320a | (Jonathan Worthington)++ (committed using GitHub Web editor) | 2 files
Merge pull request #1605 from MasterDuke17/mark_radix_as_pure_like_radix_I

Mark nqp::radix as :pure
19:03 reportable6 joined
timo .o( wonder if more-pea will be a rough rebase ) 19:49
MasterDuke ugh, reminds me i need to rebase remove-spesh-opt-if-too-many-deopts 19:50
lizmat nine: in case you missed this on #raku / #raku-dev, but it looks like ip5xs went from 1.41 to 1.14 , and from 14.99 to 6.24 for ip5xs-20 20:13
[Coke] ... heh. Was coming here to say that. :) 20:33
added another bug 20:38
lizmat hmmm.... it looks like the nativecall merge broke my Ecosystem::Archive update functionality 21:14
starts eating memory pretty fast without recovering
too tired to look at it now... will do so tomorrow 21:15
sena_kun hmm, it seems the new nativecall introduced some regressions 21:28
nine lizmat: Ecosystem::Archive's run-tests script reports ALL OK here 21:34
sena_kun: if you mean the Blin output, at least the Inline::Perl5 version used there is outdated. 0.57 is out and works 21:35
sena_kun nine, yup, but there are plenty of other modules failing bisecting to this other than Inline::Perl5. :S 21:38
jnthnwrthngtn A few distinct failure modes too, by the looks of it
sena_kun don't want to be the one who brings in bad news, but on the bright side breaking things means we are moving somewhere. 21:39
sena_kun sleep&
jnthnwrthngtn sena_kun++ # blin run
nine Well, that's what we have blin for :) Will start debugging those tomorrow 21:40
jnthnwrthngtn Indeed, I'd have been surprised if a change of this magnitude didn't lead to blin finding something :)
sena_kun nine, good luck, thanks for your continuous contribution. 21:41
nine thanks
22:58 vrurg_ joined 23:01 vrurg left
lizmat nine: the test script so far only tests if the module compiles 23:05
23:28 vrurg_ left, vrurg joined