| IRC logs at
Set by AlexDaniel on 12 June 2018.
timotimo if i understood right, tracy can now also be used without cooperation of the target application as a sampling profiler much like perf 00:15
02:34 frost-lab joined 02:59 lucasb left
nine Yesterday I learned something about spesh: even with MVM_SPESH_BLOCKING and MVM_SPESH_NODELAY a frame may get hundreds of times before getting optimized. The reason is that spesh logs are collected and only shipped once they contain (the default of) 16384 entries. 06:57
Getting a new log full of entries is what then triggers the planner and creation of candidates. 06:58
What if we set that to 1, to get immediate spesh action? Well, for one you're now operating on geologic time frames. Aside from that you'll get speshed versions with minimal facts, since at most there can be a single log entry (i.e. argument type or so). 06:59
A corollary is that even with the default of 16384 log entries, we create frames where we only have a part of the possible facts. But that's not so bad since usually a frame must be called a lot before getting speshed. And a single run with partial information probably won't change anything. 07:01
MasterDuke jnthn: mentioned playing with number of spesh log entries as a possible solution for something recently, i don't remember exactly what, but i do remember taking a quick look and not knowing if the numbers should be adjusted up or down 07:03
nine Depends on the issue I guess ;) 07:04
MasterDuke didn't mean to actually address jnthn there, but i guess it still works
maybe the problem of infix:<**> not getting inlined?
oh no, it was the change in performance and number of frames created with the now-reverted decont logging change 07:06
using Tux's test-t as the code to run 07:07
starting at 07:08
"Could try tweaking the log sizes or number of outstanding log buffers a thread can have before it stops" - jnthn
timotimo nine: yeah, a common thing where this happens is a frame with two long-running loops in it; we osr the first loop with no facts about the second, and currently, afaik, don't osr the second since after the osr optimization no osrpoints remain; not sure if any guards exist at that point in the second loop to deopt us out of the non-optimal version of that loop 07:15
nine MasterDuke: since your change increases the number of log entries made, I'd guess increasing the log since would be worth a try. But then, lowering the count so spesh reacts faster may also help. It's just something you have to try. 07:17
timotimo mhhh endless tunables 07:19
MasterDuke timotimo: yeah, there have been a couple people bit recently by trying to benchmark different ways of doing something in a single script (i.e., run first way in a loop  collect end - start time, then run second way in a loop and collect end - start time), where it looked like the second way was slower, but it wasn't if they were run in isolation
nine: ok, i'll just adjust them wildly in every direction and see what happens 07:20
timotimo we have a mechanism that accelerates speshing after entering a different(/new) compunit, don't we? 07:21
nine indeed 07:29
timotimo not sure if that has to do with immediately submitting a non-full spesh log or something else 07:31
MasterDuke timotimo: it looks like you still need to include the tracy files in your build process though? you can't use it to do sampling in a completely standalone fashion?
hm, says it doesn't require instrumentation... 07:32
07:54 sena_kun left 08:02 sena_kun joined
MasterDuke timotimo: have successfully captured a tracy profile of a rakudo script? 08:10
timotimo didn't have time to try it yet 08:11
08:20 zakharyas joined
MasterDuke hm. cloned to 3rdparty/tracy. added `-I3rdparty/tracy` to CINCLUDES in the Makefile. added `3rdparty/tracy/TracyC.h` to HEADERS in the Makefile. ran `TRACY_ENABLE=1 make -j12 install`. but it doesn't seem to have worked 08:23
ooo, looks like gcc 11 is supposed to be released next week 08:26
timotimo is TracyC a header-only thing or so? 08:30
MasterDuke think so 08:32
timotimo that does look fine 08:35
what do you mean by "doesn't seem to have worked" tho? 08:36
it doesn't look like the header itself does much. also, i think there's a TRACY_ENABLED define you have to have defined everywhere
MasterDuke i did `TRACY_NO_EXIT=1 raku -e 'say "hi"'` while the listener was running 08:37
but it didn't say anything about collecting any data
timotimo see also this part of the manual 08:38
Unless you’re able to perform automatic call stack sampling(see chapter 3.13.5), you will have to manually instrument the application. All the user-facing interface iscontained in thetracy/Tracy.hppheader file.Manual instrumentation is best started with adding markup to the main loop of the application, alongwith a few function that are called there.
chapter 3.13.5 points to another chapter that tells you that automatic call stack sampling only works when your privileges are elevated, i.e. run-as-root 08:39
MasterDuke had forgotten that, but re-ran with sudo and no change. i'm already building with --debug=3, so should be good there 08:40
timotimo putting TracyAlloc and TracyFree in our MVM_malloc/MVM_calloc and MVM_free functions would already be interesting on its own 08:41
also, you may have to compile the thing in capture/ and that'd give you a binary you can run your actual program with
MasterDuke oh, 3.12 has some relevant info
"Tracy is written in C++, so you will need to have a C++ compiler and link with C++ standard library, 08:42
even if your program is strictly pure C."
timotimo fun
this may be why i wasn't too thrilled last time i tried to get tracy into moar's build process 08:43
MasterDuke well, added `-lstdc++` to LDLIBS, but still nothing 08:44
timotimo another avenue to pursue, which doesn't have to do with tracy, is chrome's tracer frontend: 08:45
did you build the thing in capture/ yet?
MasterDuke i have a system install of tracy
timotimo cool 08:46
MasterDuke it's in the aur
timotimo aur lord and saviaur 08:48
MasterDuke also tried adding `#include "TracyC.H"` to src/main.c, but no change 08:57
timotimo now i understand what capture is here for, ok.
MasterDuke enough for now, time for some other stuff
timotimo well, i assume there'll have to be some -l as well for the tracy code to actually be in our program 08:58
09:50 sena_kun left 09:58 sena_kun joined
jnthn Note that if you set the spesh log buffer size to 1 then you'll also get a huge amount of inter-thread communication, since it sends the buffers off to the spesh thread 10:35
10:37 Kaiepi left, Kaiepi joined
MasterDuke jnthn, nine, et al.: do you have any thoughts re ? it speeds up a CI run by ~10m, but apparently exposes some still-existing problems with precomp on windows 10:42
10:43 Kaeipi joined 10:44 Kaiepi left
MasterDuke i could just remove the `-j2` from the windows prove calls, but that is where the greatest speedup was gained... 10:56
sena_kun Anyone to review ? It is still a release blocker and we still can make a release this weekend. 11:21
lizmat jnthn ^^
nine jnthn: like I said: you'll then be working on geological time scales :) 11:35
11:52 frost-lab left, frost-lab joined 11:55 zakharyas left 12:20 lizmat_ joined 12:24 lizmat left, lizmat_ is now known as lizmat 13:34 zakharyas joined 14:11 domidumont joined 14:12 frost-lab left
MasterDuke ugh. either i'm just not thinking well at all today, or i just shouldn't be let near the azure pipelines in general 15:23
16:49 zakharyas left 17:19 domidumont left 17:22 zakharyas joined
nine So I changed spesh log handling a bit to always send a log on setting a return value. This makes sure that with MVM_SPESH_BLOCKING and MVM_SPESH_NODELAY a frame is specialized after one full run. With this I finally managed to reproduce the problem in gdb and even rr :) 17:57
The reason I couldn't do so earlier really did come down to a kind of butterfly effect. Small differences in the environment (variables) caused spesh logs to fill up at different times and instead of specializing that frame after only one run we specialized after 4 runs, got different facts and the problem went away. 17:58
18:02 brrt joined
timotimo haha told you 18:07
nine We do a frightening amount of processing on those ENV vars... 18:10
brrt good * #moarvm 18:13
japhb Weird, overnight I got a CI failure for CBOR::Simple ( because, as far as I can tell, NaN has a different bitstring on Windows? I can't see any obvious error in OP(writenum) in src/core/interp.c that would constitute a serialization bug but perhaps someone here knows a portability issue that I don't? 18:17
timotimo is it signaling vs non-signaling nans? 18:18
there are a whole load of possible nans
brrt I think there are several different NaN bitstrings possible iirc
yeah, what timotimo said :-)
nwc10 IIRC there are 2**54-2 possible NaN bitstrings
no, 2**52-2
brrt I once read about it and then forgot about it 18:19
timotimo it's like a whole cloister up in there
nwc10 er, wait, it ought to be 2**53 - you have the 52 mantissa bits and the 1 sign bit, but all the mantissa bits zero is Inf or -Inf
japhb Wait, looking at the produced value again it looks like a sign flip?
timotimo convert* 18:20
convent* 18:21
japhb I tried looking for the actual IEEE754 spec, but of course it's behind a paywall because professional societies
timotimo i'm sure wikipedia has a bit on it that's good enough 18:22
brrt doesn't scihub have it?
18:25 zakharyas left 18:26 lucasb joined
japhb timotimo: Wikipedia ( just seems to claim that the sign bit "doesn't matter". I kinda wonder if I should just force it 0 when serializing. 18:26
nwc10 Some crazy (or maybe crappy) platform managed to output NaN and -NaN 18:27
This doesn't make sense, and I can't think that it's valid by any spec (or at least any sane spec). Having been looking at floating point stringification code, it seems to be a bug (possibly entrenched on the platform) of handling the sign bit a little too early. (ie you should be NaN, sign bit, infinity, hurrah, it's normal or subnormal) 18:28
although I think you have to handle 0 early, else you'll render it like a subnormal. 18:29
japhb I'm thinking since CBOR is a cross-platform spec, that if it wants 0 in the sign bit for NaN, that's what I'm going to give it. :-) 18:30
timotimo whats a sea boar 18:31
japhb timotimo: if you're serious, :-P if you're punning. :-)
timotimo is there an extension for comments in the blob 18:33
brrt huh, interesting 18:34
japhb timotimo: Not as such, as far as I can tell, but given that it has tag extensions for entire MIME messages, I'm sure we could figure out something. Besides, there's first come first served extension space. :-) 18:42
japhb pushes to trigger a CI run and see if the workaround does it: 18:44
japhb finds it somewhat humorous that Perl got the first block of tag extensions past the RFC-required set 18:47
*cough* 18:50
timotimo oh, oopsie-daisies 18:52
18:58 brrt left
japhb Fixed! Now to make actual *progress*. :-) 18:58
19:00 lizmat_ joined 19:01 lizmat left 19:02 lizmat_ is now known as lizmat
timotimo "extension 69. this extension is only there to represent the numbers 69, and 420, as well as the ASCII string 'nice' alternatively 'nice.'" 19:02
19:21 cog joined 19:27 brrt joined
nine MasterDuke: what problems does -j2 cause? 19:29
though that particular problem hasn't happened since 19:30
nine Ah, yes. Windows doesn't make these things easy... 19:32
MasterDuke after all that spam from me earlier, should be in a decent state now if anybody has opinions (the third and fourth commits are the more interesting ones) 19:33
20:13 zakharyas joined
nine I finally reproduced it in a golf! 20:14
MVM_SPESH_NODELAY=1 MVM_SPESH_OSR_DISABLE=1 MVM_SPESH_BLOCKING=1 ./rakudo-m -e 'my $foo; my $bar; sub foo(\i) { i //=; i.list.elems}; my $total; $total += foo($foo); $total += foo($bar); say $total;'
This is with express log shipping enabled
And I finally understand why it breaks! 20:25
MasterDuke nice 20:33
nine At the first run, the facts we find for that function are that i is a Scalar container which deconts to an Any type object, which is entirely correct. We construct a proper guard tree to make sure this holds. 20:36
But during the run of the function, we actually assign a List object into that container. But the call i.list is made on the original argument register version r0(2). Spesh facts for that register version still claim it deconts to an Any, but at that point it wouldn't. It's now a List. 20:38
Yet we optimize the findmeth into sp_getspeshslot r5(3), sslot(7) # [001] method lookup of 'list' on a Any 20:39
Containers essentially break the assumptions behind SSA as suddenly what we know about a register version is not stable at all. 20:41
20:43 zakharyas left
MasterDuke huh 20:44
nine Seems like this is one of those "how could we have gotten so far and not come across this?" issues 20:45
MasterDuke easy fix? 20:46
nine Right now I don't see any way to fix at all. Besides no longer relying on decont facts which would be terrible for performance. 20:47
jnthn nine: I thought all of that was ripped out long ago. 20:51
Except for at the function entry point (the spesh arg guard)
nine jnthn: what do you mean with "all of that"? 20:53
jnthn nine: Facts about what something deconts to 20:54
nine Well the container in question _is_ an argument
jnthn Anyway, the reason that we "get away with" so much is that I already eliminated this everywhere else, afaik 20:56
nine And now we know even for arguments it can be problematic :/ is the spesh log btw.
jnthn Wait, are we talking about parameters or the arguments to a call we're making? 20:59
nine The value progresses through param_rp_o -> r1(2) -> hllize -> r2(2) -> set -> r1(3) -> set -> r0(2) -> decont -> r4(3) -> findmeth
But at the point of the findmeth we alredy changed the parameter's contents.
Sorry, I can never keep parameters and arguments apart :/ 21:00
jnthn I'm trying to remember why we still need that there.
nine The thing that changes is the incoming container.
That's the \i argument of sub foo I posted above
jnthn Also: of course the reason we so rarely see this is because arguments are ro by default 21:01
And so we just decont them once at the start and that's it 21:02
Fun realization: with new-disp, we can introspect the signature of the callee and arrange for decont to happen on the *callee* side when we see the target argument is readonly, which means that for infix:<+> we'd only need an (Int,Int) specialization, not all the combinations of (Scalar[Int], Int) and so forth. Even better, that will mean way more EA possibilities even when we can't inline. 21:04
As for the current problem: I'd suggest just not setting decont_type
And seeing if there's any significant upshot 21:05
I'd kinda forgotten we do it there. Thing is that if we *do* decont then it's logged, so we'd get a guard, but guards that are met are cheap
nine So basically rip out the support for argument decont types? 21:06
jnthn Yes, in src/spesh/args.c
nine So another case of fixing stuff by removing code :D 21:07
jnthn I mean, the spesh arg guard will still care about checking them; I don't recall if it reuses the MVMSpeshFacts data structure at some point along the way
Hopefully in a couple of months I'll be deleting quite a bit more :)
The new-disp branch already has spesh plugins removed :)
nine is looking forward to debug exiting new bugs 21:08
jnthn After comes the multi cache, the method cache, all the serialization and deserialization code around that, sp_findmethod, the findmethod op, the can op... :)
Sigh, yes, my apologies in advance. I fear I'm only a little wiser and certainly no less fallible than I was when I wrote the stuff you're debugging now. 21:09
otoh, I hope the new design, covering more cases with a single mechanism, will mean less code and so less places for bugs to hide 21:10
At least, the VM surface area should be smaller.
21:11 brrt left
nine And the bug disappears... It's like magic! 21:16
jnthn :) 21:20
I wish I knew why I left that behind.
I remember tossing it out for every other case and there was some reason I didn't do that one.
21:20 dogbert17 joined 21:24 dogbert11 left
[Coke] maybe rationale left in the commit ? 21:35
nine++ so good