github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
Geth MoarVM/ryu: 6 commits pushed by (Nicholas Clark)++ 05:45
nwc10 good *, #moarvm 05:46
japhb Good way to say hello, with a push! :-) 05:49
patrickb o/ 06:52
nwc10 \o 06:54
Geth MoarVM: patrickbkr++ created pull request #1484:
CI: Update package index before installing packages
07:03
patrickb nwc10: I hope the above will fix the ryu PR. 07:04
nwc10 I was about to ask you exactly that :-)
patrickb Work on CIs tends to be a lot of pain. I have quite a bit of sympathy for MasterDuke and his attempt to improve the Azure chain. 07:07
MasterDuke++ 07:08
Geth MoarVM: 19db00f75d | (Patrick Böker)++ | azure-pipelines.yml
CI: Update package index before installing packages

This should fix missing package errors.
07:36
MoarVM: 09c4c4d427 | (Patrick Böker)++ (committed using GitHub Web editor) | azure-pipelines.yml
Merge pull request #1484 from patrickbkr/ci-update-package-index

CI: Update package index before installing packages
patrickb It now installs gdb 10. So the fix seems to actually have been correct. 07:37
Geth MoarVM/ryu: 6 commits pushed by (Nicholas Clark)++ 07:47
nwc10 and now rebased onto that
timotimo you know, jnthn, if we know that a frame we just hit the instrumentation barrier on in order to validate it is going to call into another soon after, we could totally queue the other frame for immediate verification, perhaps off-thread, and win the tiniest amount of latency 09:14
jnthn Given a lot of frames are quite small, I think off-thread might not be a win, in that the coordination could dominate, but one could go on bytecode size 09:24
timotimo i wonder what workload i had where verification was a lot of time spent. perhaps "the empty program" 09:25
jnthn But yeah, since the instrumentation has to walk the bytecode anyway...
There'a a memory trade-off also iirc, because bytecode validation depends on the frame being fully deserialized and I think also creates annotation maps 09:26
And just because one frame can be statically seen as referencing another doesn't mean that it will actually call it
For example, CATCH blocks 09:27
timotimo cdn.discordapp.com/attachments/557...nknown.png here's the moar heapanalyzer with its bytecode validation zones
jnthn In which case we risk doing work ahead of time that we'd neve really do
jnthn The long tail is interesting there 09:29
jnthn I wonder what it took 9ms to validate 09:30
I should really install that Tracey thing when I get to tuning up new-disp 09:31
Looks pretty amazing in terms of what you can visualize and find out 09:32
timotimo i'm putting in reporting for what exactly is being validated right now so we can see
can't reproduce the 9 right now 09:35
jnthn I guess this can be sensitive to context switches and other sources of load, or does it somehow account for that? 09:44
timotimo i would have seen it if i had zoomed to the zone 09:48
and i think the "self time" and "running time" and other stats also show that
it could be the mainline of nqp/lib/QAST.moar takes a bit long 09:51
hm, though for that i only got the name not the filename which i should have gotten if it were that file 09:52
that was the last load bytecode region before that validation however 09:54
timotimo or i have to see it like a stack 09:55
ok i have the filename in it as well now 09:58
nwc10 patrickb: yes, your fix fixed it 09:59
and I realise "most context free message so far today" 10:00
oops. Azure, apt and d'oh!
timotimo MASTOPS' frame with uuid 827 wins with 993 μs, next up is nqp/lib/QAST.moarvm's <mainline> with 777μs and then is core.c.setting.moarvm's 19141 with 766, then it drops a bit with 400 μs for core setting's unit
that very first one is the frame that has a boatload of locals and the code is just getcode + takeclosure + getcode + takeclosure et cetera followed by checkarity, paramnamesused, then wval + bindlex a lot, and then it sets up integer arrays by pushing const_i64_16 values into it over and over 10:02
and then a hash with string keys, where integers are boxed but it's not even caching the hllboxtype_i it just gets it over and over again :D 10:03
it's got 16.67k instructions in total 10:04
also 866 registers
wonder if anything keeps us from getting these arrays and hashes created at compile-time and properly serialized 10:05
nwc10 I've not used Bloaty McBloatface, but from what I've read it tells you the size of various sections. Is there any easy way to do that for MoarVM compiled bytecode files? So one can see whether a change moves stuff between sections? Or what total size the various different bytecode tags add up to? 10:06
timotimo what does "bytecode tags" mean to you here? 10:07
just 10.15k instructions in the <mainline> of QAST.moarvm, but it looks even more wasteful in some spaces 10:08
nwc10 good question. Wasn't clear. Say, how much is serialised arrays. Vs say serialised code. Except, I realise that this is a daft idea becuase everything contains everything else
timotimo haha, yeah true
timotimo do we do any good on repeated findmeth with the same name on the same object (grabbed fresh with a wval each time) when every findmeth is in a different spot in a frame we only run a single time? 10:17
right now our QAST.nqp ends up with a boatload of top-level calls to QAST::MASTOperations.add_core_moarop_mapping, for example, and that's a wval + decont + findmeth every time, though we could perhaps bind QAST::MASTOperations itself to a local, have the method found once up front, and call it over and over again 10:19
jnthn Even better would be to build the mapping data structure at BEGIN time, but maybe something blocks us on that 10:25
timotimo we currently use closures to create these core moarop mappers; serializing closures isn't a problem tho 10:26
why are these operations even still there now that we have the big hash of writer subs 10:27
i guess for argument count validation and such
jnthn And sometimes name discrepancies, I guess 10:28
timotimo that's right 10:29
not just sometimes, actually kind of a lot. many related to underscores 10:32
timotimo anyway, if these two frames are improved to rely on serialization instead of mainline execution, imagine the miliseconds this could save 11:08
lizmat would take any msecs saved at startup
timotimo at least 1 milisecond from validation, and i don't have any measurement for how long these two mainlines take to execute by themselves
lizmat where would I need to make changes?
timotimo the trickier one would be in nqp's vm/moar/QAST/QASTOperationsMAST.nqp 11:10
huh. the MASTOps frame that i thought was the one taking so long to validate is the big begin block that makes up the entire moarvm/lib/MAST/Ops.nqp file 11:12
i don't see it get invoked by anything in the dump of MASTOps.moarvm eithegr 11:14
does the frame being something else's outer cause it to be validated?
lizmat things like %hll_inlinability{$hll} := {} unless nqp::existskey(%hll_inlinability, $hll); 11:15
could benefit from using nqp::ifnull
reducing number of lookups
but I guess that's only once per op anyway
timotimo that lives in add_core_moarop_mapping or so? 11:16
as long as we run that code on startup a tiny improvement like that could help. if we can get it to run during compile instead, the win isn't as big 11:18
i mean, the win from making it run at compile time is big, the win from using infull in that case there will not be as big 11:22
timotimo cdn.discordapp.com/attachments/557...nknown.png - guess what you can just nativecall into the TracyC functions, like ___tracy_emit_message 11:52
tracy is telling me "sampling is disabled due to non-native scheduler clock. are you running under a VM?" and i don't know what that means. 11:54
lizmat not running on a physical machine ? 11:55
but under valgrind? or in a container ?
timotimo yeah but this is on hardware 11:56
timotimo no VMs that i'm aware of 11:57
lizmat maybe Tracy knows something you don't ? 11:58
can you actually *see* the hardware ?
timotimo yeah 11:59
i mean, i see the enclosure it's inside of
lizmat ok :-) 12:05
nine timotimo: the kernel can use a variety of clock sources. What does /sys/devices/system/clocksource/clocksource0/current_clocksource say about it? 13:05
timotimo it says hpet 13:06
High Precision Extreme Timer?
the other available one is acpi_pm
for masterduke, tracy's sampling worked fine, i wonder what the clock source is on that machine? 13:07
nine on mine it's tsc 14:28
timotimo Traditional Source of Clocks 14:30
what clock sources are available on your machine, nine? 14:31
my desktop also has tsc active and has tsc, hpet, and acpi_pm available
(the system i'm working from right now is my laptop)
nine tsc hpet acpi_pm 14:34
timotimo tracy dev points out that the flag "cap_user_time_zero" isn't set when doing perf_open, which isn't clear how to get that set, but is possibly an NYI in the kernel for my hardware perhaps? 14:55
MasterDuke timotimo: /sys/devices/system/clocksource/clocksource0/current_clocksource is tsc for me 20:44
MasterDuke .tell jnthn in case you missed them, some questions here colabti.org/irclogger/irclogger_lo...-04-30#l53 21:25
tellable6 MasterDuke, I'll pass your message to jnthn
MasterDuke oh, looks like i may have figured out this azure pipeline enough to get it a bit simplified, assuming people like the changes 21:51
MasterDuke spoke a little too soon... 21:55