github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:05
Kaiepi joined
00:38
Kaiepi left
01:54
lizmat joined
02:22
bloatable6 joined
02:39
Util joined
03:51
Kaiepi joined
|
|||
Geth | MoarVM/ryu: 6 commits pushed by (Nicholas Clark)++
|
05:45 | |
nwc10 | good *, #moarvm | 05:46 | |
japhb | Good way to say hello, with a push! :-) | 05:49 | |
06:11
sivoais left
06:39
sivoais joined
06:50
patrickb joined
|
|||
patrickb | o/ | 06:52 | |
nwc10 | \o | 06:54 | |
Geth | MoarVM: patrickbkr++ created pull request #1484: CI: Update package index before installing packages |
07:03 | |
patrickb | nwc10: I hope the above will fix the ryu PR. | 07:04 | |
nwc10 | I was about to ask you exactly that :-) | ||
patrickb | Work on CIs tends to be a lot of pain. I have quite a bit of sympathy for MasterDuke and his attempt to improve the Azure chain. | 07:07 | |
MasterDuke++ | 07:08 | ||
Geth | MoarVM: 19db00f75d | (Patrick Böker)++ | azure-pipelines.yml CI: Update package index before installing packages This should fix missing package errors. |
07:36 | |
MoarVM: 09c4c4d427 | (Patrick Böker)++ (committed using GitHub Web editor) | azure-pipelines.yml Merge pull request #1484 from patrickbkr/ci-update-package-index CI: Update package index before installing packages |
|||
patrickb | It now installs gdb 10. So the fix seems to actually have been correct. | 07:37 | |
07:39
lizmat left
|
|||
Geth | MoarVM/ryu: 6 commits pushed by (Nicholas Clark)++
|
07:47 | |
nwc10 | and now rebased onto that | ||
07:53
sena_kun left,
lizmat joined
07:59
sena_kun joined
08:06
LizBot joined
08:28
zakharyas joined
|
|||
timotimo | you know, jnthn, if we know that a frame we just hit the instrumentation barrier on in order to validate it is going to call into another soon after, we could totally queue the other frame for immediate verification, perhaps off-thread, and win the tiniest amount of latency | 09:14 | |
jnthn | Given a lot of frames are quite small, I think off-thread might not be a win, in that the coordination could dominate, but one could go on bytecode size | 09:24 | |
timotimo | i wonder what workload i had where verification was a lot of time spent. perhaps "the empty program" | 09:25 | |
jnthn | But yeah, since the instrumentation has to walk the bytecode anyway... | ||
There'a a memory trade-off also iirc, because bytecode validation depends on the frame being fully deserialized and I think also creates annotation maps | 09:26 | ||
And just because one frame can be statically seen as referencing another doesn't mean that it will actually call it | |||
For example, CATCH blocks | 09:27 | ||
timotimo | cdn.discordapp.com/attachments/557...nknown.png here's the moar heapanalyzer with its bytecode validation zones | ||
jnthn | In which case we risk doing work ahead of time that we'd neve really do | ||
09:28
frost-lab joined
|
|||
jnthn | The long tail is interesting there | 09:29 | |
09:30
cog_ left
|
|||
jnthn | I wonder what it took 9ms to validate | 09:30 | |
I should really install that Tracey thing when I get to tuning up new-disp | 09:31 | ||
Looks pretty amazing in terms of what you can visualize and find out | 09:32 | ||
timotimo | i'm putting in reporting for what exactly is being validated right now so we can see | ||
can't reproduce the 9 right now | 09:35 | ||
jnthn | I guess this can be sensitive to context switches and other sources of load, or does it somehow account for that? | 09:44 | |
timotimo | i would have seen it if i had zoomed to the zone | 09:48 | |
and i think the "self time" and "running time" and other stats also show that | |||
it could be the mainline of nqp/lib/QAST.moar takes a bit long | 09:51 | ||
hm, though for that i only got the name not the filename which i should have gotten if it were that file | 09:52 | ||
that was the last load bytecode region before that validation however | 09:54 | ||
09:55
domidumont joined
|
|||
timotimo | or i have to see it like a stack | 09:55 | |
ok i have the filename in it as well now | 09:58 | ||
nwc10 | patrickb: yes, your fix fixed it | 09:59 | |
and I realise "most context free message so far today" | 10:00 | ||
oops. Azure, apt and d'oh! | |||
timotimo | MASTOPS' frame with uuid 827 wins with 993 μs, next up is nqp/lib/QAST.moarvm's <mainline> with 777μs and then is core.c.setting.moarvm's 19141 with 766, then it drops a bit with 400 μs for core setting's unit | ||
that very first one is the frame that has a boatload of locals and the code is just getcode + takeclosure + getcode + takeclosure et cetera followed by checkarity, paramnamesused, then wval + bindlex a lot, and then it sets up integer arrays by pushing const_i64_16 values into it over and over | 10:02 | ||
and then a hash with string keys, where integers are boxed but it's not even caching the hllboxtype_i it just gets it over and over again :D | 10:03 | ||
it's got 16.67k instructions in total | 10:04 | ||
also 866 registers | |||
wonder if anything keeps us from getting these arrays and hashes created at compile-time and properly serialized | 10:05 | ||
nwc10 | I've not used Bloaty McBloatface, but from what I've read it tells you the size of various sections. Is there any easy way to do that for MoarVM compiled bytecode files? So one can see whether a change moves stuff between sections? Or what total size the various different bytecode tags add up to? | 10:06 | |
timotimo | what does "bytecode tags" mean to you here? | 10:07 | |
just 10.15k instructions in the <mainline> of QAST.moarvm, but it looks even more wasteful in some spaces | 10:08 | ||
nwc10 | good question. Wasn't clear. Say, how much is serialised arrays. Vs say serialised code. Except, I realise that this is a daft idea becuase everything contains everything else | ||
timotimo | haha, yeah true | ||
10:13
samcv left,
samcv joined
|
|||
timotimo | do we do any good on repeated findmeth with the same name on the same object (grabbed fresh with a wval each time) when every findmeth is in a different spot in a frame we only run a single time? | 10:17 | |
right now our QAST.nqp ends up with a boatload of top-level calls to QAST::MASTOperations.add_core_moarop_mapping, for example, and that's a wval + decont + findmeth every time, though we could perhaps bind QAST::MASTOperations itself to a local, have the method found once up front, and call it over and over again | 10:19 | ||
jnthn | Even better would be to build the mapping data structure at BEGIN time, but maybe something blocks us on that | 10:25 | |
timotimo | we currently use closures to create these core moarop mappers; serializing closures isn't a problem tho | 10:26 | |
why are these operations even still there now that we have the big hash of writer subs | 10:27 | ||
i guess for argument count validation and such | |||
jnthn | And sometimes name discrepancies, I guess | 10:28 | |
timotimo | that's right | 10:29 | |
not just sometimes, actually kind of a lot. many related to underscores | 10:32 | ||
10:42
frost-lab left
|
|||
timotimo | anyway, if these two frames are improved to rely on serialization instead of mainline execution, imagine the miliseconds this could save | 11:08 | |
lizmat would take any msecs saved at startup | |||
timotimo | at least 1 milisecond from validation, and i don't have any measurement for how long these two mainlines take to execute by themselves | ||
lizmat | where would I need to make changes? | ||
timotimo | the trickier one would be in nqp's vm/moar/QAST/QASTOperationsMAST.nqp | 11:10 | |
huh. the MASTOps frame that i thought was the one taking so long to validate is the big begin block that makes up the entire moarvm/lib/MAST/Ops.nqp file | 11:12 | ||
i don't see it get invoked by anything in the dump of MASTOps.moarvm eithegr | 11:14 | ||
does the frame being something else's outer cause it to be validated? | |||
lizmat | things like %hll_inlinability{$hll} := {} unless nqp::existskey(%hll_inlinability, $hll); | 11:15 | |
could benefit from using nqp::ifnull | |||
reducing number of lookups | |||
but I guess that's only once per op anyway | |||
timotimo | that lives in add_core_moarop_mapping or so? | 11:16 | |
as long as we run that code on startup a tiny improvement like that could help. if we can get it to run during compile instead, the win isn't as big | 11:18 | ||
i mean, the win from making it run at compile time is big, the win from using infull in that case there will not be as big | 11:22 | ||
11:27
zakharyas left
11:44
patrickb left
|
|||
timotimo | cdn.discordapp.com/attachments/557...nknown.png - guess what you can just nativecall into the TracyC functions, like ___tracy_emit_message | 11:52 | |
tracy is telling me "sampling is disabled due to non-native scheduler clock. are you running under a VM?" and i don't know what that means. | 11:54 | ||
lizmat | not running on a physical machine ? | 11:55 | |
but under valgrind? or in a container ? | |||
timotimo | yeah but this is on hardware | 11:56 | |
11:57
MasterDuke left
|
|||
timotimo | no VMs that i'm aware of | 11:57 | |
lizmat | maybe Tracy knows something you don't ? | 11:58 | |
can you actually *see* the hardware ? | |||
timotimo | yeah | 11:59 | |
i mean, i see the enclosure it's inside of | |||
12:02
domidumont left
|
|||
lizmat | ok :-) | 12:05 | |
nine | timotimo: the kernel can use a variety of clock sources. What does /sys/devices/system/clocksource/clocksource0/current_clocksource say about it? | 13:05 | |
timotimo | it says hpet | 13:06 | |
High Precision Extreme Timer? | |||
the other available one is acpi_pm | |||
for masterduke, tracy's sampling worked fine, i wonder what the clock source is on that machine? | 13:07 | ||
13:15
Nornie28 joined
13:19
Nornie28 left
13:24
LizBot left
14:07
domidumont joined
|
|||
nine | on mine it's tsc | 14:28 | |
timotimo | Traditional Source of Clocks | 14:30 | |
what clock sources are available on your machine, nine? | 14:31 | ||
my desktop also has tsc active and has tsc, hpet, and acpi_pm available | |||
(the system i'm working from right now is my laptop) | |||
nine | tsc hpet acpi_pm | 14:34 | |
14:47
zakharyas joined
|
|||
timotimo | tracy dev points out that the flag "cap_user_time_zero" isn't set when doing perf_open, which isn't clear how to get that set, but is possibly an NYI in the kernel for my hardware perhaps? | 14:55 | |
15:08
zakharyas left
15:09
zakharyas joined
16:07
domidumont left
16:22
domidumont joined
17:06
lizmat left
17:16
lizmat joined
17:18
domidumont left
17:24
LizBot joined
18:04
dogbert17 left
18:07
dogbert17 joined
18:08
LizBot left,
LizBot joined
18:47
zakharyas left
19:56
zakharyas joined
20:18
zakharyas left
20:32
cog_ joined
20:43
MasterDuke joined
|
|||
MasterDuke | timotimo: /sys/devices/system/clocksource/clocksource0/current_clocksource is tsc for me | 20:44 | |
20:50
unicodable6 left,
tellable6 joined
20:51
evalable6 joined,
greppable6 left,
releasable6 left,
committable6 joined
20:52
shareable6 joined,
sourceable6 joined
20:53
bisectable6 joined,
statisfiable6 left
20:54
statisfiable6 joined
20:55
unicodable6 joined
20:56
tellable6 left
20:57
releasable6 joined
21:05
sourceable6 left,
benchable6 left,
sourceable6 joined,
bloatable6 left
21:07
nativecallable6 joined,
tellable6 joined,
bloatable6 joined,
greppable6 joined
|
|||
MasterDuke | .tell jnthn in case you missed them, some questions here colabti.org/irclogger/irclogger_lo...-04-30#l53 | 21:25 | |
tellable6 | MasterDuke, I'll pass your message to jnthn | ||
21:49
evalable6 left,
shareable6 left,
statisfiable6 left,
bisectable6 left,
unicodable6 left,
committable6 left,
sourceable6 left,
bloatable6 left,
tellable6 left,
releasable6 left,
coverable6 left,
greppable6 left,
nativecallable6 left
|
|||
MasterDuke | oh, looks like i may have figured out this azure pipeline enough to get it a bit simplified, assuming people like the changes | 21:51 | |
21:51
unicodable6 joined,
sourceable6 joined
21:52
notable6 joined,
greppable6 joined,
releasable6 joined,
statisfiable6 joined,
committable6 joined
21:53
evalable6 joined,
tellable6 joined
|
|||
MasterDuke | spoke a little too soon... | 21:55 | |
22:36
Kaiepi left
23:15
Kaiepi joined
|