github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
00:19
squashable6 left
00:22
squashable6 joined
00:45
ggoebel joined
00:53
lucasb left
03:14
ggoebel left
05:05
robertle left
06:49
camelia left
06:50
camelia joined
06:53
domidumont joined
07:14
discord6 left,
discord6 joined
07:15
sena_kun joined
07:18
releasable6 left,
releasable6 joined
07:20
sivoais_ joined,
sivoais left
07:26
hankache joined
07:53
hankache left
08:42
sena_kun left
09:24
domidumont left
11:42
shareable6 joined
12:11
domidumont joined
12:20
ggoebel joined
13:50
lucasb joined
15:01
sena_kun joined
|
|||
dogbert17 | .seen nine | 15:05 | |
tellable6 | dogbert17, I saw nine 2019-10-16T15:04:38Z in #perl6: <nine> In my experience, getting such feedback is a good time to evaluate my own behaviour, regardless of whether I feel that it should be directed at me or not | ||
15:23
robertle joined
15:53
hankache joined
|
|||
nine | I'm here ;) | 16:05 | |
timotimo | .o( MVM_op_get_mark could get optimized ) | 16:34 | |
8,483,131 self time before, 4,759,885 after, oh my | 16:35 | ||
jnthn | That speeds up bytecode validation? | 16:38 | |
timotimo | little bit | ||
is there a tool that gives you C code for a lookup when given a lookup table? | |||
is "perfect hashing" the right keyword here? | 16:39 | ||
dogbert17 | nine: are you still there :) | 16:42 | |
timotimo | OK, so callgrind gives me the distribution of returns from MVM_op_get_mark | ||
nine | dogbert17: now you're making me really curious why you're asking for me ;) | 16:43 | |
timotimo | 179k -a (op >= 135 && op < 140), 4k op == 157, 1.5k ".s" (i.e. spesh ops), 76k op == 23, 71k op == 34 ":j", 71k op >= 51 && op < 56 ".r", 69k op == 127 "+a" and 303k op >= 128 && op <= 135 "*a", 169k " " (everything else) | 16:44 | |
dogbert17 | I just wanted to tell you that I've drawn a blank wrt gist.github.com/dogbert17/26d3e358...e163246bc7 | ||
timotimo | at the moment i have it split between >= 135 and < 135 | 16:45 | |
dogbert17 | I couldn't really understand your comment from yesterday: "read_setup looks fishy. Looks to me like MVM_io_eventloop_add_active_work can allocate. Would need to MVMROOT t there" | ||
timotimo | that doesn't seem optimal, the <135 case seems overrepresented | ||
dogbert17 | it looks ok to me but I must have missed something | ||
nine | dogbert17: ah, sorry, my comment wasn't entirely correct. There is an issue, but it's not about t as that's not yet initialized. | 16:47 | |
This can allocate: github.com/MoarVM/MoarVM/blob/mast...udp.c#L173 | |||
timotimo | 4,744,146 to 4,873,498, not an improvement | ||
nine | So *async_task is the thing that may be moved there and that needs MVMROOTing | ||
dogbert17 | aha | ||
nine | Otherwise t may get initialized with an out of date pointer | 16:48 | |
dogbert17 | I could definitely try that | ||
nine | It may not cause the error you see but it's definitely a bug | 16:49 | |
timotimo | m: say 4759885 * 100 / 685470196 | ||
camelia | 0.6943970763 | 16:50 | |
timotimo | m: say8483131 * 100 / 685470196 | ||
camelia | ===SORRY!=== Error while compiling <tmp> Two terms in a row at <tmp>:1 ------> say8483131 *ā 100 / 685470196 expecting any of: infix infix stopper postfix statement end stateā¦ |
||
timotimo | m: say 8483131 * 100 / 685470196 | ||
camelia | 1.2375637992 | ||
timotimo | that was earliest version vs currently "best" version | ||
maybe i should bisect the range again | |||
dogbert17 | nine: fixed it locally but, as you suspected, it didn't fix the problem | 16:51 | |
nine | dogbert17: so do you know where t->body.schedulee gets initialized in the failing example? | 16:58 | |
timotimo | bleehhh | ||
MVMROOT2 is the line, callgrind reports "26 calls to repr_alloc_init, 7375 call to repr_get_str, 52 calls to MVM_fixed_size_alloc, 35561 calls to MVM_repr_elems, 135800 calls to MVM_coerce_simple_intify, 35535 calls to MVM_fixed_size_alloc, 209226 calls to MVM_repr_at_pos_o" | 16:59 | ||
gee, thanks | |||
dogbert17 | I tried to figure that out by using breakpoint in gdb, all I've seen so far is that github.com/MoarVM/MoarVM/blob/mast...udp.c#L537 is called several times an the third time it tends to fail on line 573 | 17:00 | |
timotimo | that one "line" is doing a lot of heavy lifting obviously | ||
nine | dogbert17: I'd probably just read te source to find that out. Someone must cause do_setup_setup to be called and that's probably where t gets set up | 17:03 | |
dogbert17 | I guess I'll continue my hunt wut you're welcome to join :) | 17:06 | |
*but | |||
nine: this line is called: github.com/MoarVM/MoarVM/blob/mast...udp.c#L660 | 17:08 | ||
nine | dogbert17: that looks quite ok. It's even a bit too paranoid. That MVMROOT5 there could just be MVMROOT(tc, task, ... as the other variables are not used anymore. | 17:17 | |
dogbert17: what if an outdated pointer is already passed to MVM_io_socket_udp_async? You could add an assertion there | |||
dogbert17 | do you mean something like this? github.com/MoarVM/MoarVM/blob/mast...list.h#L50 | 17:23 | |
nine | there's an MVM_GC_ASSERT_NOT_FROMSPACE or such | 17:24 | |
dogbert17 | aha, will look for it | ||
found it | 17:25 | ||
nine | You can sprinkle that around in the code to narrow down the place where things go wrong | 17:26 | |
timotimo | MVM_serialization_read_ref at 197,147,665 incl / 35,392,385 self before, 192,316,491 incl / 40,777,091 self after | 17:30 | |
i devirtualized integer reading a little bit | 17:31 | ||
maybe MVM_repr_get_by_id isn't better than grabbing the REPR out of the object | 17:32 | ||
17:33
domidumont left
|
|||
timotimo | 190,765,710 incl / 40,777,091 self afterwards | 17:33 | |
dogbert17 | so far I can see that 'task', i.e. ssi->async_task, is busted here: github.com/MoarVM/MoarVM/blob/mast...udp.c#L540 | ||
nine | dogbert17: great! Is uv__finish_close in that backtrace as well? | 17:34 | |
dogbert17 | #0 MVM_panic (exitCode=1, messageFormat=0x7ffff76323c8 "Collectable %p in fromspace accessed") at src/core/exceptions.c:832 | ||
#1 0x00007ffff74a4138 in do_setup_setup (handle=0x7fffe4002630) at src/io/asyncsocketudp.c:543 | |||
#2 0x00007ffff760ed25 in uv__finish_close (handle=0x7fffe4002630) at 3rdparty/libuv/src/unix/core.c:286 | 17:35 | ||
I put the assert on line 543 | |||
nine | dogbert17: hey...wait a minute. Why does the setup_op_table not have a gc_mark entry? | 17:41 | |
dogbert17 | perhaps it should | 17:43 | |
nine | I'm pretty sure that if during the life time of that task a GC may get triggered (like it can) the task needs a gc_mark function | 17:44 | |
17:45
hankache left,
hankache joined
|
|||
dogbert17 | and this? github.com/MoarVM/MoarVM/blob/mast...ket.c#L727 | 17:45 | |
nine | I don't see why not | 17:48 | |
Oh, now I do. There's a generic gc_mark in MVMAsyncTask | 17:49 | ||
That takes care of queue, schedulee, cancel_notify_queue and cancel_notify_schedulee | |||
timotimo | hrm. kcachegrind likes to read the source files over and over again | 17:51 | |
or maybe i'm Doing It Wrong | |||
dogbert17 | but setup_op_table still needs a mark function ? | 17:52 | |
nine | dogbert17: ooh....only now I realized. You already mentioned ssi->async_task being busted. That's not t->body.schedulee but t itself. And that's kept in this SocketSetupInfo structure. But who marks the async_task pointer in that structure during GC? | 17:53 | |
dogbert17 | perhaps no one | 17:54 | |
nine | That would be setup_gc_mark's job | ||
So now we know why we need such a function in asyncsocketudp but not in asyncsocket | |||
dogbert17 | indeed | 17:57 | |
timotimo | hurm. REPR(obj)->ass_funcs.bind_key(...) is faster than MVMHash_bind_key even though the latter doesn't have to do a function pointer deref | 17:59 | |
nine | timotimo: what the? | 18:00 | |
timotimo | the MVMHash_bind_key adds 100,000 estimated cycles | ||
30753 calls via the ass_funcs gives me 13,168,926 estimated cycles, and via MVMHash_bind_key it's 13,347,851 | 18:01 | ||
maybe it's important that the reprops struct is const? | |||
nine | timotimo: why is there even a bind_key function in MVMHash.c? Could just as well put that code into MVMHash_bind_key directly and use that in the repr_ops | 18:03 | |
timotimo | yeah | ||
let me do that | 18:04 | ||
13,192,210 for MVMHash_bind_key now, i should probably also double-check how stable the numbers are | 18:05 | ||
nine | yeah, since it's only estimations | 18:07 | |
timotimo | Ir events are easily going up and down 100k out of the 496 mill | 18:08 | |
18:09
hankache left
|
|||
timotimo | so comparing current state with master: | 18:10 | |
m: say 496,122,671 * 100 / 507,117,300 # total cycle estimation cost of a perl6 -e 'say 0.1 + 0.2 - 0.3' | 18:11 | ||
camelia | 496122132.34714117300 | ||
timotimo | haha oops | ||
m: say 496_122_671 * 100 / 507_117_300 # total cycle estimation cost of a perl6 -e 'say 0.1 + 0.2 - 0.3' | |||
camelia | 97.83193573 | ||
timotimo | m: say 189_796_594 * 100 / 197_356_704 # inclusive cycles of MVM_serialization_read_ref | 18:12 | |
camelia | 96.169316853 | ||
timotimo | m: say 72_546_296 * 100 / 76_316_992 # inclusive cycles of MVM_validate_static_frame | 18:13 | |
camelia | 95.05916585 | ||
18:13
MasterDuke joined
|
|||
timotimo | m: say 151_149_442 * 100 / 157_972_415 # inclusive cycles of deserialize | 18:14 | |
camelia | 95.680908594 | ||
timotimo | of course deserialize includes read_ref | ||
dogbert17 | nine: soemthing like 'static void setup_gc_mark(MVMThreadContext *tc, void *data, MVMGCWorklist *worklist) { SocketSetupInfo *ssi = (SocketSetupInfo *)data; MVM_gc_worklist_add(tc, worklist, &ssi->async_task); }' | 18:17 | |
timotimo | only even calling into MVM_intcache_get if the number falls into the right range is also a good idea | 18:20 | |
from 356k calls to 169k calls | 18:21 | ||
but only from 2_961_990 to 2_028_775 cycles (because most of the extra calls returned NULL very early) | |||
nine | dogbert17: makes sense, yes | 18:23 | |
dogbert17 | and it actually seems to work | 18:24 | |
nine | yeah! \o/ | ||
dogbert17: nice work! | |||
dogbert17 | nine: do we need to MVM_gc_worklist_add(tc, worklist, <something else>); | 18:25 | |
nine++ for excellent help | |||
should we attack t/spec/S17-procasync/encoding.t now? MasterDuke mentioned something about a cast | 18:27 | ||
here's the gist of the matter: gist.github.com/dogbert17/e4cb7b4d...73a49c7727 | 18:28 | ||
MasterDuke | is there a safe way to cast `MVMGrapheme32 *` into `char *`? | ||
dogbert17 waits with baited breath for an answer :) | 18:29 | ||
dogbert17 sigh should of course be bated | 18:30 | ||
timotimo | what is your actual use case? | ||
if you just cast, you'll get quadruple-spaced rubbish text if you just print it | |||
MasterDuke | heh, wondered if i should have said something, but you caught it | ||
github.com/MoarVM/MoarVM/blob/mast...#L520-L537 | |||
timotimo | if you just MVM_free the result of casting, it doesn't matter at all | 18:31 | |
which is what throw_adhoc_free does with it | |||
m: say 494_254_583 * 100 / 507_117_300 | 18:34 | ||
camelia | 97.46356178 | ||
timotimo | i mean, it ain't an order of magnitude, but i'll take it | ||
should also look at whether the cycle count is reduced with my other branch that stores a bit of optional info in the discriminator number | 18:35 | ||
serialization_read_int is the function on second place after MVM_serialization_read_ref (in self cycle counts) | 18:36 | ||
but it's also called 938k times compared to read_ref's 714k times | |||
read_ref calls it 539k times, deserialize_stable calls it 181.8k times and deserialize calls it another 85k times | 18:37 | ||
of course the only way to gain a real amount of performance is to have to deserialize less during startup | 18:46 | ||
Geth | MoarVM: dogbert17++ created pull request #1195: Add missing gc_mark entry to setup_op_table |
18:49 | |
18:49
reportable6 joined
|
|||
MasterDuke | well, there's something about that code that's wrong. if i revert github.com/MoarVM/MoarVM/blame/mas...#L450-L468 back to before my change (i.e., to just `MVM_free(buffer); MVM_exception_throw_adhoc(tc, "Malformed UTF-8"); break;` valgrind doesn't complain anymore | 18:50 | |
nine | dogbert17, MasterDuke: pos is the index into char *bytes. Why is it used to index into *buffer? There's "count" for that | 18:51 | |
buffer will often by smaller than bytes | 18:52 | ||
MasterDuke | nine: are my indices correct (aside from using the wrong variable)? | 18:54 | |
because just changing pos to count still has the exact same behavior | 18:55 | ||
nine | may be off by one? Btw. why are you reading buffer in the first place and not bytes? | 18:56 | |
Buffer will contain only the stuff that was correct if I understand this correctly | |||
MasterDuke | i very well could be wrong, i originally wrote that quite a while ago | 18:57 | |
nine | I get totally different valgrind errors when I replace pos with count | 18:59 | |
MasterDuke | hm. maybe i need to do a make clean and start trying these changes over | 19:00 | |
nine | MasterDuke: also you shouldn't check bufsize >= 3 but count >= 3 | 19:09 | |
Otherwise count - 2 may well be negative if there's an error right at the start of the stream | |||
MasterDuke | but if i really should be getting stuff out of bytes then should it be pos >= 3? | 19:10 | |
nine | think so | 19:11 | |
MasterDuke | i.e., keep using pos (and use its size in the if/elsif), but index into bytes | ||
nine | also with pos you'll definitely have to subtract 1, because it indiscriminately incs it in decode_utf8_byte(&state, &codepoint, bytes[pos++]) | ||
MasterDuke | or just change the conditionals to 4, 3, 2? | 19:12 | |
oh, you mean in the indexing... | 19:13 | ||
nine++, valgrind doesn't complain anymore | 19:18 | ||
guess i should check that those are the right/useful things to be printing though... | 19:19 | ||
nine | one step at a time :) | 19:20 | |
Geth | MoarVM: db9b770f60 | (Jan-Olof Hendig)++ | src/io/asyncsocketudp.c Add missing gc_mark entry to setup_op_table Given that a gc might happen during the lifetime of the async connect task we need to add a gc_mark function to its ops table. This fixes an intermittent fromspace error which could otherwise occur. nine++ for helping out with this |
19:22 | |
MoarVM: 37f1f0beff | niner++ (committed using GitHub Web editor) | src/io/asyncsocketudp.c Merge pull request #1195 from dogbert17/add-missing-gc-marking Add missing gc_mark entry to setup_op_table |
|||
19:25
sena_kun left
|
|||
timotimo | hm, can we use the "no write barrier" versions of bindings in deserialize.c? | 19:30 | |
nine | timotimo: you mean because everything's gonna be in gen2 anyway? | 19:35 | |
timotimo | are they? | 19:37 | |
hm, yeah, gc_allocate_object in the call graph goes directly to gen2_allocate_zeroed | 19:38 | ||
allocate_nursery is only called 24k times in total in my profile | |||
actually, here's a gc_allocate_zeroed that has a caller in MVM_serialization_read_ref / work_loop / demand_object | 19:39 | ||
yeah, can't rely on that | 19:46 | ||
if we know in one function that every allocation will be in gen2, then maybe we can do all allocations there directly and not hit write barriers maybe? | 19:47 | ||
but mostly we're doing read_ref, which could give us an object that had been created outside of our little bubble | |||
dogbert17 | libuv 1.33 is out github.com/libuv/libuv/blob/v1.x/ChangeLog | 20:30 | |
MasterDuke | libtommath 1.2.0 is going to be released soon. it won't have my speedups for mp_to_radix, but hopefully those should make it into the next release (which will most likely be 2.0.0) | 20:43 | |
timotimo | cool! | 20:51 | |
looking forward to that | |||
MasterDuke | arg. what's some bad utf-8 i can test with? | 20:53 | |
timotimo | i'd just create a Buf from random ints between 0 and 255 | ||
maybe try 255 over and over | 20:54 | ||
20:54
ggoebel left
|
|||
timotimo | m: Buf.new(255, 255, 255, 255).decode("utf8").say | 20:54 | |
camelia | Malformed UTF-8 at line 1 col 1 in block <unit> at <tmp> line 1 |
||
MasterDuke | ugh, that hits a different error path | 20:55 | |
20:56
moon_child left
|
|||
MasterDuke | oh, but maybe i could improve that error now i have a better idea of what to do... | 20:56 | |
Geth | MoarVM/mildly-faster-startup: 7 commits pushed by (Timo Paulssen)++
|
20:58 | |
timotimo | there are different kinds of broken utf8 | ||
21:01
moon_child joined
|
|||
MasterDuke | oh! since i'm no longer reading from `buffer`, i can just reinstate the `MVM_free(buffer)`. but i'm guessing freeing `bytes` actually is a bad idea | 21:02 | |
21:23
ggoebel joined
|
|||
Geth | MoarVM/mildly-faster-startup: 4 commits pushed by (Timo Paulssen)++ | 21:25 | |
timotimo | ^- that very last one is only worth like 2k out of the 495mil | 21:26 | |
but ... *shrug* | |||
read_ref is 2.5mil above read_int in terms of self, but i've been shoveling cycles from inclusive to self so it's not really going to shift over | 21:28 | ||
MasterDuke | what do all the commits do for startup time? | ||
timotimo | all of them together? | ||
MasterDuke | yeah | ||
timotimo | probably almost nothing | ||
my "time" only goes two decimal places | |||
and the result is noisy | |||
MasterDuke | you don't have /usr/bin/time? | ||
timotimo | so all i have is the cycle estimates from callgrind | ||
that seems to be the same thing | 21:29 | ||
that is GNU Time | |||
MasterDuke | oh yeah, it's plain time that has 3 decimal places | 21:30 | |
timotimo | that'd probably be bash time or something? | ||
MasterDuke | yeah | ||
m: Buf.new(100, 101, 255, 254, 253, 252, 251).decode("utf8").say | |||
camelia | Malformed UTF-8 at line 1 col 3 in block <unit> at <tmp> line 1 |
||
MasterDuke | is "Malformed UTF-8 near bytes fffffffe fffffffd at line 1 col 3" any better? | 21:31 | |
timotimo | m: say .sum / .elems given <0.126 0.139 0.124 0.125 0.125 0.124 0.123 0.131> | 21:32 | |
camelia | 0.127125 | ||
21:32
ggoebel left
|
|||
timotimo | m: say .sum / .elems given <0.143 0.152 0.139 0.138 0.139 0.139 0.142 0.150> | 21:33 | |
camelia | 0.14275 | ||
timotimo | those were the "after" values for real, and user | ||
m: say .sum / .elems given <126 125 130 123 123 123 127 128> | 21:34 | ||
camelia | 125.625 | ||
timotimo | m: say .sum / .elems given <144 144 143 142 143 139 144 140> | ||
camelia | 142.375 | ||
timotimo | those are the before values (but i skipped the 0. because typing) | ||
so i guess it got a little slower actually? | |||
21:38
patrickb joined
|
|||
timotimo | maybe it's the 139/152 "outlier" in the after times | 21:38 | |
m: say .sum / .elems given <0.126 0.124 0.125 0.125 0.124 0.123 0.131> | |||
camelia | 0.125429 | ||
timotimo | m: say .sum / .elems given <0.143 0.139 0.138 0.139 0.139 0.142 0.150> | ||
camelia | 0.141429 | ||
timotimo | that'd make it a tiny tiny bit faster | ||
we should look into getting that branch ready for prime-time that puts perl6.moarvm into the binary to be loaded by the OS immediately on startup | 21:39 | ||
i don't remember how much it was, but i think it had a significantly positive impact on startup time | |||
MasterDuke | all/any improvement to start up time is good | 21:40 | |
how about "Malformed UTF-8 near bytes fe fd at line 1 col 3"? | |||
timotimo | cool | 21:42 | |
m: say .sum / .elems given <506698217 506806723 507023689 506843017> | 21:48 | ||
camelia | 506842911.5 | ||
timotimo | m: say .sum / .elems given <493795874 493827445 493842729 493967884> | 21:49 | |
camelia | 493858483 | ||
timotimo | m: say 493858483 * 100 / 506842911.5 | ||
camelia | 97.43817498373 | ||
timotimo | m: say 100 - (493858483 * 100 / 506842911.5) | ||
camelia | 2.56182501627 | ||
timotimo | juuuust above 2.5% fewer Ir events measured by callgrind | 21:50 | |
which hopefully have a correlation to real timing | |||
MasterDuke | cool | ||
timotimo | (this is with spesh turned off btw, because that's extra noisy) | ||
but yeah, what we want is 50% faster or 95% faster or 99.5% faster | 21:51 | ||
MasterDuke | would SPESH_BLOCKING be enough less noisy to be useful? | ||
timotimo | yeah, i should try that | ||
i was too lazy | |||
yeah that doesn't look so bad | 21:53 | ||
rebuilding master now to compare | |||
m: say .sum / .elems given <830933453 830956253 830991918 830882743 830904953> | 21:54 | ||
camelia | 830933864 | ||
timotimo | m: say .sum / .elems given <843944655 843913123 844031628 543946032 843832973> | 21:56 | |
camelia | 783933682.2 | ||
timotimo | m: say 830933864 * 100 / 83933682.2 | ||
camelia | 989.9885745749 | ||
timotimo | oops | ||
i put a wrong number in there | 21:57 | ||
m: say .sum / .elems given <843944655 843913123 844031628 843832973 843832973> | |||
camelia | 843911070.4 | ||
timotimo | m: say 100 * 843911070.4 / 830933864 | ||
camelia | 101.561761647 | ||
timotimo | m: say 100 * 843911070.4 / 830933864 - 100 | 21:58 | |
camelia | 1.561761647 | ||
timotimo | not surprising that the percentage difference is lower, because none of the things i changed should apply to the spesh thread | ||
one difference between with spesh and without spesh is that with spesh has _int_malloc at the very top for Self time | 22:01 | ||
Geth | MoarVM: MasterDuke17++ created pull request #1196: Show correct values in encoding errors |
||
22:34
patrickb left
22:42
sivoais_ left,
sivoais joined
23:06
MasterDuke left
|