timotimo | looks like that's only json | 00:00 | |
09-race.t with my new nqp-sql-writer code has 256.98user 3.10system 4:16.19elapsed 101%CPU (0avgtext+0avgdata 6539068maxresident)k | 00:02 | ||
342 megs | |||
00:03
buggable joined
00:04
evalable6 joined
|
|||
timotimo | oops it's all in one line :D | 00:05 | |
249.53user 3.09system 4:08.68elapsed 101%CPU (0avgtext+0avgdata 6421036maxresident)k - 343 megs | 00:06 | ||
it looks like 2 megabytes per second | 00:08 | ||
watching "watch ls" i mean | 00:09 | ||
248.08user 3.09system 4:07.04elapsed 101%CPU (0avgtext+0avgdata 6419576maxresident)k | 00:11 | ||
343 megs | |||
so time-wise that's almost stable, the first one is an outlier in both memory, time, and size of resulting file | |||
this is master: | 00:21 | ||
312.53user 18.73system 5:27.15elapsed 101%CPU (0avgtext+0avgdata 6465144maxresident)k - 347 megs | |||
MasterDuke | oh, that is better. same maxresident, but faster | 00:22 | |
(compared to master) | |||
timotimo | 310.59user 19.06system 5:25.21elapsed 101%CPU (0avgtext+0avgdata 6406816maxresident)k - 349 megs | 00:27 | |
it's rather strange that the files end up being bigger | 00:28 | ||
but since it's far from reproducible i can't just diff two of them | |||
watching "watch ls" gives me the impression it's 1.5 megs per second now | |||
305.59user 19.15system 5:20.84elapsed 101%CPU (0avgtext+0avgdata 6397604maxresident)k - 348 megs | 00:35 | ||
MasterDuke | 2mb/s seems pretty slow | 00:39 | |
have you tried doing a perf record while it's writing? | |||
timotimo | 303.75user 19.31system 5:19.01elapsed 101%CPU (0avgtext+0avgdata 6407984maxresident)k - 349 megs | 00:41 | |
somehow it gets better and better | |||
but see how drastic the difference in system time is? | |||
i can run a perf record | 00:43 | ||
maybe upping the number of items that have to be in the list before i print it out would help reduce the system time a little more | |||
though to be fair, halving 3 seconds will not make a noticable impact here | 00:44 | ||
japhb | timotimo: The "elapsed" shrank a whole bunch too .... | 00:46 | |
timotimo | well, yeah | 00:50 | |
it was an all-around improvement it seems like | |||
41 megs of perf.data | 00:51 | ||
the cli tool is taking a bit to chug through that | |||
huh, maybe perf should read the data in 4k blobs rather than 1k | 00:53 | ||
MasterDuke | huh, i routinely get 200mb+ perf.data files and it doesn't take too long to open them | 00:56 | |
timotimo | it doesn't seem to update its UI to reflect the progress | 00:57 | |
MasterDuke | isn't there a progress bar while it's loading? | 00:59 | |
timotimo | it is | ||
it doesn't go past 949k | |||
939k | |||
the data it reads is also odd, lots and lots of mentions of nvidia drivers | |||
kernel symbols also | 01:00 | ||
MasterDuke | i'll try recording it also | 01:13 | |
"Killed" | 01:20 | ||
but it only took about 10s to run a report on the 1.3g file it had made at that point | |||
01:22
unicodable6 joined
|
|||
timotimo | hey it now shows 1804k | 01:22 | |
fwiw htop shows the vast majority of time is spent in "kernel" | 01:28 | ||
2M! | 01:29 | ||
*shrug* | 01:31 | ||
^C and bed | |||
MasterDuke | later... | 01:39 | |
.tell timotimo my perf report shows 80% spent in MVM_profile_instrumented_mark_data | 02:06 | ||
yoleaux | MasterDuke: I'll pass your message to timotimo. | ||
02:58
ilbot3 joined
07:04
AlexDaniel joined
08:17
robertle joined
10:38
ilmari[m] joined
10:40
yoleaux joined
11:03
brrt joined
|
|||
brrt | ohai #moarvm | 11:06 | |
i had a thought regarding the write barrier for bindlex | 11:07 | ||
well, two thoughts | |||
thought number 1 - we make a 'special' named template for a write barrier and apply it to the expression for bindlex | |||
thought number 2 - we create a WRITE_BARRIER node that implements ... a write barrier | 11:08 | ||
thought number 2b - we create a STORE_WRITE_BARRIER operator (not node) that implements a store-with-a-write-barrier | |||
so the idea of the first option was that i'd be an extension of what we already have (namely, the macro for the write barrier, just 'exported' to the runtime) | 11:09 | ||
.tell MasterDuke that (FLAGVAL (ALL ...)) does not compile well yet since it'd need to be translated to (IF (ALL ...) (CONST 1) (CONST 0)) | 11:10 | ||
yoleaux | brrt: I'll pass your message to MasterDuke. | ||
brrt | and we don't do that translation just yet | 11:11 | |
it's on the TODO list | |||
huh, that gives me an idea | 11:12 | ||
to implement (COND) operators and to translate ALL/ANY to these | 11:13 | ||
anyway | |||
the idea of the second option is that the write barrier is below the abstraction barrier provided by the VM; that if we were ever to change the implementation, we'd need to implement them consistently; that a store-with-write-barrier is a common enough operation to do (and perhaps optimize) | 11:14 | ||
(the idea being that we'd have an in-memory form of (COND) that would be (COND ((TEST (...) LABEL)) ((TEST ... LABEL2) (BLOCK (label1 ...) (label2 ...)))) | 11:20 | ||
also, one potentially nice thing about the second option, is that we *might* be able to get away with not spilling the operands if the write barrier is their last usage | 11:22 | ||
(and i've just checked the lego jit implementation of the write barrier in bindlex, and it's implemented after the store, so that's actually a feasible idea) | 11:29 | ||
i | |||
i think i'm in favor of the STORE_WRITE_BARRIER operator | 11:30 | ||
i'll think about it some more. advice appreciated | 11:31 | ||
timotimo | whoops, the heap snapshot for the json_fast race benchmark is 1.1 gigs big | 12:33 | |
it's only got 44 snapshots in it :| | |||
m: say 1 * 1024 / 44 | 12:35 | ||
camelia | 23.272727 | ||
timotimo | not wildly unreasonable, but still rather gigantic | 12:36 | |
m: say 1.1 * 1024 / 60 | 12:44 | ||
camelia | 18.773333 | ||
timotimo | can that be right? the heap analyzer does almost 19 megabytes per second of data reading? | 12:45 | |
that's pretty good | |||
how can it only need 60 seconds to do a "summary all" for this 1.1 gig file? | 12:49 | ||
MasterDuke | .tell brrt i did actually try re-writing it to `(template: ishash (if (all (nz $1) (eq (^getf (^repr $1) MVMREPROps ID) (const ("E MVM_REPR_ID_MVMHash) int_sz))) (const 1 int_sz) (const 0 int_sz)))`, but it still give an MVM_oops in the same spot | 12:50 | |
yoleaux | 11:10Z <brrt> MasterDuke: that (FLAGVAL (ALL ...)) does not compile well yet since it'd need to be translated to (IF (ALL ...) (CONST 1) (CONST 0)) | ||
MasterDuke: I'll pass your message to brrt. | |||
MasterDuke | timotimo: it's multi-threaded, right? | 12:51 | |
timotimo | and how! | ||
especially "summary all" uses many jobs at the same time | 12:52 | ||
(but limited using a "ticket" system to prevent massive ram usage) | |||
but it still balloons up to about 3 gigs of ram usage - even though I try to clear out the data in between snapshots! | 12:53 | ||
MasterDuke | timotimo: you know, istr that i perf recorded writing out a large sql profile around when i introduced that option, and back then it was something related to IO (e.g., writing or encoding bytes) that was top of the exclusive list, not MVM_profile_instrumented_mark_data | 13:03 | |
13:38
greppable6 joined
|
|||
MasterDuke | timotimo: any thoughts about github.com/MoarVM/MoarVM/pull/812 ? | 15:23 | |
timotimo | we could probably use the start time of the profile itself there, it's in there somewhere ;) | 15:30 | |
i'm writing version 2 of the initial grant report and introductory blog post | 15:32 | ||
i'm almost to 1k words again, but this time i feel like i covered most stuff | 15:33 | ||
MasterDuke | ooo, looking forward to it | ||
timotimo | wanna read the current state? | 15:34 | |
MasterDuke | sure | ||
timotimo | wakelift.de/p/df27709e-95d7-433e-ac...b61914e92/ - does this work for you? | ||
MasterDuke | yeah | ||
"that It took" | 15:35 | ||
timotimo | probably started out as "that I took" | 15:38 | |
the spell checker is complaining about things like "blog" and ā and 64 and 3ā and 19 | 15:39 | ||
MasterDuke | looks good | 15:40 | |
timotimo: i thought about putting the profile start, but i don't think it's 100% accurate. since the spesh worker could (and i think is in the test case we're using) start before the profiling starts? | 15:44 | ||
timotimo | yeah, it can, but i believe the user is interested in how much time during the profiled period was spent in spesh | 15:45 | |
MasterDuke | i could add something like `if (ptd->cur_spesh_start_time == 0) ptd->cur_spesh_start_time = ptd->start_time` in MVM_profiler_log_spesh_end | 15:49 | |
timotimo | that sounds good to me | 15:50 | |
MasterDuke | hm, or should i just set ptd->cur_spesh_start_time to the same value as ptd->start_time is set to in get_thread_data? | 15:57 | |
16:14
buggable joined
|
|||
MasterDuke | timotimo: github.com/MoarVM/MoarVM/pull/812 updated | 16:18 | |
timotimo | oh, that's what you meantby that | 16:19 | |
it's not really a problem, but it's strange to have the spesh_start_time set on every thread, because it's only read from one thread at all | |||
MasterDuke | hm. think it is better as that conditional in MVM_profiler_log_spesh_end? | 16:20 | |
timotimo | yeah | 16:21 | |
MasterDuke | updated | 16:29 | |
timotimo | my blog now has a "logo" | 16:41 | |
MasterDuke | a peeled orange? | 16:43 | |
timotimo | i have no idea | ||
wonder if i should try to get the title into one line | |||
hah, non-breaking spaces ftw? | 16:44 | ||
looks terrible, it goes over the line %) | |||
now it fits on my end | 16:45 | ||
16:46
domidumont joined
|
|||
timotimo | MasterDuke: i changed the title from "Timotimo typed this!" (a fun alliteration) to "my Timotimo \this", which is literally a timotimo-typed this! | 16:46 | |
MasterDuke | heh. i do see the s overlapping the vertical line | 16:47 | |
not now | 16:48 | ||
dogbert17 | o/ can a spesh log point out a spesh related bug or are they mostly useful for other things? | 16:58 | |
nine | .tell brrt in general, higher level semantics (like a STORE_WRITE_BARRIER) make it easier for a compiler to apply optimizations, so I'd favor that. | 17:03 | |
yoleaux | nine: I'll pass your message to brrt. | ||
MasterDuke | nine: weren't you saying something recently about some new optimizations you found on a train ride? have those already been committed? | 17:07 | |
timotimo | dogbert17: you can figure out if a piece of code has been misoptimized by looking at the before/after, and you can check out why a certain specialization exists or doesn't exist by looking at the logs and specialization plans | 17:13 | |
dogbert17 | hmm, but the log is over 200 megs. Should I try to find something in the log pointing at the src line where the problem occurs? | 17:42 | |
timotimo | you can | 17:43 | |
dogbert17 | let me check ... | ||
timotimo | i usually search for the routine name if it's sufficiently rare | ||
but you'll find the name pop up in logs of routines that call the one you're searching for | |||
dogbert17 | I think it is | ||
timotimo | so maybe search for "Specialization.*your_routine" | 17:45 | |
MasterDuke | timotimo: btw, anything else for github.com/MoarVM/MoarVM/pull/812 ? | 17:46 | |
timotimo | i think i'm happy with that | ||
MasterDuke | cool, mind merging? | 17:47 | |
timotimo | sure | 17:48 | |
Geth | MoarVM: 760b0913e8 | MasterDuke17++ (committed by timo) | src/profiler/log.c Fix for gigantic and wrong spesh time in profiles Because spesh workers could start before profiling starts, `MVM_profiler_log_spesh_end` could be called before any `MVM_profiler_log_spesh_start` calls. We were then logging current time - 0, instead of current time - spesh start time. Fix by setting the spesh start time to the profiling start time if it's 0. |
||
MasterDuke | thanks | ||
timotimo | "fix for wrong profile timos" :P | ||
MasterDuke | heh | 17:49 | |
dogbert17 | it I read this right (probably not) it specializes the routine twice during the run. Is that possible? | ||
timotimo | very possible | 17:51 | |
usually for different callsites, for example for calls with one argument, or for calls with two arguments, or for calls with one named argument called "bob" | |||
dogbert17 | the first time I see: | 17:53 | |
After: | |||
Spesh of 'get-proxy' (cuid: 95, file: site#sources/FD28A8E22DFE16B70B757D9981C7B6C25543060C (HTTP::UserAgent):368) | |||
Callsite 0xb76f0904 (2 args, 2 pos) | |||
and the second: | 17:54 | ||
After: | |||
Spesh of 'get-proxy' (cuid: 95, file: site#sources/FD28A8E22DFE16B70B757D9981C7B6C25543060C (HTTP::UserAgent):368) | |||
the two arguments disappeared | |||
timotimo | if it doesn't have a "callsite" line, it's probably a "certain optimization" | ||
dogbert17 | there is no such line the second time | 17:55 | |
timotimo | i.e. "there were many different calls, but none of them accounted for a significant portion, so just compile a catch-all with basic optimizations" | ||
look upwards for the specialization plan | |||
it should explain what's going on | |||
dogbert17 | spam warning | 17:57 | |
Observed type specialization of 'get-proxy' (cuid: 95, file: site#sources/FD28A8E22DFE16B70B757D9981C7B6C25543060C (HTTP::UserAgent):368) | |||
The specialization is for the callsite: | |||
Callsite 0xb76f0904 (2 args, 2 pos) | |||
Positional flags: obj, obj | |||
It was planned for the type tuple: | |||
Type 0: HTTP::UserAgent (Conc) | |||
Type 1: HTTP::Request (Conc) | |||
Which received 1 hits (100% of the 1 callsite hits). | |||
The maximum stack depth is 22. | |||
timotimo | i'd prefer a paste service of some kind | ||
dogbert17 | agreed :-) | ||
is that the plan or did I miss it | 17:58 | ||
or are youreferring to the 'before' part followed by BB 0, BB 1 etc | 18:00 | ||
18:42
zakharyas joined
18:47
zakharyas joined
18:52
zakharyas joined
18:55
zakharyas joined
19:03
zakharyas1 joined
19:07
zakharyas joined
19:17
zakharyas joined
|
|||
timotimo | no, just that you pasted so much directly into irc | 20:23 | |
MasterDuke | i think his question was about "look upwards for the specialization plan" | 20:27 | |
timotimo | oh, sorry | ||
i was afk for a long while :( | 20:28 | ||
and will be again | |||
but yeah, that's the specialization plan | |||
there'll be another one for the same routine | |||
probably further up above still | |||
dogbert17 | timotimo: I'm learning :-) check out gist.github.com/dogbert17/adf0f171...f404f7d4fb when/if you habve time | 20:33 | |
timotimo | so the commits that brought this bug to you seem to indicate something goes wrong with inlining | 21:37 | |
in which case it'd be prudent to see if the code from the routine that troubles you got inlined somewhere | 21:38 | ||
huh, line 384 is in there twice; is that what gets inlined? or is it being inlined into get-connection? | 21:45 | ||
should be possible to figure out where exactly it goes wrong with spesh bisection; does jit-bisect do that or do we have a separate script? | 21:46 | ||
dogbert17 | timotimo: have you tried running the code ? | 21:50 | |
that way you'll have a complete spesh log not the excerpts I gisted | |||
timotimo | i have not yet | 21:51 | |
dogbert17 | I ran the spesh_bisect tool on a slightly modified version (10 iterations then exit) and it said: MVM_SPESH_LIMIT=3873 | 22:05 | |
timotimo | OK, then the last specialization in the spesh log is likely the culprit | 22:14 | |
or at least a kind of "tipping point" | 22:15 | ||
22:40
Kaiepi joined
|