| timotimo | looks like that's only json | 00:00 | |
| 09-race.t with my new nqp-sql-writer code has 256.98user 3.10system 4:16.19elapsed 101%CPU (0avgtext+0avgdata 6539068maxresident)k | 00:02 | ||
| 342 megs | |||
|
00:03
buggable joined
00:04
evalable6 joined
|
|||
| timotimo | oops it's all in one line :D | 00:05 | |
| 249.53user 3.09system 4:08.68elapsed 101%CPU (0avgtext+0avgdata 6421036maxresident)k - 343 megs | 00:06 | ||
| it looks like 2 megabytes per second | 00:08 | ||
| watching "watch ls" i mean | 00:09 | ||
| 248.08user 3.09system 4:07.04elapsed 101%CPU (0avgtext+0avgdata 6419576maxresident)k | 00:11 | ||
| 343 megs | |||
| so time-wise that's almost stable, the first one is an outlier in both memory, time, and size of resulting file | |||
| this is master: | 00:21 | ||
| 312.53user 18.73system 5:27.15elapsed 101%CPU (0avgtext+0avgdata 6465144maxresident)k - 347 megs | |||
| MasterDuke | oh, that is better. same maxresident, but faster | 00:22 | |
| (compared to master) | |||
| timotimo | 310.59user 19.06system 5:25.21elapsed 101%CPU (0avgtext+0avgdata 6406816maxresident)k - 349 megs | 00:27 | |
| it's rather strange that the files end up being bigger | 00:28 | ||
| but since it's far from reproducible i can't just diff two of them | |||
| watching "watch ls" gives me the impression it's 1.5 megs per second now | |||
| 305.59user 19.15system 5:20.84elapsed 101%CPU (0avgtext+0avgdata 6397604maxresident)k - 348 megs | 00:35 | ||
| MasterDuke | 2mb/s seems pretty slow | 00:39 | |
| have you tried doing a perf record while it's writing? | |||
| timotimo | 303.75user 19.31system 5:19.01elapsed 101%CPU (0avgtext+0avgdata 6407984maxresident)k - 349 megs | 00:41 | |
| somehow it gets better and better | |||
| but see how drastic the difference in system time is? | |||
| i can run a perf record | 00:43 | ||
| maybe upping the number of items that have to be in the list before i print it out would help reduce the system time a little more | |||
| though to be fair, halving 3 seconds will not make a noticable impact here | 00:44 | ||
| japhb | timotimo: The "elapsed" shrank a whole bunch too .... | 00:46 | |
| timotimo | well, yeah | 00:50 | |
| it was an all-around improvement it seems like | |||
| 41 megs of perf.data | 00:51 | ||
| the cli tool is taking a bit to chug through that | |||
| huh, maybe perf should read the data in 4k blobs rather than 1k | 00:53 | ||
| MasterDuke | huh, i routinely get 200mb+ perf.data files and it doesn't take too long to open them | 00:56 | |
| timotimo | it doesn't seem to update its UI to reflect the progress | 00:57 | |
| MasterDuke | isn't there a progress bar while it's loading? | 00:59 | |
| timotimo | it is | ||
| it doesn't go past 949k | |||
| 939k | |||
| the data it reads is also odd, lots and lots of mentions of nvidia drivers | |||
| kernel symbols also | 01:00 | ||
| MasterDuke | i'll try recording it also | 01:13 | |
| "Killed" | 01:20 | ||
| but it only took about 10s to run a report on the 1.3g file it had made at that point | |||
|
01:22
unicodable6 joined
|
|||
| timotimo | hey it now shows 1804k | 01:22 | |
| fwiw htop shows the vast majority of time is spent in "kernel" | 01:28 | ||
| 2M! | 01:29 | ||
| *shrug* | 01:31 | ||
| ^C and bed | |||
| MasterDuke | later... | 01:39 | |
| .tell timotimo my perf report shows 80% spent in MVM_profile_instrumented_mark_data | 02:06 | ||
| yoleaux | MasterDuke: I'll pass your message to timotimo. | ||
|
02:58
ilbot3 joined
07:04
AlexDaniel joined
08:17
robertle joined
10:38
ilmari[m] joined
10:40
yoleaux joined
11:03
brrt joined
|
|||
| brrt | ohai #moarvm | 11:06 | |
| i had a thought regarding the write barrier for bindlex | 11:07 | ||
| well, two thoughts | |||
| thought number 1 - we make a 'special' named template for a write barrier and apply it to the expression for bindlex | |||
| thought number 2 - we create a WRITE_BARRIER node that implements ... a write barrier | 11:08 | ||
| thought number 2b - we create a STORE_WRITE_BARRIER operator (not node) that implements a store-with-a-write-barrier | |||
| so the idea of the first option was that i'd be an extension of what we already have (namely, the macro for the write barrier, just 'exported' to the runtime) | 11:09 | ||
| .tell MasterDuke that (FLAGVAL (ALL ...)) does not compile well yet since it'd need to be translated to (IF (ALL ...) (CONST 1) (CONST 0)) | 11:10 | ||
| yoleaux | brrt: I'll pass your message to MasterDuke. | ||
| brrt | and we don't do that translation just yet | 11:11 | |
| it's on the TODO list | |||
| huh, that gives me an idea | 11:12 | ||
| to implement (COND) operators and to translate ALL/ANY to these | 11:13 | ||
| anyway | |||
| the idea of the second option is that the write barrier is below the abstraction barrier provided by the VM; that if we were ever to change the implementation, we'd need to implement them consistently; that a store-with-write-barrier is a common enough operation to do (and perhaps optimize) | 11:14 | ||
| (the idea being that we'd have an in-memory form of (COND) that would be (COND ((TEST (...) LABEL)) ((TEST ... LABEL2) (BLOCK (label1 ...) (label2 ...)))) | 11:20 | ||
| also, one potentially nice thing about the second option, is that we *might* be able to get away with not spilling the operands if the write barrier is their last usage | 11:22 | ||
| (and i've just checked the lego jit implementation of the write barrier in bindlex, and it's implemented after the store, so that's actually a feasible idea) | 11:29 | ||
| i | |||
| i think i'm in favor of the STORE_WRITE_BARRIER operator | 11:30 | ||
| i'll think about it some more. advice appreciated | 11:31 | ||
| timotimo | whoops, the heap snapshot for the json_fast race benchmark is 1.1 gigs big | 12:33 | |
| it's only got 44 snapshots in it :| | |||
| m: say 1 * 1024 / 44 | 12:35 | ||
| camelia | 23.272727 | ||
| timotimo | not wildly unreasonable, but still rather gigantic | 12:36 | |
| m: say 1.1 * 1024 / 60 | 12:44 | ||
| camelia | 18.773333 | ||
| timotimo | can that be right? the heap analyzer does almost 19 megabytes per second of data reading? | 12:45 | |
| that's pretty good | |||
| how can it only need 60 seconds to do a "summary all" for this 1.1 gig file? | 12:49 | ||
| MasterDuke | .tell brrt i did actually try re-writing it to `(template: ishash (if (all (nz $1) (eq (^getf (^repr $1) MVMREPROps ID) (const ("E MVM_REPR_ID_MVMHash) int_sz))) (const 1 int_sz) (const 0 int_sz)))`, but it still give an MVM_oops in the same spot | 12:50 | |
| yoleaux | 11:10Z <brrt> MasterDuke: that (FLAGVAL (ALL ...)) does not compile well yet since it'd need to be translated to (IF (ALL ...) (CONST 1) (CONST 0)) | ||
| MasterDuke: I'll pass your message to brrt. | |||
| MasterDuke | timotimo: it's multi-threaded, right? | 12:51 | |
| timotimo | and how! | ||
| especially "summary all" uses many jobs at the same time | 12:52 | ||
| (but limited using a "ticket" system to prevent massive ram usage) | |||
| but it still balloons up to about 3 gigs of ram usage - even though I try to clear out the data in between snapshots! | 12:53 | ||
| MasterDuke | timotimo: you know, istr that i perf recorded writing out a large sql profile around when i introduced that option, and back then it was something related to IO (e.g., writing or encoding bytes) that was top of the exclusive list, not MVM_profile_instrumented_mark_data | 13:03 | |
|
13:38
greppable6 joined
|
|||
| MasterDuke | timotimo: any thoughts about github.com/MoarVM/MoarVM/pull/812 ? | 15:23 | |
| timotimo | we could probably use the start time of the profile itself there, it's in there somewhere ;) | 15:30 | |
| i'm writing version 2 of the initial grant report and introductory blog post | 15:32 | ||
| i'm almost to 1k words again, but this time i feel like i covered most stuff | 15:33 | ||
| MasterDuke | ooo, looking forward to it | ||
| timotimo | wanna read the current state? | 15:34 | |
| MasterDuke | sure | ||
| timotimo | wakelift.de/p/df27709e-95d7-433e-ac...b61914e92/ - does this work for you? | ||
| MasterDuke | yeah | ||
| "that It took" | 15:35 | ||
| timotimo | probably started out as "that I took" | 15:38 | |
| the spell checker is complaining about things like "blog" and ā and 64 and 3ā and 19 | 15:39 | ||
| MasterDuke | looks good | 15:40 | |
| timotimo: i thought about putting the profile start, but i don't think it's 100% accurate. since the spesh worker could (and i think is in the test case we're using) start before the profiling starts? | 15:44 | ||
| timotimo | yeah, it can, but i believe the user is interested in how much time during the profiled period was spent in spesh | 15:45 | |
| MasterDuke | i could add something like `if (ptd->cur_spesh_start_time == 0) ptd->cur_spesh_start_time = ptd->start_time` in MVM_profiler_log_spesh_end | 15:49 | |
| timotimo | that sounds good to me | 15:50 | |
| MasterDuke | hm, or should i just set ptd->cur_spesh_start_time to the same value as ptd->start_time is set to in get_thread_data? | 15:57 | |
|
16:14
buggable joined
|
|||
| MasterDuke | timotimo: github.com/MoarVM/MoarVM/pull/812 updated | 16:18 | |
| timotimo | oh, that's what you meantby that | 16:19 | |
| it's not really a problem, but it's strange to have the spesh_start_time set on every thread, because it's only read from one thread at all | |||
| MasterDuke | hm. think it is better as that conditional in MVM_profiler_log_spesh_end? | 16:20 | |
| timotimo | yeah | 16:21 | |
| MasterDuke | updated | 16:29 | |
| timotimo | my blog now has a "logo" | 16:41 | |
| MasterDuke | a peeled orange? | 16:43 | |
| timotimo | i have no idea | ||
| wonder if i should try to get the title into one line | |||
| hah, non-breaking spaces ftw? | 16:44 | ||
| looks terrible, it goes over the line %) | |||
| now it fits on my end | 16:45 | ||
|
16:46
domidumont joined
|
|||
| timotimo | MasterDuke: i changed the title from "Timotimo typed this!" (a fun alliteration) to "my Timotimo \this", which is literally a timotimo-typed this! | 16:46 | |
| MasterDuke | heh. i do see the s overlapping the vertical line | 16:47 | |
| not now | 16:48 | ||
| dogbert17 | o/ can a spesh log point out a spesh related bug or are they mostly useful for other things? | 16:58 | |
| nine | .tell brrt in general, higher level semantics (like a STORE_WRITE_BARRIER) make it easier for a compiler to apply optimizations, so I'd favor that. | 17:03 | |
| yoleaux | nine: I'll pass your message to brrt. | ||
| MasterDuke | nine: weren't you saying something recently about some new optimizations you found on a train ride? have those already been committed? | 17:07 | |
| timotimo | dogbert17: you can figure out if a piece of code has been misoptimized by looking at the before/after, and you can check out why a certain specialization exists or doesn't exist by looking at the logs and specialization plans | 17:13 | |
| dogbert17 | hmm, but the log is over 200 megs. Should I try to find something in the log pointing at the src line where the problem occurs? | 17:42 | |
| timotimo | you can | 17:43 | |
| dogbert17 | let me check ... | ||
| timotimo | i usually search for the routine name if it's sufficiently rare | ||
| but you'll find the name pop up in logs of routines that call the one you're searching for | |||
| dogbert17 | I think it is | ||
| timotimo | so maybe search for "Specialization.*your_routine" | 17:45 | |
| MasterDuke | timotimo: btw, anything else for github.com/MoarVM/MoarVM/pull/812 ? | 17:46 | |
| timotimo | i think i'm happy with that | ||
| MasterDuke | cool, mind merging? | 17:47 | |
| timotimo | sure | 17:48 | |
| Geth | MoarVM: 760b0913e8 | MasterDuke17++ (committed by timo) | src/profiler/log.c Fix for gigantic and wrong spesh time in profiles Because spesh workers could start before profiling starts, `MVM_profiler_log_spesh_end` could be called before any `MVM_profiler_log_spesh_start` calls. We were then logging current time - 0, instead of current time - spesh start time. Fix by setting the spesh start time to the profiling start time if it's 0. |
||
| MasterDuke | thanks | ||
| timotimo | "fix for wrong profile timos" :P | ||
| MasterDuke | heh | 17:49 | |
| dogbert17 | it I read this right (probably not) it specializes the routine twice during the run. Is that possible? | ||
| timotimo | very possible | 17:51 | |
| usually for different callsites, for example for calls with one argument, or for calls with two arguments, or for calls with one named argument called "bob" | |||
| dogbert17 | the first time I see: | 17:53 | |
| After: | |||
| Spesh of 'get-proxy' (cuid: 95, file: site#sources/FD28A8E22DFE16B70B757D9981C7B6C25543060C (HTTP::UserAgent):368) | |||
| Callsite 0xb76f0904 (2 args, 2 pos) | |||
| and the second: | 17:54 | ||
| After: | |||
| Spesh of 'get-proxy' (cuid: 95, file: site#sources/FD28A8E22DFE16B70B757D9981C7B6C25543060C (HTTP::UserAgent):368) | |||
| the two arguments disappeared | |||
| timotimo | if it doesn't have a "callsite" line, it's probably a "certain optimization" | ||
| dogbert17 | there is no such line the second time | 17:55 | |
| timotimo | i.e. "there were many different calls, but none of them accounted for a significant portion, so just compile a catch-all with basic optimizations" | ||
| look upwards for the specialization plan | |||
| it should explain what's going on | |||
| dogbert17 | spam warning | 17:57 | |
| Observed type specialization of 'get-proxy' (cuid: 95, file: site#sources/FD28A8E22DFE16B70B757D9981C7B6C25543060C (HTTP::UserAgent):368) | |||
| The specialization is for the callsite: | |||
| Callsite 0xb76f0904 (2 args, 2 pos) | |||
| Positional flags: obj, obj | |||
| It was planned for the type tuple: | |||
| Type 0: HTTP::UserAgent (Conc) | |||
| Type 1: HTTP::Request (Conc) | |||
| Which received 1 hits (100% of the 1 callsite hits). | |||
| The maximum stack depth is 22. | |||
| timotimo | i'd prefer a paste service of some kind | ||
| dogbert17 | agreed :-) | ||
| is that the plan or did I miss it | 17:58 | ||
| or are youreferring to the 'before' part followed by BB 0, BB 1 etc | 18:00 | ||
|
18:42
zakharyas joined
18:47
zakharyas joined
18:52
zakharyas joined
18:55
zakharyas joined
19:03
zakharyas1 joined
19:07
zakharyas joined
19:17
zakharyas joined
|
|||
| timotimo | no, just that you pasted so much directly into irc | 20:23 | |
| MasterDuke | i think his question was about "look upwards for the specialization plan" | 20:27 | |
| timotimo | oh, sorry | ||
| i was afk for a long while :( | 20:28 | ||
| and will be again | |||
| but yeah, that's the specialization plan | |||
| there'll be another one for the same routine | |||
| probably further up above still | |||
| dogbert17 | timotimo: I'm learning :-) check out gist.github.com/dogbert17/adf0f171...f404f7d4fb when/if you habve time | 20:33 | |
| timotimo | so the commits that brought this bug to you seem to indicate something goes wrong with inlining | 21:37 | |
| in which case it'd be prudent to see if the code from the routine that troubles you got inlined somewhere | 21:38 | ||
| huh, line 384 is in there twice; is that what gets inlined? or is it being inlined into get-connection? | 21:45 | ||
| should be possible to figure out where exactly it goes wrong with spesh bisection; does jit-bisect do that or do we have a separate script? | 21:46 | ||
| dogbert17 | timotimo: have you tried running the code ? | 21:50 | |
| that way you'll have a complete spesh log not the excerpts I gisted | |||
| timotimo | i have not yet | 21:51 | |
| dogbert17 | I ran the spesh_bisect tool on a slightly modified version (10 iterations then exit) and it said: MVM_SPESH_LIMIT=3873 | 22:05 | |
| timotimo | OK, then the last specialization in the spesh log is likely the culprit | 22:14 | |
| or at least a kind of "tipping point" | 22:15 | ||
|
22:40
Kaiepi joined
|
|||