timotimo looks like that's only json 00:00
09-race.t with my new nqp-sql-writer code has 256.98user 3.10system 4:16.19elapsed 101%CPU (0avgtext+0avgdata 6539068maxresident)k 00:02
342 megs
00:03 buggable joined 00:04 evalable6 joined
timotimo oops it's all in one line :D 00:05
249.53user 3.09system 4:08.68elapsed 101%CPU (0avgtext+0avgdata 6421036maxresident)k - 343 megs 00:06
it looks like 2 megabytes per second 00:08
watching "watch ls" i mean 00:09
248.08user 3.09system 4:07.04elapsed 101%CPU (0avgtext+0avgdata 6419576maxresident)k 00:11
343 megs
so time-wise that's almost stable, the first one is an outlier in both memory, time, and size of resulting file
this is master: 00:21
312.53user 18.73system 5:27.15elapsed 101%CPU (0avgtext+0avgdata 6465144maxresident)k - 347 megs
MasterDuke oh, that is better. same maxresident, but faster 00:22
(compared to master)
timotimo 310.59user 19.06system 5:25.21elapsed 101%CPU (0avgtext+0avgdata 6406816maxresident)k - 349 megs 00:27
it's rather strange that the files end up being bigger 00:28
but since it's far from reproducible i can't just diff two of them
watching "watch ls" gives me the impression it's 1.5 megs per second now
305.59user 19.15system 5:20.84elapsed 101%CPU (0avgtext+0avgdata 6397604maxresident)k - 348 megs 00:35
MasterDuke 2mb/s seems pretty slow 00:39
have you tried doing a perf record while it's writing?
timotimo 303.75user 19.31system 5:19.01elapsed 101%CPU (0avgtext+0avgdata 6407984maxresident)k - 349 megs 00:41
somehow it gets better and better
but see how drastic the difference in system time is?
i can run a perf record 00:43
maybe upping the number of items that have to be in the list before i print it out would help reduce the system time a little more
though to be fair, halving 3 seconds will not make a noticable impact here 00:44
japhb timotimo: The "elapsed" shrank a whole bunch too .... 00:46
timotimo well, yeah 00:50
it was an all-around improvement it seems like
41 megs of perf.data 00:51
the cli tool is taking a bit to chug through that
huh, maybe perf should read the data in 4k blobs rather than 1k 00:53
MasterDuke huh, i routinely get 200mb+ perf.data files and it doesn't take too long to open them 00:56
timotimo it doesn't seem to update its UI to reflect the progress 00:57
MasterDuke isn't there a progress bar while it's loading? 00:59
timotimo it is
it doesn't go past 949k
939k
the data it reads is also odd, lots and lots of mentions of nvidia drivers
kernel symbols also 01:00
MasterDuke i'll try recording it also 01:13
"Killed" 01:20
but it only took about 10s to run a report on the 1.3g file it had made at that point
01:22 unicodable6 joined
timotimo hey it now shows 1804k 01:22
fwiw htop shows the vast majority of time is spent in "kernel" 01:28
2M! 01:29
*shrug* 01:31
^C and bed
MasterDuke later... 01:39
.tell timotimo my perf report shows 80% spent in MVM_profile_instrumented_mark_data 02:06
yoleaux MasterDuke: I'll pass your message to timotimo.
02:58 ilbot3 joined 07:04 AlexDaniel joined 08:17 robertle joined 10:38 ilmari[m] joined 10:40 yoleaux joined 11:03 brrt joined
brrt ohai #moarvm 11:06
i had a thought regarding the write barrier for bindlex 11:07
well, two thoughts
thought number 1 - we make a 'special' named template for a write barrier and apply it to the expression for bindlex
thought number 2 - we create a WRITE_BARRIER node that implements ... a write barrier 11:08
thought number 2b - we create a STORE_WRITE_BARRIER operator (not node) that implements a store-with-a-write-barrier
so the idea of the first option was that i'd be an extension of what we already have (namely, the macro for the write barrier, just 'exported' to the runtime) 11:09
.tell MasterDuke that (FLAGVAL (ALL ...)) does not compile well yet since it'd need to be translated to (IF (ALL ...) (CONST 1) (CONST 0)) 11:10
yoleaux brrt: I'll pass your message to MasterDuke.
brrt and we don't do that translation just yet 11:11
it's on the TODO list
huh, that gives me an idea 11:12
to implement (COND) operators and to translate ALL/ANY to these 11:13
anyway
the idea of the second option is that the write barrier is below the abstraction barrier provided by the VM; that if we were ever to change the implementation, we'd need to implement them consistently; that a store-with-write-barrier is a common enough operation to do (and perhaps optimize) 11:14
(the idea being that we'd have an in-memory form of (COND) that would be (COND ((TEST (...) LABEL)) ((TEST ... LABEL2) (BLOCK (label1 ...) (label2 ...)))) 11:20
also, one potentially nice thing about the second option, is that we *might* be able to get away with not spilling the operands if the write barrier is their last usage 11:22
(and i've just checked the lego jit implementation of the write barrier in bindlex, and it's implemented after the store, so that's actually a feasible idea) 11:29
i
i think i'm in favor of the STORE_WRITE_BARRIER operator 11:30
i'll think about it some more. advice appreciated 11:31
timotimo whoops, the heap snapshot for the json_fast race benchmark is 1.1 gigs big 12:33
it's only got 44 snapshots in it :|
m: say 1 * 1024 / 44 12:35
camelia 23.272727
timotimo not wildly unreasonable, but still rather gigantic 12:36
m: say 1.1 * 1024 / 60 12:44
camelia 18.773333
timotimo can that be right? the heap analyzer does almost 19 megabytes per second of data reading? 12:45
that's pretty good
how can it only need 60 seconds to do a "summary all" for this 1.1 gig file? 12:49
MasterDuke .tell brrt i did actually try re-writing it to `(template: ishash (if (all (nz $1) (eq (^getf (^repr $1) MVMREPROps ID) (const (&QUOTE MVM_REPR_ID_MVMHash) int_sz))) (const 1 int_sz) (const 0 int_sz)))`, but it still give an MVM_oops in the same spot 12:50
yoleaux 11:10Z <brrt> MasterDuke: that (FLAGVAL (ALL ...)) does not compile well yet since it'd need to be translated to (IF (ALL ...) (CONST 1) (CONST 0))
MasterDuke: I'll pass your message to brrt.
MasterDuke timotimo: it's multi-threaded, right? 12:51
timotimo and how!
especially "summary all" uses many jobs at the same time 12:52
(but limited using a "ticket" system to prevent massive ram usage)
but it still balloons up to about 3 gigs of ram usage - even though I try to clear out the data in between snapshots! 12:53
MasterDuke timotimo: you know, istr that i perf recorded writing out a large sql profile around when i introduced that option, and back then it was something related to IO (e.g., writing or encoding bytes) that was top of the exclusive list, not MVM_profile_instrumented_mark_data 13:03
13:38 greppable6 joined
MasterDuke timotimo: any thoughts about github.com/MoarVM/MoarVM/pull/812 ? 15:23
timotimo we could probably use the start time of the profile itself there, it's in there somewhere ;) 15:30
i'm writing version 2 of the initial grant report and introductory blog post 15:32
i'm almost to 1k words again, but this time i feel like i covered most stuff 15:33
MasterDuke ooo, looking forward to it
timotimo wanna read the current state? 15:34
MasterDuke sure
timotimo wakelift.de/p/df27709e-95d7-433e-ac...b61914e92/ - does this work for you?
MasterDuke yeah
"that It took" 15:35
timotimo probably started out as "that I took" 15:38
the spell checker is complaining about things like "blog" and ā€“ and 64 and 3ā…“ and 19 15:39
MasterDuke looks good 15:40
timotimo: i thought about putting the profile start, but i don't think it's 100% accurate. since the spesh worker could (and i think is in the test case we're using) start before the profiling starts? 15:44
timotimo yeah, it can, but i believe the user is interested in how much time during the profiled period was spent in spesh 15:45
MasterDuke i could add something like `if (ptd->cur_spesh_start_time == 0) ptd->cur_spesh_start_time = ptd->start_time` in MVM_profiler_log_spesh_end 15:49
timotimo that sounds good to me 15:50
MasterDuke hm, or should i just set ptd->cur_spesh_start_time to the same value as ptd->start_time is set to in get_thread_data? 15:57
16:14 buggable joined
MasterDuke timotimo: github.com/MoarVM/MoarVM/pull/812 updated 16:18
timotimo oh, that's what you meantby that 16:19
it's not really a problem, but it's strange to have the spesh_start_time set on every thread, because it's only read from one thread at all
MasterDuke hm. think it is better as that conditional in MVM_profiler_log_spesh_end? 16:20
timotimo yeah 16:21
MasterDuke updated 16:29
timotimo my blog now has a "logo" 16:41
MasterDuke a peeled orange? 16:43
timotimo i have no idea
wonder if i should try to get the title into one line
hah, non-breaking spaces ftw? 16:44
looks terrible, it goes over the line %)
now it fits on my end 16:45
16:46 domidumont joined
timotimo MasterDuke: i changed the title from "Timotimo typed this!" (a fun alliteration) to "my Timotimo \this", which is literally a timotimo-typed this! 16:46
MasterDuke heh. i do see the s overlapping the vertical line 16:47
not now 16:48
dogbert17 o/ can a spesh log point out a spesh related bug or are they mostly useful for other things? 16:58
nine .tell brrt in general, higher level semantics (like a STORE_WRITE_BARRIER) make it easier for a compiler to apply optimizations, so I'd favor that. 17:03
yoleaux nine: I'll pass your message to brrt.
MasterDuke nine: weren't you saying something recently about some new optimizations you found on a train ride? have those already been committed? 17:07
timotimo dogbert17: you can figure out if a piece of code has been misoptimized by looking at the before/after, and you can check out why a certain specialization exists or doesn't exist by looking at the logs and specialization plans 17:13
dogbert17 hmm, but the log is over 200 megs. Should I try to find something in the log pointing at the src line where the problem occurs? 17:42
timotimo you can 17:43
dogbert17 let me check ...
timotimo i usually search for the routine name if it's sufficiently rare
but you'll find the name pop up in logs of routines that call the one you're searching for
dogbert17 I think it is
timotimo so maybe search for "Specialization.*your_routine" 17:45
MasterDuke timotimo: btw, anything else for github.com/MoarVM/MoarVM/pull/812 ? 17:46
timotimo i think i'm happy with that
MasterDuke cool, mind merging? 17:47
timotimo sure 17:48
Geth MoarVM: 760b0913e8 | MasterDuke17++ (committed by timo) | src/profiler/log.c
Fix for gigantic and wrong spesh time in profiles

Because spesh workers could start before profiling starts,
  `MVM_profiler_log_spesh_end` could be called before any
  `MVM_profiler_log_spesh_start` calls. We were then logging current time
  - 0, instead of current time - spesh start time. Fix by setting the
spesh start time to the profiling start time if it's 0.
MasterDuke thanks
timotimo "fix for wrong profile timos" :P
MasterDuke heh 17:49
dogbert17 it I read this right (probably not) it specializes the routine twice during the run. Is that possible?
timotimo very possible 17:51
usually for different callsites, for example for calls with one argument, or for calls with two arguments, or for calls with one named argument called "bob"
dogbert17 the first time I see: 17:53
After:
Spesh of 'get-proxy' (cuid: 95, file: site#sources/FD28A8E22DFE16B70B757D9981C7B6C25543060C (HTTP::UserAgent):368)
Callsite 0xb76f0904 (2 args, 2 pos)
and the second: 17:54
After:
Spesh of 'get-proxy' (cuid: 95, file: site#sources/FD28A8E22DFE16B70B757D9981C7B6C25543060C (HTTP::UserAgent):368)
the two arguments disappeared
timotimo if it doesn't have a "callsite" line, it's probably a "certain optimization"
dogbert17 there is no such line the second time 17:55
timotimo i.e. "there were many different calls, but none of them accounted for a significant portion, so just compile a catch-all with basic optimizations"
look upwards for the specialization plan
it should explain what's going on
dogbert17 spam warning 17:57
Observed type specialization of 'get-proxy' (cuid: 95, file: site#sources/FD28A8E22DFE16B70B757D9981C7B6C25543060C (HTTP::UserAgent):368)
The specialization is for the callsite:
Callsite 0xb76f0904 (2 args, 2 pos)
Positional flags: obj, obj
It was planned for the type tuple:
Type 0: HTTP::UserAgent (Conc)
Type 1: HTTP::Request (Conc)
Which received 1 hits (100% of the 1 callsite hits).
The maximum stack depth is 22.
timotimo i'd prefer a paste service of some kind
dogbert17 agreed :-)
is that the plan or did I miss it 17:58
or are youreferring to the 'before' part followed by BB 0, BB 1 etc 18:00
18:42 zakharyas joined 18:47 zakharyas joined 18:52 zakharyas joined 18:55 zakharyas joined 19:03 zakharyas1 joined 19:07 zakharyas joined 19:17 zakharyas joined
timotimo no, just that you pasted so much directly into irc 20:23
MasterDuke i think his question was about "look upwards for the specialization plan" 20:27
timotimo oh, sorry
i was afk for a long while :( 20:28
and will be again
but yeah, that's the specialization plan
there'll be another one for the same routine
probably further up above still
dogbert17 timotimo: I'm learning :-) check out gist.github.com/dogbert17/adf0f171...f404f7d4fb when/if you habve time 20:33
timotimo so the commits that brought this bug to you seem to indicate something goes wrong with inlining 21:37
in which case it'd be prudent to see if the code from the routine that troubles you got inlined somewhere 21:38
huh, line 384 is in there twice; is that what gets inlined? or is it being inlined into get-connection? 21:45
should be possible to figure out where exactly it goes wrong with spesh bisection; does jit-bisect do that or do we have a separate script? 21:46
dogbert17 timotimo: have you tried running the code ? 21:50
that way you'll have a complete spesh log not the excerpts I gisted
timotimo i have not yet 21:51
dogbert17 I ran the spesh_bisect tool on a slightly modified version (10 iterations then exit) and it said: MVM_SPESH_LIMIT=3873 22:05
timotimo OK, then the last specialization in the spesh log is likely the culprit 22:14
or at least a kind of "tipping point" 22:15
22:40 Kaiepi joined