01:40 benchable6 joined 01:57 ilbot3 joined 02:49 greppable6 joined
Geth MoarVM/grisu: af2eb8a7f7 | (Zoffix Znet)++ | src/math/grisu.c
More post-Grisu3 Num renderer polishing

  - Fix a couple of rendering issues introed in
   earlier mod[^1]
  - Add more cases of handling decimal positions
   like we had before Grisu3 stuff.
  [1] github.com/MoarVM/MoarVM/commit/8841c4241b
03:44
MoarVM: zoffixznet++ created pull request #823:
Stringify Num using Grisu3 algo / Generalize mp_get_double() routine
04:06
MoarVM/master: 5 commits pushed by (Zoffix Znet)++
MoarVM: Kaiepi++ created pull request #824:
Allow NativeCall support for wchar_t
05:09
05:32 SourceBaby joined 05:35 SourceBaby joined 05:41 SourceBaby joined 05:42 SourceBaby joined 05:44 SourceBaby joined
samcv so I've decided after much thinking that while i'll retain the shiftjis name in the shiftjis decode/encode functions, the name of the encoding is going to be referenced as windows 06:43
windows-932 since that's the official name of the extensions that it uses, and make sure we don't encounter issues if or when i add support for the baseline shiftjis standard 06:44
though maybe not the most common name it's know by, it's accurate
Geth MoarVM: 4d3fc2818d | (Zoffix Znet)++ | src/math/bigintops.c
Fix handling of denormals in nqp::div_In

Makes 3.1x faster: nqp::div_In with 2**1020 Ints
   19.3x faster: nqp::div_In with 2**1020100020 Ints
Fixes RT#130155: rt.perl.org/Ticket/Display.html?id=130155 Fixes RT#130154: rt.perl.org/Ticket/Display.html?id=130154 Fixes RT#130153: rt.perl.org/Ticket/Display.html?id=130153 ... (19 more lines)
07:00
synopsebot RT#130155 [new]: rt.perl.org/Ticket/Display.html?id=130155 [BUG] Rat operations give bogus underflow
synopsebot RT#130154 [new]: rt.perl.org/Ticket/Display.html?id=130154 [BUG] Int/Int gives bogus underflow
RT#130153 [new]: rt.perl.org/Ticket/Display.html?id=130153 [9999][BUG] Int**Int yields bogus overflow
07:01 evalable6 joined
Geth MoarVM: a5ed7ea5ed | (Zoffix Znet)++ | src/math/bigintops.c
Tweak naming of double mantissa size define

Tis teh bits; ain't digits
07:08
MoarVM/master: 5 commits pushed by (Samantha McVey)++ 07:31
08:06 domidumont joined 08:12 domidumont joined 08:32 lizmat joined 09:48 evalable6 joined
MasterDuke timotimo: here's another --profile segv. `$ = (1,2,3,4,5).max for ^100_000` 10:18
backtrace (minux ~87k lines) gist.github.com/MasterDuke17/a0ced...445d90da23 10:32
10:34 Ven`` joined 10:57 Ven`` joined
timotimo 87k lines? holy crap. maybe that's actually a stack overflow? 11:29
MasterDuke it was just repeats of `#87351 0x00007ffff76c955c in dump_call_graph_node (tc=tc@entry=0x555555758c60, pds=pds@entry=0x7fffffffd540, pcn=0x5555557bb610) at src/profiler/instrument.c:420` 11:30
timotimo yeah
not the same addresses though, right?
MasterDuke correct, different addresses for pcn 11:31
timotimo i have no explanation for this deep of a call graph yet 11:33
MasterDuke runs just fine with MVM_SPESH_INLINE_DISABLE=1 11:36
timotimo i wonder if my recent inline fix was just bogus? 11:38
MasterDuke you definitely fixed something, right? maybe there's just more to fix that was uncovered by that change 11:41
timotimo the fix got in right *after* the release, though, right?
MasterDuke i think. fwiw, i'm at HEAD 11:43
timotimo ok so at least it's actually a stack overflow, not some other kind of crash 11:52
so, the good news is, i should probably be able to reduce the stack size of dump_call_graph_node a little 11:59
MasterDuke so we get more iterations in before it overflows? 12:00
timotimo yeah 12:02
it should definitely not grow that big, though
actually, perhaps i can't do better than 144 bytes 12:05
wait, did i just confuse bits and bytes again %) 12:06
OK, the one 144 frame is now two frames, the one that'll remain on the stack more often is 80, the other one is 128 12:19
not quite as good as i had hoped, maybe i can do something about it yet.
MasterDuke nice 12:21
timotimo cool, 64 and 128 now 12:23
MasterDuke nicer 12:24
timotimo it now reaches the "write profile file" stage 12:26
12:27 benchable6 joined
timotimo just 160 megs! 12:28
MasterDuke does it actually write it out?
timotimo yup
MasterDuke good deal 12:29
Geth MoarVM/profile_dump_less_stack_usage: 90b05c81b9 | (Timo Paulssen)++ | src/profiler/instrument.c
profile: extract recursion loop for smaller stack frames

dumping call graphs used to put 144 bytes onto the stack for every slice of recursion, now it'll deposit just 64 bytes for every slice and put a 128 byte frame on top to do most of the actual work.
12:31
timotimo MasterDuke: could you have a look at what the stack actually looks like? i'll try to enjoy the last bit of sun on the balcony 12:32
MasterDuke what do you mean? 12:33
with your branch?
timotimo: gist updated with output of `info frame` at the segv 12:36
timotimo oh, i meant the profiled call stack 12:52
so we have almost 100k of each pull-one, infix:<cmp>, infix:«>», return, max, iterator-and-first, is-lazy, iterator, ReifiedList, one anonymous routine, new, and SET-SELF, the last 4 out of Rakudo/Iterator 12:57
and almost 100k of the -e frame
it seems to think that the -e frame is calling itself over and over 12:58
MasterDuke huh. think it's a problem in how the profiler is recording the data? or in how spesh is doing inlining? 13:10
timotimo not sure yet 13:11
14:02 AlexDaniel joined 14:22 greppable6 joined 15:00 robertle joined
timotimo what was the second-to-last profile example that segfaulted? :| 15:58
15:58 committable6 joined
dogbert11 second to last? 15:59
from me or MasterDuke?
timotimo i'm not entirely sure :|
dogbert11 perhaps this gist.github.com/dogbert17/750ffbf9...72b70779d8 16:01
timotimo let's see 16:03
dogbert11 do you have a new theory? 16:04
timotimo that used to segfault?
dogbert11 yes
timotimo doesn't any more :)
dogbert11 timotimo++
it worked if you turned off inlining 16:05
timotimo same as the one today, then
dogbert11 perhaps your fix works there as well 16:06
timotimo it did
dogbert11 cooool
timotimo but it just makes it non-explosive
the call graph being so huge is still bogus
dogbert11 aha, one mystery left then
I have one small script where the profile get 150 megs 16:07
timotimo that's likely the same underlying issue 16:08
16:09 benchable6 joined
dogbert11 if it helps, I can tune a script so that the profile, albeit buggy, is quite small 16:10
timotimo 93.2% (5054407876196421ms) 16:11
*sigh*
Infinity% (14.95ms)
oops, did i say it doesn't crash any more 16:12
it just doesn't crash reliably
dogbert11 FWIW if I run the profile under valgrind it does not SEGV 16:22
and the generated profile doesn't look bugus 16:24
timotimo the call graph is suspiciously deep 16:27
dogbert11 my profilare you referring to the long spike on that page 16:28
wanted to write; are you referring to the long spike on that page 16:29
timotimo yeah
got something to show you
once i figure out how to work this GUI here
dogbert11 clicking it gives 'push-at-least SETTING::src/core/Iterator.pm6:49 ' 16:30
timotimo i.imgur.com/htmC15q.png
see how the structure is suspiciously similar?
dogbert11 yes 16:31
timotimo i think we're accidentally forgetting to handle a prof_exit and thus recursing too deep 16:32
dogbert11 the allocations page also looks a bit strange
there are many lines mentioning the same thing, e.g. Rakudo::Iterator::CountOnlyBoolOnlyDelegate is mentioned 40-50 times 16:34
timotimo that does seem wrong, yeah
dogbert11 instead of being aggregated
same with '<anon|23>+{Rakudo::Iterator::CountOnlyBoolOnlyDelegate[<anon|33>]}' and 'Rakudo::Iterator::CountOnlyBoolOnlyDelegate[<anon|33>]' 16:36
timotimo it's small enough that i can do it without --minimal, i.e. get the function names in the graph, too 16:42
dogbert11 does that give you any new clues? 16:43
timotimo let's see
hm, so, the implementation of List:D's ACCEPTS is deeply recursive 17:31
this list's accepts is called from junction's ACCEPTS 17:32
let's just say that at some point this code would have a stack trace that'd have a hundred or so ACCEPTS in a row 17:33
just replacing the ~~ there with eq makes the profile really tiny 17:38
i think we see that one type so many times because we actually create many types of that name 17:39
dogbert11 shouldn't they be aggregated? 17:41
timotimo not if they are different actual types 17:42
MasterDuke so does anything very recursive blow up profiles? 17:44
17:47 FROGGS_ joined
timotimo yep 17:47
thing is, we create one role for every time we mix in the CountOnlyBoolOnlyDelegate 17:50
which is a role that just forwards a call to bool-only and a call to count-only to another iter object's method of the same name
i'd think it'd be better to have an attribute mixed in by the role that stores this delegate target
i'm not sure why the name shows up as anon|23 and anon|33, like, how do these numbers get so short? 17:51
MasterDuke that would just help/fix that particular bit of code, right? 17:54
timotimo i expect this causes a bit of performance degradation in everything that uses this particular piece of rakudo iterator tech 17:55
MasterDuke i'm kind of impressed, `sub f($n) { $n * ($n < 2 ?? $n !! f($n - 1)) }; say f(40_000)` only created a 19mb profile
timotimo also, it's not the cause for the huge profile; the use of ~~ against lists/junctions is
MasterDuke ah
dogbert11 the program actually generates a MoarVM panic if MVM_GC_DEBUG=2 17:57
message is 'MoarVM panic: Invalid assignment (maybe of heap frame to stack frame?)'
timotimo yeah, i was hunting that yesterday
dinner proparation time now, though
No such method '!set-delegate-target' for invocant of type '<anon|23>+{Rakudo::Iterator::CountOnlyBoolOnlyDelegate}'. Did you mean '!set-delegate-target'? 18:11
:|
got it down 18:16
one Rakudo::Iterator::CountOnlyBoolOnlyDelegate
one <anon|23>+{Rakudo::Iterator::CountOnlyBoolOnlyDelegate}
so ... 18:24
mixin goes from 21.6ms inclusive time down to 2.77ms inclusive time
with the same amount of calls to it :)
notably because generate_mixin now gets called only once, rather than the 39 times that mixin itself is called
same for set_is_mixin and setup_mixin_cache 18:25
oh, wow
from 2 gc runs down to 1
dogbert11 what are you up to ? 18:27
timotimo i'm not sure if these two profiles were measuring the same code 18:28
18:36 robertle joined 18:52 Kaiepi joined
timotimo i'll probably change it back a tiny bit so that it's actually parameterized on something, but only on the target iterator's type 19:15
that could give us better specializations, i think 19:16
though we may not be calling bool-only or count-only a million times in regular code 19:17
19:22 bisectable6 joined
Geth MoarVM: 67e5093f0e | (Timo Paulssen)++ | src/debug/debugserver.c
only suspend on actually must-suspend breakpoints
20:12
FROGGS jnthn: I found something interesting when hunting the DBDish::mysql instability 20:37
this causes a double free quite often: github.com/MoarVM/MoarVM/blob/mast...rp.c#L5040
timotimo oh, huh! 20:38
jnthn Oops...but also, huh...shouldn't that only set during type creation? 20:39
timotimo probably should, yeah 20:40
FROGGS aye
timotimo .o( put a lock on it )
FROGGS these two got in: 20:41
// debugname = 0x7fffe44d7010 "NativeCall::Types::Pointer[MoarVM::Guts::REPRs::MVMArrayB]"
// debugname = 0x7fff7c080fa0 "Array[DBDish::mysql::Native::MYSQL_BIND]"
20:42 lizmat joined
FROGGS okay, in Pointer.^parameterize we call .^set_name, which calls that op 20:42
that's still during type creation, right?
timotimo should be, yeah
the only way to double-free there is to get two threads into the tiny space between MVM_free(STABLE(obj)->debug_name) and STABLE(obj)->debug_name = debugname 20:43
which is possible if the encode_C_string causes GC i suppose
FROGGS that's what I thought too
timotimo this would be a time for helgrind if it weren't so noisy due to many false-positives :( 20:44
FROGGS wow, yes, it says a lot 20:52
hmmm 20:54
MVM_string_utf8_encode_C_string cannot cause GC, right? I mean, it returns a char* 20:55
timotimo oh, yes
jnthn It'd seem to mean that two threads are trying to concurrently set the name 21:00
FROGGS the type named Array[DBDish::mysql::Native::MYSQL_BIND] is on DBDish::mysql::Connection, which get's need'ed not use'ed... 21:06
but need was just loading at compile time but no imports? 21:07
okay, change it to "use" makes no difference, but that's no surprise 21:10
timotimo did you grab some tracebacks?
FROGGS a backtrace from gdb? 21:11
timotimo call MVM_dump_backtrace(tc)
^- literally type that into gdb
FROGGS right
(gdb) call MVM_dump_backtrace(tc) 21:14
Invalid cast.
dogbert11 is call the same as p ?
timotimo just to be sure, tc is available in your currently selected frame?
no, p is call + print
FROGGS (gdb) p tc 21:15
$1 = 1.4616321449683623412809166386416848
timotimo lolwat
tc is supposed to be a pointer to a MVMThreadContext :)
of course we don't have full control over everybody's code
so maybe you're inside mysql's code or something?
FROGGS (gdb) p MVM_dump_backtrace 21:16
$2 = {void (MVMThreadContext *)} 0x7ffff76ac870 <MVM_dump_backtrace>
so, that's in place at least
lizmat timotimo: re github.com/rakudo/rakudo/commit/9f...1f0bfR1430
that being a private method, is that correct?
timotimo oops, it is not correct! 21:17
FROGGS MVM_interp_run (tc=0x5cf6, tc@entry=0x7fffdc0f3290, initial_invoke=0x0, invoke_data=0x6, invoke_data@entry=0x7fffdc13c4b0) at src/core/interp.c:5040
lizmat also: github.com/rakudo/rakudo/commit/9f...1f0bfR1436
FROGGS which tc shall I use?
lizmat ok,
timotimo: will fix it :-)
timotimo too late
this does probably mean that this is untested by the spec test suite 21:18
otherwise i wouldn't have gotten a PASS
lizmat yeah :-)_
timotimo lizmat++
i wish we could get the supervisor process to not allocate anything on the heap :) 21:19
FROGGS timotimo: look gist.githubusercontent.com/FROGGS/...tfile1.txt
timotimo oh, huh. i wonder if NativeHelpers needs to have its guts updated perhaps? 21:20
it does kind of do terrible things with internal structs if i remember correctly
the code start { }; sleep 60 will run the GC 8 times 21:21
ah, getrusage-total allocates, of course
lizmat timotimo: perhaps we shouldn't use a sub for that ? 21:22
timotimo no, it's the nqp::getrusage op 21:23
lizmat ah, ok
timotimo the profiler didn't add allocation logging to that op yet; it'll show up in the next profile i'll take
FROGGS okay, here we create types at runtime: github.com/salortiz/NativeHelpers-...ob.pm6#L27 21:24
timotimo hm. we don't actually have anything to log the allocation of nested objects. like getrusage will create a hash with a bunch of numbers in it, but we'll only count the hash 21:26
ah, not a hash, a BOOTIntArray 21:27
that's a lot better than i thought
5.8k objects in the 60 seconds i slept
in theory we could change getrusage to write to an int array you pass it, so we could re-use that … 21:28
lizmat that sounds like an excellent plan
this would sit well with race conditions I guess
timotimo how do you mean? 21:29
lizmat ah, no, somehow I was thinking it used one buffer internally (probably P5 thinking)
the whole issue was that it created a new one each time, right ?
timotimo oh, no, we'd keep the array in our "user" code 21:30
a perl6-level getrusage sub would allocate the array and fill it immediately
lizmat and giving it a list_i puts the responsibility in the hands of the dev
timotimo but the TPS could re-use the same object over and over
lizmat right
timotimo FWIW, there are other objects we allocate a whole lot more of
lizmat such as ?
timotimo Num, Scalar, BOOTCode, NumLexRef, BOOTHash, IntLexRef, IntAttrRef are all above 10k 21:31
Num at 88k
lizmat hmmm... we shouldn't have any Nums :-(
timotimo you think?
lizmat aaaahhhh lemme see 21:32
timotimo we do have a bunch of native num attributes, those shouldn't be making Num objects, but that's simply a case of "we box just to immediately unbox" i bet
lizmat there *is* a infix:<*> involved 21:33
timotimo well, from the allocation numbers i see 41k from Int.pm6's Bridge method, 17.6k from Num.pm6's infix:</>, 17.5k from Real.pm6's infix:</> and *then* Num.pm6's infix:<*> 21:34
a big portion of Scalar allocations seem to come from iterating over a Range, it seems like 21:35
i.e. prefix:<^>, iterator from Range.pm6, line 1703 from Rakudo/Iterator.pm6, SET-SELF from 5 lines below that, 11.7k Scalars in Range.pm6's SET-SELF, and 17.6k from Range.pm6's new 21:36
i'll be AFK for a bit
anyway, the range in question is most probably from prod-affinity-workers 21:38
sounds like an easy win to me 21:39
lizmat looks 21:40
timotimo perl6 --profile -e 'start { }; sleep 60' and then look at the profiler's allocations tab 21:43
lizmat timo: I think I got a good idea what's going on there 21:59
timotimo: but am too tired now to tinker with this right now
(another 6 hours in the car today)
so, will look at it tomorrow 22:00
timotimo OK, maybe i'll just do it right now :) 22:09
you rubberducked good, though :) 22:10
just by replacing the for ^worker-list.items with a loop ( ) loop i got it down to 6 gc runs in the same time it did 8 before 22:20
i might need to run 2 minutes to get more precise measurements 22:21
MasterDuke is it faster?
timotimo it runs pretty much exactly 60 seconds :P 22:22
MasterDuke heh 22:23
timotimo i should have run it with "time" to get proper cpu time measurements
The profiled code ran for 60005.54ms. Of this, 29.98ms were spent on garbage collection (that's 0.05%).
The profiled code ran for 60005.6ms. Of this, 28.19ms were spent on garbage collection (that's 0.05%).
that's before -> after, so somehow it got the tinyest bit slower. which i'll just call "noise" :) 22:24
now i'm down to 5 collections
124238 (60.21%) 22:26
56587 (61.84%)
50127 (59.79%)
interesting development (this is call frames in total and percentage eliminated via inlining) 22:27
aha, i see why getrusage-total isn't being jitted. it's mostly being entered inlined via a frame that also has nqp::cpucores, which isn't jitted 22:30
Geth MoarVM: b1f64db89b | (Zoffix Znet)++ | src/core/coerce.c
Add missing include for Grisu3 dtoa function

Fixes github.com/MoarVM/MoarVM/issues/825 M#825
22:32
synopsebot M#825 [open]: github.com/MoarVM/MoarVM/issues/825 implicit function declaration compiler warning for function ‘dtoa_grisu3’
timotimo hah, now it's bailing on lseep 22:34
sleep*
lovely! 22:42
jit-compiled frames: 96.93% (120908) 22:43
22:50 greppable6 joined
timotimo cool, got rid of some NumLexRefs 22:50
and Num on top of that 22:51
both prod-affinity-workers and .sum allocate a hash for named arguments even though they don't get passed any; wonder why that happens 22:53
probably deopt annoyingness 22:55
huh, prod-affinity-workers doesn't show up in the spesh log :| 22:57
MasterDuke nice. i've never really figured out how to reduce allocations of things
22:57 evalable6 joined
timotimo oh, no, it is in there, i just misspelt it 22:58
oh, wrong again 22:59
funny, it speshes prod-affinity-workers, but only if profiling is turned on. so maybe the profiling overhead makes the cpu usage go up a tiny bit and makes the scheduler decide to create an additional worker? 23:02
MasterDuke what if instead you start some other thread doing random work? 23:05
timotimo then the impact on gc runs won't be as visible 23:09
running a 120s profile now 23:10
changed a few / to nqp::div_n
oh, huh, 5 gc runs for 120s as well 23:13
oh wow
m: say [+] 53084, 17770, 17760, 13444, 6303, 5959, 5948
camelia 120268
23:13 committable6 joined
timotimo m: say [+] 24873, 12066, 11896, 11877, 11867, 11861, 11857 23:14
camelia 96297
timotimo m: say (120268 * 2) R/ 96297
camelia 0.4003434
timotimo so we're now only allocating 40% as many objects - though i didn't account for how big each object is
MasterDuke that's a big reduction in count! 23:15
timotimo aye
i'll run 5 minutes now 23:16
jnthn was rather not happy about the thought of making the ThreadPoolScheduler's code less readable, though 23:18
MasterDuke i would think if it's been proven bug-free for a while now he'd be more amenable to optimizing it 23:19
can always make a PR for comments 23:20
timotimo OK, so 8 GC runs over 5 minutes, that's not so bad 23:21
whoops, i made it no longer allocate as many BOOTCode (hardly any any more) but also made prod-affinity-workers unjittable 23:27
MasterDuke "Timotimo's Choice" 23:28
timotimo queuepoll's not jitted :) 23:29
MasterDuke jit all the ops!! 23:30
timotimo yeah why not :P 23:33
vim shouldn't let me ctrl-p into the install/ folder %) 23:38
actually, there shouldn't be an install folder under moarvm/ anyway
got the frame jitted again, yay 23:40
23:43 Kaiepi joined
MasterDuke and not allocating as many BOOTCodes? 23:48
timotimo 71 over the course of the whole run 23:49
MasterDuke cool beans 23:53
timotimo i.imgur.com/3upvzFE.png
the tinyest difference %)
huh, the bytecode at the end is actually bigger 23:56
OK, the devirtualized calls are actually more arguments to put on the stack 23:57
but the call itself is more direct
m: say 30290 - 29954 23:59
camelia 336
timotimo that is not much