MasterDuke returning to my previous question. maybe a `--for-packaging` MoarVM build option that creates both a moar and moar-debug executable? 00:01
and also editing perl6-(gdb|valgrind)-m to wrap the calls to the moar executable in something like `if [-d /path/to/moar-debug] then $MOAR = /path/to/moar-debug else $MOAR = /path/to/moar elif`
* s/-d/-e/
and changing topics yet again, this is what heaptrack says now for that test script i.imgur.com/mb3gwYv.png 00:04
AlexDaniel MasterDuke: hey, is it the memory issue? 00:40
MasterDuke: if yes, have you seen this? rt.perl.org/Ticket/Display.html?id...xn-1481074
timotimo must make sure that perl6-gdb-m won't ever use an outdated moar because the user has recompiled and maybe forgotten the flag the second time or something 00:43
MasterDuke i'm not too too worried about users who do re-compile themselves, they should know what they're doing 00:55
more the distro package user who wants to supply a useful error report
timotimo mhm 00:56
bedtime, bbl
MasterDuke AlexDaniel: yeah, that latest heaptrack report is at HEAD 00:58
AlexDaniel MasterDuke: with moar on HEAD also? So it's not fixed? 00:59
MasterDuke yeah, it's improved 01:00
actually, it (HEAD on all three) has the lowest bytes allocated in total (according to heaptrack( 01:04
anybody around know what is the max size that can be allocated with the fixed size allocator? 01:14
01:52 ilbot3 joined
MasterDuke jnthn, timotimo: is this completely wrong? gist.github.com/MasterDuke17/df4f8...2b233ae294 fwiw, it passes nqp's `make m-test` and rakudo's `make m-test m-spectest` and here's what heaptrack thinks of that test script afterward i.imgur.com/toaZC85.png 03:37
03:53 pharv_ joined 04:43 geekosaur joined 07:22 pharv_ joined 08:21 Voldenet joined 09:24 lizmat joined 09:37 dogbert17 joined
lizmat so what causes an param_rp_o ? 10:31
timotimo lizmat: "required positional, object" 11:02
jnthn MasterDuke: It's in fixedsizealloc.h, but it will be a good bit smaller than the 64KB that libuv requests. The FSA falls back to malloc. 11:13
MasterDuke: Meaning that I'd be somewhat surprised if your patch changed much 11:14
Unless libuv's request is somehow more realistic for you? 11:15
lizmat: param_rp_o is what almost every required positional parameter that's not a native type compiles into. 11:16
lizmat yeah, figured the native bit
:-)
timotimo is there any worth in trying to teach MVM_spesh_args to continue in some cases?
like it could keep param_rp_{i,n,s} around, but turn a param_rp_o into a sp_* when the callsite says there's an object there 11:17
also checkarity could go, for example
it doesn't have to bail out completely just because it couldn't lower a param_rp_i or something 11:18
lizmat method a(int $a) # doesn't JIT
method a(Int:D $a) # does JIT
jnthn I'd need to see what cases it's managing to miss out on 11:22
timotimo aye, the speshlog would be helpful 11:40
MasterDuke jnthn: i hardcode on_alloc to use 1024, instead of the suggested_size. according to heaptrack, the "bytes allocated in total" goes from 1.1g on HEAD to 664m with my patch, "peak memory consumption" goes from 387m to 257m, and on_alloc goes from 480m at the top of the "most memory allocated" column to not even showing up anymore 11:44
jnthn Hmm, but...then we're optimizing for things that don't do any output buffering. 11:46
And so send us lots of small things
And yeah, on_alloc will go away 'cus all of its allocations will fall under the fixed size allocator... 11:47
MasterDuke what would be a good test case to benchmark with?
jnthn Hm, I guess Perl 5 has output buffering on by default, so just a one-liner to produce a bunch of output? 11:48
Hm, does on_alloc get the handle? 11:49
Also the patch makes every array 8 bytes bigger, which will add up 11:50
hah, it *does* get the handle
MasterDuke yeah, i added: + MVMThreadContext *tc = (MVMThreadContext *)((SpawnInfo *)handle->data)->tc;
jnthn So we could use the last size rounded up to a power of 2
As a predictor
That way we'd tune nicely to things that buffer *and* things that don't 11:51
Though I guess we'd never tune "upwards" again
MasterDuke the last size is in the handle?
i wasn't sure if adding that flag was necessary, or just changing the MVM_free's procops.c to MVM_fixed_size_free's was sufficient 11:53
jnthn Yeah, last size in the handle 11:56
11:56 evanm joined
jnthn Yes, MVMArray takes ownership of the buffer to avoid a copy 11:56
So it would need to release it in the correct way
Oh, and there's a further problem 11:57
If the user pushes onto the buffer
We're screwed
As it'll try to realloc
MasterDuke hm, can that code check the flag? 11:58
jnthn Yeah but...I still don't much feel like making MVMArray 8 bytes bigger for this. Wonder if we can steal a header flag for it instead... 12:02
I think there's two patches here really though
The first to try and get better size choices
Which I'd feel quite unconcerned by
And the one to switch some uses of MVMArray to use the FSA
Which I'm a bit more uncomfortable with 12:03
nine glibc recently got a much faster malloc using a thread local pool
jnthn nine: What happens if you free it on another thread? 12:04
(A likelihood in this use case...)
MVMArray really does need to switch over to using the FSA, though 12:05
Because then we can use the free at safepoint mechanism when it grows
Which would be a step towards avoiding SEGVs if it's abused from multiple threads 12:06
So I guess starting to move that way would be reasonable
We could steal a header flag instead of making every array bigger for it, though :) 12:07
MasterDuke ok, i'll continue playing around with what i started
what do you mean header flag?
jnthn MVMCollectable has a set of flags 12:08
It's a bit field
We use them for GC bits, object ID handling, and similar
But there are some spare ones
We could designate one or two of them as "per REPR"
MasterDuke ah, cool, i'll look there
nine jnthn: no idea :)
jnthn Probably nothing good :) 12:09
Alrighty, lunch time :)
nine jnthn: I figure it must be safe, otherwise that would be a change of semantics
WRT JITing NativeCall it feels like we ought to not do the usual "JIT it when it's hot" thing driven by spesh, but initiate JIT compilation manually on first call of a native sub 12:43
timotimo we can bump up the urgency of a task when we see a nativecallinvoke 12:47
we still need to collect a bit of stats before we'll do anything much
nine What stats would we need? 12:51
Btw. I do wonder why native subs aren't JITed right now. There is code for mapping nativecallinvoke to MVM_nativecall_invoke at least 12:52
timotimo could be because of flattening callsites perhaps 13:04
nine How I do I find that ou? 13:10
out
13:37 HTTP_____GK1wmSU joined 13:39 HTTP_____GK1wmSU left
timotimo hmm 13:45
if you don't mind debugspam ... :)
somewhere there's a check for callsite having is_interned 13:46
it could be tied to the multi dispatch cache 13:48
which bails out if the callsite in question doesn't have is_interned set
jnthn Flattening will break most opts 13:49
timotimo we once had an idea where we could fake up/find an interned callsite if the result of flattening is eligible 13:55
haven't tried implementing that yet 13:56
one thing is the call-me-forwarder which we can't make much smarter because it's shared by everything that has a CALL_ME
that one uses flattening to pass the arguments on, for example 13:57
so that call won't be optimized, sadly
nine It's good that NativeCall doesn't use that anymore then 14:01
At least not anymore for everything but the very first call to a sub 14:02
timotimo oh, i didn't realize we were able to throw that out 14:06
nine timotimo: github.com/rakudo/rakudo/commit/46...ea9779a104 and then github.com/rakudo/rakudo/commit/cd...dbe1f44b1b 14:10
Geth MoarVM: 35c4130001 | (Stefan Seifert)++ | src/jit/graph.c
JIT nativecallcast
14:12
nine I think that ^^^ is my first contribution to something JITy :)
MasterDuke jnthn: i don't see anything quite so obvious as handle->last_size anywhere, do you mean something like sizeof(handle->body.data)? 14:21
timotimo sizeof can only handle compile-time sizes 14:22
MasterDuke right, i guess handle->body.ssize 14:23
without the sizeof 14:24
timotimo is handle a VMArray? 14:26
MasterDuke no, uv_handle_t
->body is definitly wrong 14:27
timotimo uv_handle_t doesn't have any kind of ssize i don't think
i'm not sure we're allowed to access uv structs or if their contents are implementation details we're not supposed to change ourselves
MasterDuke "jnthn: Yeah, last size in the handle"
i'm trying to find the size of the last read 14:28
irclog.perlgeek.de/moarvm/2017-08-12#i_15005128 here for a little ways is my conversation with jnthn 14:29
Geth MoarVM: 9a3563aebc | (Stefan Seifert)++ | 2 files
JIT trunc_i32
14:33
MoarVM: a0b6e0dced | (Stefan Seifert)++ | src/jit/graph.c
Move trunc_i32 case to where the op fits the comment
14:34
timotimo oh, we can (or maybe already do) store the last size in the SpawnInfo 14:35
if we store both the amount of data read and the size of buffer we last allocated we can be clever about growing our "recommendation" 14:37
i.e. if we've filled up our last buffer completely, go at least one step up for the next one
MasterDuke i don't see a size in SpawnInfo
timotimo then we can put one (or two as i suggested) there
MasterDuke k 14:38
timotimo did you understand why using the fsa for the on_alloc thing threw on_alloc out of the ranking completely?
MasterDuke yeah, i expected that 14:39
timotimo OK
sadly it's not helpful information at all :(
anyway, i gotta be AFK for a lil' bit
MasterDuke it confirmed to me that i'd done it kind of correctly
timotimo hm, i wouldn't call it that ... 14:40
it's really just hiding what's going on
because heaptrack doesn't understand our allocators it just counts everything that uses the fsa as a single thing
MasterDuke "done it" == use the fsa correctly, in this case
timotimo oh
MasterDuke but yeah, it'd be nice if heaptrack could be told about the fsa and be more useful 14:42
Geth MoarVM: 427546dc83 | (Stefan Seifert)++ | 2 files
JIT extend_i32
nine Funny that trunc_i32 and extend_i32 are actually the same code 14:43
But it kinda makes sense. With trunc_i32 you want to make sure that the highest 32 bit are zeroed and with extend_i32 you want to make sure that the highest 32 bits don't contain garbage, i.e. they are zeroed
Zoffix Indentation in that commit looks borked 14:44
MasterDuke "Heaptrack can in principle also profile such applications, but it requires code changes to annotate the memory pool implementation." wonder if they mean code change in heaptrack or the application?
Zoffix puts one point towards "SPACES" team in the Grand Debate
14:56 japhb_ joined
Geth MoarVM: 4e7264bc37 | (Stefan Seifert)++ | src/jit/emit_x64.dasc
Fix indentation

  Thanks to Zoffix++ for noticing
14:56
japhb_ nine: Does extend_i32 really mean "zero top bits" or "sign extend top bits"?
I would assume extend_u32 would use the same code for both, but not extend_i32 14:57
nine japhb_: excellent point! Probably means the latter
The cdq instruction should do what I need but I have to get off the tram now 15:00
15:01 evanm joined
timotimo yeah, sign extend 15:05
extend_u32 would only zero them out
Geth MoarVM: 60149d3fde | (Stefan Seifert)++ | src/jit/emit_x64.dasc
Fix JITed extend_i32 with negative numbers

Of course we have to sign-extend instead of just zeroing out the higher bits. Many thanks to japh++ for actually thinking :)
15:08
nine My first 64 bit assembly code and my first assembly code this century :)
timotimo nice 15:09
MasterDuke millenium also, right?
timotimo i didn't do any assembly before the turn of the century i believe
nine Tru :)
Ok, the rest of my day will belong to my girlfriend. Wrote the fix in the tram station and Walking home now :) 15:10
jnthn MasterDuke: Sorry I wasn't clear; I meant to add a field storing the last size we actually got :) 15:22
MasterDuke heh, timotimo++ straightened me out. working on it now. what do you think of his idea of storing both the size we got and the size we last allocated? 15:29
timotimo also, if we get like 1/4th of the amount we allocated space for we could realloc the buffer to fit so we don't always copy, but we do copy if it's worth a bunch of memory 15:30
may be a bad idea, of course 15:31
15:48 evanm joined
timotimo hmm. even though the nativecall optimizing compiler uses setcodename i don't see any names in the jitlog for example 15:56
but the things do get jitted, which is very nice
it'd be cool if the generated pieces of code could use locals instead of lexicals 16:01
MasterDuke timotimo: i've lost the point of keeping the two values. why not just always allocate the next power of two greater than the last amount read? 16:19
timotimo oh, if we exactly fill up we'll still go up one power of two? 16:21
that'd be fine, the n
MasterDuke well, why go up in that case? 16:22
timotimo imagine we had 10000000 bytes available to stuff in a buffer
last time we had 128 bytes
there's no way to signal how many bytes we "missed" 16:23
MasterDuke right...
timotimo but we can know that the buffer was fully used up
and then grow
if we only build power-of-two buffers anyway, we don't really have to store what we allocated last time, only how much was actually received 16:24
and that ought to be fine
16:42 pharv_ joined
Geth MoarVM: MasterDuke17++ created pull request #631:
Alloc proc read buffer based on amount last read
17:06
MasterDuke timotimo, jnthn: how's ^^^ look? 17:07
timotimo hm, feels like "next power of two" ought to be easier than that 17:20
stackoverflow has this solution, but also says log2 is highly optimized for modern processors so could also use that and ceil 17:21
17:21 MasterDuke_ joined
timotimo probably no reason to change it 17:21
a little comment about it might be nice 17:22
BBL again
17:26 MasterDuke joined
MasterDuke hm. my patch, which wasn't noticeably slower for 5 iterations, is for 20. ~63s before, ~84s after 17:38
17:50 MasterDuke joined 17:51 ilmari[m] joined
timotimo oh? huh?! 17:58
have you tried debugspamming what values you get for size? 17:59
maybe we need to put a minimum in there?
MasterDuke but, changing to pow(2, ceil(log2(size)) is much faster
17:59 d4l3k_ joined
timotimo oh, really? 18:01
MasterDuke both the bit twiddline and pow+ceil+log2 version give same number, but pow version is about the same runtime as without the patch
wait, or not... 18:02
hm, i've been testing on my laptop, maybe i need to switch to the desktop
afk for a while, will re-test on desktop and report back 18:03
or if someone else runs some tests that would be much appreciated also... 18:04
timotimo thanks for your work
18:06 camelia joined
timotimo dinner time \o/ 18:09
19:46 D33P-B00K joined 19:48 D33P-B00K left
MasterDuke odd. the absolute numbers are a lot worse on the desktop, but the relative change is about the same. ~122s before my patch, ~140s after (same for bit twiddling and pow+ceil+log2) 20:00
though memory used went from 885352maxresident)k to 517192maxresident)k 20:01
all according to /usr/bin/time with that test script and 20 iterations
hm, but adding `if (size < 128) size = 128;` to my patch seems to be the best of both worlds. ~100s and 495656maxresident)k 20:08
20:09 dogbert17 joined 20:30 pharv_ joined
timotimo just submitted a recetn moarvm build to coverity 20:46
51 outstanding defects, it says 20:47
that's outstanding! 20:48
MasterDuke any easy pickings from their report? 20:52
timotimo eh
MasterDuke ooo, reproduced the better numbers on the laptop with adding `if (size < 128) size = 128;` 20:55
timotimo good good 20:56
21:00 zakharyas joined
Geth MoarVM: 7c5a9b4709 | (Bart Wiegmans)++ | src/jit/emit_x64.dasc
Use 32 bit auto-truncating for trunc_i

Because dynasm passes arguments as 32 bit integers the code with a large constant probably doesn't do what we mean anyway, and the desired behaviour is implicit in 32 bit operations on 64 bit registers, so lets use that instead.
21:29
eater I'm getting the error `invalid argument` 21:30
and it's just so generic 21:31
I dont even know where it's coming from
timotimo right, we're probably strerror-ing something
is that really all you're getting?
eater timotimo: input: termbin.com/li9m, output: termbin.com/h8x5 21:32
timotimo ah it's your unix domain socket stuff 21:35
are you familiar with gdb at all?
eater like 3 familiar 21:36
I know how to add breakpoints now
timotimo okay, do you have different line numbers from what's on github right now? for asyncsocket.c?
i.e. did you change stuff there?
eater termbin.com/2xod
these are my changes currently
so yea 21:37
timotimo ah, ok
eater hmm, I'm gonna try breaking at MVM_exception_throw_adhoc 21:38
maybe that gives more info :o
timotimo nope
since it's async we're stringifying the error we get from libuv and sending it to the callback
eater wel
at least I got a call
:)
timotimo i expect this is in read_setup 21:39
especially the part where a comment says "Error; need to notify"
however, since there's the MVMROOT macro there, you won't get proper breakpoints or stepping in gdb
because of course that's not possible >:(
eater :(
timotimo i mean it's somewhat trivial to replace MVMROOT with the corrept temp_push and temp_pop invocations 21:42
21:43 lizmat joined
eater what do you mean? 21:43
because it's not trivial enough for me apparently :p
timotimo hehe
"MVMROOT(tc, foo, {" becomes MVM_gc_temp_root_push(tc, (MVMCollectable**)&foo); 21:44
and "});" becomes MVM_gc_temp_root_pop(tc) i think
eater yay macro's!
timotimo yup
if they at least really were just text substitutions
you know what, i'll build a stage into our compiler that turns MVMROOT into the equivalent 21:45
eater but what does MVM_gc_temp_root_{push,pop} do? 21:46
timotimo how much do you want to learn about GC? :)
eater well, whatever you want me to learn :D
timotimo OK, so the GC we have is a "moving GC", i.e. the addresses of objects may change when the GC has run 21:47
the temp roots are how we tell the GC that our C code is holding on to some of these objects 21:48
so when the GC moves stuff around it'll also update our local variables for us
(also such a root helps keep objects alive, for example if you just created them in the C code)
does that make sense? i'd probably have to elaborate more 21:49
eater it makes sense
timotimo anyway, if you don't root an object, the C code might end up accessing memory where the object used to be
and any changes will no longer have any effect
eater but I'm geuss at each MVMROOT statement it's gonna retrieve the new updated addresses/locations? 21:50
*guessing 21:51
or is it in a MVMROOT saying "hold these for me"
and when you leave it's dead?
timotimo not quite 21:52
eater or at least the reference counter goes down something like that
timotimo the GC code will actually change what's at that address
so you only pay the price if an actual GC run happens in between
we don't have reference counts
the temp roots is just a little stack the threadcontext holds 21:53
that's also the reason for naming it push and pop
eater I see :) 21:54
timotimo the mvmroot thing is nice because you can't accidentally forget to pop wrong 21:55
eater yeah I can understand it's neat
timotimo 1) the nested { } prevent you from calling pop but actually popping a different variable
2) you can't forget to remove a pop (or reduce the number of things to be popped for the multi-pop variant) 21:56
yeah, we're already in agreement and i keep babbling on, aren't i
eater haha 21:57
it's fine :)
at least I know you're very happy with it :D
21:58 pharv_ joined
timotimo anyway, if you do the replacement with a push and a pop, you'll be able to debug into thee 22:00
there*
eater well it atleast doesn't come from read_setup 22:01
timotimo OK, but it could be any of those 22:02
since all it does is stringify uv_strerror (or whatever) and sends it back to be thrown
eater should I add more info?
could be helpful in the future :)
timotimo we can make the error message better in rakudo itself, rather than in moar
i.e. take the \err we get passed in and concat like "failed to accept" or "failed to receive" or whatever 22:03
eater for rakudo listening is just 1 step
so I don't know if that works 22:04
it was listen_setup :)
timotimo right
these functions all belong to one of the nqp::async* functions 22:05
eater yeah but it's also one function there
even in Moar it is 22:06
:')
init, bind and listen
timotimo aren't there just a single error condition in each of those?
eater github.com/MoarVM/MoarVM/blob/mast...ket.c#L758 22:07
this looks like 3 conditions to me
I feel like I'm angering alot of people tho 22:09
writing C in Atom :'))
timotimo ooooh
you think we can use the , operator to have a "int stage" in there? 22:10
shall i try to write something up? 22:11
eater hmm
also an option
termbin.com/5o6x 22:12
this is my temp. hack :)
altho I don't know how I should do string concat :') 22:13
sprintf breaks everything apperantly
timotimo ah that's better
less terrible than my idea
sprintf should work there, just gotta have a Big Enough Bufferā„¢
eater ah! 22:14
I was using sprintf in PHP style
aka
timotimo ah, "returns the string for me"
eater sprintf(format, datas...) -> string
timotimo yeah, no :D 22:15
eater yep
would be too easy ofcourse :p
timotimo yeah, c isn't like that
eater anyway, I think the best solution would to puut the whole error handling in it's own function
and do all calls in seperate if's
timotimo you could do that, yeah 22:16
don't forget to MVMROOT the arguments to your function
eater "error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]" 22:19
WHY
timotimo because we have to support MSVC
eater refactors everything for 1 debug statement
timotimo feel free to put orphan { } in your code to make this work
eater ah! 22:20
I got my datas tho :)
"uv said: invalid argument, init return: 0, bind return: -22, listen return: 0"
timotimo would you look at that 22:22
eater I think I found the issue 22:25
it's trying to set tcp_nodelay :)
timotimo aha! 22:26
is there a different flag we'd have to set if they're unix domain sockets? 22:27
eater well
I'm currently looking if it's actually an issue
or
/the/ issue at hand
timotimo right
eater and tbh
timotimo did strace help? 22:28
eater maybe I should do so :')))
timotimo ooooh
nodelay isn't about async, it's about the algorithm to handle congestion and such
docs.libuv.org/en/v1.x/pipe.html - perhaps we have to use different functions actually 22:30
eater yea
I think so too
altho you can UDP or TCP over unix sockets
:'))
timotimo hmm, true
eater so don't know how they handle that
timotimo well, actually
the things are called SOCK_STREAM and SOCK_DGRAM 22:31
it doesn't explicitly mention TCP or UDP :P
eater oh
true!
timotimo also, this has windows support, too 22:32
eater yea
so
asyncsocketpipe.c? :D
timotimo hmm
i wonder if there's a lot more duplication if we split it compared to just handling the differences in asyncsocket.c
eater hmm 22:34
well I know at the start already if it's tcp or a pipe 22:35
timotimo yeah
eater and of most tcp functions are pipe counter parts
so I can make alot of ifs
timotimo and the read callbacks and such should all only handle stream_ parts of the api
on_connection, for example, only deals in stream functions 22:36
eater I can replace uv_tcp_t everywhere with uv_stream_T 22:37
and only cast when needded
timotimo right
22:48 ilbot3 joined
timotimo MasterDuke: the minimum size check can also move before the bit twiddling 22:49
i.e. putting the chaih of |= into an else { }
MasterDuke ah, nice 22:50
geekosaur TCP_NODELAY dsables the Nagle algorithm, which you typicaly want to do for interactive sessions (ssh) but not for other things 22:51
with it set, short packets will be less efficient but responsiveness will be better 22:52
for local sockets you don't care
so only use it for INET
timotimo MasterDuke: it's quite possible the c compiler would do that anyway :D
geekosaur they're not 22:53
TCP and UDP ride on top of IP
(likewise TCP6/UDP6 over IP6)
local sockets don't use ot ned them; you are either using stream or datagram behavior
*or need
eater yep :) I can call listeninfo->socket->type :)
that has the info I need
anyway, I'm drunk and it's 1am so I'm gonna sleep 22:54
thanks for the help timotimo <3
timotimo sure!
thanks for working on moarvm
23:11 lizmat joined 23:12 sivoais joined
MasterDuke jnthn, timotimo: ugexe++ just commented on github.com/MoarVM/MoarVM/pull/631, but i don't know the answer 23:36
timotimo hm, i didn't think of default-read-elems as a thing for async stuff 23:40
because we're not "pulling this much data", we're "being given up to this much data" 23:41
MasterDuke i don't think i've seen that variable before
timotimo appears in Handle.pm
23:48 ilmari[m] joined