Geth nqp: b8e416f365 | MasterDuke17++ (committed using GitHub Web editor) | docs/ops.markdown
Fix typo
00:18 lucasb left
Geth ¦ problem-solving: vrurg assigned to jnthn Issue Provide better interface for dispatchers 00:44
00:51 patrickz joined 00:53 ricky007 left 00:55 patrickb left
patrickz .tell rba I have a change in that you can pull at your convenience. 01:05
tellable6 patrickz, I'll pass your message to rba
01:06 patrickz left 01:07 rypervenche left
rba patrichz: got it. time is passing by very quickly tonight... 01:08
tellable6 2020-03-25T21:38:37Z #raku-dev <patrickb> rba I have a new rakubrew release: version 7. As always Can you upload?
2020-03-25T22:45:22Z #raku-dev <patrickb> rba I just uploaded a last minute bugfix release of rakubew: version 8. Can you upload both releases (7 and 8)? Same URL. Thank you!
01:10 rypervenche joined 02:38 upupbb-user3 left 03:03 upupbb-user3 joined 03:10 upupbb-user3 left 05:14 upupbb-user3 joined 06:00 upupbb-user3 left
lizmat Files=1306, Tests=111225, 212 wallclock secs (29.13 usr 8.10 sys + 2981.77 cusr 264.14 csys = 3283.14 CPU) 08:47
that's about 100 CPU seconds *more* than last spectest, and 7 wallclock secs more 08:48
is this the takenextdispatcher opt taken out?
if not, something else changed badly in the past 48 hours
nine No, that's not committed yet 08:49
lizmat let's see what test-t has to day
08:53 hungrydonkey joined
MasterDuke could be the new freeing of stuff. some were just during --full-cleanup, but others were just plain leaks 08:56
nine: btw, did you see any of my chat with timotimo last night? heaptrack says that one of the places where the most allocations happen when compiling rakudo is in asplice(), with the most calls happening in add() and write_string_heap() in QASTCompilerMAST.nqp 08:59
any ideas for how to reduce the number of allocations? 09:01
those two produce almost 600k calls to asplice() each, compared to 41k for the next biggest 09:02
nine I guess instead of an array of buffers, we could write them into one large buffer 09:06
09:10 upupbb-user3 joined
Geth nqp: 7cfcb66f09 | (Daniel Green)++ | src/vm/moar/QAST/QASTCompilerMAST.nqp
Use already calculated value instead of nqp::elems
nine It's hard to say if it does make a difference in actual runtime. If it does, it's quite small 09:22
MasterDuke oh, you just implemented it? 09:23
09:24 leont joined
MasterDuke i wasn't necessarily trying to improve runtime speed, but reduce the amount of memory used/allocated (though that may help speed). i.e., heaptrack reports 24gb allocated just when compiling CORE.c, with a peak RSS of 14gb 09:26
Geth nqp: 8170a23f51 | (Stefan Seifert)++ | src/vm/moar/QAST/QASTCompilerMAST.nqp
Save a bunch of allocations in MoarVM::StringHeap

All we did with the @!strings array was adding to it and collecting them into the bytecode buffer at the end of compilation. Instead of a list of buffers, we may as well collect them into one buffer that will later be written into the mbc buffer in one go. This way we don't have to allocate for every single string on the heap but can rely on the VM to grow the buffer in sensible steps.
nine MasterDuke: can you re-test with this? 09:34
timotimo nine:do you have a clue how well your "does this need utf8" check performs? 09:35
also, is nqp::getcp_s actually capable of outputting negative values for synthetics?
i thought synthetics were meant to never leak out of moarvm's internals 09:36
nine well they do in this case 09:38
I've had a clue as I did profile this code extensively. But I can't say for sure anymore :/ It's too long ago. I think it did still hurt 09:39
timotimo OK 09:40
i know liz has lots of experience rewriting raku code with nqp statements and ops to get better performance; i don't know how well it translates to actual nqp code since the semantics and performance characteristics differ quite a bit 09:41
nine Too bad nqp::encode assumes it ownes the target array and simple replaces the slots (and doesn't even free a previously existing one!). Otherwise we could encode into $!strings directly, saving a lot more allocations
timotimo i think i'll just look at spesh's output for the routine to see if i spot anything 09:42
lizmat so, I'm starting to lean towards not having a 2020.03 release 09:44
timotimo if it really turns out to take a chunk of compile time, a special "has any non-latin1/ascii codepoints" check could become a moarvm op
lizmat nqp::isascii 09:45
nine timotimo: I'd be disappointed if we couldn't compile that code to something that performs in the same ball park as such an op 09:46
timotimo at the very least we don't know how to eliminate bounds checks yet :)
lizmat an op could directly go at the internal representation? which would not be possible when written in nqp ?
or are you saying that nqp::getcp_s is doing exactly that ? 09:47
nine nqp::getcp_s is already just a call to MVM_string_get_grapheme_at which returns the internal int32 09:48
lizmat m: use nqp; my str $a = "\r\n"; dd nqp::getcp_s($a,0) # indeed
camelia -1
lizmat ok 09:49
09:50 travis-ci joined
travis-ci NQP build failed. Stefan Seifert 'Save a bunch of allocations in MoarVM::StringHeap 09:50
09:50 travis-ci left
lizmat restarted the one failing job 09:50
timotimo i haven't looked but i could imagine getcp_s will branch over and over on the string's storage type, which will be predicted very effectively, but that's still time wasted
nine True. It will also check every time that the index is within the string 09:52
lizmat nqp looks good, should I bump it to rakudo ?
nine I don't think this change is worth a bump by itself 09:55
lizmat oki 09:59
timotimo you know, a made-up character class could be used for the not-ascii / not-latin1 thing; we already have the ops for the mechanism
but nine was right, the code for the loop doesn't look all that bad 10:00
lizmat travis is green again 10:01
timotimo the add-string method doesn't inline StringHeap's add; a quick check for the presence of the string in it could be worth something 10:02
i.e. a very small method that can be inlined for sure
don't know how often it gets called in total 10:04
string heap had 26256 attempts for 3736 strings 10:08
^- that's in World.nqp
Stage mbc : string heap had 255978 attempts for 27911 strings 10:09
^- that's the core setting
m: say 27911 * 100 / 255978
camelia 10.9036714
timotimo so about 9 out of 10 will be rejected by the existskey check and quick lookup 10:10
Geth nqp: samcv++ created pull request #607:
Unmangle mast strings
nqp: c09765df9a | (Stefan Seifert)++ | src/vm/moar/QAST/QASTCompilerMAST.nqp
Split MoarVM::StringHeap into an inlineable add and slow add-internal

Method add was too large to be inlined but 9 out of 10 calls added already existing strings, i.e. don't need the large part of the method. Split the method into a quick check and one that actually adds a new string.
  timotimo++ for the idea!
timotimo i may want to point out that the factor isn't quite as big in smaller programs 10:15
lizmat m: dd "a" ... 10 # that feels, LTA 10:19
camelia Decrement out of range
in block <unit> at <tmp> line 1
nine But worst case we'll get about the same performance as before, won't we? add will be inlined and we perform one call to add-internal. If add doesn't get inlined, the code's not hot enough for that to make a difference
timotimo nine:quite possibly, yeah 10:20
nine timotimo: speaking about memory. How would I go on about finding out why moar-ha reports a constant ~40M heap size in a program that's several 100M RSS and clearly growing without bounds?
timotimo mhh, something is being allocated outside of what moar believes is its heap 10:21
that's either manual mallocs that moar hasn't a clue about, or some REPR that has a wrong or missing unmanaged_size function
lizmat something in libuv perhaps ?
or another library that MoarVM uses? 10:22
timotimo ah, i don't think i've got unmanaged size functions for any of the async task reprs
nine timotimo: well according to heaptrack about half of the growing memory is allocated by the gen2 allocator and the other half by the fixed size allocator 10:23
timotimo AsyncTask has another level of indirection with the ops
turning on the FSA debug gives precise stack traces, because it'll malloc for every allocation rather than every time a page happens to be filled up by an allocation 10:24
annoyingly - at least the last time i checked - there's no way to ask heaptrack to treat our own & special allocation functions as allocation functions to analyze 10:27
on the other hand, when gen2 grows without bound, that's more interesting; the heap analyzer - sort of by neccessity - works on the heap after everything unreachable has been tossed out 10:30
we never free completely empty gen2 pages currently 10:31
it's possible that every time more garbage than the last time is explicitly immediately allocated into the gen2, but that seems kind of unlikely?
but anything that's alive in the gen2 should definitely show up in the moar heapanalyzer 10:33
and of course we have different size classes in gen2 and the fsa, so it's possible but unlikely that objects being allocated are just so diverse in their size that every size bin gets at least one page? 10:34
nine The test program just does `for ^1000000 { }` with being Perl 5: `sub new { return bless {}; }`. Of course there's a bit of Inline::Perl5 code in between, but still it does the same thing over and over 10:38
10:51 Altai-man_ joined
Geth rakudo: a10140f189 | (Elizabeth Mattijsen)++ | src/core.c/Exception.pm6
Make error message a bit more readable
MasterDuke nine: small reduction in numbers. before: 27.2m calls to allocation functions, 29.97gb allocated, 14.0gb peak rss. after:26.6 calls to allocation functions, 23.5gb allocated, 13.8gb peak rss 11:51
but asplice pretty much drops off the map in heaptrack. was 4.3gb allocated, now 74.2kb 11:53
lizmat 25% less allocated seems like a nice win ? 11:54
MasterDuke oops, sorry, looked at the wrong thing. asplice goes from 4.3gb allocated to 4.0gb allocated
but now the number of calls to asplice is cut in half. only ~600k by add-internal(), instead of ~600k each by add() and write_string_heap() 12:00
nine Yeah, won't be able to get rid of those asplice calls unless we teach nqp::encode to append to an existing buffer 12:17
MasterDuke i noticed that too (encode not taking an existing buffer). think it should be just a change in the moarvm implementation? or a new flag? or even a new op? 12:20
12:25 hungrydonkey left
nine I think it could just be changed in MoarVM. We'd have to check if the array already has a slots allocated and copy the encoded string into that instead of using it as slots of the array 12:27
lizmat I'm looking in awe of what is being done here... but is this still to get the release out ? 12:28
or have we already decided that the 2020.03 release is a no-go ? 12:29
if the latter, I think we need to tell sena_kun
12:29 sena_kun joined
lizmat fwiw, I would be in favour of skipping a release 12:29
nine lizmat: this is just for fun. I've spent a lot of time on the whole takenextdispatcher situation yesterday and pretty much ran out of ideas on how to proceed. 12:30
12:30 Altai-man_ left 12:31 hungrydonkey joined
MasterDuke nine: what should body.start get set to in the existing buffer case? it's currently explicitly set to 0 12:38
timotimo MasterDuke:it'd be kept the same 12:39
in order to append to the end, only the amount has to change 12:40
which set_size_internal should do for us
MasterDuke so still set to 0? or unchanged from whatever it was in the existing buffer?
think encode should call set_size_internal() or just assume the receiving buffer is big enough? 12:41
timotimo steal some logic from whatever append does 12:44
i'm assuming by append we really do mean append, for encode i mean
it'd already be a small win if we can re-use an existing buffer for encode by overwriting the entirety of it
though we'll want to have something in place so that after encoding a really really long string we don't keep a huge buffer around for a couple hundred tiny strings 12:45
but appending an encoding result to the end of an existing buffer is what would make the string heap relatively un-allocaty 12:47
MasterDuke nqp: my $a :=; my $b := nqp::encode("abc", "utf8", $a); nqp::encode("123", "utf8", $b); say(nqp::elems($b)); say(nqp::decode($b, "utf8")) 12:50
camelia encode requires an empty array
at <tmp>:1 (<ephemeral file>:<mainline>)
from gen/moar/stage2/NQPHLL.nqp:1913 (/home/camelia/rakudo-m-inst-1/share/nqp/lib/NQPHLL.moarvm:eval)
from gen/moar/stage2/NQPHLL.nqp:2118 (/home/camelia/rakudo-m-inst-1/share/…
MasterDuke locally i get `6␤abc123` 12:51
timotimo that sounds like the right semantics
12:56 hungrydonkey left
MasterDuke i just added `if (((MVMArray *)buf)->body.slots.any) memmove(((MVMArray *)buf)->body.slots.i8 + ((MVMArray *)buf)->body.elems, (MVMint8 *)encoded, output_size);`. but won't that stomp over whatever random memory is after the slots? 12:57
timotimo yes, you will have to set its element count before you do that 12:58
MasterDuke riiiggghhhttt...forget i asked anything and just assume i did that already... 13:02
ugh, how do it get src/strings/ops.c to know about set_elems() in 6model/reprs/VMArray.h? 13:08
nine REPR(buf)->pos_funcs.set_elems 13:10
MasterDuke right! 13:12
timotimo reprconv.h also has MVM_repr_set_elems or something similar that does the lookup for you and is hopefully inlined 13:13
13:16 upupbb-user3 left
MasterDuke ok, it still needs some polishing, but let's try testing with add-internal(). so i'm thinking it's probably going to change to write directly to $!strings 13:20
oh. but the encoded size of the string is supposed to go before the string itself... 13:22
nine Yes, you'll have to skip that and fill it in later 13:27
MasterDuke hm. so `setelems($!strings, <current size + the size of a uint32>); encode(..., $!strings); setelems(<back to the end of the previous string>); write_uint32(<size of new string>); setelems(<back to end of $!strings>); write_uint8(0) while $pad--` ? 13:28
what's the size of a uint32 in elems? 1? 13:30
nine setelems, encode, nqp::writeuint($!strings, $pos-where-the-size-should-be)
MasterDuke ah, write_uint32_at 13:31
doh, the size is right there in the existing code. nine++
jnthn About the nextdispatcher thing: I'm getting the impression that it's tricky to back it out, and we need time to work out an optimization strategy for it (or maybe doing the whole dispatcher thing quite differently). I guess if we can't defer the nextdispatcher changes until the next release, then probably our only option is to scrub this one, because a release with such a huge performance regression is probably not helpful. 13:35
I'll have time to look at how we can better do the optimization of it next week.
(Though implementing it may be a bigger job.) 13:36
lizmat nine vrurg timotimo Kaiepi sena_kun comments?
MasterDuke cool. rakudo just built ok. doing a spectest and then i'll see what heaptrack has to say 13:38
sena_kun I am in favor of delaying the release. We usually pluck out some regressions (lizmat++) so the state of the current HEAD is rather clean than not (not taking the dispatcher issue, windows segfault etc situation in account).
timotimo my first thought is: if we take some pains to toss nextdispatcherfor or whatever exactly was new, and release, then people will get some bugfixes and speedups for upgrading, that they would otherwise not be able to have
but since i'm not going to put the work in, i can hardly demand someone do it :) 13:39
nine Whatever we do it will end up in quite a bit of work. So we need a good plan to avoid investing a lot of work in the wrong direction.
MasterDuke if we reasonably believe we can get things fixed+not slower by the time of next release i'm in favor of skipping this one. if not, maybe it's worth backing out the entirety of the new dispatcher work and doing a release sooner 13:41
Kaiepi i'm ok with delaying
MasterDuke spectest passed
nine Even backing out the changes is non-trivial
sena_kun We can invest time in doing even more fixes and speedups (or look at network stuff), and do even better release, than paying double the price. (and I am the person who claimed montly releases are important, you know)
s/montly/monthly/ 13:42
nine sena_kun: hey, I was not just claiming but massively arguing for monthly releases :) In this case I just don't see a way to have one
sena_kun :) 13:43
MasterDuke what about releasing from a commit right before the dispatcher merge? might be only a few changes/improvements, but it gets something out the door 13:44
sena_kun Actually, I see a way of simply taking some really old commit (before merges), maybe two weeks after 2020.02 and cherry-pick some commits there, but it'll be a minor one.
lizmat all of the work I did on CompUnit can easily wait another release 13:47
sena_kun: that actually sounds like a good idea 13:48
the native shaped array fix could go in
MasterDuke heaptrack reports pretty much everything the same 13:49
lizmat most of jnthn's commits in rakudo are ok I think
nine MasterDuke: but where are those asplice calls coming from now?
MasterDuke haven't checked that yet 13:50
lizmat sena_kun: if you have a proposal for a cut-off point, I would gladly supply the commit shas that I think could go in anyways
should be around 10 I think
MasterDuke hm, the 600k remaining after nine's last commits are gone 13:52
but heaptrack still says 4b due to asplice 13:53
guess i should record the diff of the old and new size requested
sena_kun lizmat, I am kind of reluctant to think about that, because I don't like history when one has `release-point ... a looooot of things ... release-announcement`. I am not sure what the risks are here. When I did a point release for musl it took some time and I had only a single commit to backport, but here we have much more. 13:54
lizmat ok, that's fine by me as well :-) 13:55
sena_kun Alas, I don't know where should I draw a line between "I'm lazy to do it after a very rough week" and "The profit is fixed and potential losses are any size we might be lucky/unlucky". 13:56
sena_kun tries to hack on openssl 13:57
nine Note that a newly created release branch with lots of cherry picked commits would have seen ~0 testing 13:58
lizmat thinks she's looking at a staring contest 14:00
jnthn Depends on what sena_kun thinks of it, but we could also - imagining we have a solution for the dispatcher thing within a week or so - consider a release in 2 weeks time (so make the April one a bit earlier).
sena_kun jnthn, I am considering this and it looks like a rather great option (imagine nobody will fall trying to have a solution for the dispatcher within a week or so time limit). 14:01
My stance on this is that as long as you don't increase number of regressed modules over time (i.e. blin && fix it once in a week or so), you can release any week if internals are good enough. 14:02
14:05 lucasb joined
MasterDuke ok, most asplice calls have an increase in ssize of 0 (79k calls), next most common is 256 and 128 with 5.8k and 4.2k calls respectively 14:08
but the biggest increases in ssize are 6037504 and 5844992, then 5844992 and 499712 and 389120 and 188416 14:09
ok, that 6m number is suspicious. i noticed heaptrack now reports 6.54mb leaked, up from 300k before 14:11
timotimo this is the "encode directly into an existing buffer" change?
MasterDuke yeah, plus changing how the string heap is created 14:12
should i gist some diffs?
nine might help
timotimo puts matchsticks between eyelids
MasterDuke 14:14
the heaptrack was taken with those fprintfs commented out 14:15
nine MasterDuke: need to free encoded after memmove 14:16
MasterDuke and after all that work i did finding leaks the past couple days... 14:19
ok, now numbers are the same, just without that 6mb leak 14:21
the big numbers are all due to assemble_to_file 14:28
14:28 Altai-man_ joined
timotimo right, it's one buffer that keeps growing 14:28
and in the case of the core setting it'll end up at about 15 megabyte
i haven't looked; does assemble_to_file concatenate buffers and then write the result to a file? because in that case the total buffer doesn't need to be built, it could just write every part of the whole with one write call 14:29
14:31 sena_kun left
MasterDuke 14:32
lizmat looks like lines 2648-2650 are superfluous ? 14:34
MasterDuke looks like yes. but it's probably faster to assemble one big buffer in memory and write it all at once than do many small writes. guess it depends on speed/memory use 14:35
^^^ was to timotimo 14:36
lizmat MasterDuke: yeah, got that :-)
wouldn't OS buffering take care of that for you ? 14:37
MasterDuke but yeah, those look unused...
possibly lots more syscalls though, which might be slower even if buffered 14:38
nine: did you do any benchmarking of the writing?
.oO( only 5 Rakuers missing until /r/rakulang turns into a teapot )
MasterDuke teapot? 14:40
lizmat :-) 14:41
MasterDuke ah, number of readers 14:43
lizmat :-) 14:47
nine MasterDuke: no, I didn't try without that big buffer 14:49
MasterDuke hm, might not get to it this evening, but maybe i'll try changing to write (at least somewhat) on the fly and see what the differences are 14:55
15:09 synthmeat left 15:32 synthmeat joined 16:16 cognominal joined 16:29 sena_kun joined 16:31 Altai-man_ left, sena_kun left 16:54 sena_kun joined
Geth rakudo: 4bbe308bef | (Ben Davies)++ | 3 files
Make a bunch of traits around 2% faster

The performance benefits of not creating any symbols for unused named parameters aren't huge, but they exist.
moritz nice trick! 17:44
Geth nqp: a10d8a394d | (Daniel Green)++ | src/vm/moar/QAST/QASTCompilerMAST.nqp
Remove some unused variables
nine MasterDuke: LOL...quite obvious, isn't it :) 18:09
MasterDuke lizmat++ for spotting it 18:10
vrurg Unfortunately, I'm late for the discussion. But I left some thoughts on what could be done for dispatchers in 18:13
Geth rakudo: a4fbbfa334 | (Ben Davies)++ | 4 files
Make a minor optimization to a bunch of STORE methods

Same method of optimization as in 4bbe308befcfddbef5b5ca1634b524ed673449db.
Kaiepi those are the two hot spots i can think of offhand where that optimization would be useful, there are probably more out there 18:24
18:28 Altai-man_ joined 18:31 sena_kun left 19:03 MasterDuke left 19:13 MasterDuke joined
Geth rakudo: 2f8538ed05 | (Ben Davies)++ | src/core.c/Parameter.pm6
Don't loop over @!named_names for *% parameters in Parameter.raku

  @!named_names should always be null for slurpy named parameters.
20:29 sena_kun joined 20:30 Altai-man_ left
Kaiepi jnthn, could you review ? 20:45
the pr you commented on a while ago is something i plan on revisiting later 20:46
jnthn Kaiepi: Will try and do it tomorrow. Does it make any backward-incompatible changes, or just aditions? 21:07
Kaiepi additions 21:08
should be backwards compatible 21:09
21:15 Kaiepi left 21:16 Kaiepi joined 21:17 Kaiepi left 21:18 Kaiepi joined 21:19 Kaiepi left 21:20 Kaiepi joined 21:21 Kaiepi left 21:31 Kaiepi joined 21:32 Kaiepi left, Kaiepi joined 21:35 Kaiepi left 21:38 Kaiepi joined 22:28 Altai-man_ joined 22:31 sena_kun left 22:35 lucasb left 22:36 leont left 22:55 Altai-man_ left 23:14 rba[m] joined 23:23 b2gills left 23:28 b2gills joined