github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
00:01 tellable6 joined 00:18 anatofuz joined 00:28 Kaiepi left 00:32 Kaiepi joined 00:52 lucasb left 00:54 anatofuz left 00:55 anatofuz joined 00:56 anatofuz left 01:09 anatofuz joined 01:14 anatofuz left 01:23 Kaiepi left, Kaiepi joined 01:39 anatofuz joined 01:44 anatofuz left 01:45 anatofuz joined 01:49 anatofuz_ joined, anatofuz left 01:52 anatofuz_ left 01:53 anatofuz joined 01:58 anatofuz left, anatofuz_ joined 02:05 anatofuz_ left 02:07 anatofuz joined 02:12 anatofuz left 02:13 anatofuz joined 02:16 anatofuz left 02:17 anatofuz joined, anatofuz left 02:19 AlexDaniel` left 02:29 AlexDaniel` joined 02:39 anatofuz joined 02:40 anatofuz left, anatofuz joined 02:41 anatofuz left 02:44 anatofuz joined 02:50 tbrowder left, mtj left 02:56 tbrowder joined, mtj joined 03:03 anatofuz left, anatofuz joined 03:05 anatofuz left 03:06 squashable6 left 03:07 anatofuz joined 03:08 squashable6 joined 03:12 anatofuz left 03:13 anatofuz joined 03:18 anatofuz_ joined, tbrowder left, mtj left 03:21 anatofuz left 03:24 tbrowder joined, mtj joined 03:29 anatofuz_ left 03:31 anatofuz joined 03:37 anatofuz_ joined 03:40 anatofuz_ left 03:41 anatofuz left 03:44 anatofuz joined 03:48 anatofuz left 05:12 anatofuz joined 05:14 anatofuz left 05:16 anatofuz joined 05:30 anatofuz left 05:31 anatofuz joined 05:57 domidumont joined 06:11 anatofuz left
nine timotimo: yeah, already saw that 06:14
tellable6 2016-12-19T04:30:04-05:00 #moarvm <lizmat> niner: I've reverted f777352 so people can continue on, see backlog
nine 2016? That's a little late, isn't it? 06:16
06:17 anatofuz joined
nwc10 "T04:30:04-05:00" - that's "morning", isn't it? :-) 06:23
06:42 anatofuz left 06:58 anatofuz joined 07:09 anatofuz left 07:11 anatofuz joined 07:21 sena_kun joined 07:31 anatofuz left 07:48 anatofuz joined
lizmat wow, I have no idea what that was about: it's not a rakudo commit 08:28
tellable6 2019-09-17T02:47:40Z #perl6-dev <vrurg> lizmat I have fulfilled your request on R#3174. Could you re-review, pls? Besides, Dict is now implementable with just 8 lines of code. :)
synopsebot R#3174 [open]: github.com/rakudo/rakudo/pull/3174 [data types] Provide means of preserving the decontainerization of values on Hash -> Map coercion
lizmat .tell vrurg I have
tellable6 lizmat, I'll pass your message to vrurg
nwc10 lizmat: It's MoarVM commit 71bbf628ff3cf2821f3b11b363fa1105f5f3fed4 08:31
and then see cac2a57c045fade4695fa4eed350e4bb9ece0a8f 08:32
I decided to brute force look for it in several repositories
lizmat ok... I couldn't bring up the energy to do that this early in the morning (for me) 08:33
nwc10 it's more fun than work (currently)
please don't tell work
08:35 anatofuz left
lizmat won't 08:36
08:57 brrt joined 09:27 zakharyas joined 09:40 brrt left 10:29 sena_kun left 11:32 anatofuz joined, anatofuz left 11:33 anatofuz joined 12:09 sena_kun joined 12:12 sena_kun left
jnthn fwiw, am currently working on a new algorithm for choosing what to type-specialize on, so that when we have things like `push` and `ASSIGN-POS` that vary wildly in only one arg, we an still specialize on the others. 12:19
12:19 sena_kun joined 12:24 sena_kun left 12:29 sena_kun joined, anatofuz left 12:30 sena_kun left
jnthn m: say 1.276 / 2.581 12:40
camelia 0.494382
timotimo that's a nice number
jnthn Yes 12:41
It's for this script:
for ^10_000 {
for Int, Str, Num, Rat, Bool {
my @a = $_ xx 100
}
}
First number is with my changes to produce derived type specializations
Still need to tidy up, but that's quite an encouraging initial result
japhb No kidding! 12:42
tellable6 2019-04-04T18:06:39Z #perl6 <jmerelo> japh: that's rich :-)
japhb jnthn++
japhb has no idea what tellable6 is referring to 12:43
Oh, I see the problem! It thinks japh == japhb 12:44
12:44 anatofuz joined
japhb (We're totally different people.) 12:44
moritz that's what you want us to believe anyway! :D
japhb Well, I suppose we've probably never been in the same room at the same time, so I guess there's that to feed the conspiracy theorists .... 12:45
12:48 squashable6 left, anatofuz left
jnthn MoarVM panic: Spesh arg guard: unimplemented sparse guard case 12:49
ugh, guess I'll need to figure this one out
12:53 squashable6 joined 13:19 lucasb joined 13:25 AlexDaniel is now known as messagestoi, messagestoi is now known as AlexDaniel
jnthn The guard tree is always quite the headache. Having just about remembered how it works, I couldn't figure out how part of the code to update it functioned...turns out it functions *wrongly* in some cases. 13:47
Such that if we produce two specializations for types, and then a certain (untyped) specialization, then wehn we install the untyped one we make the typed ones unreachable
Meaning we lose the benefit of having done them 13:48
MasterDuke does that happen frequently? 13:49
timotimo ouch!
13:56 zakharyas left 13:57 sena_kun joined
jnthn MasterDuke: Well, enough that I can find examples of it in specializations produced while compiling NQP code quite easily 13:57
MasterDuke ouch
jnthn However, many hot things are monomorphic, and don't get bit by it 13:58
timotimo do you already have a visualizer for the guard trees? if not, want me to hack up a speshlog-to-graphviz script for you?
jnthn timotimo: There's a textual dump format which I can follow already :) 13:59
timotimo OK. i find it just a little hard to follow sometimes
but i probably also don't have to, so that's good
jnthn So does the code that updates it, apparently ;)
timotimo :)
MasterDuke timotimo: btw, did you ever figure out that call-site interning thing from the starts-with example the other day?
timotimo i tried to put an interning step into spesh, but it was doomed since the list of callsites in a CU is MVMCallsite* rather than MVMCallsite** 14:00
but there could be a different prepargs instruction that takes a literal pointer to an MVMCallsite
jnthn The PR that introduced all the | usage isn't good anyway, since it also breaks introspection 14:05
timotimo should be fine to only use it for interned callsites, since i think they do not move. i'd have to double-check that first
jnthn urgh, it's not obvious how to do this thing right at all... 14:09
timotimo is it actually important that the guard tree is append-only? 14:10
or add-only or whatever
would replacing it entirely and freeing at a safepoint be good enough?
jnthn We already do produce/install a new one, so in principle, we could totally reproduce it each time, yes
But even *then* I don't quite know how much that helps :) 14:11
Oh... 14:27
The tree isn't wrong after all
We visit the certain result node, but then record it as a fallback result
But continue looking at the others
timotimo ah 14:28
nine Trying to find out why Inline::Perl5 fails so hard at running in multiple threads, I've so far found 2 missing MVMROOTs (code_pair_configure_container_spec and MVM_6model_parametric_parameterize) and a dead lock (finish_parameterizing) 14:29
MasterDuke is it at all possible to statically find missing MVMROOTs? 14:30
nine MasterDuke: actually.....I'm working on a GCC plugin to do exactly that
MasterDuke oh, ++nine 14:31
timotimo great!
i wanted to figure out how clang static analyzer works
but i didn't get far at all
nine Well, there's a GCC plugin for writing GCC plugins in Python. That makes it much simpler 14:32
timotimo oh nice 14:33
when is inline.perl6 :) :)
ah, finally i remembered what i was thinking about as i was falling asleep
so, let's say CORE.c.setting has 85.6k times a 64bit signed int with value 0 in it, and 14k times value 1, 3.9k times value 2, and 1.7k times value 3 14:48
and let's say i special-case 64bit signed integers with one specific WHAT in the serialization, or even give it a slot to put "the default int" in there 14:49
would that be something?
let's see how often VM_INT crops up in serialization there 14:52
Kaiepi can someone review github.com/MoarVM/MoarVM/pull/1166 ? 14:53
nwc10 timotimo: I forget. Currently how is that 64bit signed int stored? and how large is it? 14:54
timotimo we have a variable-sized integer implementation in the serialization format
so the value would be just one byte
but there's also a reference to which STable the object should have 14:55
nwc10 yes, I remember that. Not sure what git blame says about it :-)
aha.
the STable
timotimo which stores, among other things, the bit size, and signedness
nwc10 IIRC the number of serialisation tags was less than about 32
timotimo and of course the type object itself
nwc10 so there were spare tags that could be used
but they are a finite resource, IIRC
it's maybe 4 years since I looked at this stuff 14:56
but I hope I'm asking the right questions
14:57 AlexDaniel left 14:58 AlexDaniel joined
timotimo VM_INT are much, much rarer in the core setting 14:58
14:58 AlexDaniel left, AlexDaniel joined, AlexDaniel left, AlexDaniel joined
timotimo could be that the right approach is to not have so many different integer objects in the first place; something must be bypassing the int cache 15:03
otherwise all the references would be to the same object for each value
15:03 AlexDaniel left
nwc10 I wasn't aware of the int cache. I think that that's a much better thing to figure out first 15:04
sorry if I'm not being that helpful at actually "doing"
15:05 AlexDaniel joined, AlexDaniel left, AlexDaniel joined
timotimo my brain doesn't work particularly well with "do important things first" 15:07
nwc10 I wouldn't have said that. I would have said instead that "hey, I could add more code to do this cool thing" is far more tempting than "I should go for a long, possibly fruitless trip trying to debug something strange" 15:09
because most humans prefer the first.
15:11 AlexDani` joined 15:12 Geth left, Geth joined 15:13 AlexDaniel left
timotimo i think the intcache construction function also wants an mvmroot in it 15:16
15:21 domidumont left 15:23 Geth left, Geth joined
timotimo ah, interesting. every p6opaque calls serialize on its flattened stables, that's where at least some set_int with non-intcached values come from 15:31
autobox_int seems to miss the int cache somehow 15:35
box_slurpy_named doesn't consult the int cache, and neither does result_set_int i think 15:42
looks like i'm getting no set_int with in-range values any more 15:45
1218559009 14288 -rw-r--r--. 1 timo timo 14630232 Sep 17 17:49 blib/CORE.c.setting.moarvm 15:50
1218559452 14288 -rw-r--r--. 1 timo timo 14630232 Sep 17 17:53 blib/CORE.c.setting.moarvm 15:53
huh.
don't know how to properly differentiate in "serialize" if it's as part of a P6opaque or a free-standing P6int object 16:00
global variables! 16:03
*no output* 16:07
nice
but that also means i might have misunderstood the output at the very beginning when i thought there were oodles of P6int objects free-standing in the serialized blob 16:08
that would explain why the file didn't change size at all
oh well. at least we'll now do less GC in many cases
Geth MoarVM/derived-specializations: 4 commits pushed by (Jonathan Worthington)++ 16:15
jnthn Turns out that the derived specializations thing I have is good at making some things worse, as well as some things better... 16:17
nine But why? 16:18
jnthn Mostly immediately, because it'll make only one specialization per callsite in the first batch of stats 16:22
Or at least, that's the case I see in the benchmark I have
I think probably it should make a pass to see if it can deal with everything as an observed type specialization, and only then apply the derived algorithm
While I tried to get the first to fall out of the second as a degenerate case 16:23
16:30 AlexDani` is now known as AlexDaniel, AlexDaniel left, AlexDaniel joined
timotimo i think the impact from my int cache additions/fixes are felt mostly in parsing/compilation maybe? 17:06
MasterDuke if it's a positive impact then that's a great place to feel it 17:09
17:10 robertle joined
timotimo i should count the number of GCs in setting compilation over a few runs, once with my patch, once without it 17:15
18:08 MasterDuke left 18:19 MasterDuke joined
timotimo m: say .sum / .elems given <1552 1539 1540 1545 1541 1540 1533 1539 1539 1546 1538> 18:22
camelia 1541.090909 18:23
MasterDuke if that's the average % speedup in parsing rakudo from your patch i'll buy you a whole case of beer 18:25
timotimo sorry to disappoint 18:26
that's just the number of GC runs without the patch
MasterDuke heh. i can dream...
timotimo the first value is already lower than the lowest from "without the patch" 18:27
at that amount, any improvement below 100 will not be felt very strongly i'm afraid 18:30
18:44 AlexDaniel left 18:45 AlexDaniel joined, AlexDaniel left, AlexDaniel joined
timotimo m: say .sum / .elems given <1529 1504 1516 1517 1504 1517 1529 1517 1529 1512 1516 1517 1529 1517 1517 1519 1512 1523 1517 1504 1520> 18:59
camelia 1517.380952
MasterDuke nice 19:00
timotimo m: say .sum / .elems given <60.562 59.698 58.751 61.310 58.739 65.250 66.078 65.651 64.736 60.581 61.384 64.785 65.736 65.861 66.420 61.923 66.317 65.960 65.940 65.179 65.983> 19:02
camelia 63.659238
timotimo m: say .sum / .elems given <63.266 64.417 64.230 64.489 63.768 63.854 65.326 65.366 64.261 64.285 64.564 65.502 64.626 64.003 64.952 64.830 64.434 64.717 65.065 65.389 63.673 64.322>
camelia 64.515409
timotimo not necessarily significant, the machine wasn't idle while it did the compilations 19:03
19:14 squashable6 left 19:15 squashable6 joined
timotimo but just under a second faster would certainly be welcome 19:18
MasterDuke nice
Geth MoarVM: e9e0731aca | (Timo Paulssen)++ | src/core/intcache.c
make sure intcache type is rooted
19:23
MoarVM: 55b6f73450 | (Timo Paulssen)++ | src/core/args.c
make sure autobox on return and slurpy nameds uses int cache
timotimo survived stresstest, so i think it's good
would you like to do some timing runs of core setting compilation, MasterDuke?
MasterDuke i was in the middle of attempting the truffle rebase, but i may have to put that on pause anyway 19:27
so yeah, i'll do a couple
timotimo oh, of course
only if it's convenient to do right now
19:43 domidumont joined 19:45 Ven`` joined
MasterDuke timotimo: how did you get the actual compiling command line to show? 19:49
timotimo Configure.pl with --no-silent-somethingsomething 19:50
20:00 domidumont left
MasterDuke m: say .sum / .elems given <37.664 37.086 37.301 37.570 37.559> # before 20:03
camelia 37.436
MasterDuke m: say .sum / .elems given <36.941 36.867 37.056 37.113> # after
camelia 36.99425
MasterDuke timotimo: ^^^ looks like half-a-second faster 20:05
timotimo oh, huh. 20:06
i guess that's nice
MasterDuke heh. probably take an hour or two off of Xliff's compile times
but yeah, any improvement is great 20:07
20:22 zakharyas joined, sena_kun left 20:36 zakharyas left 20:46 Geth_ joined, Geth left, robertle left 20:48 Geth joined, Geth_ left
timotimo right now i'm looking into saving a little space for VMArrays of objects where the STables are mostly the same 20:50
21:00 anatofuz joined
AlexDaniel .seen japh 21:01
tellable6 AlexDaniel, I saw japh 2019-04-04T03:54:49Z in #perl6: <japh> .tell jmerelo so I asked $wife if she could print the pdf on her laser printer at work, few hours later she called me up, freaking out, ""dude, did you ask me to print the whole internet?""
AlexDaniel japhb: ohā€¦ you'reā€¦ different people?
I was looking at undelivered messages and manually corrected it 21:02
nine: also that 2016 message was sent to ā€œninerā€
.tell niner foo
tellable6 AlexDaniel, I haven't seen niner around, did you mean nine?
AlexDaniel now it does that, so it won't happen again
.tell I accidentally delivered your message to japhb! Sorry! colabti.org/irclogger/irclogger_lo...09-17#l121 21:03
tellable6 AlexDaniel, I haven't seen I around, did you mean I_?
AlexDaniel .tell japh I accidentally delivered your message to japhb! Sorry! colabti.org/irclogger/irclogger_lo...09-17#l121
tellable6 AlexDaniel, I'll pass your message to japh
nine So it seems like the new mixin type object makes it into gen2 but then gets collected and freed and later reappears in a register 21:11
But this only ever happens when multiple threads are active
MasterDuke if perf is showing `merge_graphs` as the second-most expensive function after `MVM_interp_run`, what would i look for in the spesh logs to diagnose why? 21:13
timotimo merge_graphs is from inlining i believe 21:15
have a look at the times the individual spesh's take
m: say 14628272 * 100 / 14630232 21:16
camelia 99.98660308
timotimo m: say 14628272 - 14630232
camelia -1960
timotimo almost 2 kbytes saved by not writing the length of empty string/int/num BOOTArrays
er, not num i think?
but object
ooooh, there's loads of empty hashes tho 21:19
MasterDuke Spesh of 'lexical_vars_to_locals' (cuid: 62, file: gen/moar/Optimizer.nqp:712) 21:20
Specialization took 43835us (total 71072us)
JIT was not successful and compilation took 22303us
timotimo that seems slow-ish 21:21
m: say 14619432 * 100 / 14630232 21:23
camelia 99.9261803
timotimo m: say 14619432 - 14630232
camelia -10800
MasterDuke next slowest is 36111us, then 27799us 21:24
timotimo so how do these look? do they have big inlines? 21:27
MasterDuke 359 BBs 21:30
pages and pages of facts
timotimo so it has many, many registers and versions thereof 21:31
21:31 Geth_ joined
MasterDuke yep 21:31
though i'm getting a big feeling of deja vu. i'm looking at install-core-dist.p6 again, i think i've brought this exact stuff up before
21:31 Geth left
MasterDuke 164 spesh slots 21:32
r211(0): usages=0, flags=0 DeadWriter 21:33
a quick glance shows some registers with up to 54 versions
oops, 57 21:34
21:35 brrt joined
timotimo so, we serialize a whole load of hashes that share large sets of keys 21:40
maybe there's an opportunity there
gist.githubusercontent.com/timo/11...tfile1.txt 21:47
this shows keys of the hashes that are serializedin the core setting 21:48
MasterDuke definite dupes 21:49
timotimo if i read it correctly, the sum of the hash sizes is almost 470k?
there won't be a simple/sensible way to remove the values in there, but the keys could be deduped 21:50
so at most like a third of this could perhaps be made to go away
MasterDuke good deal 21:52
timotimo at the very most
will be just a little tricky to make work with lazy deserialization and such
looks like there's actually only 1761 distinct sets of keys out of 6946 hashes 22:02
so there'd already be a win without finding "the biggest set that is fully in our given list of keys so we only have to put what's left" 22:04
22:05 anatofuz left 22:17 brrt left
MasterDuke nice 22:21
btw, what does it mean that there are lots of registers/versions/spesh slots? is there some value that it's unusual to go above? 22:27
timotimo nah, just means the code that was inlined was big 22:28
big and complicated
if there's only a lookup array of four slots there could be 1884 + 1761 + 784 + 736 + 516 hashes that don't have their own key lists 22:30
m: say 1884 + 1761 + 784 + 736 + 516 22:31
camelia 5681
timotimo m: say (1884 + 1761 + 784 + 736 + 516) * 100 / 6945
camelia 81.799856
timotimo m: say 6945 - (1884 + 1761 + 784 + 736 + 516)
camelia 1264
MasterDuke hm, there are a bunch of things there are being inlined into that one function multiple times. does that mean anything? 22:33
timotimo not necessarily
MasterDuke e.g., Spesh of 'scope' (cuid: 108, file: gen/moar/stage2/QASTNode.nqp:601), Spesh of 'new' (cuid: 94, file: gen/moar/stage2/QASTNode.nqp:490)
timotimo how big are these?
MasterDuke in what way? 22:34
timotimo for example bytecode size
MasterDuke scope = Frame size: 154 bytes, new = Frame size: 206 bytes 22:35
timotimo i think that's not huge 22:36
22:37 anatofuz joined
jnthn iirc, it doesn't dedupe spesh slots on inline, and it adds registers per inline and there's nothing to compact those either 22:44
MasterDuke are any of those things easy? or, which would have the greatest impact? 22:48
jnthn Neither is easy 22:51
Spesh slot de-dupe because you have to know which things are possible to de-dupe without breaking things (sp_findmethod writes into spesh slots for caching, for example) 22:52
And we don't keep that information
(If if we did, it'd cost more memory to keep it, which we'd need all the time, not just when the frame is in use) 22:53
And the register thing because that's intimately tied with how we do deopt
timotimo a bitfield ought to be enough for that, but allocating memory for that will give a bit more overhead anyway 22:54
jnthn (It means we can just copy the chunk of the registers out into the new frame upon uninlining)
The register one is very likely the most worthwhile 22:56
timotimo aye, the size of frames is used to allocate frames, whereas the spesh slots are per spesh candidate 22:57
jnthn But you get to re-work deopt as part of the fun :)
And need new GC mapping info
etc.
So I think "desirable, but not easy" :)
Don't let me stop you trying, but I figure "you'll end up changing deopt and the GC" is the owed warning :) 22:58
MasterDuke that sounds like something i will cheer someone else on to do then
timotimo jnthn: how do you feel about refering to earlier hash objects in a serialized blob to share lists of keys between objects?
jnthn timotimo: Hmmm...scared, but I am of the serializer generally ;) 22:59
(How) would it interact with lazy deserialization?
timotimo it'd either have a reference to the object it steals keys from, and then demand_object-s it 23:00
or it would have the offset where the keys start and reads the keys itself
jnthn Does it even need that, or...right
How much does it cost in the serializer to figure out when it can use this?
And how much space do you think it'll save in the serialization output? 23:01
timotimo OK, so in the core setting there's 6945 hashes 23:02
1264 of those have "their own" set of keys, all others could share by only looking up to 8 hashes backwards
let me add up the sizes of these hashes real quick 23:03
jnthn Oh, a recent list...neat 23:04
That's also quite cheap
I guess for arrays of hashes that have the same shape this goes pretty well
timotimo when the number of keys doesn't match, it'll immediately know to fail 23:05
comparing the list of keys is really easy since the keys are already sorted anyway
though i'm not sure if it's worth trying to get rid of the sorting by using a hash for the hash
i guess we'd already have the hash objects in question around
oh, i need to row back a little on the results 23:09
1761 less hashes have shared keys 23:10
m: say 54161 + 5535 + 10777 + 4877
camelia 75350
timotimo this is the number of string references that would be possible to save 23:11
string references are either 2 or 4 bytes
but the first 0x7FFFFFFF strings are 2 bytes big 23:12
so 150 kbytes could be saved with this
jnthn Hm, that's something 23:15
timotimo m: say 75350 * 2 * 100 / 14619432
camelia 1.0308198
timotimo just one percent :[
jnthn That's including bytecode too, mind 23:21
timotimo yeah
do we know how big the share of bytecode is?
i mean the file says how big it is
btw, heap snapshot format version 3 requires zstd to be available; if moar doesn't find it during Configure.pl time it'll fall back to heap snapshot version 2 23:22
23:24 anatofuz left, Ven`` left, anatofuz joined
timotimo wondering whether the dependency should stay optional 23:28
23:29 anatofuz left 23:31 anatofuz joined
jnthn timotimo: Hmm, how different are they aside from the compression? 23:34
sleep, back tomorrow o/ 23:36
23:37 anatofuz left 23:38 anatofuz joined
Kaiepi can someone review github.com/MoarVM/MoarVM/pull/1166 tomorrow? idr expect it to be merged until after the next release, but i want to get the stuff that needs to be fixed out of the way before then 23:39
23:43 anatofuz left
timotimo quite a bit different 23:47
23:54 anatofuz joined