github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
00:06 Kaiepi joined 00:34 sivoais left, sivoais joined 02:26 squashable6 left 02:30 squashable6 joined 05:40 Woodi left 05:41 domidumont joined 05:59 squashable6 left 06:02 squashable6 joined 06:05 Elronnd joined 06:18 brrt joined 06:23 Woodi joined 07:08 robertle joined 07:10 lizmat joined 07:20 lizmat left 08:25 zakharyas joined
brrt \o 08:45
nwc10 o/ 09:11
brrt our memmem.h header isn't really acting like a proper header 09:13
09:23 domidumont1 joined, brrt left 09:26 domidumont left
jnthn So while looking at the profile of a Rat benchmark, I spotted some easy wins (provided spectest agrees) in our bigint stuff 09:31
09:47 brrt joined
Geth MoarVM: c5545770ca | (Jonathan Worthington)++ | src/math/bigintops.c
Use mp_set_int when we know we have a 32-bit int

It's quite a lot cheaper.
09:49
MoarVM: aad6fdc0bd | (Jonathan Worthington)++ | 3 files
Greatly reduce allocation churn on bigint ops

When we get an operation where we have a big integer and a small big integer, we have to upgrade the small one to do the operation. We used to `malloc` and `mp_init` a fresh `mp_int` to use every time, and then
  `mp_clear` and `free` it at the end. We can get rid of this cost by
holding a few `mp_int` objects on the `ThreadContext` and re-using them for all of these operations. This shaves around a third off a Rat addition benchmark.
MoarVM/more-pea: 38 commits pushed by (Jonathan Worthington)++
review: github.com/MoarVM/MoarVM/compare/e...91f9c14d5d
10:01
timotimo whoa 10:17
10:17 sena_kun joined 10:26 klapperl left 10:33 klapperl joined
jnthn Might have found another 15% off that improved time too 10:37
brrt :-o 10:38
jnthn And that's before further improvements in the EA to devirt some more bigint operations :) 10:39
brrt apparently, memmem segv's if it's argument is null 10:40
... I think if the argument to MVM_VECTOR_CONTAINS is NULL, and MVM_VECTOR_SIZE is 0, then we should not crash 10:41
timotimo yeah
brrt which is what happens now 10:42
but in my patch it does crash
oh, hang on, I see the problem
Geth MoarVM: 2fe2f588fd | (Jonathan Worthington)++ | src/math/bigintops.c
Use a cheaper way to force_bigint

This does rely on mp_digit being at least 32-bit, but we rely on that elsewhere in multiple other places already. Gives ~15% off a Rat addition benchmark.
10:47
MoarVM/more-pea: 38 commits pushed by (Jonathan Worthington)++
review: github.com/MoarVM/MoarVM/compare/e...32709e8988
10:48
brrt - problem nr. 1: I used MVM_VECTOR_ELEM_SIZE for the haystack size, rather than MVM_VECTOR_SIZE
jnthn oops :) 10:49
brrt - problem nr. 2: memmem will treat the needle as a pointer to the data it'll compare, whereas the current implementation was looking for the needle-as-value (with ==) 10:52
so I need to pass &t rather than t 10:53
timotimo hm. are you sure that memmem is the right thing for MVM_VECTOR_CONTAINS though? since it'll advance by 1 byte during search, if i'm not mistaken? 10:57
brrt jnthn: gist.github.com/bdw/993188860117ef...f454ffaa1d
timotimo: is that so.... well... what's the alternative? 10:58
I thought memmem was optimized and stuff
timotimo it is optimized!
brrt but idk how :-D
timotimo we do have memmem32, which is a memmem that goes by 4 bytes instead of 1 bytes
brrt hmm 10:59
I'll leave that for you to discuss
timotimo the thing is
brrt is listening
timotimo the optimization of memmem relies on the needle being able to mismatch at a given position in an "interesting" way 11:00
i.e. if you have a needle "bbbbbbbb" and a haystack of "aaaaaaaaaaaabbbbbbbb" you can look at the 8th byte, see that it's a, and know that the match can't be at position 8 or earlier at all
but - unless i'm misunderstanding what VECTOR_CONTAINS is supposed to do - you're supposed to get back the index of an element 11:01
brrt ok.... well.. is there a default, library optimized, version of 'find this exact match at these offsets'
timotimo those don't overlap AFAIK?
11:01 zakharyas left
brrt no, MVM_VECTOR_SEARCH is supposed to return a pointer 11:01
timotimo right, a pointer to an element 11:02
brrt ok, my understanding is wrong, then
because I need to search only the boundaries
timotimo and since VECTOR_SEARCH is passing MVM_VECTOR_ELEM_SIZE, it can't be used to search for two consecutive elements either
brrt I think that's normal, though
but what I don't want, is a half-item 11:03
so memmem is, indeed, not what I want
timotimo that's just a peculiarity of the API we have
i mean the API you're offering
maximum shrug ;)
brrt :-)
timotimo i don't think you can get much faster than what the compiler turns MVM_VECTOR_CONTAINS into. that said
the previous version of MVM_VECTOR_CONTAINS uses ==, does that work properly with structs in all cases? 11:04
does it compile to something memcmp-like?
i sometimes get confused as to what you're allowed to do with structs when it comes to assignment and comparison
11:08 brrt left
jnthn lunch; bbl 11:10
jnthn back 11:54
timotimo can't wait to see where that "another 15%" could be from 11:59
Kaiepi the stuff that gets discussed here always seems like arcane wizardry 12:00
timotimo you can tell by the fact that i'm a part of it that this is all completely learnable
Kaiepi lol
i haven't really looked into moarvm much outside of bsd compatibility 12:01
jnthn timotimo: It was from the "use a cheaper way to force_bigint" 12:03
timotimo oh, i see
yeah, the commit message even mentions the 15% figure again
jnthn grmbl, I was going to make us devirt/avoid boxing with gcd_I, but turns out I've discovered a case where we're too conservative in BB merging... 12:05
And so refuse to do things that would be safe enough
timotimo aaw 12:16
but that also means when that is fixed, other optimizations might be able to step in in other, similar situations? 12:17
along with exprjit boundaries being disappered
jnthn Yeah
Hmmmm.
Implemented what I thought would be a solution and it does something awful.
12:23 Guest5089 joined
jnthn (It was 'cus I did something silly.) 12:29
12:37 zakharyas joined
Guest5089 hmm, SEGV in several spectests 12:49
12:52 brrt joined
Guest5089 perhaps that's what jnthn is grumbling about above 12:52
disappears with MVM_JIT_DISABLE=1 12:54
timotimo you're on the more-pea branch?
Guest5089 master
timotimo oh
this actually landed on master
Guest5089 do you think that was a mistake? 12:55
timotimo i don't think so, but if it segfaults, that's not so good
Guest5089 Thread 1 "moar" received signal SIGSEGV, Segmentation fault. 12:57
jnthn The bigint stuff went into master, since it's not related to PEA
Guest5089 0x00007ffff766e1cd in force_bigint (tc=0x557e4360, body=0x555556f65e80, idx=1) at src/math/bigintops.c:170
brrt timotimo: I think you're allowed to use '==' on structs 12:58
timotimo that's cool
so it surely even ignores padding bytes?
Guest5089 jnthn: try running t/spec/S32-temporal/DateTime.t
brrt not sure if it does ignore padding bytes, no 12:59
jnthn Guest5089: I had clean spectest
Distracted with some other things for now
Guest5089 that's odd, perhaps something is wrong with my install, hmm
timotimo can you find what exact access makes the thing go asplode? 13:00
Guest5089 sec
timotimo i have a dirty moarvm locally, so rebuilding would be quite a hassle
actually, i might have to go afk for a bit soon
Guest5089 timotimo: gist.github.com/dogbert17/83ae9222...873328c3bb 13:01
timotimo huh, how does that segfault? 13:05
13:05 MasterDuke left
timotimo the tc seems odd 13:05
Guest5089 indeed 13:06
timotimo i mean, it's the jit, so the stack gets a little screwed up 13:07
but still
Guest5089 (gdb) p *tc
Cannot access memory at address 0x557e4360
brrt that does $r14 say 13:08
Guest5089 sigh, valgrind seems to hide the problem 13:10
(gdb) p $r14 13:11
$1 = 93824994348208
timotimo m: say 93824994348208.base(16)
camelia 555555758CB0
timotimo that's the proper tc pointer
jnthn /* Call the bigint comparison function. */ 13:13
| mov ARG1, tc;
| mov ARG2, WORK[reg_b];
| mov ARG3, WORK[reg_c];
Should that tc not be TC ?
e.g. the register holding it?
brrt: Can you spot anything silly in gist.github.com/jnthn/faa05a408bd9...97740f4da0 ? 13:15
timotimo you're working directly with eax, ecx, edx? 13:16
jnthn Yeah. 'cus cdq and div work on specific registers 13:17
timotimo ah
jnthn TMP3 and TMP4 are r8 and r9, so no conflicts there
timotimo of course they do
right, TMP1 and TMP2 are rcx and rdx 13:18
jnthn Yeah, ain't using those :)
timotimo OK, that's fine at least
i assume the code above the last hunk is also fine, since calling MVM_bigint_fallback_mul probably already worked fine?
i wonder if | jo >1 is fine without braces around it 13:19
jnthn ~Yeah
timotimo good
good to hear dynasm does The Right Thing there
with regards to amount of statements
jnthn Wrapping it in braces doesn't help. no 13:20
timotimo OK
jnthn The erorr is utterly strange
timotimo i've seen worse back when i didn't know about floating point registers in the calling convention :P
jnthn Cannot invoke null object 13:21
at SETTING::src/core/Rat.pm6:49 (/home/jnthn/dev/MoarVM/install/bin/../share/perl6/runtime/CORE.setting.moarvm:DIVIDE_NUMBERS)
from SETTING::src/core/Rat.pm6:133 (/home/jnthn/dev/MoarVM/install/bin/../share/perl6/runtime/CORE.setting.moarvm:infix:<+>)
timotimo where sometimes it just so happened to have the right value in the register regardless
does turning off the exprjit make a difference btw?
jnthn No
timotimo OK 13:23
just to make sure there are no gremlins; make clean everywhere? rakudo does have some extops, but i don't think they access MVMThreadContext fields directly 13:30
brrt jnthn: will look 13:34
jnthn I've been putting jmp >1 at various places (that is, esape to the fallback)
brrt why `xor eax, edx` 13:35
and, it may be the case that the '4:' label is reused somewhere? 13:36
jnthn brrt: Technique taken from helloacm.com/optimized-abs-functio...-assembly/ 13:37
Do labels have to be numbers, or can they be names?
But yeah, I've been putting in a jmp >1 in various places (e.g. escape to the slow path)
brrt global labels can be names
what I typically do is put an 'int 3' and step through 13:38
but that works in lldb, not in gdb
jnthn And if I put it before the jnz >4 all is well
If put it after the jnz >4 (e.g. when we should have fallen out of the loop) I get the error
How do the label numbers actually work?
brrt luajitish.
but to be more precise, there's a 10-item array of offsets that are computed in one of the three dynasm phases 13:39
and updated
jnthn If I use the label 5 instead of 4 I get a SEGV
brrt Yes, I think I made label 5 special 13:40
jnthn 6 also segvs
timotimo 4 non-special labels should be enough for anybody
jnthn 7 too :P
Well, 4 is busted also ;)
timotimo d'oh :)
brrt hmm 13:41
jnthn Just differently
brrt :-D
jnthn But yeah, it seems to be something about the labels
brrt actually, I think label 4 should be safe to use
oh, hang on
this is a loop backwards, isn't it
jnthn I don't have to do anything special 'cus it's a back branch, do I?
Yes
brrt yes, you do 13:42
no, I think label 4 should be
corsix.github.io/dynasm-doc/instructions.html
under 'Jump Targets'
a backwards jump is spelled 'jmp <4' 13:43
jnthn oh!
brrt there's a mechanical (i'm going to avoid callit it 'sane') reason for that
jnthn And there we have it
Thanks, brrt :) 13:44
Thanks timotimo for ideas also :)
brrt :-)
timotimo glad to see it was figured out :)
jnthn The devirt of that is worth something; I didn't do the div_I yet so it's materializing it sharpish 13:45
Geth MoarVM/more-pea: f7c7db200a | (Jonathan Worthington)++ | src/spesh/pea.c
Avoid some spurious PEA bailouts

When we reach a merge point in the graph, if the allocation was only seen on one side and not the other, then it cannot possibly need a materialization on the side where it wasn't yet created. We don't insert such materializations yet, but rather bail out; still, fixing this means we bail out in some cases where we didn't need to do so.
13:53
jnthn That was the thing that I needed to do so we even attempted gcd_I devirt :) 13:54
Now, dare I attempt div_I... :) 13:55
hah, it won't help yet, for reasons 14:04
Geth MoarVM/more-pea: 0e2be43fc8 | (Jonathan Worthington)++ | src/spesh/facts.c
A couple more facts
14:05
14:30 domidumont joined 14:33 domidumont1 left 14:36 mornfall left 14:37 mornfall joined 14:38 lizmat joined 14:40 lizmat_ joined 14:43 lizmat left 14:47 lizmat_ left 14:49 robertle left 14:58 brrt left 15:23 domidumont left 15:55 zakharyas left
Geth MoarVM/more-pea: 5eff90b3ce | (Jonathan Worthington)++ | 10 files
Devirtualization and EA for further bigint ops

  * gcd_I, with the nice tight GCD algorithm inlined direclty into the
   machine code
  * cmp_I, eq_I, ne_I, lt_I, le_I, gt_i, and ge_I
15:55
16:27 lizmat joined
jnthn I think we do Rat + in about half the time we did before I started on this bigint EA work, plus my early bigint opts. 16:27
*earlier
And not finished yet :) 16:28
Though we do have a slight problem in that I was hoping entire Rat objects would go away, not just the calcs inside them
But it all goes rather awkwardly because of the Num failover
Or at least, that's what'll block it later 16:29
16:29 lizmat left
jnthn Still, if we can eliminate all the intermediate allocations during the calc, we're a long way ahead. 16:29
16:30 lizmat joined 16:55 robertle joined 16:56 domidumont joined
Geth MoarVM/more-pea: da088863a4 | (Jonathan Worthington)++ | 10 files
Devirtualize and EA for neg_I, abs_I
17:09
MoarVM/more-pea: 201e8e6b0b | (Jonathan Worthington)++ | src/spesh/pea.c
Log which bigint ops were optimized in EA
jnthn I guess the only common ops left to go are div and mod 17:11
But not today :)
17:13 domidumont left
Geth MoarVM: 248e2980af | (Jonathan Worthington)++ | src/jit/x64/emit.dasc
Fix wrong argument in JIT of cmp_I
17:39
jnthn That should fix the SEGV that was reported earlier
It's been wrong for ages, and nobody noticed until today when I used the tc argument :P 17:40
m: say 306 / 491 17:43
camelia 0.623218
jnthn Quite a lot less GC runs in that Rat benchmark now also :)
timotimo very nice 17:45
jnthn Next week I'll take a look at inserting materializations at merge points that require them, and also at fixing up the representation we produce to get rid of the broken SSA issues 17:48
And probably put in div_I and mod_I devirt/EA too, though if anyone wants to steal that part (or indeed any of the remaning operators) then go ahead. :) 17:49
Note that the branch isn't in a very good state in terms of stuff working :)
I'm getting towards the point where I've done the features I want in it, and then the focus will switch to fixing it up and preparing for merge.
There's at least one problem with deopt left. 17:50
17:50 zakharyas joined
jnthn Off to rest :) o/ 17:53
18:56 zakharyas left 19:31 lizmat left 19:41 scovit left 19:48 scovit joined 22:20 MasterDuke joined, MasterDuke left, MasterDuke joined 23:50 Elronnd left