timotimo that's right. 00:00
jnthn And could easily make for internal frag.
timotimo sure. maybe we should build an mp_init that'll get the initial size passed to it
aren't we treating mp_int as immutable anyway? 00:01
timotimo goes to bed 00:08
jnthn also 00:14
00:59 jnap joined 01:23 jnap joined 02:59 flussence joined 03:31 flussence joined 04:40 cognominal joined 04:50 lue joined 05:20 colomon joined 06:38 cxreg joined 07:00 FROGGS joined 07:04 dagurval joined
nwc10 jnthn: looks like you got the one I was hitting 07:26
thanks 07:27
FROGGS ohh, clearly I need to backlog 07:28
good morning
08:30 odc joined
nwc10 jnthn++ # better than 30Gb of RAM 09:15
jnthn nwc10: yay 09:18
nwc10: Does that mean we can apply the refs_frames on STables fix?
nwc10 I believe so 09:19
works on my machine
not sure how much it does/doesn't speed things up
having trouble measuring it reliably
but go for it, although the commit message I had about "previous commit" is now a bit wrong
was going to try the big GC change now
jnthn It was on the mailing list, yes?
nwc10 but low on sleep (couldn't sleep)
not sure :-)
jnthn: was definately here: paste.scsys.co.uk/299937 09:20
but "previous commit" is actually bbf922e66fd7e71ef529f1624c5cdd57e3b6d90a 09:21
jnthn ah, didn't see it on the list
Thanks
nwc10 and at least one of the ohters you found was being concealed by the bug
jnthn Ugh...
nwc10 I hadn't sent it yet, because I wasn't sure that it worked
it does now
yes, Ugh :-(
the torture is good at finding missing temproots
and reasonable at write barriers
but its "memory" is limited
jnthn was glad to not have to get up early today 09:22
Well, just got a CORE.setting build in 60.5s, which I *think* may be lowest I've seen so far. 09:28
nwc10 one check on Linux suggested it used a tad less max memory for the setting
jnthn New best spectest time also... 296s. 09:35
nwc10 OK, so it is a win. GOod
how fast is Hello World now? :-)
jnthn C:\consulting\rakudo>timecmd perl6-m -e "say 'Hello, world'" 09:38
Hello, world
command took 0:0:0.27 (0.27s tota
C:\consulting\rakudo>timecmd perl -E "use Moose; say 'Hello, world'"
Hello, world
command took 0:0:0.22 (0.22s tota
nwc10 they still win. :-( 09:39
at least, "this week"
dalek arVM: bd1186f | nicholas++ | src/gc/roots.c:
Should not use REPR() on an STable.

The code was buggily assuming that every collectable was an Object when checking to see if Objects referenced frames. Not all collectables are Objects, but by chance of how Objects and STables are laid out in memory, REPR() on an STable would give a pointer to a sufficienty valid Object that the code didn't fail. However, a side effect of this was the code ends up thinking that every STable is an Object that references frames, and hence need to be tracked in gen2roots[]. This slows things down, and concealed several bugs, fixed in previous commits.
09:40
jnthn I think the other thing you're working on will have more impact on performance.
Since we can probably allocate around 10% more objects per GC run. 09:41
And get better cache locality due to smaller objects.
nwc10 git submodules-- # really mess with my rebase workflow 09:45
MoarVM build time is still More Than Awesome 09:54
jnthn: I should have found this one for you last night: paste.scsys.co.uk/300516 10:07
still not sure if memcpy() is the best idea - maybe an explicit loop?
but that's micro-optimisign
er, memset()
jnthn Yeah, go for clarity for now. Actually, the bytecode assembler is by a long way the fastest bit of compilation right now anyway. 10:09
So it really is micro. :)
nwc10 when I'm cleaning this code up, should I take out the assert()s? 10:10
jnthn There's no cost to an optimized build to leave them in? 10:11
nwc10 I think one might need to define -DNDEBUG to remove them
jnthn If you think they'll be helpful for future debugging, I don't mind them staying. 10:12
OK, if we do that to then they can stay.
nwc10 I think that they will
jnthn 24th 10:26
oops
FROGGS 25rd 10:27
masak 26st 10:50
nwc10 jnthn: incoming to the list. enjoy :-) 10:52
"this week" :-) 10:56
dalek MoarVM/nwc10/feature/gc-header-shrink: d713ab9 | nicholas++ | src/gc/collect.c: 10:57
MoarVM/nwc10/feature/gc-header-shrink: In process_worklist(), assert() that various assignments are unneeded.
MoarVM/nwc10/feature/gc-header-shrink:
MoarVM/nwc10/feature/gc-header-shrink: Also assert that if the pointer is in to-space already, it is never marked as
MoarVM/nwc10/feature/gc-header-shrink: gen2.
moritz for the convenince of the moarvm hackers, I've imported all those patches into a branch 10:58
... and killed dalek :-)
nwc10 thanks
10:58 dalek joined
nwc10 there were only 10 10:58
moritz dalek doesn't rate-limit 10:59
and freenode kicks :-)
dalek arVM: 96e503b | nicholas++ | src/mast/compiler.c:
In form_string_heap(), ensure that the padding bytes are initialised.

Whilst we never read this memory, it's useful to ensure it is initialised, otherwise valgrind (and similar tools) will rightly complain that we're writing garbage to disk.
With this change, compiling the setting runs without error under valgrind.
11:00
FROGGS nice! 11:11
nwc10++
jnthn nwc10: Did you include a patch to add -DNODEBUG? 11:13
nwc10 jnthn: no.
can I delegate that to FROGGS? :-)
FROGGS wut? 11:14
so, every assert() should be #ifndef'd by NODEBUG ?
is it common to use NODEBUG instead of the opposite? 11:15
nwc10 FROGGS: no, look at /usr/include/assert.h 11:16
jnthn FROGGS: No, you just need to add the define
nwc10 if you define NDEBUG
jnthn I dunno how things are on MSVC by default
nwc10 not NODEBUG
FROGGS ahh
jnthn OK
CORE.setting memory consumption is less :)
nwc10 that was the plan.
"I love it when a plan comes together" 11:17
FROGGS and it gets built in like no time?
nwc10 (I'll skip the cigar. Could i just have a whisky instead?)
but not right now as I feel a bit crap
FROGGS no cigar no whisky, sir
jnthn nwc10: And...you finally broke 1 minute CORE.setting build time on my box \o/
nwc10 Awesome
you'll have lots of time for bloggage :-)
jnthn And a little time off startup too
Down to 0.24s.
nwc10 Moose still wins? 11:18
jnthn By 0.02s.
nwc10 getting there. But would it be easier to send patches to knobble Moose? :-)
don't tell anyone
jnthn lemme do a spectest time :)
FROGGS jnthn: what perl is that you're using for the moose test? 11:19
FROGGS guesses it is an ActiveState 5.14 or so 11:20
jnthn Another 7s off 11:24
289s.
And it may be the asserts are running
nwc10 probably
and those non-inline functions
which could probably go
jnthn FROGGS: yeah, looks like 11:25
FROGGS jnthn: before you blog about MoarVM beating Moose, let us compare the timings against a 5.18 or better, okay? :o) 11:26
jnthn nwc10++
This is great work.
Also I see 5MB more off memory use of a single Rakudo.
FROGGS and then again, our threads suck *g*
FROGGS hides
jnthn FROGGS: We're not beating it yet. :)
FROGGS: But yes, if we're going to say that it'll need careful analysis. 11:27
FROGGS: What's the status on char name lookup stuff? You gave up debugging it for now?
FROGGS yes, I gave up 11:28
:/
maybe I can continue after a diakopterish brainstorming
lunch &
nwc10 got through NQP on an x86 Debian system 11:29
so runs on both kinds of OS and both kinds of architecture :-)
jnthn ;)
nwc10 I don't know how good our "timing" setup is. 11:38
In that, objects are (I think) often larger than a CPU cache line
and flags is accessed often, but serialisation context not so often 11:39
so I wondered if putting that union at the start would help reduce L1 cache misses a tad
jnthn Hm, interesting thought... 11:40
nwc10 x86 Linux max RSS for setting down by 6.8%
jnthn It'd be sort-of nice to do away with ->sc entirely
I'm just not sure how best to do it. 11:41
I mean, we'd have to maintain a huge lookup table somewhere.
nwc10 I don't know if it wins you anything
jnthn Could use a flag bit for "is it in an SC" I guess
It wins you another pointer off every runtime-created object...
nwc10 I don't know if the forwarder can share memory with anything else, during GC rns 11:42
jnthn Well, you evacuate the entire object before writing ->forwarder into the old one... 11:45
nwc10 OK. You still need the falgs
flags
jnthn Right
But you can use the body 11:46
nwc10 so you're talking about re-using the space for the REPR or the STABLE
or yes, the body
jnthn You know a collectable is either an object or an STable.
So then there's always at least a pointer's worth in the body, even if it's a type object
A full NQP build is now 50s and a Rakudo one 92s on my box. 11:47
nwc10 I seem to be fighting with rsync on the x86 machine
(which is cpan.etla.org/ )
so I can't get an idea of CPU change
jnthn + MVM_CF_IN_GEN2_ROOT_LIST = 32, 11:50
+
+ /* GC has found this object to be live. */
+ MVM_CF_SECOND_GEN_LIVE = 64
Mighta been more consistent to name the second flag MVM_CF_GEN2_LIVE :)
nwc10 yes, I didn't think of that.
nwc10 goes for noms 11:51
jnthn Really happy about the cleanup in + MVM_CF_IN_GEN2_ROOT_LIST = 32, 11:52
+
+ /* GC has found this object to be live. */
argh
Really happy about the cleanup in 3ca1fcd74 also. 11:53
timotimo o/ 12:10
i see we're improving speed and memory usage today
when exactly do we need to look at the sc anyway? only when we want to serialize out some stuff, right? 12:21
jnthn There's the SC write barriers too 12:23
timotimo ah, that's for reposession?
jnthn But that doesn't need the SC itself
At least, in the common "it's not in one" case 12:24
timotimo: Correct
timotimo the most optimal thing we could do is to only store the sc in the objects if we know we're meaning to serialize out what we're building at the end :P
jnthn Anyway, I think that there's probably bigger wins to be had for now. 12:32
timotimo probably
timotimo casts Summon Bigger Fish 12:33
jnthn needs to do a few more $dayjob things for a bit 12:37
tadzik timotimo: ask and you shall receive :P 3.bp.blogspot.com/_QcNJfTQTukE/TK-7...trade4.JPG 12:38
timotimo hehe.
nwc10 jnthn: problem I see with "big hash" is that it's probably 3 pointers of storage for every SC pointer saved 12:40
best hack I could think of over lunch was
0) wait until we have inline functions so that we can assert a bunch of sanity
1) for nursery objects, store the SC in the pointer before the object. This avoids issues about moving 12:41
2) for oversized objects, store the SC in the pointer before the object. This just makes life simpler
3) for everything gen2, have two sets of storage, one for things with SC, one for things without
oh, and typing this in, I guess
"store the SC in the pointer before it" which is simpler than the plan I had
but I'm not going to hack on this this month. Or maybe next. 12:42
jnthn *nod* 12:46
Yeah...it's trickier to get a win on this.
nwc10 but it feels like a hack 12:52
and there's probably cleaner low hanging fruit to be plucked 12:53
jnthn indeed
Pro tip: before wondering why your benchmark is so slow, make sure it doesn't contain an infinite loop. 14:14
Even the CLR doesn't do *those* fast...
nwc10 :-)
14:19 jnap joined
masak .oO( why does that joke never get old? because it never terminates! ) 14:20
jnthn Typically, I managed to get the infinite loop in the locking code, not the lock-free code... 14:36
moritz lock-free xor loop-free 14:39
14:49 ggoebel1114 joined
jnthn walk & 14:51
timotimo run & 14:53
tadzik sleep & 15:04
FROGGS work & 15:05
masak eat, pray, love & 15:11
jnthn .. 15:22
moritz &&&
jnthn now has beer in le fridge again :) 15:24
nwc10 yay! 15:30
I have me in the bed, because that feels the best place
jnthn: had a thought on the way home - hiding the SC *before* the collectable doesn't work, as the code needs to march through the nursery based on size
but hiding it at the end would work
the complication is that you have to know at allocation time whether you need a space for a SC 15:31
(could just arrange for everything in the nursery to have space for one, and tidy up at gen2 promotion)
all feels like a hack
improving code gen, unblocking Panda and unblocking Star seem to be more important
$ 15:32
jnthn Yeah. If it's speed and memory wins we're after, the Int improvements are more worthwhile. 15:34
And yes, code-gen improvements. 15:35
timotimo lol, i jogg'd 15:37
FROGGS *g* 15:57
16:20 colomon joined 17:03 rurban_ joined 17:46 cognominal joined
dalek arVM: 22cfff8 | nicholas++ | src/gc/collect.c:
In process_worklist(), assert() that various assignments are unneeded.

Also assert that if the pointer is in to-space already, it is never marked as gen2. e05bf62 | jnthn++ | build/setup.pm: Define NDEBUG in optimized builds.
Means that we will not check assert()s introduced in recent commits in optimized builds.
18:14
18:14 dalek joined
arVM: 0f2077b | jnthn++ | src/ (5 files):
Rename a flag for consistency.
18:18
jnthn nwc10: Failed to find a way your work wasn't an improvement, so it's in. Also did the NDEBUG thing and the flag rename I mentioned earlier. 18:20
nwc10 Cool, thanks for tidying it up 18:26
jnthn Thanks for doing the hard work :) 18:28
nwc10 thanks for the couple of key insights and fixes that got me unstuck
18:32 FROGGS joined
PerlJam idly wonders if TPF would sponsor work on moarvm via the hague grants 18:35
(of course, that presumes also that there is some hague money left) 18:36
19:04 camelia joined
nwc10 does anyone have a good idea of what things block Panda working on Moar? And what things block Star? 19:17
FROGGS NativeCall blocks Star
nwc10 it's a bit frustrating that Moar isn't able to take its place alongside Parrot as a first class backend for "end users"
FROGGS and I'd guess that openpipe might block panda 19:18
nwc10 ah
and sockets?
FROGGS ohh, yeah
jnthn yeah, and sockets 19:22
And our handful of failing tests 19:23
nwc10 so it's not massive, but 3 parts are quite hard 19:27
and likely to scare peope away from attempting them
FROGGS I hope that openpipe will be done this weekend 19:28
nwc10 and then Panda can be tested with local tarballs?
FROGGS I dunno 19:29
nwc10 OK. I'll stop asking stupid questions :-)
FROGGS hehe
:o)
dalek arVM/jnthn_bigint_opt: dd8c35e | jnthn++ | src/ (3 files):
Start holding mp_int * in P6bigint.
21:03
timotimo but i had code to do that, too 21:04
you didn't need to re-do that
jnthn oh...
I...didn't think it was working? 21:05
timotimo too late now
it works all right
it's essentially the same thing you have now plus a bit of WIP stuff that doesn't work at all yet
jnthn oh...
OK.
jnthn is trying to do little steps towards working
as in, working 32-bit storage 21:06
timotimo oh 21:07
i have that, too
sorry about forgetting that
er ... that is ... i have a union that does the thing on 32bit and 64bit systems and little/big endian
jnthn Yeah, I more meant actually storing stuff that way? 21:08
timotimo yeah, that i do not have yet 21:09
jnthn Anyway, gimme a bit more time...I'll see if I can get this working-ish.
timotimo it seemed like a huge blob of work without a good handle to pull at
or a loose thread to start unraveling the whole thing at
jnthn Yeah, that's kinda why I've jumped in. I could see it was being a frustrating amount of effort, when I'd hoped it would be a relatively isolated and not horrible task. 21:11
dalek arVM/wip-openpipe: a594b94 | (Tobias Leich)++ | src/io/procops.c:
disable hacky :rp/:wp switch
arVM/wip-openpipe: 03647f4 | (Tobias Leich)++ | src/io/procops.c:
openpipe spawns using a shell
arVM/wip-openpipe: 9652087 | (Tobias Leich)++ | src/io/procops.c:
disable ipc communication, this blows up on windows
arVM/wip-openpipe: e92e012 | (Tobias Leich)++ | src/ (2 files):
use uv_process_close on windows (instead of waitpid)
FROGGS C:\MoarVM>perl6-m -e "say +qx{dir}.lines" 21:12
39
$ perl6-m -e 'say +qqx{ls}.lines'
49
jnthn \o/ 21:13
FROGGS jnthn: I decided to make qx{} work and care about IO::Pipe later
jnthn ok :)
FROGGS so we can only read its stdout atm
I'll deal with upcoming issues when panda needs more functionality :o) 21:14
timotimo sounds great, though!
FROGGS would be nice if someone could review the branch in MoarVM and rakudo 21:15
github.com/MoarVM/MoarVM/compare/wip-openpipe
github.com/rakudo/rakudo/compare/wip-openpipe
I am too tired to spot any issues 21:17
so everybody please be extra careful 21:18
dalek arVM/jnthn_bigint_opt: 4078a7c | jnthn++ | src/ (3 files):
Introduce union for holding smallint in P6bigint.

Not using the smallint part yet.
21:37
arVM/jnthn_bigint_opt: 8e9761e | jnthn++ | src/6model/reprs/P6bigint. (2 files):
Teach a few things to handle smallint case.

We never actually create that case yet, so really we can only test the checking code doesn't get false positives so far.
21:39 lizmat_ joined
dalek arVM/jnthn_bigint_opt: b930b29 | jnthn++ | src/math/bigintops.c:
Make get_bigint complain if it sees a smallint.
22:12
arVM/jnthn_bigint_opt: 182960c | jnthn++ | src/6model/reprs/P6bigint.c:
Start producing smallint for empty and box cases.

At this point, everything fails in an orderly way, with "Incomplete smallint handling!". The task from here is just to fix things until we never see this error again. :-)
arVM/jnthn_bigint_opt: d1246e5 | jnthn++ | src/math/bigintops.c:
Handle smallint in bigint to/from string.
22:29
jnthn That's the first 5 of 60-bigint.t passing again. :) 22:30
timotimo i would love to continue your work now, but i've contracted quite a sizable headache today 22:32
jnthn timotimo: I'll do a little bit more, then :)
timotimo so i'm taking the next tram home and getting a bunch of sleep
jnthn timotimo: Get well soon
timotimo i'll give it my all 22:33
actually i'll have to wait another 10 minutes 22:34
so i'll let dalek make me happy with good news :3
dalek arVM/jnthn_bigint_opt: 5e5e592 | jnthn++ | src/6model/reprs/P6bigint.h:
Fix "fits in 32 bits" test.
23:28
arVM/jnthn_bigint_opt: 90160f3 | jnthn++ | src/math/bigintops.c:
Add smallint handling to various basic ops.

Gets us up to 8 passing tests in NQP's 60-bigint.t. Also adds much of the infrastructure we will need to do the rest.
jnthn timotimo: Hopefully I've done enough so far to show a way forward. 23:30
timotimo: Some things we'll likely want to take the force_bigint approach on. 23:31
timotimo thank you, that seems like a good foundation to work with 23:32
i'll hit the hay now
jnthn Rest well, feel better. 23:33
timotimo that's the plan, anyway :)
jnthn :)
timotimo how do you feel about using the mp_foobar_d forms if we have a big int and a smallint? 23:34
it *might* not be safe depending on the data format used for mp_digit, though 23:35
but we might get around allocating a bigint in a bunch of cases where we do things with one big and one small int
imagine a for loop that adds 1 in every iteration and starts in the big integer range, for example 23:36
anyway. mandatory bedrest now :P 23:37
jnthn timotimo: We *could* but I thought the code was already involved enough with the two cases. 23:40
timotimo that's true. there'd have to be measurements.
jnthn timotimo: And I think that case will be quite rare too
timotimo maybe it'd also be something worth considering in the specializer 23:41
if that's possible at all in a non-tracing jit type of deal
jnthn Doing a loop up to 2147483647 is rather large...
timotimo oh well. now that you phrase it that way :)
timotimo disappears in a puff of optimism
jnthn o/ 23:42