timotimo | that's right. | 00:00 | |
jnthn | And could easily make for internal frag. | ||
timotimo | sure. maybe we should build an mp_init that'll get the initial size passed to it | ||
aren't we treating mp_int as immutable anyway? | 00:01 | ||
timotimo goes to bed | 00:08 | ||
jnthn also | 00:14 | ||
00:59
jnap joined
01:23
jnap joined
02:59
flussence joined
03:31
flussence joined
04:40
cognominal joined
04:50
lue joined
05:20
colomon joined
06:38
cxreg joined
07:00
FROGGS joined
07:04
dagurval joined
|
|||
nwc10 | jnthn: looks like you got the one I was hitting | 07:26 | |
thanks | 07:27 | ||
FROGGS | ohh, clearly I need to backlog | 07:28 | |
good morning | |||
08:30
odc joined
|
|||
nwc10 | jnthn++ # better than 30Gb of RAM | 09:15 | |
jnthn | nwc10: yay | 09:18 | |
nwc10: Does that mean we can apply the refs_frames on STables fix? | |||
nwc10 | I believe so | 09:19 | |
works on my machine | |||
not sure how much it does/doesn't speed things up | |||
having trouble measuring it reliably | |||
but go for it, although the commit message I had about "previous commit" is now a bit wrong | |||
was going to try the big GC change now | |||
jnthn | It was on the mailing list, yes? | ||
nwc10 | but low on sleep (couldn't sleep) | ||
not sure :-) | |||
jnthn: was definately here: paste.scsys.co.uk/299937 | 09:20 | ||
but "previous commit" is actually bbf922e66fd7e71ef529f1624c5cdd57e3b6d90a | 09:21 | ||
jnthn | ah, didn't see it on the list | ||
Thanks | |||
nwc10 | and at least one of the ohters you found was being concealed by the bug | ||
jnthn | Ugh... | ||
nwc10 | I hadn't sent it yet, because I wasn't sure that it worked | ||
it does now | |||
yes, Ugh :-( | |||
the torture is good at finding missing temproots | |||
and reasonable at write barriers | |||
but its "memory" is limited | |||
jnthn was glad to not have to get up early today | 09:22 | ||
Well, just got a CORE.setting build in 60.5s, which I *think* may be lowest I've seen so far. | 09:28 | ||
nwc10 | one check on Linux suggested it used a tad less max memory for the setting | ||
jnthn | New best spectest time also... 296s. | 09:35 | |
nwc10 | OK, so it is a win. GOod | ||
how fast is Hello World now? :-) | |||
jnthn | C:\consulting\rakudo>timecmd perl6-m -e "say 'Hello, world'" | 09:38 | |
Hello, world | |||
command took 0:0:0.27 (0.27s tota | |||
C:\consulting\rakudo>timecmd perl -E "use Moose; say 'Hello, world'" | |||
Hello, world | |||
command took 0:0:0.22 (0.22s tota | |||
nwc10 | they still win. :-( | 09:39 | |
at least, "this week" | |||
dalek | arVM: bd1186f | nicholas++ | src/gc/roots.c: Should not use REPR() on an STable. The code was buggily assuming that every collectable was an Object when checking to see if Objects referenced frames. Not all collectables are Objects, but by chance of how Objects and STables are laid out in memory, REPR() on an STable would give a pointer to a sufficienty valid Object that the code didn't fail. However, a side effect of this was the code ends up thinking that every STable is an Object that references frames, and hence need to be tracked in gen2roots[]. This slows things down, and concealed several bugs, fixed in previous commits. |
09:40 | |
jnthn | I think the other thing you're working on will have more impact on performance. | ||
Since we can probably allocate around 10% more objects per GC run. | 09:41 | ||
And get better cache locality due to smaller objects. | |||
nwc10 | git submodules-- # really mess with my rebase workflow | 09:45 | |
MoarVM build time is still More Than Awesome | 09:54 | ||
jnthn: I should have found this one for you last night: paste.scsys.co.uk/300516 | 10:07 | ||
still not sure if memcpy() is the best idea - maybe an explicit loop? | |||
but that's micro-optimisign | |||
er, memset() | |||
jnthn | Yeah, go for clarity for now. Actually, the bytecode assembler is by a long way the fastest bit of compilation right now anyway. | 10:09 | |
So it really is micro. :) | |||
nwc10 | when I'm cleaning this code up, should I take out the assert()s? | 10:10 | |
jnthn | There's no cost to an optimized build to leave them in? | 10:11 | |
nwc10 | I think one might need to define -DNDEBUG to remove them | ||
jnthn | If you think they'll be helpful for future debugging, I don't mind them staying. | 10:12 | |
OK, if we do that to then they can stay. | |||
nwc10 | I think that they will | ||
jnthn | 24th | 10:26 | |
oops | |||
FROGGS | 25rd | 10:27 | |
masak | 26st | 10:50 | |
nwc10 | jnthn: incoming to the list. enjoy :-) | 10:52 | |
"this week" :-) | 10:56 | ||
dalek | MoarVM/nwc10/feature/gc-header-shrink: d713ab9 | nicholas++ | src/gc/collect.c: | 10:57 | |
MoarVM/nwc10/feature/gc-header-shrink: In process_worklist(), assert() that various assignments are unneeded. | |||
MoarVM/nwc10/feature/gc-header-shrink: | |||
MoarVM/nwc10/feature/gc-header-shrink: Also assert that if the pointer is in to-space already, it is never marked as | |||
MoarVM/nwc10/feature/gc-header-shrink: gen2. | |||
moritz | for the convenince of the moarvm hackers, I've imported all those patches into a branch | 10:58 | |
... and killed dalek :-) | |||
nwc10 | thanks | ||
10:58
dalek joined
|
|||
nwc10 | there were only 10 | 10:58 | |
moritz | dalek doesn't rate-limit | 10:59 | |
and freenode kicks :-) | |||
dalek | arVM: 96e503b | nicholas++ | src/mast/compiler.c: In form_string_heap(), ensure that the padding bytes are initialised. Whilst we never read this memory, it's useful to ensure it is initialised, otherwise valgrind (and similar tools) will rightly complain that we're writing garbage to disk. With this change, compiling the setting runs without error under valgrind. |
11:00 | |
FROGGS | nice! | 11:11 | |
nwc10++ | |||
jnthn | nwc10: Did you include a patch to add -DNODEBUG? | 11:13 | |
nwc10 | jnthn: no. | ||
can I delegate that to FROGGS? :-) | |||
FROGGS | wut? | 11:14 | |
so, every assert() should be #ifndef'd by NODEBUG ? | |||
is it common to use NODEBUG instead of the opposite? | 11:15 | ||
nwc10 | FROGGS: no, look at /usr/include/assert.h | 11:16 | |
jnthn | FROGGS: No, you just need to add the define | ||
nwc10 | if you define NDEBUG | ||
jnthn | I dunno how things are on MSVC by default | ||
nwc10 | not NODEBUG | ||
FROGGS | ahh | ||
jnthn | OK | ||
CORE.setting memory consumption is less :) | |||
nwc10 | that was the plan. | ||
"I love it when a plan comes together" | 11:17 | ||
FROGGS | and it gets built in like no time? | ||
nwc10 | (I'll skip the cigar. Could i just have a whisky instead?) | ||
but not right now as I feel a bit crap | |||
FROGGS | no cigar no whisky, sir | ||
jnthn | nwc10: And...you finally broke 1 minute CORE.setting build time on my box \o/ | ||
nwc10 | Awesome | ||
you'll have lots of time for bloggage :-) | |||
jnthn | And a little time off startup too | ||
Down to 0.24s. | |||
nwc10 | Moose still wins? | 11:18 | |
jnthn | By 0.02s. | ||
nwc10 | getting there. But would it be easier to send patches to knobble Moose? :-) | ||
don't tell anyone | |||
jnthn | lemme do a spectest time :) | ||
FROGGS | jnthn: what perl is that you're using for the moose test? | 11:19 | |
FROGGS guesses it is an ActiveState 5.14 or so | 11:20 | ||
jnthn | Another 7s off | 11:24 | |
289s. | |||
And it may be the asserts are running | |||
nwc10 | probably | ||
and those non-inline functions | |||
which could probably go | |||
jnthn | FROGGS: yeah, looks like | 11:25 | |
FROGGS | jnthn: before you blog about MoarVM beating Moose, let us compare the timings against a 5.18 or better, okay? :o) | 11:26 | |
jnthn | nwc10++ | ||
This is great work. | |||
Also I see 5MB more off memory use of a single Rakudo. | |||
FROGGS | and then again, our threads suck *g* | ||
FROGGS hides | |||
jnthn | FROGGS: We're not beating it yet. :) | ||
FROGGS: But yes, if we're going to say that it'll need careful analysis. | 11:27 | ||
FROGGS: What's the status on char name lookup stuff? You gave up debugging it for now? | |||
FROGGS | yes, I gave up | 11:28 | |
:/ | |||
maybe I can continue after a diakopterish brainstorming | |||
lunch & | |||
nwc10 | got through NQP on an x86 Debian system | 11:29 | |
so runs on both kinds of OS and both kinds of architecture :-) | |||
jnthn | ;) | ||
nwc10 | I don't know how good our "timing" setup is. | 11:38 | |
In that, objects are (I think) often larger than a CPU cache line | |||
and flags is accessed often, but serialisation context not so often | 11:39 | ||
so I wondered if putting that union at the start would help reduce L1 cache misses a tad | |||
jnthn | Hm, interesting thought... | 11:40 | |
nwc10 | x86 Linux max RSS for setting down by 6.8% | ||
jnthn | It'd be sort-of nice to do away with ->sc entirely | ||
I'm just not sure how best to do it. | 11:41 | ||
I mean, we'd have to maintain a huge lookup table somewhere. | |||
nwc10 | I don't know if it wins you anything | ||
jnthn | Could use a flag bit for "is it in an SC" I guess | ||
It wins you another pointer off every runtime-created object... | |||
nwc10 | I don't know if the forwarder can share memory with anything else, during GC rns | 11:42 | |
jnthn | Well, you evacuate the entire object before writing ->forwarder into the old one... | 11:45 | |
nwc10 | OK. You still need the falgs | ||
flags | |||
jnthn | Right | ||
But you can use the body | 11:46 | ||
nwc10 | so you're talking about re-using the space for the REPR or the STABLE | ||
or yes, the body | |||
jnthn | You know a collectable is either an object or an STable. | ||
So then there's always at least a pointer's worth in the body, even if it's a type object | |||
A full NQP build is now 50s and a Rakudo one 92s on my box. | 11:47 | ||
nwc10 | I seem to be fighting with rsync on the x86 machine | ||
(which is cpan.etla.org/ ) | |||
so I can't get an idea of CPU change | |||
jnthn | + MVM_CF_IN_GEN2_ROOT_LIST = 32, | 11:50 | |
+ | |||
+ /* GC has found this object to be live. */ | |||
+ MVM_CF_SECOND_GEN_LIVE = 64 | |||
Mighta been more consistent to name the second flag MVM_CF_GEN2_LIVE :) | |||
nwc10 | yes, I didn't think of that. | ||
nwc10 goes for noms | 11:51 | ||
jnthn | Really happy about the cleanup in + MVM_CF_IN_GEN2_ROOT_LIST = 32, | 11:52 | |
+ | |||
+ /* GC has found this object to be live. */ | |||
argh | |||
Really happy about the cleanup in 3ca1fcd74 also. | 11:53 | ||
timotimo | o/ | 12:10 | |
i see we're improving speed and memory usage today | |||
when exactly do we need to look at the sc anyway? only when we want to serialize out some stuff, right? | 12:21 | ||
jnthn | There's the SC write barriers too | 12:23 | |
timotimo | ah, that's for reposession? | ||
jnthn | But that doesn't need the SC itself | ||
At least, in the common "it's not in one" case | 12:24 | ||
timotimo: Correct | |||
timotimo | the most optimal thing we could do is to only store the sc in the objects if we know we're meaning to serialize out what we're building at the end :P | ||
jnthn | Anyway, I think that there's probably bigger wins to be had for now. | 12:32 | |
timotimo | probably | ||
timotimo casts Summon Bigger Fish | 12:33 | ||
jnthn needs to do a few more $dayjob things for a bit | 12:37 | ||
tadzik | timotimo: ask and you shall receive :P 3.bp.blogspot.com/_QcNJfTQTukE/TK-7...trade4.JPG | 12:38 | |
timotimo | hehe. | ||
nwc10 | jnthn: problem I see with "big hash" is that it's probably 3 pointers of storage for every SC pointer saved | 12:40 | |
best hack I could think of over lunch was | |||
0) wait until we have inline functions so that we can assert a bunch of sanity | |||
1) for nursery objects, store the SC in the pointer before the object. This avoids issues about moving | 12:41 | ||
2) for oversized objects, store the SC in the pointer before the object. This just makes life simpler | |||
3) for everything gen2, have two sets of storage, one for things with SC, one for things without | |||
oh, and typing this in, I guess | |||
"store the SC in the pointer before it" which is simpler than the plan I had | |||
but I'm not going to hack on this this month. Or maybe next. | 12:42 | ||
jnthn | *nod* | 12:46 | |
Yeah...it's trickier to get a win on this. | |||
nwc10 | but it feels like a hack | 12:52 | |
and there's probably cleaner low hanging fruit to be plucked | 12:53 | ||
jnthn | indeed | ||
Pro tip: before wondering why your benchmark is so slow, make sure it doesn't contain an infinite loop. | 14:14 | ||
Even the CLR doesn't do *those* fast... | |||
nwc10 | :-) | ||
14:19
jnap joined
|
|||
masak .oO( why does that joke never get old? because it never terminates! ) | 14:20 | ||
jnthn | Typically, I managed to get the infinite loop in the locking code, not the lock-free code... | 14:36 | |
moritz | lock-free xor loop-free | 14:39 | |
14:49
ggoebel1114 joined
|
|||
jnthn | walk & | 14:51 | |
timotimo | run & | 14:53 | |
tadzik | sleep & | 15:04 | |
FROGGS | work & | 15:05 | |
masak | eat, pray, love & | 15:11 | |
jnthn | .. | 15:22 | |
moritz | &&& | ||
jnthn now has beer in le fridge again :) | 15:24 | ||
nwc10 | yay! | 15:30 | |
I have me in the bed, because that feels the best place | |||
jnthn: had a thought on the way home - hiding the SC *before* the collectable doesn't work, as the code needs to march through the nursery based on size | |||
but hiding it at the end would work | |||
the complication is that you have to know at allocation time whether you need a space for a SC | 15:31 | ||
(could just arrange for everything in the nursery to have space for one, and tidy up at gen2 promotion) | |||
all feels like a hack | |||
improving code gen, unblocking Panda and unblocking Star seem to be more important | |||
$ | 15:32 | ||
jnthn | Yeah. If it's speed and memory wins we're after, the Int improvements are more worthwhile. | 15:34 | |
And yes, code-gen improvements. | 15:35 | ||
timotimo | lol, i jogg'd | 15:37 | |
FROGGS | *g* | 15:57 | |
16:20
colomon joined
17:03
rurban_ joined
17:46
cognominal joined
|
|||
dalek | arVM: 22cfff8 | nicholas++ | src/gc/collect.c: In process_worklist(), assert() that various assignments are unneeded. Also assert that if the pointer is in to-space already, it is never marked as gen2. e05bf62 | jnthn++ | build/setup.pm: Define NDEBUG in optimized builds. Means that we will not check assert()s introduced in recent commits in optimized builds. |
18:14 | |
18:14
dalek joined
|
|||
arVM: 0f2077b | jnthn++ | src/ (5 files): Rename a flag for consistency. |
18:18 | ||
jnthn | nwc10: Failed to find a way your work wasn't an improvement, so it's in. Also did the NDEBUG thing and the flag rename I mentioned earlier. | 18:20 | |
nwc10 | Cool, thanks for tidying it up | 18:26 | |
jnthn | Thanks for doing the hard work :) | 18:28 | |
nwc10 | thanks for the couple of key insights and fixes that got me unstuck | ||
18:32
FROGGS joined
|
|||
PerlJam idly wonders if TPF would sponsor work on moarvm via the hague grants | 18:35 | ||
(of course, that presumes also that there is some hague money left) | 18:36 | ||
19:04
camelia joined
|
|||
nwc10 | does anyone have a good idea of what things block Panda working on Moar? And what things block Star? | 19:17 | |
FROGGS | NativeCall blocks Star | ||
nwc10 | it's a bit frustrating that Moar isn't able to take its place alongside Parrot as a first class backend for "end users" | ||
FROGGS | and I'd guess that openpipe might block panda | 19:18 | |
nwc10 | ah | ||
and sockets? | |||
FROGGS | ohh, yeah | ||
jnthn | yeah, and sockets | 19:22 | |
And our handful of failing tests | 19:23 | ||
nwc10 | so it's not massive, but 3 parts are quite hard | 19:27 | |
and likely to scare peope away from attempting them | |||
FROGGS | I hope that openpipe will be done this weekend | 19:28 | |
nwc10 | and then Panda can be tested with local tarballs? | ||
FROGGS | I dunno | 19:29 | |
nwc10 | OK. I'll stop asking stupid questions :-) | ||
FROGGS | hehe | ||
:o) | |||
dalek | arVM/jnthn_bigint_opt: dd8c35e | jnthn++ | src/ (3 files): Start holding mp_int * in P6bigint. |
21:03 | |
timotimo | but i had code to do that, too | 21:04 | |
you didn't need to re-do that | |||
jnthn | oh... | ||
I...didn't think it was working? | 21:05 | ||
timotimo | too late now | ||
it works all right | |||
it's essentially the same thing you have now plus a bit of WIP stuff that doesn't work at all yet | |||
jnthn | oh... | ||
OK. | |||
jnthn is trying to do little steps towards working | |||
as in, working 32-bit storage | 21:06 | ||
timotimo | oh | 21:07 | |
i have that, too | |||
sorry about forgetting that | |||
er ... that is ... i have a union that does the thing on 32bit and 64bit systems and little/big endian | |||
jnthn | Yeah, I more meant actually storing stuff that way? | 21:08 | |
timotimo | yeah, that i do not have yet | 21:09 | |
jnthn | Anyway, gimme a bit more time...I'll see if I can get this working-ish. | ||
timotimo | it seemed like a huge blob of work without a good handle to pull at | ||
or a loose thread to start unraveling the whole thing at | |||
jnthn | Yeah, that's kinda why I've jumped in. I could see it was being a frustrating amount of effort, when I'd hoped it would be a relatively isolated and not horrible task. | 21:11 | |
dalek | arVM/wip-openpipe: a594b94 | (Tobias Leich)++ | src/io/procops.c: disable hacky :rp/:wp switch |
||
arVM/wip-openpipe: 03647f4 | (Tobias Leich)++ | src/io/procops.c: openpipe spawns using a shell |
|||
arVM/wip-openpipe: 9652087 | (Tobias Leich)++ | src/io/procops.c: disable ipc communication, this blows up on windows |
|||
arVM/wip-openpipe: e92e012 | (Tobias Leich)++ | src/ (2 files): use uv_process_close on windows (instead of waitpid) |
|||
FROGGS | C:\MoarVM>perl6-m -e "say +qx{dir}.lines" | 21:12 | |
39 | |||
$ perl6-m -e 'say +qqx{ls}.lines' | |||
49 | |||
jnthn | \o/ | 21:13 | |
FROGGS | jnthn: I decided to make qx{} work and care about IO::Pipe later | ||
jnthn | ok :) | ||
FROGGS | so we can only read its stdout atm | ||
I'll deal with upcoming issues when panda needs more functionality :o) | 21:14 | ||
timotimo | sounds great, though! | ||
FROGGS | would be nice if someone could review the branch in MoarVM and rakudo | 21:15 | |
github.com/MoarVM/MoarVM/compare/wip-openpipe | |||
github.com/rakudo/rakudo/compare/wip-openpipe | |||
I am too tired to spot any issues | 21:17 | ||
so everybody please be extra careful | 21:18 | ||
dalek | arVM/jnthn_bigint_opt: 4078a7c | jnthn++ | src/ (3 files): Introduce union for holding smallint in P6bigint. Not using the smallint part yet. |
21:37 | |
arVM/jnthn_bigint_opt: 8e9761e | jnthn++ | src/6model/reprs/P6bigint. (2 files): Teach a few things to handle smallint case. We never actually create that case yet, so really we can only test the checking code doesn't get false positives so far. |
|||
21:39
lizmat_ joined
|
|||
dalek | arVM/jnthn_bigint_opt: b930b29 | jnthn++ | src/math/bigintops.c: Make get_bigint complain if it sees a smallint. |
22:12 | |
arVM/jnthn_bigint_opt: 182960c | jnthn++ | src/6model/reprs/P6bigint.c: Start producing smallint for empty and box cases. At this point, everything fails in an orderly way, with "Incomplete smallint handling!". The task from here is just to fix things until we never see this error again. :-) |
|||
arVM/jnthn_bigint_opt: d1246e5 | jnthn++ | src/math/bigintops.c: Handle smallint in bigint to/from string. |
22:29 | ||
jnthn | That's the first 5 of 60-bigint.t passing again. :) | 22:30 | |
timotimo | i would love to continue your work now, but i've contracted quite a sizable headache today | 22:32 | |
jnthn | timotimo: I'll do a little bit more, then :) | ||
timotimo | so i'm taking the next tram home and getting a bunch of sleep | ||
jnthn | timotimo: Get well soon | ||
timotimo | i'll give it my all | 22:33 | |
actually i'll have to wait another 10 minutes | 22:34 | ||
so i'll let dalek make me happy with good news :3 | |||
dalek | arVM/jnthn_bigint_opt: 5e5e592 | jnthn++ | src/6model/reprs/P6bigint.h: Fix "fits in 32 bits" test. |
23:28 | |
arVM/jnthn_bigint_opt: 90160f3 | jnthn++ | src/math/bigintops.c: Add smallint handling to various basic ops. Gets us up to 8 passing tests in NQP's 60-bigint.t. Also adds much of the infrastructure we will need to do the rest. |
|||
jnthn | timotimo: Hopefully I've done enough so far to show a way forward. | 23:30 | |
timotimo: Some things we'll likely want to take the force_bigint approach on. | 23:31 | ||
timotimo | thank you, that seems like a good foundation to work with | 23:32 | |
i'll hit the hay now | |||
jnthn | Rest well, feel better. | 23:33 | |
timotimo | that's the plan, anyway :) | ||
jnthn | :) | ||
timotimo | how do you feel about using the mp_foobar_d forms if we have a big int and a smallint? | 23:34 | |
it *might* not be safe depending on the data format used for mp_digit, though | 23:35 | ||
but we might get around allocating a bigint in a bunch of cases where we do things with one big and one small int | |||
imagine a for loop that adds 1 in every iteration and starts in the big integer range, for example | 23:36 | ||
anyway. mandatory bedrest now :P | 23:37 | ||
jnthn | timotimo: We *could* but I thought the code was already involved enough with the two cases. | 23:40 | |
timotimo | that's true. there'd have to be measurements. | ||
jnthn | timotimo: And I think that case will be quite rare too | ||
timotimo | maybe it'd also be something worth considering in the specializer | 23:41 | |
if that's possible at all in a non-tracing jit type of deal | |||
jnthn | Doing a loop up to 2147483647 is rather large... | ||
timotimo | oh well. now that you phrase it that way :) | ||
timotimo disappears in a puff of optimism | |||
jnthn | o/ | 23:42 |