dalek | arVM: a4a5e60 | jnthn++ | src/ (3 files): Mechanism for HLL handler of 'method not found'. |
00:09 | |
00:13
FROGGS_ joined
|
|||
timotimo | I'm no longer do hopeful for my sec comp approach tbh | 00:35 | |
I would have to fork libuv or trap and emulate system calls | 00:36 | ||
it may actually be more sensible to try to make the white list compiler thing work instead | |||
though I think I will keep the fork and communicate operations I made so far | 00:37 | ||
diakopter | timotimo: what is the goal of the effort? | 00:40 | |
timotimo | running user supplied code without fear | 00:41 | |
and getting data out of the process easily | 00:42 | ||
the whitelisting compiler thing would be extra beneficial in that it could work with all backends | 00:57 | ||
jnthn | I think we probably want to do it at VM level... | 00:59 | |
timotimo | yeah :\ | ||
i could also combine a really strict selinux policy with a pretty lenient seccomp whitelist | 01:00 | ||
though of course capsicum would be much more ideal | |||
jnthn | Time for some sleep here | 01:06 | |
'night o/ | |||
diakopter | o/ | 01:10 | |
01:43
jnap joined
02:44
jnap joined
02:59
jnap joined
04:35
jnap joined
04:44
FROGGS joined
05:36
jnap joined
06:37
jnap joined
07:37
jnap joined
08:29
FROGGS joined
08:38
jnap joined
08:46
FROGGS joined
09:04
crab2313 joined
10:33
Mouq joined
10:40
jnap joined
11:40
jnap joined
12:08
FROGGS joined
12:26
odc joined
|
|||
dalek | arVM: dd7ddb4 | jnthn++ | src/core/ (3 files): Factor out HLL symbol lookup, add to public API. |
12:29 | |
12:41
jnap joined
12:48
Mouq joined
|
|||
timotimo | do we already have a proper datastructure in place to handle freeing pointers of things that were allocated using malloc? | 13:38 | |
jnthn | timotimo: Well, many things are attached to a garbage collectable object (so its gc_free or gc_cleanup does it) | 13:40 | |
timotimo | er, i missed the key point :P | ||
"in a separate thread" | |||
how much percent of the whole run time are spent inside free()? you seemed to have a profile some time recently | 13:41 | ||
13:42
jnap joined
|
|||
jnthn | Depends enormously on what you're profiling | 13:42 | |
timotimo | well, that makes sense. | ||
jnthn | When I do something that creates loads of Ints, it's malloc/free of mp_int that dominates. | 13:43 | |
I've been pondering that one a bit. | |||
timotimo | dominates? | ||
as in, on the top of the leaderboards? | |||
jnthn | The dominator is usually on top... | 13:44 | |
timotimo | wow. | ||
jnthn | It's the biggest source of free/malloc, I mean. :) | ||
timotimo | oh | ||
yeah, well ... :) | |||
jnthn | I had an idea on it though | ||
We could define it as a union of mp_int and two int32s. | 13:45 | ||
And set one of those int32s to MAX_VAL to indicate "this is not a bigint" | |||
And use the other one to store the value | |||
So, small numbers are just stored | 13:46 | ||
timotimo | that could conceivably help | ||
how likely is it that MAX_VAL ends up there in a legit value? | |||
jnthn | Thing is, if we do math on them as 64-bit numbers, you can always check the result doesn't overflow. | ||
Very, very unlikely given it'd need the pointer to have those bits set | |||
timotimo | right. | ||
jnthn | If we are careful and define the struct so that the MAX_VAL flag is overlapping the LSB of the pointer, then "no chance" in so far as it'd mean we got back non-aligned memory. | 13:47 | |
timotimo | ah, yes, that'd be helpful | ||
checking for a 64 bit overflow should be relatively cheap compared to even the tinyest amount of data retrieval from ram, no? | 13:48 | ||
jnthn | Well, thing is, what you're actually looking for is "is this bigger/smaller than a 32-bit number could hold"? | ||
If so then promote to big int. | |||
Well, it's a heck of a lot cheaper than a malloc | 13:49 | ||
And yeah, those sorts of things pipeline pretty well | |||
And probably we get a decent hit rate on branch prediction. | |||
So the check may well be almost "free", like our write barriers hopefully almost are. | |||
timotimo | good point about the predictor | 13:52 | |
ttyl | 13:56 | ||
jnthn | o/ | 13:58 | |
13:58
krunen joined
14:20
FROGGS joined
14:43
jnap joined
15:06
V_S_C joined
|
|||
V_S_C | just noticed hoelzeo++ fixed dir() on moarvm, so tried Panda bootstrap afresh | 15:10 | |
jnthn wonders what the next failure is :) | |||
V_S_C | after ==> Fetching File::Find | 15:12 | |
moar.exe just keeps consuming CPU & getting more memory (gets memory very slowly, much unlike other processes) | 15:13 | ||
jnthn | hm | ||
jnthn has no idea what that could be | 15:14 | ||
I suspect somebody will debug it sooner or later and get it down to a smaller test case, though. | |||
V_S_C | @jnthn, after your last commit to nqp, nmake rakudo ended with unexpected version of QRegex.nqp, so I used the Jan release of rakudo instead of bleeding edge from git | 15:17 | |
I'll try again with updated rakudo | |||
diakopter | jnthn: why not always put the whole mp_int inline the bigint body? | 15:18 | |
jnthn | diakopter: Hm, only immediate worry is how well it'd cope with moving...probably ok | ||
diakopter | [then make a new one to store mutations to] | 15:19 | |
jnthn | diakopter: That sounds very do-able though... | ||
diakopter | HOW BIG IS IT | ||
er | |||
hpone | |||
how big is it | |||
jnthn | Not sure right off... | 15:20 | |
diakopter guesses less than 100 bytes. I'm sure it mallocs on its own for its data array | |||
jnthn | typedef struct { int used, alloc, sign; mp_digit *dp; | ||
} mp_int; | |||
diakopter | probably worth making a few sized pools for those dp arrays | 15:22 | |
jnthn | Well, maybe, but I think the other plan I mentioned of not even doing bigints at all for thing that fit into 32-bit may be the bigger win. | ||
We may be able to do both of these, of course... | 15:23 | ||
All we need is a non-ambiguous way to know what we have | |||
diakopter | lots of checking, but probably yeah. it's what JS JITs do anyway | ||
except they are tagged, so it only uses 4 bytes total | 15:24 | ||
jnthn | The common case of Perl 6 Int is not small | ||
uh | |||
The common case of Perl 6 Int is not big | |||
diakopter | (and 31 bits of int) | ||
timotimo | jnthn: i have an idea that may help a tiny bit | ||
do we know in advance how much memory an mp_int is going to take when we build our P6Int object? | |||
because then we could allocate the P6Int storage + the mp_int storage and only have to free 1 instead of 2 pointers | 15:25 | ||
also, the values would always be close together which may help caching stuff | |||
jnthn | Well, but there's no separate malloc for the Int itself | ||
timotimo | there is not? | 15:26 | |
well, that's fine then | |||
diakopter | well theoretically you could even know how big mp_dogit *dp will be too | ||
jnthn | True | ||
I think the union of mp_int * and two int32s I mentioned before may work out best overall, though. | 15:27 | ||
Sure, we malloc an mp_int, BUT only when people have numbers that are actually big | |||
If we union an mp_int and the two int32s, we can still make it work, but we make very Int pay the size cost | |||
*every | |||
diakopter | so Int | 15:28 | |
timotimo | how big is the size cost of using the union approach? | ||
jnthn | I pasted the struct above | ||
It's 3 * 32-bit integers + 1 pointer | |||
timotimo | ah, there | 15:29 | |
jnthn | So on 64-bit that's 3 * 4 + 4 (padding) + 8 | ||
diakopter | jnthn: are you sure the bigint lib isn't already doing that optimization? | ||
timotimo | isn't using ints for used and alloc a bit excessive? | ||
jnthn | So 3 times the size if we assume "fits in int32" is the overwhelmingly common case. | ||
timotimo: Not if you want to get really big. :) | 15:30 | ||
timotimo | who has that amount of memory? | ||
jnthn | NSA? :P | ||
timotimo | right. | ||
wait, those ints are only 32 bits anyway? | |||
jnthn | .oO( well, now there's this channel being monitored... ) |
15:31 | |
timotimo: Yes | |||
diakopter | jnthn: are you sure the bigint lib isn't already doing that optimization? | ||
timotimo | and we pad to 64 bit boundaries? | ||
on a 64 bit system, that is | |||
jnthn | diakopter: yes, trivially because it can't save us the cost of allocating its mp_int struct... | ||
timotimo: Well, a pointer needs to be on an 8-byte boundary. | 15:32 | ||
diakopter | no I mean using an int32 only instead of alloc a mp_digit*dp array | ||
timotimo | ah. okay. since the two ints are followed by a pointer, we wouldn't lose anything to padding in that case | ||
jnthn | diakopter: If it is, then the data structure here sure isn't making it convenient for them to be... | ||
diakopter | why not | 15:33 | |
timotimo | how does libtommath decide how much to allocate? is it double-the-amount-of-memory-each-time? in that case we could store the log2 of the allocated size and an int8 would suffice. the used amount wouldn't get better, though | ||
jnthn | diakopter: I'd expect to see a union somewhere in mp_int | ||
diakopter | maybe it's cheaper not to have a union and have a couple tracking firlds | ||
need one for array size after all | 15:34 | ||
in fact it does need all three of thosr | 15:35 | ||
used alloc sign | |||
tadzik | I think I'll rewrite panda fetcher to use system's "cp" and "cp -r" where available | ||
fetcher is so much pita | |||
jnthn | diakopter: I just read through the add operation and I see no evidence there of it doing the opt | 15:36 | |
diakopter | but yeah maybe there would be a union instead of just reusing another field without another name | ||
hm | |||
jnthn | tadzik: Won't that mean maintaining two codebases where one would do? | 15:37 | |
tadzik: two codepaths, I mean... | |||
diakopter | BuildUtils | ||
er | |||
UnixUtils | |||
(haha) | |||
tadzik | jnthn: yep ;/ | 15:39 | |
V_S_C | jnthn: I'll be gr8ful to tadzik++, coz I got started with PERL from Rakudo PERL6 onward | 15:40 | |
& module installer will give me focus | |||
otherwise its more about staying out of way | |||
I mean, there's nqp, then the build employs perl5 | 15:41 | ||
& more than 1 VMs | |||
16:29
krunen joined
16:44
jnap joined
|
|||
japhb | Just backlogged -- given the struct, why can't you union the entire struct with 2-3 64 bit ints, and overlay e.g. the sign field with a flag to indicate one of the 64-bit ints can be used instead of the mp_digit*? That would avoid being limited to 32-bit small ints, greatly increasing the cases when the full beast can be avoided. Or am I misunderstanding the discussion? | 16:46 | |
jnthn | japhb: The point of using 32-bit ints was you could do the math in 64-bit and have easy, portable overflow detection. | 16:49 | |
japhb | OIC | 16:50 | |
That's the point I missed. | |||
timotimo is going to tackle varint encoding now | 16:51 | ||
jnthn | varint? | 16:55 | |
timotimo | variable-sized integers | 17:01 | |
for our serialization blob | |||
jnthn | oh! | ||
yeah | |||
timotimo | :) | 17:02 | |
jnthn | curry and beer and beer & | 17:09 | |
17:29
FROGGS_ joined
|
|||
timotimo | this varint type could even encode inf and -inf and NaN :P | 17:31 | |
because there's 9 different representations for the 0 | |||
and only one would ever be chosen | 17:32 | ||
17:36
ggoebel1113 joined
17:45
jnap joined
|
|||
timotimo | wow, that was surprisingly easy to do in the end | 17:47 | |
japhb | (surprisingly easy)++ | 17:51 | |
timotimo | i still have to hook it up properly | 17:58 | |
it seems like we have some endianness trouble on moarvm | 18:35 | ||
and the tarball is missing dyncall and libuv | |||
18:43
krunen joined
18:46
jnap joined
19:21
raiph joined
|
|||
raiph | A redditor has two questions about moarvm performance: | 19:23 | |
Q1: Is the fast spectest time mainly a function of the fast startup time? | |||
Q2: How does it currently compare to other VMs once the startup cost is amortized? | |||
www.reddit.com/r/perl/comments/1w0e...rt/cey6l13 | 19:24 | ||
timotimo | raiph: there's still the worst-possible-implementation for concatenation of strings, but i have some benchmarks that show moarvm beating parrot almost all the time and jvm sometimes; these benchmarks have startup time removed completely. | 19:30 | |
let me upload them somewhere | |||
raiph | awesome, thx (the ones that are currently at an ipv6 address only, right?) | 19:31 | |
timotimo | yeah, but i'm going to remove the older moar from it | 19:33 | |
the one from before putting in inlining | |||
when you post the benchmarks, make sure to point out this is on a -O1 -g3 moar vs a -O3 parrot | |||
also, someone said that panda requires nativecall, that's not true | 19:35 | ||
.o(building a new rakudo-parrot right now to make the benchmark results) | 19:36 | ||
FROGGS | we need nativecall for rakudo*, that is all | 19:37 | |
19:43
Mouq joined
|
|||
raiph | i'm wondering if I should share the link or tell folk to come to #moarvm (or #perl6) | 19:43 | |
timotimo | ho-hum. | 19:44 | |
raiph: do you have a webspace where you could upload it? | |||
raiph | feather? | ||
timotimo | that'd be excellent | ||
www.dropbox.com/s/r4lodw7eto6nker/...ds.tar.bz2 | |||
raiph | cool | 19:45 | |
timotimo | i should probably add perl5 to those to make our accomplishments look much less impressive | 19:46 | |
and also the NQPs | |||
for that, let me get off my desktop and spend the rest of my day on my laptop while my desktop crunches benchmarks | 19:48 | ||
20:37
krunen joined
20:47
jnap joined
21:00
flussence joined
|
|||
[Coke] | moar is at 28709 pass today | 21:31 | |
timotimo | about 90 away from parrot, seems like | ||
okay, i got the numbers finally :) | 21:32 | ||
[Coke] | parrot is at 28807 todya | ||
r: say 28807-28709 | 21:33 | ||
camelia | rakudo-parrot e51b6c, rakudo-jvm e51b6c, rakudo-moar e51b6c: OUTPUTĀ«98ā¤Ā» | ||
timotimo | raiph: are you still there? | 21:38 | |
can you upload another benchmark to the same directory and give us the links? | |||
raiph | sure | ||
timotimo | www.dropbox.com/s/py4ssvp41232beh/..._ever.html | ||
i think you can just wget that directly onto the server the way it is | 21:39 | ||
raiph: so, can has links? | 21:45 | ||
raiph | feather.perl6.nl/~raiph/25jan2014-b...kudos.html | 21:47 | |
timotimo++ | 21:48 | ||
21:48
jnap joined
|
|||
timotimo | thanks for uploading these | 21:48 | |
21:50
ozmq joined
|
|||
ozmq | Benchmarks good ... my Int $i = 0; while ($i++ < 1000000) {}; | 22:07 | |
A million empty blocks does 170 million opcode dispatches in moar's interp.c loop. 170 ops per loop seems like quite a lot. | 22:08 | ||
timotimo | i'm the first one to admit that the microbenchmarks are practically meaningless | 22:09 | |
22:14
ozmq joined
|
|||
japhb | timotimo: Well, kinda. The microbenchmarks are like golfing a bug -- they help find how one small change (s:g/Int/int/, for instance, or using a different loop construct) can make a big performance difference. | 22:16 | |
Which often means the loser in that comparison has an implementation issue that is slowing it down more than it should be. | 22:17 | ||
timotimo | right | ||
for example: why are the rakudos so much slower at doing empty loops than the nqps? :) | |||
japhb | And in that sense magnify problems that added together make a real benchmark like rc-forest-fire mysteriously slower or faster. | ||
timotimo | well, i suppose for a regular "for" loop, they have to go via the *Iter | ||
is there a --tests-tagged for "only minibenchmarks" btw? | 22:18 | ||
if i'm to do regular benchmarks, i may not want to do all the microbenchmarks | |||
japhb | Well, there's a limited set of non-micro-benchmarks, so I would probably just specify the explicit list of ones I want. But yes, micros and non-micros should be tagged. In fact, we should add a LOT of tags. | 22:19 | |
timotimo | agreed. | ||
japhb | timotimo: Thanks for all your work on perl6-bench stuff, BTW. | 22:20 | |
timotimo | sure | 22:22 | |
it's not been that much :P | |||
japhb | Converting Richards to idiomatic Perl 6 ought to be pretty easy. For the Perl 5 version, we need to decide whether to use "native" Perl 5 OO, or something like Moose or Moops. | 22:24 | |
I haven't kept up with the relative performance there. | 22:25 | ||
22:25
ozmq joined
|
|||
japhb | nwc10: As the resident perl5 expert, do you happen to have any thoughts on making that test fair and useful for Perl 5 comparison? | 22:26 | |
22:32
ozmq joined
|
|||
timotimo | i must have done something terribly wrong | 22:32 | |
ah, i see | 22:33 | ||
jnthn | For those benchmarking or discussing performance, it might be worthwhile remembering that Moar doesn't do a shred of optimization of the bytecode its fed yet, let alone any kind of JIT compilation. | 22:41 | |
timotimo | it seems like i didn't see correctly | 22:42 | |
i still get only one datapoint for nqp-parrot for parse-json | |||
japhb | jnthn: Do we even *want* it to make opt passes over the bytecode when in interpreted mode? That would kinda kill the point of mmap'ing the bytecode, and it's easier to write the optimizations in the code generators anyway, isn't it? | 22:43 | |
When going to JIT, I totally see optimizing the hell out of it. | |||
jnthn | japhb: It should certainly make opt passes on hot things; that's how runtime specialization works. | ||
japhb | jnthn: Oh, I think we were thinking in different directions. | ||
I was thinking e.g. peephole optimizations. | 22:44 | ||
jnthn | japhb: Ah, those should be done earlier | ||
japhb | Agreed. | ||
timotimo | japhb: my feeble attempt to just put in || +@all_times < 3 into the timing loop failed to produce proper results :\ | ||
japhb | timotimo: paste? | ||
Or push to a branch? | 22:45 | ||
timotimo | pushed | 22:47 | |
does parrot have a peephole optimizer? | 22:48 | ||
22:49
jnap joined
|
|||
timotimo | we would be implementing the peephole optimizer in nqp, right? not in c | 22:51 | |
jnthn | timotimo: Given it's VM-specific, I could easily imagine it being done in src/mast/compiler.c or so | 22:54 | |
timotimo | but then we have to implement it in c! :P | 23:04 | |
japhb | timotimo: I think you wanted ... < 3 * $runs since each call to time_command performs $runs timings, and these get flattened into @all_times. | 23:11 | |
timotimo | ooooh | 23:13 | |
that's a good point, thanks! | |||
timotimo fixes | 23:14 | ||
jnthn: i would have to give every operation on big integers a check if it's a mp_int or stored 64bit int, right? | 23:38 | ||
and do manual overflow checking, too? | |||
i'm not sure how to do the latter in C, actually | |||
jnthn | timotimo: Stored 32-bit int. | ||
timotimo | ah | 23:39 | |
jnthn | timotimo: The point is to store 32-bit things specially, but do the math at 64-bit size. Then see if you can store the result in 32 bits or not - which is just a > and < check :) | ||
timotimo | ah | 23:41 | |
that seems easy enough | 23:42 | ||
is that guaranteed to work? | |||
i suppose for exponentiation i could still overflow it :) | |||
jnthn | Yes, you'll have to go on a case-by-case basis. | 23:44 | |
23:46
flussence joined
23:50
jnap joined
|