dalek arVM: a4a5e60 | jnthn++ | src/ (3 files):
Mechanism for HLL handler of 'method not found'.
00:09
00:13 FROGGS_ joined
timotimo I'm no longer do hopeful for my sec comp approach tbh 00:35
I would have to fork libuv or trap and emulate system calls 00:36
it may actually be more sensible to try to make the white list compiler thing work instead
though I think I will keep the fork and communicate operations I made so far 00:37
diakopter timotimo: what is the goal of the effort? 00:40
timotimo running user supplied code without fear 00:41
and getting data out of the process easily 00:42
the whitelisting compiler thing would be extra beneficial in that it could work with all backends 00:57
jnthn I think we probably want to do it at VM level... 00:59
timotimo yeah :\
i could also combine a really strict selinux policy with a pretty lenient seccomp whitelist 01:00
though of course capsicum would be much more ideal
jnthn Time for some sleep here 01:06
'night o/
diakopter o/ 01:10
01:43 jnap joined 02:44 jnap joined 02:59 jnap joined 04:35 jnap joined 04:44 FROGGS joined 05:36 jnap joined 06:37 jnap joined 07:37 jnap joined 08:29 FROGGS joined 08:38 jnap joined 08:46 FROGGS joined 09:04 crab2313 joined 10:33 Mouq joined 10:40 jnap joined 11:40 jnap joined 12:08 FROGGS joined 12:26 odc joined
dalek arVM: dd7ddb4 | jnthn++ | src/core/ (3 files):
Factor out HLL symbol lookup, add to public API.
12:29
12:41 jnap joined 12:48 Mouq joined
timotimo do we already have a proper datastructure in place to handle freeing pointers of things that were allocated using malloc? 13:38
jnthn timotimo: Well, many things are attached to a garbage collectable object (so its gc_free or gc_cleanup does it) 13:40
timotimo er, i missed the key point :P
"in a separate thread"
how much percent of the whole run time are spent inside free()? you seemed to have a profile some time recently 13:41
13:42 jnap joined
jnthn Depends enormously on what you're profiling 13:42
timotimo well, that makes sense.
jnthn When I do something that creates loads of Ints, it's malloc/free of mp_int that dominates. 13:43
I've been pondering that one a bit.
timotimo dominates?
as in, on the top of the leaderboards?
jnthn The dominator is usually on top... 13:44
timotimo wow.
jnthn It's the biggest source of free/malloc, I mean. :)
timotimo oh
yeah, well ... :)
jnthn I had an idea on it though
We could define it as a union of mp_int and two int32s. 13:45
And set one of those int32s to MAX_VAL to indicate "this is not a bigint"
And use the other one to store the value
So, small numbers are just stored 13:46
timotimo that could conceivably help
how likely is it that MAX_VAL ends up there in a legit value?
jnthn Thing is, if we do math on them as 64-bit numbers, you can always check the result doesn't overflow.
Very, very unlikely given it'd need the pointer to have those bits set
timotimo right.
jnthn If we are careful and define the struct so that the MAX_VAL flag is overlapping the LSB of the pointer, then "no chance" in so far as it'd mean we got back non-aligned memory. 13:47
timotimo ah, yes, that'd be helpful
checking for a 64 bit overflow should be relatively cheap compared to even the tinyest amount of data retrieval from ram, no? 13:48
jnthn Well, thing is, what you're actually looking for is "is this bigger/smaller than a 32-bit number could hold"?
If so then promote to big int.
Well, it's a heck of a lot cheaper than a malloc 13:49
And yeah, those sorts of things pipeline pretty well
And probably we get a decent hit rate on branch prediction.
So the check may well be almost "free", like our write barriers hopefully almost are.
timotimo good point about the predictor 13:52
ttyl 13:56
jnthn o/ 13:58
13:58 krunen joined 14:20 FROGGS joined 14:43 jnap joined 15:06 V_S_C joined
V_S_C just noticed hoelzeo++ fixed dir() on moarvm, so tried Panda bootstrap afresh 15:10
jnthn wonders what the next failure is :)
V_S_C after ==> Fetching File::Find 15:12
moar.exe just keeps consuming CPU & getting more memory (gets memory very slowly, much unlike other processes) 15:13
jnthn hm
jnthn has no idea what that could be 15:14
I suspect somebody will debug it sooner or later and get it down to a smaller test case, though.
V_S_C @jnthn, after your last commit to nqp, nmake rakudo ended with unexpected version of QRegex.nqp, so I used the Jan release of rakudo instead of bleeding edge from git 15:17
I'll try again with updated rakudo
diakopter jnthn: why not always put the whole mp_int inline the bigint body? 15:18
jnthn diakopter: Hm, only immediate worry is how well it'd cope with moving...probably ok
diakopter [then make a new one to store mutations to] 15:19
jnthn diakopter: That sounds very do-able though...
diakopter HOW BIG IS IT
er
hpone
how big is it
jnthn Not sure right off... 15:20
diakopter guesses less than 100 bytes. I'm sure it mallocs on its own for its data array
jnthn typedef struct { int used, alloc, sign; mp_digit *dp;
} mp_int;
diakopter probably worth making a few sized pools for those dp arrays 15:22
jnthn Well, maybe, but I think the other plan I mentioned of not even doing bigints at all for thing that fit into 32-bit may be the bigger win.
We may be able to do both of these, of course... 15:23
All we need is a non-ambiguous way to know what we have
diakopter lots of checking, but probably yeah. it's what JS JITs do anyway
except they are tagged, so it only uses 4 bytes total 15:24
jnthn The common case of Perl 6 Int is not small
uh
The common case of Perl 6 Int is not big
diakopter (and 31 bits of int)
timotimo jnthn: i have an idea that may help a tiny bit
do we know in advance how much memory an mp_int is going to take when we build our P6Int object?
because then we could allocate the P6Int storage + the mp_int storage and only have to free 1 instead of 2 pointers 15:25
also, the values would always be close together which may help caching stuff
jnthn Well, but there's no separate malloc for the Int itself
timotimo there is not? 15:26
well, that's fine then
diakopter well theoretically you could even know how big mp_dogit *dp will be too
jnthn True
I think the union of mp_int * and two int32s I mentioned before may work out best overall, though. 15:27
Sure, we malloc an mp_int, BUT only when people have numbers that are actually big
If we union an mp_int and the two int32s, we can still make it work, but we make very Int pay the size cost
*every
diakopter so Int 15:28
timotimo how big is the size cost of using the union approach?
jnthn I pasted the struct above
It's 3 * 32-bit integers + 1 pointer
timotimo ah, there 15:29
jnthn So on 64-bit that's 3 * 4 + 4 (padding) + 8
diakopter jnthn: are you sure the bigint lib isn't already doing that optimization?
timotimo isn't using ints for used and alloc a bit excessive?
jnthn So 3 times the size if we assume "fits in int32" is the overwhelmingly common case.
timotimo: Not if you want to get really big. :) 15:30
timotimo who has that amount of memory?
jnthn NSA? :P
timotimo right.
wait, those ints are only 32 bits anyway?
jnthn
.oO( well, now there's this channel being monitored... )
15:31
timotimo: Yes
diakopter jnthn: are you sure the bigint lib isn't already doing that optimization?
timotimo and we pad to 64 bit boundaries?
on a 64 bit system, that is
jnthn diakopter: yes, trivially because it can't save us the cost of allocating its mp_int struct...
timotimo: Well, a pointer needs to be on an 8-byte boundary. 15:32
diakopter no I mean using an int32 only instead of alloc a mp_digit*dp array
timotimo ah. okay. since the two ints are followed by a pointer, we wouldn't lose anything to padding in that case
jnthn diakopter: If it is, then the data structure here sure isn't making it convenient for them to be...
diakopter why not 15:33
timotimo how does libtommath decide how much to allocate? is it double-the-amount-of-memory-each-time? in that case we could store the log2 of the allocated size and an int8 would suffice. the used amount wouldn't get better, though
jnthn diakopter: I'd expect to see a union somewhere in mp_int
diakopter maybe it's cheaper not to have a union and have a couple tracking firlds
need one for array size after all 15:34
in fact it does need all three of thosr 15:35
used alloc sign
tadzik I think I'll rewrite panda fetcher to use system's "cp" and "cp -r" where available
fetcher is so much pita
jnthn diakopter: I just read through the add operation and I see no evidence there of it doing the opt 15:36
diakopter but yeah maybe there would be a union instead of just reusing another field without another name
hm
jnthn tadzik: Won't that mean maintaining two codebases where one would do? 15:37
tadzik: two codepaths, I mean...
diakopter BuildUtils
er
UnixUtils
(haha)
tadzik jnthn: yep ;/ 15:39
V_S_C jnthn: I'll be gr8ful to tadzik++, coz I got started with PERL from Rakudo PERL6 onward 15:40
& module installer will give me focus
otherwise its more about staying out of way
I mean, there's nqp, then the build employs perl5 15:41
& more than 1 VMs
16:29 krunen joined 16:44 jnap joined
japhb Just backlogged -- given the struct, why can't you union the entire struct with 2-3 64 bit ints, and overlay e.g. the sign field with a flag to indicate one of the 64-bit ints can be used instead of the mp_digit*? That would avoid being limited to 32-bit small ints, greatly increasing the cases when the full beast can be avoided. Or am I misunderstanding the discussion? 16:46
jnthn japhb: The point of using 32-bit ints was you could do the math in 64-bit and have easy, portable overflow detection. 16:49
japhb OIC 16:50
That's the point I missed.
timotimo is going to tackle varint encoding now 16:51
jnthn varint? 16:55
timotimo variable-sized integers 17:01
for our serialization blob
jnthn oh!
yeah
timotimo :) 17:02
jnthn curry and beer and beer & 17:09
17:29 FROGGS_ joined
timotimo this varint type could even encode inf and -inf and NaN :P 17:31
because there's 9 different representations for the 0
and only one would ever be chosen 17:32
17:36 ggoebel1113 joined 17:45 jnap joined
timotimo wow, that was surprisingly easy to do in the end 17:47
japhb (surprisingly easy)++ 17:51
timotimo i still have to hook it up properly 17:58
it seems like we have some endianness trouble on moarvm 18:35
and the tarball is missing dyncall and libuv
18:43 krunen joined 18:46 jnap joined 19:21 raiph joined
raiph A redditor has two questions about moarvm performance: 19:23
Q1: Is the fast spectest time mainly a function of the fast startup time?
Q2: How does it currently compare to other VMs once the startup cost is amortized?
www.reddit.com/r/perl/comments/1w0e...rt/cey6l13 19:24
timotimo raiph: there's still the worst-possible-implementation for concatenation of strings, but i have some benchmarks that show moarvm beating parrot almost all the time and jvm sometimes; these benchmarks have startup time removed completely. 19:30
let me upload them somewhere
raiph awesome, thx (the ones that are currently at an ipv6 address only, right?) 19:31
timotimo yeah, but i'm going to remove the older moar from it 19:33
the one from before putting in inlining
when you post the benchmarks, make sure to point out this is on a -O1 -g3 moar vs a -O3 parrot
also, someone said that panda requires nativecall, that's not true 19:35
.o(building a new rakudo-parrot right now to make the benchmark results) 19:36
FROGGS we need nativecall for rakudo*, that is all 19:37
19:43 Mouq joined
raiph i'm wondering if I should share the link or tell folk to come to #moarvm (or #perl6) 19:43
timotimo ho-hum. 19:44
raiph: do you have a webspace where you could upload it?
raiph feather?
timotimo that'd be excellent
www.dropbox.com/s/r4lodw7eto6nker/...ds.tar.bz2
raiph cool 19:45
timotimo i should probably add perl5 to those to make our accomplishments look much less impressive 19:46
and also the NQPs
for that, let me get off my desktop and spend the rest of my day on my laptop while my desktop crunches benchmarks 19:48
20:37 krunen joined 20:47 jnap joined 21:00 flussence joined
[Coke] moar is at 28709 pass today 21:31
timotimo about 90 away from parrot, seems like
okay, i got the numbers finally :) 21:32
[Coke] parrot is at 28807 todya
r: say 28807-28709 21:33
camelia rakudo-parrot e51b6c, rakudo-jvm e51b6c, rakudo-moar e51b6c: OUTPUTĀ«98ā¤Ā»
timotimo raiph: are you still there? 21:38
can you upload another benchmark to the same directory and give us the links?
raiph sure
timotimo www.dropbox.com/s/py4ssvp41232beh/..._ever.html
i think you can just wget that directly onto the server the way it is 21:39
raiph: so, can has links? 21:45
raiph feather.perl6.nl/~raiph/25jan2014-b...kudos.html 21:47
timotimo++ 21:48
21:48 jnap joined
timotimo thanks for uploading these 21:48
21:50 ozmq joined
ozmq Benchmarks good ... my Int $i = 0; while ($i++ < 1000000) {}; 22:07
A million empty blocks does 170 million opcode dispatches in moar's interp.c loop. 170 ops per loop seems like quite a lot. 22:08
timotimo i'm the first one to admit that the microbenchmarks are practically meaningless 22:09
22:14 ozmq joined
japhb timotimo: Well, kinda. The microbenchmarks are like golfing a bug -- they help find how one small change (s:g/Int/int/, for instance, or using a different loop construct) can make a big performance difference. 22:16
Which often means the loser in that comparison has an implementation issue that is slowing it down more than it should be. 22:17
timotimo right
for example: why are the rakudos so much slower at doing empty loops than the nqps? :)
japhb And in that sense magnify problems that added together make a real benchmark like rc-forest-fire mysteriously slower or faster.
timotimo well, i suppose for a regular "for" loop, they have to go via the *Iter
is there a --tests-tagged for "only minibenchmarks" btw? 22:18
if i'm to do regular benchmarks, i may not want to do all the microbenchmarks
japhb Well, there's a limited set of non-micro-benchmarks, so I would probably just specify the explicit list of ones I want. But yes, micros and non-micros should be tagged. In fact, we should add a LOT of tags. 22:19
timotimo agreed.
japhb timotimo: Thanks for all your work on perl6-bench stuff, BTW. 22:20
timotimo sure 22:22
it's not been that much :P
japhb Converting Richards to idiomatic Perl 6 ought to be pretty easy. For the Perl 5 version, we need to decide whether to use "native" Perl 5 OO, or something like Moose or Moops. 22:24
I haven't kept up with the relative performance there. 22:25
22:25 ozmq joined
japhb nwc10: As the resident perl5 expert, do you happen to have any thoughts on making that test fair and useful for Perl 5 comparison? 22:26
22:32 ozmq joined
timotimo i must have done something terribly wrong 22:32
ah, i see 22:33
jnthn For those benchmarking or discussing performance, it might be worthwhile remembering that Moar doesn't do a shred of optimization of the bytecode its fed yet, let alone any kind of JIT compilation. 22:41
timotimo it seems like i didn't see correctly 22:42
i still get only one datapoint for nqp-parrot for parse-json
japhb jnthn: Do we even *want* it to make opt passes over the bytecode when in interpreted mode? That would kinda kill the point of mmap'ing the bytecode, and it's easier to write the optimizations in the code generators anyway, isn't it? 22:43
When going to JIT, I totally see optimizing the hell out of it.
jnthn japhb: It should certainly make opt passes on hot things; that's how runtime specialization works.
japhb jnthn: Oh, I think we were thinking in different directions.
I was thinking e.g. peephole optimizations. 22:44
jnthn japhb: Ah, those should be done earlier
japhb Agreed.
timotimo japhb: my feeble attempt to just put in || +@all_times < 3 into the timing loop failed to produce proper results :\
japhb timotimo: paste?
Or push to a branch? 22:45
timotimo pushed 22:47
does parrot have a peephole optimizer? 22:48
22:49 jnap joined
timotimo we would be implementing the peephole optimizer in nqp, right? not in c 22:51
jnthn timotimo: Given it's VM-specific, I could easily imagine it being done in src/mast/compiler.c or so 22:54
timotimo but then we have to implement it in c! :P 23:04
japhb timotimo: I think you wanted ... < 3 * $runs since each call to time_command performs $runs timings, and these get flattened into @all_times. 23:11
timotimo ooooh 23:13
that's a good point, thanks!
timotimo fixes 23:14
jnthn: i would have to give every operation on big integers a check if it's a mp_int or stored 64bit int, right? 23:38
and do manual overflow checking, too?
i'm not sure how to do the latter in C, actually
jnthn timotimo: Stored 32-bit int.
timotimo ah 23:39
jnthn timotimo: The point is to store 32-bit things specially, but do the math at 64-bit size. Then see if you can store the result in 32 bits or not - which is just a > and < check :)
timotimo ah 23:41
that seems easy enough 23:42
is that guaranteed to work?
i suppose for exponentiation i could still overflow it :)
jnthn Yes, you'll have to go on a case-by-case basis. 23:44
23:46 flussence joined 23:50 jnap joined