|
Parrot 6.6.0 "Parrothead" | parrot.org/ | Log: irclog.perlgeek.de/parrot | #parrotsketch meeting Tuesday 19:30 UTC Set by moderator on 16 July 2014. |
|||
|
01:26
FROGGS_ joined
05:39
bighugedog joined
07:04
cooper joined
08:46
basiliscos joined
09:45
Timbus joined
10:04
bighugedog joined
13:22
rurban1 joined
13:58
bluescreen joined
16:48
Chirag joined
17:29
autark joined
17:36
rurban1 joined
17:56
Hunger joined
20:52
robertle joined
|
|||
| robertle | hi folks, I am currently trying to learn more about VMs, and have of course looked at parrot as well as lua as examples. I have written a very simple stack-based VM, and would like to try a register based one next as an exercise. | 20:54 | |
| now there is one thing I don't quite understand, and would be extremely happy if I could some input on | |||
| a) if you e.g. compare lua with parrot, then you will notice that in lua the registers are type-tagged, so you can store any type in any register. in parrot, the basic registers are typed. I can see the case for typed registers: you do not need to go through different code paths depending on the content of a register, you know from the opcode what's in them. so less branching. at the same time, you need more space for teh | 20:56 | ||
| I forgot what b) was about now :) | 20:57 | ||
| ah, obviously I am targeting a dynamically typed language as well, otherwise that would be stupid of course | 20:59 | ||
| rurban | imho lua's approach uses less space, so it's much faster in an intel cpu | 21:09 | |
| I also work on a lua-based vm: potion i.e. a lua with proper oo, traits | 21:10 | ||
| the branching is not so important if the paths are short. the cpu both basic blocks | 21:11 | ||
| the cpu fetches both | 21:12 | ||
| with int it's a simple shift, with strings a mask, with double a full qword, with other objects a mask | 21:13 | ||
| robertle | ah! I didn't know they can fetch both and then just skip some if required! | ||
| rurban | lua/potion: no type specific ops, most ops are methods calls, which are cached | 21:14 | |
| robertle | I thought essentially every branch is a potential branch miss pipeline stall unless the rpedictor can avoid that | ||
| rurban | no double arithmetic ops, all are method calls. arithmetic ops only for int, with a simple shift | ||
| it only stalls if it has to wait. e.g. on fpu or io or far away heap mem. | 21:15 | ||
| robertle | so type-tagged registers sound ok then! do you know what the reason for the fiexd-type registers in parrot is? | 21:16 | |
| rurban | the advantages for parrot are CPS: exceptions, concurrency | ||
| robertle | hmm, I thought that a branch misprediction would stall it as well. need to read up on that | 21:17 | |
| how do these special registers help with concurrency or exceptions? | |||
| rurban | and fixed type regs are also pretty fast. you just need a bunch of ops. as in java. | ||
| the registers are stored in the continuation object | |||
| no stack switching and stack adjustments necessary | 21:18 | ||
| robertle | ok, but why does it make a difference for that whether they are of a fixed type or of a variable type? | ||
| rurban | in lua/potion you have copy all stack values over manually with yield | ||
| not really | |||
| I prefer tagged types. but parrot went with fixed types | 21:19 | ||
| no substantial differences | |||
| fixed types allow full word ints, tagged only limited (as with a common lisp) | 21:20 | ||
| so you cannot store pointeres in ints, and perl5 demands that | |||
| robertle | because you need space for the tag? | ||
| rurban | however, pointers are always aligned | ||
| yes | |||
| so lua uses the last 2 alignment bits for the type | |||
| you can only store proper ptrs then, not pointers to the middle of some buffer/string. | 21:21 | ||
| robertle | ok, this is an interesting topic: assuming you want full-length contents and a tagged type, the main problem is that because of alignment etc, your tagged union gets quite big, right? | ||
| rurban | you cannot call it tagged then :) | ||
| it's a normal 2 word value struct then | |||
| lua/potion/common lisp is fast because it uses only one word | 21:22 | ||
| robertle | right, with the second being a union of possible types | ||
| rurban | perl/python/ruby are slow because their values are huge (3-10 words) | ||
| with perl having a particular unfortunate layout | 21:23 | ||
| robertle | so one of the things I would like to try is to basically still use a full type for the register, no lower-order bits as tags, but *not* keep them together quite so much. | ||
| rurban | not keeping together? sounds bad | 21:24 | |
| robertle | so a 32-wide register file would be 32*word-sized register + 32*nibble to denote type of that word | ||
| or so | |||
| rurban | the type word pointing a type vtable? (as in common lisp) | 21:25 | |
| robertle | right, the obvious problem would be cache effects, so you need to keep this small enough that it does not matter | ||
| rurban | you still need 1 bit or 2 for the GC | ||
| or a full word for the refcount | |||
| you can use the free bits in the type word for the gc. I guess ruby does it like this now | 21:26 | ||
| robertle | but that's part of the value, right? I mean you would only GC complex things like string, which have to be stored somewhere else and the "word" in the register would just point to it | ||
| rurban | yes, immediate values vs refs | 21:27 | |
| robertle | yeah, that sounds good. I was thinking a nibble per register for types, but I don't think I'll end up with 16 types... | ||
| rurban | 16 is pretty low. I have 25 so far | ||
| nope, 22 | |||
| robertle | so lets say in a primitve version you would have int, float, bool, nil, string, hash, vector, special | 21:28 | |
| rurban | github.com/perl11/p2/blob/p2/core/potion.h#L133 | ||
| robertle | first 4 being immediates | ||
| I'll check it out! | |||
| rurban | yes, 3-4 immediates + plus a couple of basic types, the rest can fit into a generic container with known size and a vtable pointer | 21:29 | |
| fperrad also has a luajit based vm, with perl6 support in the works. | 21:30 | ||
| github.com/fperrad/tvmjit/tree/master/src | |||
| robertle | one thing that I need to work out, and that I would be very glad to get ideas on, is how to structure the instruction set around the dynamic types | ||
| rurban | the compiler knows about immediates, the rest is done via method calls | 21:31 | |
| robertle | initiall I thought that many of the ops would behave differently based on the types in the reisters, with many of the options raising an exception | ||
| rurban | but some arith ops need to check the types at run-time, for undef, bigints, doubles, strings and such | 21:32 | |
| robertle | so e.g. a "multiply" would allow int and float in the input regs and store and int or float in the target, raise an exception on all other types | ||
| right, that's what I mean! so the question is do I *always* do runtime type check/branch, or do I write an instruction set which only works if the correct types are in the registers, and make the compiler guarantee that | 21:33 | ||
| rurban | yes. see e.g. github.com/perl11/p2/blob/p2/core/vm.c for the simple lua-alike version ()simplier than lua in fact) | ||
| robertle | the second sounds like less runtime overhead, but you need more opcodes | ||
| e.g. you need a separate opcode for float mult and int mult | |||
| with e.g. a byteode format like lua's, you don't get that many... | 21:34 | ||
| rurban | you can add a type inferencer (hm style) or gradual typing to restrict types to help the compiler | ||
| generally you have slower ops for the generic case | |||
| robertle | still: do I look at the type of the input registers when executing an opcode to determine what to do, or just at the opcode itself | 21:35 | |
| right, so an option would be to generally look at the input registers, which is slow but powerfull, and have a few fast ops that get executed often. | |||
| int++ or so | |||
| rurban | with arith binops I look at both types, and if both are immed. ints, do it fast (one cpu op), otherwise delegate to a method call | 21:36 | |
| github.com/perl11/p2/blob/p2/core/vm.c#L355 | 21:37 | ||
| with parrot it's similar, we just have seperate ops for int, double and objects | |||
| and the objects (integer, double, big, complex, ...) do their slow logic | |||
| robertle | right, when you say "I look at both types", you mean at opcode dispatch time? | 21:38 | |
| rurban | yes. the biggest trick is to be able to call methods fast, not so the ops | ||
| opcode dispatch is either in a byteocde vm (big switch) or if jitted it's already in the instruction cache linearily layed out | 21:39 | ||
| with parrot we look at 3-6 arg types I guess. if it's a multi method call | 21:40 | ||
| but the delegation is very slow then | |||
| it was much faster a few years back though, before multi dispatch support | 21:41 | ||
| robertle | what I was trying to do get the next instruction (which looks like lua in structure), mask off the opcode, determine the source registers and copy the relevant nibbles over. so I end up with one word that contains both opcode and source types | 21:43 | |
| and the do a computed goto with that | |||
| obviously there would be plenty of cases that do the same, whcih is just another goto | 21:44 | ||
| if I see it right this is the "big computed goto" option, the other one to just do ti on the opcode, and then do something similar inside each op handler to switch on the operand types | 21:45 | ||
| not sure what's better, nprobably needs experimentation | |||
| I need to look at your p2 in more detail, but since it seems to execute lua bytecode, what's the core difference | 21:48 | ||
| ? | |||
| I mean, obviously there is going to be lots of other bits like GC and IO | |||
| or concurrency | |||
| but I mean in the actual heart of the VM, the dispatcher? | |||
| anyway, thanks a lot, I'll do some more playing and will be back. off to bed now... | 21:54 | ||
| rurban | robertle: yes, that's ertl's idea for fast vm's. ops + args in one word | ||
| lua does it the same way | |||
| p2 is basically an improved lua, much simplier, pre-luajit | 21:55 | ||
| with proper oo system | |||
| www.jilp.org/vol5/v5paper12.pdf | 21:56 | ||
| robertle | eh, when you say "ops + args in one word", do you mean lua-style bytecode, or the "big computed goto" | 21:57 | |
| rurban | lua-style bytecode | 21:58 | |
| big computed goto, is the slow unjitted dispatch | |||
| the fast dispatch is the jit | |||
| See perl11.org/p2/ for the potion/p2 docs | 21:59 | ||
| robertle | right, but JIT aside the question is still whether you combine opcode and arg types into one big jump table, or do a computed goto based on opcode alone, and then something siomilar to sort out different behaviour based on the types | ||
| rurban | or better perl11.org/p2/html/ | ||
| robertle | ah, that explains a bit! I was confused looking at your code, expecting a 3-address machine like lua is, simply beacuse the opccodes are so similar. but this is 2-address? | 22:00 | |
| rurban | no, its the same 3 address machine | 22:01 | |
| robertle | dest = op src what | 22:02 | |
| what is an operand as well? | |||
| i mean, src and what are operands? | |||
| rurban | op + 2 operands, a bigger one and a smaller one | ||
| robertle | k | 22:03 | |
| argh, I *reallY* have to go to bed | |||
| thanks! | |||
| rurban | bye! | ||
| actually both operands use 12 bits: github.com/perl11/potion/blob/mast...odes.h#L20 | 22:04 | ||
|
23:29
dalek joined
|
|||