00:33 eternaleye joined
TimToady I'd think hash functions would be a good use for polymorphism based on the storage type 00:47
if you're using conditionals for that, it's kinda smelly 00:48
but maybe the rules are different for in-lined stuff... 00:49
00:50 jnap joined
TimToady (just spouting off, haven't actually looked at what you're doing :) 00:50
01:30 lizmat_ joined 04:45 lizmat joined 05:50 woolfy joined 06:07 woolfy joined, woolfy left, colomon joined 06:58 zakharyas joined
timotimo it's kind of hard to do since the uthash is implemented with c preprocessor macros, so if the hash function to implement is conditional upon the type, some kind of conditional needs to remain in the code 07:07
we could of course put a function pointer into the struct that you have to put into the hashable entries anyway, but that is quite a lot of overhead - or at least it seems to me. 07:08
07:10 brrt joined 07:11 lizmat joined
brrt \o #moarvm 07:20
jnthn o/ brrt 07:21
nwc10 \o/
timotimo o 07:22
moritz \|o|/
brrt wow, such happiness :-)
brrt reading backlog 07:23
jnthn so joy 07:24
brrt i'm at a loss what the goal of timotimo's last commit is tbh 07:27
timotimo the one from uthash_padding? 07:28
brrt yes
doesn't seem to be used anyway?
anywhere
timotimo not yet
we are currently forced to turn all our strings into the 32bit representation so that two equal strings will hash to the same value
otherwise we wouldn't be able to use hashes any more unless we force every string that can be in 8 bits to always be in 8 bits 07:29
brrt oh….
timotimo that's also why you see the MVM_string_flatten call everywhere
that gets rid of ropes and forces the 32bit thing
brrt hmmm 07:30
jnthn also 'cus the ropes code is...uh...ropey
brrt but then you have 32 bit strings … everywhere
jnthn brrt: Right 07:31
timotimo that's true
uthash doesn't make it very easy to swap out the hash function for different things
jnthn brrt: Which in terms of "get the right answers up to codepoint level" and "constant time indexing" is a fine choice.
brrt i guess thats somewhere between acceptable and annoying
jnthn Just not for efficiency. But optimization always comes after wroking. :)
brrt true enough 07:32
timotimo one way to "defeat" this problem is to write a "key extractor" function that turns the string into the 32bit representation for hashing, but that's an immense amount of overhead
jnthn uthash.h is like, epic macros
timotimo maybe i can factor out the "calculate the hash bucket" part of all the hash functions and allow the user to calculate the hash bucket directly
jnthn yeah
timotimo that would make it much less hacky i think
jnthn ok, I should teach stuff
timotimo except then you can do even more terrible things :) 07:33
jnthn bbiab
brrt what today?
you want the bucket rather than the hash to be computed?
jnthn brrt: MVC web dev stuff
Nothing too thrilling ;)
brrt well, stay strong
:-p
jnthn yeah, it's going fine :) 08:39
Well, only web-y course I gotta do this month
timotimo jnthn: putting nameds into callsites ... won't that make our callsite interning strategies much less successful? 09:03
jnthn timotimo: ONLY IF WE DON'T UPDAET THE INTERN CODE
uh, ooops]
timotimo was scared there for a little moment 09:04
the intern code can still only intern callsites if they have the same exact set of nameds, no?
jnthn hit caps lock instead of tab, then didn't look at what I'd typed :)
not only exact same name, but also exact same order 09:05
timotimo right
jnthn That's what allows us to do the names => positions opt
timotimo mhm
but at least the callsite doesn't contain a reference to what is being called, right?
(except a little cache for invocants) 09:06
jnthn Right. :)
timotimo what file do i look at to find the callsite writing code?
jnthn src/mast/compiler.c
I'd start by updating docs/bytecode.markdown or so
Just to get the format clear.
timotimo oh, get_callsite_id also writes the callsite to the bytecode file 09:07
brrt that would be awesome jnthn :-)
timotimo that explains why i overlooked it
what exactly?
brrt updating the docs ;_)
timotimo oh, now i get it!
brrt ;-)
jnthn bytecode.markdown is actually quite up to date, afaik.
timotimo we're actually turning nameds into something that looks exactly like a positional (because really it is) 09:08
jnthn brrt: Do you want a spesh doc?
brrt: If so, what parts do you most want a document on?
Same question to timotimo I guess :)
I'm happy to write something up, but it's good to know what the goal is :)
(Which should be "make things clear that the code doesn't make clear".) 09:09
brrt is thinking about how i'd attack it
timotimo an index to the string heap is a 16 bit integer, isn't it? 09:11
jnthn 32 now 09:17
dalek arVM/named_to_positional: aa55bbc | (Timo Paulssen)++ | docs/bytecode.markdown:
fix highlighting in vim ~_~
arVM/named_to_positional: 738a4cd | (Timo Paulssen)++ | docs/bytecode.markdown:
initial spec attempt for callsites storing named arg names.
timotimo OK
jnthn after doing r-j where JVM has 16-bit ones I...learned ;)
dalek arVM/named_to_positional: 5eedf49 | (Timo Paulssen)++ | docs/bytecode.markdown:
index to string heap is 32bit big.
09:18
timotimo should i store the names so that they line up with argument numbers or should i "compact" them?
(in the actual in-memory callsite, not the bytecode format) 09:19
i.E. will the MVMString **named_names; begin with one NULL per positional?
jnthn Hmmm
timotimo could potentially set the whole thing to NULL and not allocate at all if there's only positionals as a slight optimization 09:20
jnthn well, yeah, you only need it if there are names, for sure
I think it'll be a question of looking at what args.c needs to be fast/easily done, tbh.
timotimo oh, actually, how about this crazy idea: 09:21
... yeah, actually a crazy idea, not a good one.
jnthn Data structures need designing around use cases :)
timotimo forget about it :)
jnthn I *think* that args.c may be efficiencly implementable with them compacted. 09:22
timotimo i also just realized that it'd always be a number of NULLs, then a number of names
rather than any kind of mixture
so just knowing the index of the first named will be sufficient to calculate every named index 09:23
jnthn and you do know that 'cus we cache num_pos
timotimo sounds great tehn
then
jnthn We don't build a hash table for looking things up 'cus there's not enough names to be worth it almost all the time 09:24
brrt can we do symbolic lookups? 09:25
timotimo huh? where would that hash table live/what would it be used by? 09:26
jnthn well, potentially you could have a name => index hash on a callsite 09:27
but it's not gonna be worth it
lunch time; bbiab
timotimo ah, aye
a linear scan would probably always be better
except if you have something like "Hash.new(:foo<bar>, [ and 1000 others ])"
that could potentially wreak some havoc 09:28
brrt you know what might be worth it? 09:29
timotimo do tell :)
brrt sorting the symbols by (symbol pointer value or something like that)
so that you could - if callsite name lists get very large - always resort to binary search 09:30
timotimo you'll have to give me a bit more context
symbol means name of named parameter here?
brrt yes
with the added notion that the named parameter should / could be 'normalized' - i.e. all instances refer to the same in-memory-object, so that comparison is pointer-comparison 09:31
(that is to say, equality is pointer-equality, not what i said)
binary search is really cheap and memory-efficient 09:32
timotimo i'm not entirely sure how exactly that plays together with all other pieces of the puzzle
brrt neither do i 09:33
timotimo has much code to read 09:34
brrt :-)
timotimo hey, if you want to do something string-related that'll probably pay off in memory usage:
there's strings that show up incredibly often, over and over again 09:35
i've tried to add a string interning step to the minor collection of the nursery before, that didn't work out well
brrt such as?
hmmm
timotimo "dotty"
(this is just from settings compilation, though)
brrt as part of collection that wouldn't be ideal i imagine 09:36
you'd ideally want the compiler / mast to fix that
timotimo not possible
"foobar".substr(1, 2) ← how to? :)
brrt … possible for spesh?
timotimo the compiler/mast already has a string heap
gist.github.com/timo/e1af6d5c10a4e34d6cb0 ← check it. 09:37
that is a random sub-sample of the gen2
jnthn I think we need to keep GC simple, fwiw. 10:28
Otherwise we make pause times worse.
10:45 lizmat joined 10:50 lizmat_ joined 11:11 brrt joined
brrt is checking 11:13
wow
these are all different strings?
tadzik so "OPER" is there 876 times? 11:14
brrt as in, different sections of memory 11:22
tadzik that's my understanding
brrt wow
(again)
ok, and name-lookups are string comparisons? 11:25
12:22 lizmat joined
jnthn I suspect the OPER thing gets fixed with the arnsholt++ work on O 12:38
mmmm...cheesecake
13:52 vendethiel joined 14:04 jnap joined 14:31 dalek joined, btyler joined 14:36 jnap1 joined 14:38 synopsebot joined 14:40 masak joined
timotimo and the lowered_param_N thing i've fixed manually by stashing these strings in the Actions (i think) where they were previously generated with string concat 15:12
jnthn: we don't inline method calls yet; is that something the specializer will do at some point? or will we leave that for the jit? 15:30
hmm. maybe the callsite used in a specialized piece of bytecode could have extra information attached to it that the specializer could then use to improve calls that come from the specialized bytecode 15:32
oh, i think i understand why it's hard' in many cases we just put a slot for a little cache into the specialized bytecode to be filled later 15:36
so at specialize time we may not even know a single likely candidate
15:41 lizmat joined 16:01 cognominal joined
jnthn timotimo: Well, the specializer and the JIT are very related 16:54
Such that if the specializer learns to inline then that's done the work for the JIT to also, I expect 16:55
Or most of the work at least
The way we de-opt in the two cases may be different.
timotimo mhh, okay 16:56
jnthn On "don't know what's likely", one of the later things we'll do is 2-stage spesh 16:57
timotimo jnthn: how do you recommend i tackle the kind of daunting task to change argument handling from "two arguments in order" to "names in the callsite"? 16:58
jnthn The first will introduce various "recording" instructions; if the spesh remains hot enough then we'll use them to emit an even specialer version with guard clauses.
timotimo ah, these recording instructions will be using the spesh slot mechanism? 16:59
jnthn timotimo: Well, my plan was to get the bytecode writer to emit the right thing first, and then update the bytecode reader to read them in.
yeah, will use spesh slots for it.
timotimo i am not quite convinced that this will work fine without a new stage0 17:00
jnthn And then rebootstrap so we always have them.
And then switch args.c over
timotimo oh, that makes sense
jnthn oh, it'll want a new stage0
Maybe twice over.
timotimo even if i have to make two, should i commit both to the repository? 17:01
jnthn Anyway, my idea was to ween us off needing to put the names in.
Sure
You'll be doing it in a branch
timotimo i already am :)
jnthn They should be the only NQP changes needed for this.
So then we just merge --squash the branch. 17:02
timotimo ah, fair enough
what does "ween us off needing to put the names in" mean? o_O
jnthn Hack until deleting the argconst_s instructions for names doesn't break anything:) 17:03
Can leave the holes
And deal with them once everything works. 17:04
timotimo so 1) write the names into the callsites and serialize that, 2) build a new stage0 with the names in the callsites, 3) remove the argconst_s bytecodes that put the names in the even slots 17:05
and then build an even newer stage0 that doesn't have the argconst_s bytecodes at all any more
jnthn uh, 1 is really "and deserialize too" 17:07
timotimo ah, yeah
jnthn And 3 is a lot of work to mkake it possible :)
*make
timotimo i should have put a "just" in there for good measure :D
run-time named arguments and |%foo are handled by creating a new callsite on the fly, right? 17:10
jnthn yeaah
We ignore any flattening callsites in spesh
timotimo that'll want fixed, too somewhere between 1 and 3
jnthn And will do for the future
timotimo oh, hold on; i thought i was supposed to change named argument passing for everything ever
jnthn you still need to update it :) 17:11
Just saying that spesh isn't going to try and deal with |
Not for the time being anyway.
timotimo update what now? 17:12
sorry, i seem to be having a brainfart or something
.o( brainfort, much cooler than a blanketfort )
spesh isn't, but the regular code is
jnthn update the flattener 17:14
timotimo yea. i was going to do that
jnthn Note you'll need to do GC updates also.
timotimo oh? how so?
jnthn 'cus callsites now point to MVMString which is collectable 17:15
timotimo ah, that makes sense 17:16
17:44 lizmat joined 17:45 lizmat joined 17:55 benabik joined 18:47 woolfy joined 20:07 zakharyas joined 20:18 brrt joined
timotimo the interning mechanism doesn't count nameds? 20:29
the arity it considers seems to be only the number of positionals
jnthn we don't intern named things, right. 20:30
timotimo oh, i see that now 20:31
arg_count != num_pos
that line, i overlooked
so i won't need to teach the interner about nameds yet
jnthn no, that can come later 20:49
timotimo is the "names_used" mechanism still needed the way it is right now after the refactor? 20:50
brrt (reading backlog) yes, its deopt that will differ rather substantially 20:54
jnthn timotimo: Well, we need it to behave the same. Doesn't have to work exactly the same. 20:55
timotimo mhm
args_proc_init would introspect the args MVMRegister assuming it's an array-like and get the strings from the args list, yes? 20:56
or am i looking at the wrong thing?
jnthn sounds like 20:58
I mean, it doesn't introspect args today
timotimo i feel kinda dumb. maybe today isn't a good day
jnthn It knows what's there from arg_flags
timotimo i thought that thing is what creates the callsite and the callsite is supposed to have the nameds set
21:00 lizmat joined 21:07 brrt joined
timotimo huh. so MASTNode *args is supposed to be some kind of array of MVMObjects 21:12
among them should be strings for the nameds, right?
when i use ATPOS_S_C, the get_str gets passed a null pointer 21:14
oh, huh.
yeah, i confused flags and args 21:15
there's one flag for two args
21:19 woolfy joined
timotimo i'm not doing very well right now >_< 21:22
jnthn Well, it's probably a little tricky...
timotimo what i'm trying to do here should be simple, though :)
jnthn timotimo: Did you give up on the NQP regex opt thingy?
timotimo for now, yes 21:23
jnthn OK, I may task steal that.
timotimo right now i want to teach the mast compiler to write out the nameds into the callsite
thus, when looping through the flags, i num_nameds++ if i see MVM_CALLSITE_ARG_NAMED | _FLAT_NAMED
then, i go from 0 to num_nameds and get args[(elems - num_nameds) + i * 2]
i tried that with ATPOS_S_C, but that didn't work, neither did ATPOS_S 21:24
and (MVMString *)ATPOS(...) didn't work either
what i get from there is a p6opaque at least, so it *could* be a P6str 21:25
21:26 brrt joined
timotimo huh, the object i get there has no unbox_str_slot, though 21:26
say, could we perhaps put the name of the class we've created into the MVMP6opaqueREPRdata? 21:27
jnthn Oh 21:29
It's perhaps a MAST::SVal
Which contains the string
timotimo ah, that would be helpful
jnthn For "what class is this" I think it's more "what type is this" and belongs on the STable. 21:30
timotimo hm. yeah probably
value = <error reading variable: Cannot access memory at address 0x3f> 21:31
that doesn't seem right
AFK for a bit 21:33
oh btw 21:52
i've looked and it *is* a p6opaque
interesting. a few times the function i'm hacking in actually succeeds, even with nameds 21:55
22:19 brrt left
timotimo well, i'm certainly at a loss here. 22:42
ah, ok 22:43
it's due to the fact that it's FLAT_NAMED
or rather: the first one that is FLAT_NAMED breaks my code
jnthn ah
timotimo how are those treated?
jnthn A flat named is really a positional...
That will be flattened.
timotimo so i shouldn't be looking at the object that's put there at all to extract a name from it 22:44
in what combinations do those flattened nameds appear?
always in the last slot? always at most one?
jnthn Not sure off hand... 22:47
See QASTOperationsMAST.nqp
timotimo thanks
jnthn Near call and callmethod there's some code that sorts things out
I know it makes sure nameds come later.
timotimo ah, so maybe all i'll need to do is just ignore the NAMED_FLAT flag 22:48
wow, that way i actually get one file to compile and the next error i'll have to hunt is All positional args must appear first 22:50
jnthn That's in bytecode.c iirc 22:51
timotimo the verifyer, aye
jnthn Hmmm 22:53
grr
timotimo ah, that's probably because it's trying to read the callsites 22:54
and it's not reading the named args
it just treats them as if they were the next callsite :)
jnthn Oh! 22:55
Meanwhile, I got a version of the regex thingy that calls back into the main optimizer
timotimo also: i said "the verifyer, aye" which was untrue
i had one, too. it was just that it horribly broke compilation :)
jnthn Well, mine does pass the NQP tests and build Rakudo.
It for some reason doesn't ever lower self. 22:56
timotimo well, that's much better than what i had already! 22:57
cool :)
jnthn ohh 23:00
tssk
timotimo ah dangit 23:02
now i'm trying to read in the stage0 and of course i can't
i'll need a bytecode version incraese
jnthn Yes, you need to bump the bytecode version and maintain backcompat. 23:03
timotimo yup
very obvious in retrospect :)
jnthn
.oO( A description of much of computer science... )
Figured out why it didn't lower self. 23:04
However, the cursor match variable is gonna be a stiffer challenge.
timotimo mhm :|
but it'll probably really be worth it
jnthn yeah, 'cus it's the only remaining lexical now. 23:06
In most rules
23:08 FROGGS joined
jnthn is pondering an asserttype op for conveying stuff that should always be true. 23:10
spesh can leave it in place (unless it can prove it's not needed), and then copy the type info. 23:12
Essentially a way for code-gens and optimizers to convey to the VM, "I'm really really sure this is true even if you can't prove it; blow up if it ever ain't, and optimize assuming it is" 23:13
timotimo right
jnthn Immediate use case is that !cursor_start always returns something of the same type as self. 23:14
Granted some day inlining might get it.
But !cursor_start is perhaps a little big to be an obvious inline. 23:15
Well, patch passes spectest too 23:17
timotimo now i'm getting bogus data in my arg_name array :\
i'm setting it with get_heap_string, that seems correct, aye?
(gdb) print frame->cur_args_callsite->arg_name[0] 23:18
$4 = <error reading variable: Cannot access memory at address 0x34>
just ... huh?
i must be creating a callsite somewhere and forgetting to null that out or something 23:20
jnthn that looks...odd, yeah.
0x34 is clearly not a pointer. 23:21
I just pushed my NQP patches
23:22 woolfy left
jnthn No significant effect yet. 23:23
timotimo is it aborting for some other reason? 23:24
jnthn Well, it can't lower $¢ yet 23:25
But also it could benefit from the asserttype thing I mentioned
timotimo i'm segfaulting and i have no idea where this data could possibly come from 23:27
jnthn Anyway, need sleep...still exhausted from last day's teaching/bad sleep...
'night
and good luck ;)
timotimo oh, huh 23:28
it's actually a straight-up null pointer
good night and rest well!
(gdb) print num_nameds 23:31
$8 = 32783
yeah, that's not so probable
ah, yeah, C doesn't null out variables on the stack for you 23:32
what in the ... 23:34
i'm getting a segfault in MVM_gc_worklist_add 23:36
and, this happens: (gdb) print /r **(MVMCollectable **)(frame->cur_args_callsite->arg_name[0])
Cannot access memory at address 0x38000800000001
other places use MVM_ASSIGN_REF for the result of get_heap_string, but I can't do it here, because the root, in this case the callsite, isn't an MVMCollectable 23:42
does everything that has MVMCollectables inside them have to be an MVMCollectable?
i guess i'll go to sleep now 23:55