github.com/moarvm/moarvm | IRC logs at irclog.perlgeek.de/moarvm/today
Set by moderator on 18 May 2018.
00:51 shareable6 joined 01:56 ilbot3 joined
moderator github.com/moarvm/moarvm | IRC logs at irclog.perlgeek.de/moarvm/today
03:22 greppable6 joined, reportable6 joined, notable6 joined, quotable6 joined, committable6 joined, coverable6 joined, evalable6 joined, bloatable6 joined 03:23 bisectable6 joined, releasable6 joined, nativecallable6 joined, unicodable6 joined, benchable6 joined, statisfiable6 joined, squashable6 joined, undersightable6 joined, shareable6 joined 05:11 robertle joined 05:52 robertle joined 06:12 shareable6 joined 08:53 domidumont joined 09:00 domidumont joined 09:28 FROGGS joined 09:47 shareable6 joined 11:35 shareable6 joined 12:07 zakharyas joined 12:57 shareable6 joined 13:45 Geth joined 14:45 ilmari[m] joined 15:36 committable6 joined, reportable6 joined, coverable6 joined, bloatable6 joined, releasable6 joined, unicodable6 joined, statisfiable6 joined, squashable6 joined 16:49 Kaiepi joined 17:36 shareable6 joined 18:10 shareable6 joined 18:51 Ven`` joined 19:51 shareable6 joined 21:00 Summertime joined 21:19 Summertime left
samcv MasterDuke: mill arch is pretty interesting 21:21
timotimo jnthn: do you think adding a bitmap for "which slots in the sc have been deserialized so far?" to make the loop over the whole array to find a given object faster? 21:44
s/"?"$/ is a good idea?/
jnthn Hm, is that expensive? 21:48
I figured it'd only happen on the first lazy deserialization of something, and that we tend to then deserialize entire subtrees of things
timotimo well, you know how we sometimes have a cached sc idx inside an object? 21:50
in install_core_dists we hit that 20% of the time
we do find_obj_idx or what it's called 88k times inside that script 21:51
i'm printing out any number of >= 8 consecutive nulls and i get numbers ranging up to ~450 sometimes 21:52
if we can sometimes skip this many items outright, we'd surely have much less cache evictions 21:53
jnthn Hmm, but why are we hitting the linear search if the object already exists? :S 21:54
I thought if something was deserialized then we'd *always* have index cached 21:55
timotimo i didn't see through that code
jnthn I found the odd place where we didn't before
But I guess we must be missing another one
timotimo i had no idea what the requirement was for an object to have its scidx cached :)
but "an object should always have its scidx cached" is a reasonable explanation and i could go digging 21:56
jnthn afaik there's not a reason that it can't, I'm guessing we must just somewhere "forget"
timotimo rr shall rescue me 21:59
oh, does anything speak against caching the index right then and there while we're doing the linear search? 22:00
jnthn Hmm...if it can be done without thread headaches 22:07
timotimo racing to install the same value from two threads could be fine 22:09
hm, but if another thread is reading half the value, that could be not so great
OK, it looks like the change makes no difference 22:10
timing wise, that is 22:11
samcv while taking a nap today i thought of an optimization for string eq. we can compare the cached hash codes (if they exist) and quickly reject non matching strings 22:18
timotimo oh, we don't do that yet? 22:22
sounds like a good idea in any case
jnthn Just make sure to exclude the 0 no-hash-yet sentinel :-) 22:25
Geth MoarVM: 4152021ff8 | (Samantha McVey)++ | tools/update-changelog.p6
[Tools] Add update-changelog.p6 tool
22:31
MoarVM: d634d24cf3 | (Samantha McVey)++ | src/strings/ops.c
Instantly return 0 with string eq if cached hash code doesn't match

If the cached hash codes exist, we are able to quickly return 0 without having to manually compare the two strings. For some work loads I could see this having a fair impact.
samcv spectest pass. and commited
timotimo it could be beneficial to just calculate some hash codes "for fun" now 22:32
to increase the occurence of hash codes not being 0
samcv "for fun" lol 22:34
timotimo like, when we do GC and some threads are done with their work already, but some other thread is still GCing away ... maybe grab some random strings and calculate some hash codes?
do we do anything smart with hashes calculated from strands btw?
samcv hah
like what?
also no we don't 22:35
timotimo if we have two long strings in a strand, can we re-use the first part's hash code (if the whole string is a part of it) 22:36
hm, the hash code potentially includes the length of the string, eh?
that would make it useless for that purpose i suppose
we probably don't want to have a hash function that you can just combine stuff together with, maybe 22:37
samcv timotimo: yeah that basically makes it easily attackable 22:40
timotimo right
if we can cheat, so can the attacker
samcv not sure if we want to rekey a hash if we have too many hash conflicts 22:41
i mean we probably should ideally
timotimo i don't yet know how rekeying can work if we want to keep cached hash codes
samcv and then you can also worry about timing attacks
timotimo: well you just ignore them 22:42
timotimo is the attack we expect that the attacker gets the full hash code right?
or just the part we use for buckets?
samcv just the bucket part i suppose
timotimo in that case we can perhaps just change what part of the hash code we use to decide on the bucket? 22:43
samcv hm 22:44
i guess we could like reverse it?
timotimo start at the end instead of the beginning?
samcv hm
i mean we'll have a 64 bit value so that has a lot of surface area. we could rotate it maybe 22:47
timotimo rotate sounds good
samcv surface area as in, we don't really need a full 64 bits for bucket determination
but it makes it slower to bruteforce and should be trivial to do a rotation on it and be able to rekey in case there is an attack 22:48
22:48 ZofBot joined
samcv and allow us to not have to recompute all the strings hashes again 22:48
timotimo hm 22:59
we're currently telling people the order of items in a hash will change between runs of the same program
are they expecting the order will not change on its own thereafter?
naah, that'd already happen when hashes increase in size
samcv well it won't change unless they add things to it
and they already reorder on bucket resizing
timotimo which it already did before anyway
right
samcv though they technically shouldn't rely on that either 23:00
timotimo there isn't as much thinking before my talking today as there sometimes is
samcv also could be interesting if each hash had its own rotation
timotimo every MVMHash gets its own Quaternion 23:01
samcv quaternion?
what
timotimo in 3d programming, they're used to make things rotate in a way you'd expect
wow, the wikipedia has a bunch of illustration and none of them seem enlightening 23:02
samcv on 3d programming? 23:03
timotimo damn, a gamasutra article entitled "rotating objects using quaternions". it starts "Last year may go down in history as The Year of the Hardware Acceleration". it is from 1998
btw, i don't know much about 3d graphics or 3d programming or whatever, i've just picked this snippet up somewhere 23:05
samcv also not sure if we need to hide the order of objects in a hash table or not
MasterDuke sounds about right, i think i got my first 3d video card around 1996 or 1997
samcv i.e. by randomizing which buckets we iterate through first
timotimo my first 3d-ish card was a matrox mystique, but no clue if the original or the 220 version 23:06
the latter was released 1997 apparently
MasterDuke mine was a canopus pure 3d, a 3dfx voodoo 1 (but with 6m ram, 2m more texture memory than the reference version)
i could finally run jedi knight at 640x480! 23:07
TimToady my first graphics processor was the blitter on an Amiga 1000 :) 23:08
MasterDuke hm, i'm not sure i've ever used an amiga. certainly heard/read much about them though 23:09
timotimo i never had any amiga, or even commodore or atari or what have you. the first computer i remember using was either a 386 or a 486, possibly the latter 23:10
jnthn Today my wife was trying to install some smartphone app that wanted over 400MB of space, which seemed huge given what it was supposed to do. I pointed out this was 4 times more space than the entire disk space of my family's first home computer (a 486) that I programmed on. The BBC micro that was the first machine I programmed on didn't even have a hard disk. :-) 23:12
TimToady neither did the Amiga 1000
unless you count floppies... 23:13
jnthn I figure floppies are by definition not hard. :P
TimToady depends on how floppy your definition is, I suppose...
timotimo in order to try to appreciate machines of the pre-timotimo-era i'm watching The 8bit Guy (formerly The iBook Guy, and additionally 8bit keys) on You the Tube 23:14
jnthn Hm, actually, I'd always thought "hard disk" was just the opposite of "floppy disk", and never considered if that was the real reason for the naming :)
TimToady it's a bit of a retrynym, I suspect
*retro
timotimo hm, is a solid state disk just the opposite of a fluid community cube? 23:15
what would you call the opposite of a disk
MasterDuke no, of a companion cube
TimToady join the Flat Disk Society today!
timotimo actually, disc and disk aren't the same thing
TimToady discs have grooves :) 23:16
timotimo amusingly, in german it's called Festplatte, which you could wrong-translate as a thing you put lots of food on to serve at some kind of fest/party/event
wrangslate?
samcv jnthn: when i was with lizmat i spitballed my ideas on implementing MVMString that has a feature to not normalize 23:17
and it seems pretty doable with mostly minor modifications to our functions
jnthn samcv: Hmm, too bad I wasn't there. :-) It had occurred to me that MVMString might want to be the thing behind Uni though
TimToady though Uni is just differently normalized... 23:18
samcv i actually did a proof of concept sorta thing
i added a nqp op that converted a normal mvmstring into a non-normalized type
jnthn Though my idea was to have multiple types based on the MVMString REPR so that we can use type specialization to strip out the switching over "what kind of string is this"
samcv and added a setting to one of the mvmstring struct
jnthn I think I'd rather shuffle that setting type-wards for the reason just mentioned :)
The thing that worries me is the binary operations 23:19
samcv binary operations?
timotimo so MVMString gets a REPR_data?
jnthn timotimo: Yes
samcv also that would mean having to write new functions for every single current function?
jnthn samcv: As in, those that have multiple strings as the input
samcv ah
well it works as long as both are of the same string type
which i demonstrated in my Proof of concept i wrote 23:20
string eq etc, i just had the second string convert its type to the first string's type
timotimo you can always concatenate into a "dirty" type, i.e. mixed normalization modes
jnthn It doesn't mean having to write new functions if they do the same thing
samcv timotimo: well no. i didn't allow that
jnthn It's possible that at MoarVM level it just blows up if there's a type mismatch
timotimo then you have to decide if some kinds of normalizations are infectious compared to others
samcv it would convert the second item in the concatenation to non-normalized type or normalized depending on the first one 23:21
jnthn And we handle that up at Rakudo level
otoh maybe that's inefficient
Since MVMString is also immutable
samcv which?
TimToady privileging the first argument is a bit non-p6-y
samcv well lizmat though that NFG is infectious
jnthn Indeed
samcv so maybe that's what i implemented actually 23:22
jnthn Infections NFG could work
timotimo that's (?) why we give all Int ops a type to box stuff into
jnthn There's all kinds of tricky though
samcv in any case, i have much more confidence of this being doable
timotimo also it makes me just a little uncomfortable that the slice reprop just takes self's type
jnthn Like, if we do $str.split($uni), are the results Str or Uni?
If we do $uni.substr(1, 2) are those units graphemes or codepoints? 23:23
timotimo and also, when do we consider a part to match if the split needle is explicitly Uni rather than Str
jnthn And what does it return?
samcv jnthn: it's the $str's type
jnthn Do we have a .subuni(1, 2) for the other thing?
timotimo not only "match inside a grapheme", but also how to handle different normalization forms of the same thing
samcv no
timotimo etc etc
samcv jnthn: we have substr but it uses uni semantics
well the data type is not normalized. so it just does substring identically 23:24
jnthn samcv: That doesn't go so well with the "operations have consistent semantics" design rule, though
TimToady I think if people want to split graphemes they'd better explicitly force Uni first
samcv nothing changes. it just makes a strand or a new string from point a to b
jnthn: well at least on moarvm it's that simple
jnthn We probably need to figure out how we want it to look at the Perl 6 level before deciding the MoarVM level. 23:25
samcv i was thinking more of how moarvm is concerned though than how it'd actually be implemented in rakudo
yeah
TimToady looks around for a language designer...
jnthn I'm a bit uncomfortable with the units specified to .substr(...) meaning something different on Uni, I guess.
In that we generally try to make it so that when you perform an operation, you don't have to know the exact type it's operating on to know the semantics 23:26
TimToady that's why we used to have opaque string offsets in the design :)
jnthn Thus why we have == vs eq
samcv i'm gonna go grab some food. brb
jnthn Enjoy :)
TimToady: Hm, that was when Str was envisioned as a multi-layer construct rather than the NFG thing with Uni a separate thing, though? 23:27
Or are those two ideas seperable?
TimToady ayup
I suppose it wouldn't hurt to have a .subuni, and be a little consistent with the subbuf vs substr distinction 23:28
though it's not like we don't overload other method names on different types 23:29
jnthn My feeling is that a type distinction between Str and Uni is probably right; the Perl 5 I've been writing recently has made me miss the Buf/Str distinction, and the Str/Uni case feels pretty distinct to
TimToady: We do, though I like to think we mostly do it when the semantics will not be a surprise. :)
Not knowing what the units I feed in will be interpreted as feels a bit...akward. 23:30
*awkward
I guess .index is simlarly problematic
TimToady it does make it a bit harder to write generic code that doesn't care whether you feed it Uni or Str, but we could already say that about Buf 23:31
maybe we also need a .submumble method :)
jnthn :)
I struggle to think of many cases where I don't care what level of abstraction I'm working at. 23:33
TimToady what happens when we subbub a Blob currently?
subbuf, er
answer, it returns a subblob 23:34
jnthn Yeah, I was thinking about making that able to use a "view" also 23:36
So that it doesn't have to copy
samcv back 23:38
TimToady well, views work better on immutables than mutables, though editors often know how to maintain pointers into mutable buffers... 23:43
so depends on whether we want to rewrite vim in Perl 6... :P
jnthn Indeed, Blob is immutable. Wasn't planning it for Buf, or at least not without some explicit way of asking for it 23:44
timotimo taking multiple subbuf-rw into the same buf and assigning length-changing things makes bufs very weird :) 23:47
not surprising, though
samcv timotimo: so i think what i'll do is have each hash have its own rotation of the hash keys 23:51
and we could also change the rotation on table expansion as well