samcv | dropped my laptop and the screen went all blocked up and making bad noises | 01:07 | |
maybe dropping it tapped the ram? idk | |||
turned on fine once i turned it off | |||
timotimo, with 1/100 the size i didn't see collapse strands btw | |||
timotimo | interesting! | 01:09 | |
01:14
vendethiel joined
01:48
ilbot3 joined
|
|||
MasterDuke | and --profile shows the most expensive routine (by exclusive time) is `infix:<~> (SETTING::src/core/Str.pm:2801)`, at 30%, next most is 5.8% | 01:52 | |
that's the Str:D multi | 01:54 | ||
samcv | yeah i bet it is | 02:01 | |
that thig always sucks | |||
MasterDuke, what are you using to measure? and how big is your file? | |||
MasterDuke | perf record and --profile, of `use JSON::Fast; spurt("otherCache", from-json(to-json(slurp("cache").split("\0"))).join("\0"))`, with the first 2.5Mb of the file timotimo was using | 02:03 | |
samcv | argh complaining i don't have enough permissio | 02:05 | |
it's set to -1 | |||
at 502017, unexpected \ inside list of things in an array | 02:09 | ||
whatever i just made a coverage report of it | 02:21 | ||
will check that function | |||
ok i see | 02:22 | ||
interesting | |||
uploading now | 02:23 | ||
MasterDuke, this must be G as in giga cry.nu/coverage/json-1/libmoar/cov...c.html#L77 | 02:25 | ||
but the whole function itself runs 4.5k times | 02:26 | ||
for loop is pretty hot | |||
but looks like turn_32bit_into_8bit_unchecked is never hit. so that's not causing the issue | |||
though it does still hit that function 25.8k times | 02:27 | ||
MasterDuke | this i.imgur.com/xg5qvIk.png is heaptrack's overview | 02:29 | |
samcv | if (g < -127 || g > 127) probably (hopefully? gets optimized to not check sign and just check binary hopefully. since that's all that's needed | ||
MasterDuke | oh, oops, all that measuring and profiling was with my modified version | 02:31 | |
samcv | modified how | ||
MasterDuke | gist.github.com/MasterDuke17/73b37...4e528a64e6 | 02:32 | |
samcv | but | 02:34 | |
why would you change it like that. won't it possibly break | |||
oh i see i guess you already collapse it | |||
then check there. any change in time? | |||
MasterDuke | i just time it once before and once after. 0.1s slower after my change (out of 31s) | 02:35 | |
samcv | ok i've made it faster | 02:41 | |
02:41
agentzh joined
|
|||
samcv | let me calculate how much. but this is only measuring differences between a single change i made, and i made a couple other changes that may have a minor effect idk | 02:42 | |
average of 5 times made it go from 22.188 to 23.43 seconds | 02:44 | ||
so 1.2s shorter | |||
5.4% faster | 02:45 | ||
Geth | MoarVM: 7beffd390c | (Samantha McVey)++ | src/strings/ops.c collapse_strands 5.4% speed boost under some workloads This loop gets ran a huge number of times under collapsing strands workloads. If we're already set can_use_8bit, don't bother checking if the following graphemes are < -127 and > 127 |
02:50 | |
samcv | pushing that change MasterDuke | ||
samcv | i'm thinking of making that MVMint8 a plain old int, so it'll use the fastest type on whichever cpu it's ran on. it's bool so doesn't matter the size of it | 02:51 | |
MasterDuke | you mean 23.43 to 22.188? | 02:59 | |
samcv | yes | 03:00 | |
i'm making MVM_string_graphs_nocheck too, so that should help make string things faster cause that gets called a ton of times | 03:01 | ||
uhm and got 21.408s with some more of my changes | 03:08 | ||
03:15
vendethiel joined
|
|||
samcv | i think i found a bug | 03:19 | |
github.com/MoarVM/MoarVM/blob/mast...#L157-L165 here. if it's an ascii string, we should access blob_ascii | 03:20 | ||
which is what is used in every other places | |||
can can compare them with memcmp but we can't point to the same place | 03:23 | ||
i don't think we run that command anywhere on ascii strings since they're hardly used (or something) | 03:24 | ||
imo it would be easier to deal with if they both used blob_8 and we still have the string be of type ASCII, but they are both stored in blob_8's | 03:25 | ||
then we won't have to have branching when we're really just comparing two things that are subsets of each other, and can be memcmp'd fine | 03:26 | ||
05:51
brrt joined
05:53
domidumont joined
05:59
domidumont joined
06:00
domidumont joined
06:36
geekosaur joined
06:44
brrt joined
|
|||
brrt | merge pushed | 07:14 | |
the 'has-dynasm' thingy has been removed, because, a): we have our own dynasm fork, and it's not going to be the one that's installed, b): dynasm is a *built-time-dependency*, damnit, and it doesn't make sense to use some other guys' version | 07:15 | ||
07:36
domidumont joined
07:41
zakharyas joined
09:05
domidumont joined
10:43
vendethiel- joined
|
|||
timotimo | it occurs to me that collapse_strands uses the grapheme iterator in all cases. we could probably special-case one or two different storage types there | 11:17 | |
MasterDuke | timotimo: ah, interesting idea | 11:26 | |
timotimo | i'm doing a run on a quarter (elementwise) of the whole cache | 11:27 | |
MasterDuke | timotimo: did you notice samcv's possible bug find in MVM_string_substrings_equal_nocheck? it does look odd | 11:28 | |
timotimo | typedef MVMint8 MVMGraphemeASCII; | 11:29 | |
typedef MVMint8 MVMGrapheme8; /* Future use */ | |||
i imagine if we split it the compiler will merge it again until we change one of these typedefs | 11:36 | ||
so since reducing it down to like a tenth makes it complete in mere minutes, i was hoping a quarter would give me a nice hour or so run time | 12:00 | ||
like, just a tiny bit up the hockey stick | 12:01 | ||
13:27
spebern joined
14:10
nine_ joined
14:13
SmokeMachine joined
14:33
spebern joined
|
|||
timotimo | almost 4 hours now | 15:06 | |
15:08
brrt joined
|
|||
brrt | good * #moarvm | 15:09 | |
15:47
brrt joined
16:39
domidumont joined
|
|||
samcv | good * | 17:02 | |
where do you get future use timotimo | 17:03 | ||
<timotimo> typedef MVMint8 MVMGrapheme8; /* Future use */ | |||
it is used currently though | |||
timotimo | that's just part of the code :) | 17:04 | |
i think i was Future Man who got to Use that | |||
geekosaur | ^ | ||
samcv | heh | ||
well it's used currently :P | 17:05 | ||
geekosaur | couple weeks(?) ago timotimo implemented that optimization for strings restricted to the ISO8859 subset | ||
so yes, the future is now(tm) | |||
timotimo | the future is so last week | ||
samcv | was longer than couple weeks ago | 17:06 | |
also timotimo what are your thoughts of the ascii thing. they're both stored int int8's and that places looks like a bug right? in MVM_string_substrings_equal_nocheck. nowhere else do i see an ascii type string have blob_8 | 17:08 | ||
but i wouldn't be against making ascii strings just use blob_8 anyway or something | |||
timotimo | we should decide that blob_8 is supposed to be signed, just like ascii is. and then we can throw it out :) | 17:09 | |
samcv | they'd still be 8 bit strings and still be distinguishable, but would make comparison of ascii and others less prone to errors | ||
they are both signed | |||
8 bit numbers | |||
timotimo | yeah, it's important to have signedness here | 17:10 | |
for our synthetics | |||
samcv | yeah | ||
timotimo | also, ascii is - of course - only defined up to 127 | ||
samcv | 12. what. | 17:11 | |
timotimo | if you try to store latin-1 in "ascii", we'll not be in agreement | ||
samcv | we store latin-1 in blob_8 | 17:12 | |
geekosaur | enh, question is whether synthetics or latin-1 make more sense there. I'm not sure there are that many useful cases for synthetics in that range | ||
timotimo | we do? :o | 17:13 | |
samcv | yes timotimo | ||
they're just numbers... | |||
timotimo | but but but but | ||
signed! | |||
samcv | so now we want another blob format for no reason XD | ||
blob_latin1 XD | |||
TimToady | well, even ASCII can have CRLF in it... | ||
timotimo | yeah, and we turn that into a negative number | 17:14 | |
samcv | yep | ||
would be easier if ascii, latin-1 and 8bit all stored in blob_8. we don't have a Latin-1 type of string btw | 17:15 | ||
just ascii, grapheme 8, grapheme 32, and strand | |||
can still have the string be of type ascii, but no need to confuse things and make comparing 8 bit strings more work than it needs to be | 17:16 | ||
programming errors etc | |||
also i kind of think CRLF should be a constant synthetic grapheme. that could save some having to check what crlf is in the trie | 17:21 | ||
Geth | MoarVM: affec75b9c | (Samantha McVey)++ | 4 files Add MVM_string_graphs_nocheck funct, use it places we prev. already check There are many places where we check arguments with MVM_string_check_arg, and then will later on call MVM_string_graphs. This is redundant because MVM_string_graphs runs the same checks every time it runs that MVM_string_check_arg has already done. Shows a minor, but measurable speed increase. |
17:22 | |
samcv | is this supposed to be called MVN_unicode_normalizer_form MVN? and not MVM? | 17:29 | |
18:17
AlexDaniel joined
|
|||
samcv | hmm. i. want to steal some code | 18:22 | |
Knuth-Morris-Pratt search, memmem (searching for memory within memory, so we can use it on whatever size grapheme we want) | 18:23 | ||
lists.gnu.org/archive/html/bug-gnu...00031.html | |||
19:02
zakharyas joined
|
|||
TimToady wondered whether CRLF should always be -1 | 19:08 | ||
timotimo | we can easily make that happen inside moarvm | 19:10 | |
by just creating -1 as soon as moarvm is started | |||
samcv | yeah what timotimo said | 19:29 | |
19:44
dalek joined,
SourceBaby joined
|
|||
samcv | ugh i can't get `memmem` working | 19:50 | |
i thought it would return a pointer to the location in the haystack the substring starts at. which is what the manpage says | 19:51 | ||
but it returns numbers that are inconsistently different than haystack | 19:52 | ||
i thought if i did `haystack - memmemresult` i would get the size_t from the start of the haystack to the found result and it should be consistent. but it isn't | |||
and i'm very confused | |||
whatever it's pointing to seems to hold the same data every time i run it. but it isn't the right memory region... | 20:00 | ||
timotimo, help gist.github.com/60469bf4ea8db41db3...95c2f8a957 | 20:01 | ||
man page if anybody needs it... man7.org/linux/man-pages/man3/memme...op_of_page argh. | |||
if we can get it to work we can get kruth-morris-pratt optimized string search tho | 20:02 | ||
then include the source for mac/windows since it's not a standard c lib function | 20:04 | ||
timotimo | hmm | 20:22 | |
samcv: seems correct to me | 20:24 | ||
you're telling it to look for "a" and it finds it right at the beginning | |||
when you make needlelen 2 instead of 1, it'll find it a bit later | |||
haystack[4195968] needle[4195976] found[4195972] | |||
i.e. 4 bytes in | |||
samcv | ok that's not what i get | 20:25 | |
i get something totally not that | |||
haystack[94595572664408] needle[94595572664416] found[3212937304] | |||
timotimo | did you #define _GNU_SOURCE? | ||
it doesn't compile otherwise | |||
32bit machine perchance? | |||
samcv | it compiles for me. without that. it's 64bit | 20:26 | |
timotimo | the char output is b0rked on my end | ||
samcv | and adding #define _GNU_SOURCE does not help me | ||
ah | |||
that's okay though. as long as it's giving the right answer... | |||
i just get random different found values.... | 20:27 | ||
that are like much smaller than anything else | |||
and it's not even consistent when i do haystack - found | |||
timotimo | oh | ||
you're storing the result of memmem into an int | |||
int isn't defined to be able to store a pointer | |||
samcv | yes | 20:28 | |
timotimo | that's probably why you're getting such a bogus result. and i have no idea why i'm getting a correct one. perhaps it's storing these things in a memory location much closer to 0 so that it fits into 32bit? | ||
samcv | it still does not work XD that's what i tried FIRST | ||
if i set it void * or char * i get even CRAZIER ranging values | |||
haystack[94913660565544] needle[94913660565552] found[18446744072887842856] | |||
X\ | |||
T_T | 20:30 | ||
ok now it's working. | |||
idk what options my in editor C compiler uses. but it works fine compiling myself | |||
weird....... | 20:31 | ||
will have to look into that | |||
ok well at least i have a working example. will try and make it work in mvm now | 20:33 | ||
timotimo | sorry, i was just out on the balcony to see an iridium flare | 20:35 | |
but isn't memmem using byte-granularity? | |||
so we'd have to continue searching until we hit a properly aligned one, right? | |||
samcv | ok i think i fixed it | 20:36 | |
in mvm | 20:37 | ||
oh you mean if by some magic the byte is found between bytes? | |||
err for grapheme32's i guess. but yeah i think it does by byte | 20:38 | ||
i don't think that will happen. but we will have to have a check to make sure that does not occur | |||
however unlikely it may be. it is possible | 20:39 | ||
timotimo | mhm | ||
hm, say ... | |||
samcv | (maybe possible) | ||
timotimo | if it does a fancy algorithm, it may discover by itself that it can skip almost all alignments, except of course the proper one | ||
samcv | but until we know for sure we need to check | ||
well it does do fancy | |||
Knuth-Morris-Pratt | |||
which is why i want it | |||
timotimo | how much fancier than boyer-moore is this? | 20:40 | |
samcv | maybe less? idk | 20:41 | |
timotimo | it's not mentioned once in the wikipedia article | ||
samcv | i do know it's a lot faster than what we have now | ||
timotimo | right | ||
it's sad we hardly spend any time searching for a needle in a haystack when compiling the core setting | |||
samcv | i've heard of it before. it's different than booyer moore. not quite as complex i think | ||
gonna run spectest now | 20:43 | ||
yay spectest pass | 20:46 | ||
ok. now. so can a 32 bit number exist in an array of 32 bit numbers, not at a normal offset | 20:47 | ||
i am not sure how to prove or disprove that | 20:48 | ||
well easiest would be to try to construct one synthetically i guess | 20:49 | ||
nice timotimo. it's 2.2x faster :O | 20:51 | ||
doing 'a' x 100000 ~~ /b/; | |||
worst case | |||
that is huge | 20:52 | ||
timotimo | nice | 21:03 | |
it's easy for such a 32bit number to exist inside two 32bit numbers | 21:04 | ||
how many f is 32bit again %) | |||
two is 8bit, so 8 is 32bit | 21:05 | ||
0000ffff, ffff0000 has ffffffff in it at a non-aligned offset | |||
samcv | yeah. wondering what the probability of that is | ||
timotimo | hm | 21:07 | |
imagine you have a number that has its lowest byte full | |||
m: say 0x100.uniname | |||
camelia | LATIN CAPITAL LETTER A WITH MACRON | ||
timotimo | okay, so imagine we have two nullbytes and then this, and we're looking for \1 | 21:08 | |
m: say ("A".ord +< 8).uniname | |||
camelia | <CJK Ideograph Extension A> | ||
timotimo | hmm | ||
samcv | ok i need a break. bbs | 21:35 | |
i'm getting very frustrated trying to use the memmem function standalone. | 21:39 | ||
i try and rename it and import it. and then it just says it can't find it | 21:40 | ||
timotimo, let me know if you come up with some way i could test the bug and check when it triggers. would be nice to write a test for it | |||
timotimo, en.wikipedia.org/wiki/Knuth%E2%80%..._algorithm | 21:51 | ||
timotimo | that's the page i looked at | 21:54 | |
samcv | timotimo, maybe you can at least write some code to detect if it's not on the boundary. or help me | ||
github.com/samcv/MoarVM/commit/889...1eea0af3aa here's the commit | |||
it works fine. but for the 32bit graphemes has that rare bug with overlap | 21:55 | ||
probably | |||
so the pointer subtraction works as expected since i store the result of memmem in a MVMGrapheme32* | 21:56 | ||
so uh. do i cast as char * or what and then subtract them? not 100% sure | |||
to find out if it's not divisible cleanly by 32 | |||
(cast both memmemrtrn32 and haystack->body.storage.blob_32 as char * i mean) | 21:57 | ||
22:26
agentzh_ joined
|
|||
timotimo | okay so i quartered the workload and it's still going after 11h30m | 22:57 | |
samcv | well it compiled fine on mac on travis | ||
bsd has a non optimized memmem thing. not sure how it compares in speed to what we previously did | 22:58 | ||
timotimo | maybe it's better to cast to intptr_t, but char* should be fine | 22:59 | |
the perf.data file is 11 gigs big %) | |||
samcv | nice | 23:00 | |
amazing | |||
timotimo | just waiting for it to finish writing it out | ||
oh, i think it just finished | |||
i wonder how long it'll take to perf report that :D | 23:01 | ||
[ perf record: Woken up 43617 times to write data ] | 23:02 | ||
[ perf record: Captured and wrote 10904.887 MB perf.data (156696959 samples) ] | |||
samcv | timotimo, i think there must be a bug | 23:33 | |
at 502017, unexpected \ inside list of things in an array | 23:34 | ||
in sub parse-array | |||
in JSON::Fast | |||
can you compress perf data | 23:38 | ||
also working on making that loop even tighter. i love now i can run roast and have it finish in 3.5mins | 23:46 | ||
Geth | MoarVM: samcv++ created pull request #573: Have two part loop in collapse strands to make loop tighter when possible |
23:58 |