samcv hopefully my PR looks sane 00:01
timotimo, how is appveyor coming?
timotimo well, it runs, but i get mysterious nqp failures that i didn't investigate further yet 00:02
samcv that's okay
if i do a pr will appveyor run on it?
timotimo can has the json document that asplodes there? maybe only the part where it asplodes, though?
samcv kk
yea
timotimo i hit "new build" so we can get a fresh nqp test run 00:03
it should be clean, right?
samcv here a.uguu.se/4wdPsjslTP7s_100cache.xz 00:05
what should be clean?
timotimo "make test" in nqp 00:06
samcv yeah
show me if it's not
timotimo well, it wasn't on appveyor 3 days ago :)
ci.appveyor.com/project/timo/moarv...uom38mbshc - you can watch the progress here 00:07
it's just cloning nqp
t\nqp/019-file-ops.t (Wstat: 0 Tests: 104 Failed: 0) 00:09
Parse errors: Bad plan. You planned 107 tests but ran 104.
ok 104 - reading with nqp::readlinefh stuff written by nqp::writefh
Failed 3/107 subtests
(less 23 skipped subtests: 81 okay)
what were you doing with that cache file? because i can run the script just fine and don't get an error
um 00:10
there's totally a \ that doesn't belong in there :P
look at how the first line ends and the second line begins in that file
i don't think it should be like that
same failure mode on x86, btw 00:13
the thing is 00:14
the last test that succeeds in there ... is also the last test in the file 00:15
so the plan is wrong, right?
but i have the same code on my machine and it ends up with the right number of tests
so we're improperly skipping something somewhere?
aha 00:16
there's a skip there that says 6, but the "else" branch of that has 9 tests in it
so that's probably the culprit?
samcv uhm 00:19
use JSON::Fast; spurt("otherCache", from-json(to-json(slurp("scache"))
so
it should work cause i'm converting something to json then back
timotimo oh, it's a long-ass string, rather than split at \0
ok, i did that differently 00:20
let's see
samcv oh wait. no. wait hold on. i think i pasted wrong
use JSON::Fast; spurt("otherCache", from-json(to-json(slurp("100cache").split("\0"))).join("\0"))
timotimo that exact code works for me 00:26
did you make sure to install the latest JSON::Fast with zef, or alternatively used -I on the git checkout's lib folder?
ci.appveyor.com/project/timo/moarvm 00:27
look, it green
MasterDuke samcv: is your PR faster? 00:30
samcv a tiny bit yes
it is measurable 00:31
MasterDuke nice
timotimo oh nice trick
samcv see stats gist.github.com/samcv/035ead8a920e...c9293d324e 00:32
that's with uhm. use JSON::Fast; spurt("otherCache", from-json(to-json(slurp("100cache").split("\0"))).join("\0"))
which is 1/100 of the original one 00:33
$cache.substr(0, $cache.chars/100)
^that's how i made it
if timotimo is wondering
timotimo ah, thanks
samcv yay green
timotimo yes yes very green
i don't know if we can sign up to travis with a github organization or something 00:34
not travis
appveyor
samcv tell it to do PR
timotimo smh
i can't
samcv :\
timotimo i don't have access to the github hooks on the moarvm repo
i can't even make it build commits to master
i can, however, click "new build" every time someone asks me to
samcv i am disappointed
:)
hahaha
timotimo except when i'm asleep, or AFK, or something 00:35
i won't give anybody a pager number for me so they can wake me the middle of my sleep to run an appveyor build :D
samcv lol.
uhm i want one ran
timotimo okay, i would, but only if they give me tons of money
samcv on my repo
i'm sure i could pay somebody in india much less money to do it
timotimo oh, on your repo? gotta figure out how that works
samcv i don't *think* mcsvc has memmem but it's possible. worst case we can always make our own 00:36
or just use whatever it was we did before
timotimo um ... i don't even know how to tell appveyor to build something else tbh :( 00:37
however!
it's really easy to set up your own!
samcv github.com/samcv/MoarVM/commit/889...1eea0af3aa but yeah
we could just ifdef that part and only have it build on linux or something idk
or bsd
though i haven't benched it on mac os x yet 00:38
you have mac right timotimo ?
can you check for me
or is that somebody else. i don't know
it's 2.2x faster under worst case for me though with the change on linux with glibc
timotimo no, i'm on linux, i don't own a mac at all
lizmat has a mac
samcv cool
i can't imagine it would be *slower* than what we do now 00:39
timotimo you might call 'er ... lizmac!
samcv dies
i'm hungry
timotimo i hope your grant gets accepted soon enough so you can eat!
samcv haha
timotimo i haven't written a comment on it yet ... i always find it hard to write something that doesn't sound thoroughly weird 00:40
already had that problem with jnthn's proposals, and also with pmurias' one
and i think also with brrt's proposal?
anyway, bedtime! 00:45
samcv night :) 00:56
timotimo, hopefully this check works, I think it should, so far have not encountered it when running with roast github.com/samcv/MoarVM/blob/14935...ops.c#L236 01:46
01:48 ilbot3 joined
Geth MoarVM: samcv++ created pull request #574:
Use memmem in string index. Uses Knuth-Morris-Pratt on glibc 2.2x+ faster
01:56
JimmyZ samcv: re MVN, it's typo :) 04:57
05:20 domidumont joined 05:27 domidumont joined 05:31 brrt joined 06:05 domidumont joined
samcv heh 06:13
MVN: Moar virtual normalizer. heh since it's MVN_unicode_normalizer_form 06:14
brrt ohai samcv 06:21
samcv hey brrt :)
it's your morning right brrt ? 06:22
brrt how are things?
yes
samcv good, got some work done today
brrt i just figured out i nuked objdump
samcv nice
brrt well, and now i need it :-P
samcv got regex in perl 6 2.2x faster
brrt because somehow i screwed up something in the LET to DO translation
oh, very nice
samcv github.com/MoarVM/MoarVM/pull/574 using memmem. though it's only fast on glibc platforms
brrt all regex
?
samcv well
if they are both 32bit or both 8bit needle/haystack 06:23
brrt hmm, i think there are some feature macro's you can use
samcv also need to get some pointer subtraction and division checkd by somebody else though
can read the full text of my PR github.com/MoarVM/MoarVM/pull/574 explains pretty well all conceirns things i need comments on 06:24
06:39 brrt joined
brrt sure, no problem 06:43
memmem is a gnu extension, though 06:46
samcv yeah. bsd has it too (and mac os x)
brrt and windows?
samcv windows probably doesn't. but we can always ifdef it and just have it do the same path we do now
brrt that'd be wise 06:47
samcv since if the strings aren't both the same storage type we have to do that on any platform
also please tell me what #ifdef's to use
what do we use on MoarVM for that? or do i need to ask jnthn
brrt let me think about that for a bit
i recall having a decent set of ifdefsā€¦. 06:48
samcv i guess i could grep all the ifdefs
but i think bsd memmem is probably going to be faster than what we currently do, even though it's not fancy and super great like the glibc one
brrt anyway, even-moar-jit has a definition MVM_JIT_PLATFORM_POSIX vs MVM_JIT_PLATFORM_WIN32
but that's not going to help you much
samcv and if it *is* a fair bit faster, we can just implement a basic memmem for windows or something
brrt anyway, what i wanted to say is 06:49
we have 'isolated' the platform-specific bits into src/platform
samcv we could always just have memmem on windows that is our own function
brrt exactly 06:50
samcv yeah
brrt although i'd wrap it in an MVM_memmem in that case
samcv that would be the most neat way to do it
and then ifdef it?
brrt aye
samcv k
brrt iirc the macro _MSC_VER for the microsoft C compiler 06:51
samcv if we use gcc on windows do we have glibc?
brrt .. not sure
samcv idk how windows environment works... maybe only if we build in cygwin? 06:52
brrt what mingw has
or cygwin, yes
samcv i mean we could have a MVM_memmem with a 5th argument, which is in size_t, which denotes what offset the match is needed at 07:07
brrt not sure if i grok 07:08
samcv like for 32bit strings can supply it with sizeof(MVMGrapheme32) and it can know to check if it's a multiple of that thing
brrt aha
samcv ok. so memmem replies back with memory addresses
brrt uhuh
samcv and 8 bits vs 32 bits, and it's possible but rare it could find one inside another
well the likelyhood goes down exponentialy if the needle is longer than 1
but regardless. it should be checked at least. and would be a useful argument i ncase we have a magic futuristic memmem which does it for us and we don't need to continue again 07:09
brrt right, i see 07:10
that could be something, yes
samcv see my comment on the code on the PR. and hopefully somebody can double check that it does what i think. pretty sure it does but 07:11
pointer addition subtraction etc can have weird things
brrt oh, i see 07:12
just cast to (char*), and subtract
iirc that will (officially) give you a ptrdiff_t
samcv i think it got mad at me doing that 07:13
when i did the division
well. the %
brrt o.O
samcv did not like me checking remainder of a char *
and wouldn't compile 07:14
07:14 domidumont joined
brrt okay, thenā€¦ let me check 07:15
samcv go ahead and try it yourself, maybe i did something weird in addition to that 07:16
Geth MoarVM: 4dce22fcee | (Samantha McVey)++ | 3 files
Fix typo MVM_unicode_normalizer_form was MVN (typo)

MVN_unicode_normalizer_form should have been MVM_unicode_normalizer_form. Fix the typo in all references to this function.
07:17
brrt gist.github.com/anonymous/fa28475f...bb8fd48aad <- this just works 07:19
that said
you can check for within-boundaryness simply with
(((char*)mm_return_32) - ((char*)haystack->body.storage.blob32) & 3) == 0 07:21
well, that's if you're on the 4 byte boundary, if you're within the boundary you're nonzero
that said
you probably shouldn't throw in that block 07:22
i mean
hmmmm
memmen wants to find the *first* occurrence, doesn't it?
so you know by definition that there isn't an earlier match 07:23
so you can restart the search at the next grapheme boundary
07:36 domidumont joined
brrt samcv: left my comment on the PR 07:37
samcv thx for comments :-) 07:45
brrt yw :-) 07:46
samcv yeah exactly. see my comment i just posted in replyp
brrt okay, i can live with a memmem-like with an extra argument, but as a rule, you should probably call it something else than memmem
since memmem has no such argument 07:47
memfind, maybe
mempos
samcv ohalso it seems the speed gains are similar on bsd as on glibc
i had somebody bench it
2x faster
brrt \o/
samcv let me check the source for memmem on bsd. i know the glibc would do memchr for short needles 07:49
github.com/st3fan/osx-10.9/blob/ma...D/memmem.c looks like bsd does that too
which is how i was benching it. so makes sense it'd be similar for that
are bsd licenses compatible with artistic 2.0? 07:50
bsd one is much more self contained than glibc one. so could be very easy for us to use for msvc 07:51
brrt i'mā€¦ not sure, and to be on the safe side, i'd not agree with blanket inclusion 07:52
samcv libuv is MIT 07:54
brrt we compile-and-link libuv as a strictly separate library, though 07:57
anyway, it might well be ok
i can't judge
samcv i'm sure i can find a gpl one or just code our own if it's an issue 07:58
our UTF-8 decoder included in MVM source is MIT 07:59
brrt alright, alright :-) 08:02
not a GPL one, we're on the artistic license
samcv well it's important to check though
yeah artistic 2
brrt :-) 08:14
samcv brrt, looks like 25% faster under worstcase on mac
but single char searches are 2x faster
brrt awesome
samcv so 25% to 50% better and 100%better (2x faster) in easiest case for the memmem
brrt, i'm thinking of making MVM_mempos and making so many arguments 08:21
well. adding on also uh. starting position 08:22
or something
so it becomes easy to use it from other functions
brrt hmm, you could do that, but it's not really necessary, the formulation base_ptr + start_position will do what you mean just fine 08:23
samcv you can just give it the pointers to the objects, say what position to start from in haystack (will be in units of block size)
true
but then you have to subtract to get the haystack length
brrt you could also call it mempos_at_boundary or something like that
samcv as well
brrt yes 08:24
samcv at_boundary sounds nice
brrt mempos(void*base, size_t base_elems, void *needle, size_t needle_elems, size_t elem_size) 08:25
or something like that
i mean, hack away :-)
08:47 domidumont joined
samcv ok well i sort of have it. but i have a bug on one line 09:03
09:03 domidumont joined
samcv return ((char *)mm_addr - (char *) haystack)/block_size; # works fine for MVMGrapheme32 but is the wrong number for MVMGrapheme8 09:05
btw the mm_addr i checked and it's the same on the new function vs how the old one was. so the only difference is in this subtraction/division line 09:06
before i had:
(MVMGrapheme8*)mm_return_8 - haystack->body.storage.blob_8;
(MVMGrapheme32*)mm_return_32 - haystack->body.storage.blob_32;
for each of the two types. and that works perfectly. but the division line is not working argh 09:07
brrt hmm, that's a bit odd
samcv yeah 09:08
i pushed my work if you can take a quick look. github.com/MoarVM/MoarVM/pull/574/files i'm gonna go to bed soonish 09:10
timotimo samcv: we could have a quick fail if the needle can't fit in 8bit but the haystack already is; though i don't think it'd be very common 09:11
samcv how can the needle not fit in 8bit
you mean 0 wide? 09:12
memmem already handles that
timotimo no, i mean, if the haystack is known to contain only ascii, and the needle has some esoteric unicode characters in it, we can automatically fail, no? 09:14
samcv oh
but how do we know the needle has that. i guess if it's a small needle?
so we'd have to see what's in the needle
.o(so we can pretend perl6 has super fast regex by testing searching unicode needles of an ascii string) 09:15
we can be faster than perl 5!
it will be great
timotimo i find it worrysome that you claim "most" regex get faster from that patch
samcv why? 09:16
timotimo we only use indexat when we have a scan followed by a literal
samcv well i have tested only literals
timotimo i.e. when you start your regex with a character class, or when it's ignoremark or ignorecase ...
samcv i mean most literal regex 09:17
that have the same content type
sorry i really meant... most indexing
more than 50%
timotimo "anytime it is regex between an 8bit string and 8bit string or 32bit string and 32bit string."
this tripped me up a bit
did you try tux' benchmark with your commits? 09:18
samcv nope
was going to but i forgot how 09:19
timotimo i'll do it now 09:20
samcv kk
just don't check out the most recent commit
the one before most recent on that branch
timotimo i have the one before the "nonworking" one 09:22
samcv perfect
timotimo something's not right here 09:24
all i can see it do in strace is read chunks from the hello.csv 09:25
but it's really slow 09:26
samcv ok
timotimo how does tux run it in like 5 seconds?
samcv it is really slow
timotimo it's already been 3 minutes
samcv idk...
i made the file smaller
timotimo oh, perhaps the /tmp on tux' machine is a tmpfs
samcv mine is tmpfs. is yours?
timotimo oh, it is 09:27
samcv but it is slow. it's not just you 09:29
unless everybody does it wrong. idk
i assumed the README.speed just didn't have the number of iterations it writes the csv file as he actually does
timotimo head -30 /tmp/hello.csv | perl6 $t.pl >/dev/null 2>&1
that's what test.sh does
except the output tux always gets is - i think - 5000? 09:30
samcv idk
i didn't run the .sh
or did i... 09:31
timotimo with 1000 instead of 30, i get only 1.18 seconds in total
ok now i'll drop your branch and see how it changes
samcv ok 09:32
09:32 brrt joined
timotimo not seeing a difference tbh 09:33
i'll check in gdb if it reaches your code 09:34
lizmat fwiw, Text::CSV uses as little regexen as possible
timotimo if it uses nqp::index or similar it ought to get a speedup, too 09:35
lizmat one notable case is Str.split(<a b c>) rather tagn Str.split( /a|b|c/ )
*than
samcv ah 09:36
yeah maybe it doesn't hit that code
lizmat or rather: Str.split($a,$b,$c) rather than Str.split( /$a|$b|$c/ )
I would be surprised :-)
timotimo it'd be neat if we could do memmem but with more than one needle
samcv well at least other code will be faster :)
timotimo i've theorized about that in the past, but i never got anything written, not even a POC 09:37
samcv yes
that would be great
lizmat implementing fixed string multi-needle split was one of the things that made Text::CSV almost 2x faster at one point
and yes, that uses nqp::index 09:38
samcv idk can the kruth morris-pratt be adapted for alternates
timotimo, memmem with strandsssssssss 09:39
that would be neat
timotimo hm, but when we apply a regex to a string, we use indexingoptimized on that string first 09:40
samcv yeah
to collapse it right
timotimo so no strands when we hit your code
yeah
samcv yeah i know that
09:51 brrt1 joined
samcv going to get to bed now. goodnight 10:37
timotimo goodnight! 10:44
12:05 AlexDaniel joined 14:21 brrt joined 14:38 brrt joined
Geth MoarVM/even-moar-jit: 1d16e823c7 | (Bart Wiegmans)++ | 2 files
Add DISCARD node

In order to make e.g. DO nodes whose child nodes yield values tileable
  (as in the new LET-to-DO translation) we add a discard node to enforce
void context on those nodes.
15:08
MoarVM/even-moar-jit: 01aa8f4546 | (Bart Wiegmans)++ | 2 files
Graph log: parameters in node name

This is a more compact display than the previous one. Now all node parameters are logged in the node name, rather than as separate nodes, which emphasizes the (active) node structure better, I think.
brrt i'm thinking there is probably some bug in the compilation of DO nodes 15:09
because the let->do conversion code is actually using it and well, stuff ain't going so well 15:10
16:26 cog_ joined 16:31 domidumont joined 17:08 AlexDaniel joined 17:50 synopsebot6 joined 18:58 spebern joined
Geth MoarVM: 9653dc6aa1 | (Jonathan Worthington)++ | src/6model/reprs/MVMHash.c
Implement serialize/deserialize in VMHash REPR.
19:14
19:28 Voldenet joined
timotimo i'm surprised we never needed that until now?!? 19:29
samcv got a build failure on windows gist.github.com/samcv/0b2abb9da0fa...-moar-L198 19:37
somehow it found gcc which i didn't want to use
not sure what installed it 19:38
git? strawberry perl?
ah looks like strawberry perl did
lizmat timotimo: well, I know some places where we can put BEGIN blocks in the setting now :-) 19:39
hmmm... won't the JVM need something like that as well then? 19:40
jnthn: ^^^
samcv maybe i didn't actually get the c compiler? 19:41
geekosaur looks like strawberry perl is built to use its own gcc install 19:52
samcv yeah 19:53
trying again to get the right msvc thing installed. think should work this time since i went the full thing, not just some compilier which turned out to only have visual basic and C# 19:54
nwc10 jnthn: ASAN barfage. First in a month or so: paste.scsys.co.uk/557802 19:57
jnthn lizmat: Yeah, it probably will. Though at least it's fixed in one place now. 20:19
samcv: I always used ActivePerl when doing MSVC builds, fwiw
nwc10: ooh, that's an intersting one 20:20
Too tired to fix it tonight
But it points very well at the problem
What were you running?
samcv ok using the developer prompt now. things are in path now 20:25
ok now it seems to be ok
gonna nmake now
nwc10 jnthn: make m-spectest6 20:29
and I did think/hope that it pointed straight at a problem, as the two stack traces converged on adjacent lines
timotimo we're not allowed to enter gc from spesh, right? 20:42
that's why that crashed?
jnthn Right 20:43
I thought I'd caught all those spots but apparently I missed one 20:44
timotimo how should we deal with this for a proper fix?
jnthn I'm too tired to know 20:45
It involves inlining, so it's not liable to be simple
Though it may be that we can catch that this would happen in the fixup
And refuse to inline
That's probably the conservative solution 20:46
timotimo ah, sure
another way would be to just abort the whole spesh process and try again later
jnthn That's not likely to change the fact that the wval in question wasn't deserialized yet
It's probably code paths like
timotimo because then we're probably going to have something else having kicked off gc for us 20:47
jnthn die X::Blah.new unless $condition-that's-usually-tree;
*true
So we never end up accessing the wval for X::Blah and it ain't deserialized
timotimo right
is the proper fix to never allocate when inside spesh, or just never actually gc?
jnthn Well, actually the problem is that we acquire a mutex on the SC 20:48
And that acquisition can enter the GC
Because it's a block point and thus a safe point
An alternative is to teach spesh to be GC safe
But that's probably something for spesh2 :) 20:50
timotimo mhh 20:51
samcv cool. ok. got it compiling. now got undefined external symbol memmem cool 20:55
timotimo cool cool 20:56
i can probably hack in the "refuse this if we're in spesh" thing there 20:58
20:58 Geth joined
timotimo we don't really have an "abort" mechanism in place yet, so i'll have to put a bunch of return values everywhere 21:03
ho-hum. at the point where we "fix_wval", we've alreday done a whole lot of stuff, not sure if it can still properly abort here or if i should add a bit to the first pass 21:06
samcv jnthn, it's cool with you if we use the memmem from libc aka freebsd memmem 21:10
nice got it to compile with memmem on windows 21:12
timotimo i can put the check in much earlier when we're actually checking whether an inline will be possible 21:14
so there's already mechanisms in place to refuse to inline
jnthn timotimo: yeah, there's a validation step 21:16
samcv: Yeah, I only had chance to quickly glance what you're doing, but it sounds sensible. :)
So long as we don't bust the build on any of the common platforms :) 21:17
samcv ok cool
yeah moarvm is working fine on windows atm :) with libc's memmem 21:18
compiling nqp and printing all my debug messages. so it's def working... hehe
spew of text
as long as it compiles i guess
then will run the tests
timotimo nwc10: will you be able to test a patch when i push it? 21:19
oh jnthn 21:28
can you set up my appveyor so it gets kicked off when moarvm gets a commit or a pull request?
samcv that would be nice 21:29
timotimo i'm not sure what you'll need
jnthn timotimo: Ask me tomorrow, if you would :) 21:30
timotimo OK
samcv cool nqp spectest pass on msvc. 21:32
timotimo jnthn: i put a little fprintf into the code where we jump to not_inlinable when a wval is dangerous, and holy hell does it come up often 21:33
hm 21:34
my code might be wrong, it doesn't ever hit the else branch here
let's see ...
yes, mea culpa 21:36
ok, cool. it finds a whole lot of harmless wvals and also a few dangerous ones 21:37
Geth MoarVM/refuse_dangerous_inlines: 2aa5c36df3 | (Timo Paulssen)++ | 3 files
refuse inlines when it could cause GC inside spesh

  nwc10++ was able to trigger this.
21:41
timotimo it's kinda sad that an inline that was refused because of this won't ever be re-instated again later on 21:45
i've already wished for being able to spesh frames multiple times in the past
maybe that's along the same lines 21:46
samcv ok it's now much prettier 22:33
github.com/MoarVM/MoarVM/pull/574/...85f2b959R1 added a MVM_memmem function in platform so we can handle the platform specific stuff there 22:34
gonna test the build on windows now, should be good still hopefully 22:35
timotimo want me to walk you through setting up appveyor with a moarvm repo? 22:48
samcv maybe. i guess. 22:51
but yeah i tested it on windows and it all works now on both windows and linux
after some experimenting
timotimo my neck is tensing up and the last time i got that i felt like throwing up for more than one whole day :< 22:56
tomorrow is going to be fun, in the Dwarf Fortress sense of the word :| 22:58