MasterDuke timotimo: do you have any thoughts on ? 00:04
or anybody else for that matter. why is .contains so slow? 00:05
timotimo we might want to make the substring searcher capable of understanding strands 00:07
MasterDuke how is the regex faster than nqp::index? 00:09
timotimo no clue 00:10
MasterDuke m: use nqp; say nqp::index(("abcde" x 100_000_000) ~ "foo", "foo", 0) > -1; say now - INIT now
camelia True
timotimo how exactly is contains implemented? with nqp::index?
MasterDuke m: use nqp; say nqp::index(nqp::indexingoptimized(("abcde" x 100_000_000) ~ "foo"), "foo", 0) > -1; say now - INIT now
camelia MoarVM panic: Memory allocation failed; could not allocate 500000003 bytes
MasterDuke the numbers are 11s and 1.8s on my machine 00:11
so what is the first one doing? it must understand strands somehow, right? 00:12
timotimo i guess? 00:13
you can try a profile with perf or something to see what lines of code are involved
that should give us a clue 00:14
for very artificial cases like this the contains operator could see that abcde contains none of the characters in the needle, so it could skip it immediately 00:16
there are interesting edge cases involving the needle being longer than the individual strands (before repeating)
so that you can find "aaaaaaaaaaaaaa" in "a" x 100 00:17
MasterDuke yeah, you couldn't just check that the needle wasn't in a strand, because it could match across repetition boundaries 00:18
timotimo yup 00:19
and across strand boundaries as well of course
and across two different repeated strands
"a" x 2 ~ "b" x 2 contains "ab" and "aabb" "interestingly"
MasterDuke 00:20
timotimo is this the fast one? 00:21
MasterDuke slow
this is the fast one
the code is in the gist description 00:22
samcv: any thoughts about the above? ways to make .contains faster without using 36x the memory it does now? 00:26
samcv how far should i read in the backlog 00:27
well you are using a repeated string 00:28
as i recommended before try comparing it with a nonrepeated string...
so write some string to a file then read it and concat it together (nonrepeat) 00:29
so read it to $var. then do $var ~ $var
then try doing contains on it
then compare
MasterDuke pretty much the same time 00:32
'(("abcde" x 100_000_000) ~ "foo").contains("foo"); say now - INIT now' took 9s 00:34
timotimo wow, the time to index the string doesn't even show up at all
MasterDuke 'my $a = "string.txt".IO.slurp; my $n = now; say ($a ~ $a ~ "foo").contains("foo"); say now - $n' took 7.8s
timotimo wait, is memchr a string search?
MasterDuke where string.txt contains "abcde" x 50_000_000 00:35
timotimo can you try what happens when you have abcdeof as the string? 00:36
because then it's not as trivial to find the o it's looking for by "gallopping" 00:37
though the slower case should also be gallopping 00:38
MasterDuke no repetitions? or no "foo" in the haystack?
oh, i see what you mean
adding "of" took it from 9s to 14.6s. adding "gg" took it from 9s to 12.6s 00:40
adding "fo" took it from 9s to 19s 00:41
samcv memchr is a search
timotimo that's clearly that optimized algorithm that jumps as far forwards as it can from every mismatch
MasterDuke so in the non-indexingoptimized version, is it just joining a strand at a time as it's indexing through and then discarding them? is that how the memory used doesn't increase? 00:44
timotimo i don't think it does it like that 00:46
samcv i mean it'd be cool if when indexing a repeated string we could not flatten it and just return that it's not in the repeated string :P
and destroy your example 00:47
but we call the indexed flattening argument so it'd flatten before indexing i guess
timotimo ? 00:48
samcv well if we know it's "abcde" repeated we can just say no "foo" isn't in there
by checking the string x 2 00:49
timotimo not only the string x 2
also the preceding and following string
and the string x many_more_times if the needle is long
samcv oh. i didn't see that my bad
yes of course
i'm just throwing out ideas
timotimo i know what's up 00:50
the regex engine most probably also turns the "foo" that we're searching for into a 32bit storage thingie
so it matches with the other string
samcv why would it do that 00:51
i have seen no evidence it does
timotimo i see it in callgrind
samcv oh
why does it do that. the string we're searching is not 32bit 00:52
timotimo from MVM_string_index it calls into MVM_memmem 385 times in the regex one
samcv: i seem to recall you put in code to change the storage of the needle if it benefits the search? 00:54
hey wait wtf 00:56
maybe i was looking at it wrong
huh. 00:57
samcv timotimo, i was going to
well i did in a branch of mine
timotimo ah
samcv but there were some issues. i will revisit it though
timotimo the 32bit/32bit one might have been a red herring
there's also many 8bit/8bit occurences 00:58
and between the regex match and the contains call there's no difference in the 32bit/32bit ones by count
samcv yeah i'd think it'd be 8bit and 8bit 00:59
but i could always be wrong. that's what i'd expect though
timotimo i would have expected that, too
MasterDuke i thought it was hard to get strings to be 8bit? 01:00
timotimo just can't have newlines in 'em
MasterDuke ah
timotimo actually 01:01
do we have signed char storage?
samcv timotimo, it can have newlines though
MasterDuke replacing one of the chars in the haystack or the needle doesn't change the time 01:02
*with a "\n"
samcv yeah you can have newlines
timotimo OK
newlines are synthetics, that's why i was thinking that
samcv \r\n are. and you can have those too :P
you can have negative integers 01:03
MasterDuke nor does replacing a char with ā€
samcv it's 8bit signed
timotimo that was what i was worried about :) not having signed storage there
samcv MasterDuke, well you're concatenating it
it's not going to do anything to the repeated stuff until you normalize it
err flatten it
with the optimized indexing op
MasterDuke so how does nqp:index work without doing the flattening? 01:04
samcv yes
MasterDuke that's what i want to make faster
samcv if both are 8bit or both 32bit and has no strands (i.e. it's a single string without strands like would be created from concatenation)
timotimo callgrind is taking its sweet time
samcv then it uses memmem
otherwise it iterates graphemes one by one
when you call MVM_string_indexing_optimized it will flatten the strands into only one string, and if all codepoints are not higher than can fit in 8bit signed then it converts it to 8bit as well 01:06
so it converts it to 32 bit, then if all were lower than the max for 8bit signed then it then converts to 8bit after making the 32bit flattened string
MasterDuke making the need "cdefoo" takes it from 9s to 20s
*needle 01:07
samcv would it be better to convert to 8bit, and then if we see any too big codepoints we ditch the 8bit flattened thing and put it into 32bit
instead of having to convert to 8bit afterward
MasterDuke ah, so any faster way possible than iterating grapheme by grapheme?
samcv if we assume 50% 32bit and 50% 8bit resulting strings then it'd be faster 01:08
what do ya'll think about that?
timotimo all the work is happening inside MVM_string_index_from_end 01:10
samcv for which example
timotimo that's what calls get_grapheme_at_nocheck 500 million times or whatever
samcv which work?
when we do nqp::contains? or the regex one
timotimo indexingoptimized + contains
the number of 32/32 memmems is the same but the one for 8/8 differs 01:11
the slow one goes for 8/8 memmem 642 times
the fast one does 1012 times
samcv timotimo, what do you think of creating an 8bit string and then if we find a too high/low codepoint ditching it and creating a 32bit string 01:12
instead of starting out creating a 32bit string
it would save time if we're only creating an 8bit string
timotimo i've wanted to write code like that
samcv should i do that right now?
timotimo dunno 01:13
can you tell me why one has index_from_end and the other doesn't?
samcv which one what's the difference in code?
timotimo one is ~~ /foo/ the other is inedxingoptimized + .contains 01:14
samcv ah. well idk the regex perl6/nqp code it must be calling it 01:15
also i can de-duplicate code here 01:16
remove the code in MVM_string_indexing_optimized and just call collapse_strands
and i'd already made collapse_strands slightly faster at least one point in time. not that much but was a tiny bit 01:17
timotimo something is emitting nqp::rindexfrom
does collapse_strands also create a new object? 01:18
in what place were you hoping to build 8bit strings until we find a high codepoint? 01:22
like decoding strings or something? 01:29
samcv well that's what MVM_string_indexing_optimized does though 01:37
well except it builds 32 bit. and then once the 32 bit is made turns that to 8 bit.
timotimo oh ok 01:38
samcv to start with i'm deleting all the code in MVM_string_indexing_optimized and having it call collapse_strands since the code is mostly the same 01:39
timotimo i'll try to get some sleep. it's really hot again today :|
samcv except without my previous minor optimization i made to collapse strands 01:40
timotimo good luck!
BenGoldberg Why not attempt to make a 16 bit string? 01:55
samcv you mean if it fits in 16 bits but doesn't fit in 8? 02:03
well we don't do that. but i don't see any reason we couldn't though it would slow down operations between 16bit and 8bit or 16 bit and 32bit strings 02:04
so would reduce even more the chances of strings having matching types
timotimo, pushed the deduplication code to master now. will work on making it faster to create an 8bit string now 02:05
ok so 'use nqp; say nqp::index(("abcde" x 100_000_000) ~ "foo", "foo", 0) > -1;' i got 7.7->7.4 seconds and that's not even using the indexing optimized function 02:13
that seems faster
MasterDuke, the optimized version is down to 0.8s with my changes 02:14
down from 1.13780546 02:15
probably uses a ton less memory too
you still around MasterDuke?
yay now i'm getting corruption under certain conditions. yay. gotta love that 02:37
with the code i'm still working on that is. not the one i pushed
MasterDuke samcv: i'm not seeing much difference in speed, but it's now using 2491396maxresident instead of 2508864maxresident (a saving of 17468) 03:00
hm, but that number varies across runs 03:02
that was the lowest value i've seen, most are around the same as before 03:11
samcv what i pushed should only have like a 5% speed improvement or maybe a little less. but only on creating 32 bit strings. i am working on a version that will be faster 03:30
but i'm getting corruption in some cases
this for example: perl6 -e 'EVAL"Ā« one #`[[[comment]]] two Ā»"'
not sure why
oh. actually from less
seems to trigger much more. heh 03:31
m: 'Ā»'.ord.say
camelia 187
samcv yeah it seems to mess up the data somehow. if i even just type in any high codepoint characters into the REPL it'll die 03:32
ooooooooo i figured out what i did wrong 03:34
heh. i did MVMGrapheme8 g = MVM_string_gi_get_grapheme(tc, &gi); so it always was under 8 bit since i put it into an 8 bit integer 03:35
no wonder things went horribly wrong. so the can_fit_into_8bit check never triggered
seems to work fine now 03:36
MasterDuke, let me share with you my patch onto MoarVM master
MasterDuke, apply this patch on master and let me know if it seems better to you 03:37
should use a lot less memory too i'd think 03:40
well or it'll be about the same in case you did like a string of all ascii and then added a unicode character to the very end of it. in all other cases should be faster 03:41
seems to go from 1.1488307 seconds to 1.0257282 for this test: 'use nqp; say nqp::index(nqp::indexingoptimized(("abcde" x 100_000_000) ~ "foo"), "foo", 0) > -1; say now - INIT now' 03:43
well it gives me the same number of seconds running spectest so it can't negatively impact performance too much 03:58
BenGoldberg If perl6 starts to become popular in asia, I strongly suspect we'll want 16 bit versions of strings, in addition to 8 and 32 bit. 04:00
samcv can those be fit in 16bits?
BenGoldberg Most of them.
samcv well i guess we can always just change the type of the needle when trying to index them or whatever 04:17
though doing the same thing but it's perl6 -e 'use nqp; say nqp::index(nqp::indexingoptimized(("abcde" x 100_000_000) ~ "foĀ»"), "foo", 0) > -1; say now - INIT now' 04:23
goes from 4.5s to 5.7
though 100 million is a lot of codepoints 04:24
ok was able to reduce it down to 5s from 5.7 04:31
ok cool. making a PR for this change so jnthn can take a look at it 04:48
PR: 04:49
[Tux] This is Rakudo version 2017.07-12-g8e960522e built on MoarVM version 2017.07 06:37
csv-ip5xs 2.631
test 12.421
test-t 4.110 - 4.176
csv-parser 12.057
lizmat Files=1215, Tests=65561, 221 wallclock secs (13.37 usr 4.99 sys + 1339.81 cusr 140.12 csys = 1498.29 CPU) 07:53
yoleaux 07:19Z <ab6tract> lizmat: i see :)
samcv anyone here want to add any comments to this post: 08:55 err this is the link
suggestions on what 6.d's codename should be
stmuk_ String::Koremutake of the git SHA ! 09:40
samcv hmm? 10:25
i don't understand
stmuk_ its an algo which generates pseudo-Englishish words from random chars 10:28
samcv is it determinalistic? 10:29
stmuk_ yes 10:49
mr_ron rakudo: for (50_000, 100_000) -> $limit {my $x; my $y = "x" x 100; $x ~= $y for (1..$limit); say "$limit: ", now - ENTER now} 13:56
camelia 50000: 6.187919
MoarVM panic: Memory allocation failed; could not allocate 6470200 bytes
mr_ron rakudo: for (50_000,) -> $limit {my $x; my $y = "x" x 100; $x ~= $y for (1..$limit); say "$limit: ", now - ENTER now} 13:57
camelia 50000: 6.19400217
mr_ron rakudo: for (50_000, 100_000) -> $limit {my $x; my $y = "x" x 100; $x ~= $y for (1..$limit); say "$limit: ", now - ENTER now}
camelia 50000: 6.2410434
MoarVM panic: Memory allocation failed; could not allocate 6470200 bytes
mr_ron rakudo: for (50_000, 75_000) -> $limit {my $x; my $y = "x" x 100; $x ~= $y for (1..$limit); say "$limit: ", now - ENTER now}
camelia 50000: 6.22447996
MoarVM panic: Memory allocation failed; could not allocate 6476500 bytes
mr_ron rakudo: for (25_000, 50_000) -> $limit {my $x; my $y = "x" x 100; $x ~= $y for (1..$limit); say "$limit: ", now - ENTER now}
camelia 25000: 1.6497031
50000: 5.4814596
mr_ron rakudo: for (75_000) -> $limit {my $x; my $y = "x" x 100; $x ~= $y for (1..$limit); say "$limit: ", now - ENTER now} 14:01
camelia MoarVM panic: Memory allocation failed; could not allocate 5840200 bytes
mr_ron rakudo: my $x = "x" x (100 * 75_000); say $x.chars 14:02
camelia 7500000
mr_ron rakudo: my $x = "x" x (100 * 750_000); say $x.chars 14:06
camelia 75000000
mr_ron rakudo: for (50_000) -> $limit {my $x; my $y = "x" x 100; $x ~= $y for (1..$limit); say $x.chars; say "$limit: ", now - ENTER now} 14:13
camelia 5000000
50000: 6.2035536
mr_ron On my system 100_000 can run 10 * as slow as 50_000 and 150_000 gets killed by OS. Unless objection new RT ticket will likely have subject similar to 'likely memory management failure with string concat' 14:17
Geth nqp/master: 6 commits pushed by pmurias++ 14:34
travis-ci NQP build failed. pmurias 'Add test for nqp::spawnprocasync' 14:50
Geth nqp/async-await-continuations: 9a378ee133 | pmurias++ | src/vm/js/Compiler.nqp
[js] Refactor out a bit of shared code into a sub
nqp/async-await-continuations: 682c859263 | pmurias++ | 4 files
Merge branch 'master' into async-await-continuations
nqp/async-await-continuations: 282e72fd3f | pmurias++ | 6 files
[js] Start sprinkling await/async everywhere

Passes some tests with cross-compiling nqp.
pmurias looks into travis fail...
ugexe we never unfudged the jvm stuff for proc::async 15:42
Geth nqp: 2535304767 | pmurias++ | src/vm/jvm/runtime/org/perl6/nqp/runtime/
[jvm] Make stub a noop rather than have it do a wrong cast
pmurias ugexe: you mean in the roast tests?
ugexe pmurias: yeah... so i dont know that it actually works for jvm. just that it exists for it (as of 2017.060 15:44
jnthn It's implicitly covered in that precompilation uses it, and I made sure it worked that far on JVM 15:46
travis-ci NQP build passed. pmurias '[jvm] Make stub a noop rather than have it do a wrong cast' 15:58
Geth nqp/async-await-continuations: 2f7276110d | pmurias++ | 3 files
[js] Fixup some convertions to primitive types in async/await mode

Some grammars now work
rakudo/nom: 710fa80004 | (Elizabeth Mattijsen)++ | src/core/Rakudo/
Create iterator early, let it serve as check for emptiness
rakudo/nom: 2dd5963cb3 | (Elizabeth Mattijsen)++ | src/core/
Make sure Setty at least have a R:I:IterationSet type object
Zoffix hack seems ded? My bot's not responding and I can't ssh to give 'em, the boot 19:02
Undercover: help
Undercover Zoffix, Use cover: trigger with args to give to sourcery sub. e.g. cover: Int, 'base'. See
Zoffix huh. Well, this one is, but sourcebaby's ded 19:03
can ping but not ssh 19:05
lizmat perhaps moritz can check ? 19:07
Geth rakudo/nom: ab08bd04a4 | (Elizabeth Mattijsen)++ | src/core/Rakudo/
Make R:I:Mappy roles also take IterationSets

As a preparation until all Hashes take IterationSets, or some other HLL construct on a low level data structure, so that we can pass them along as parameters without being automagically upgraded to HLL Hashes.
rakudo/nom: 250ae1026c | (Elizabeth Mattijsen)++ | src/core/
Make Setty.values use R:I.Mappy-values directly

Because the R:I:Mappy role now takes IterationSets
timotimo can't run virt-manager o_O 19:18
huh, so "console" is not the command that'd let me see what's up 19:23
nine timotimo: dmesg is usually worth a look 19:31
if the host is affected, too
timotimo on the master?
it's usually the virtual disk going poof
nine well you didn't say what's actually keeping from running virt-manager ;) 19:32
timotimo oh just python-requests being a piece of total shit on fedora
nine what about virsh? 19:33
timotimo i used that on the remote host
dogbert17 lizmat: I added some SetHash stuff to
timotimo then i did "console 11" (for hack) and got a dead terminal
Geth roast/master: 4 commits pushed by (Jan-Olof Hendig)++, lizmat++ 19:34
timotimo anyway, done. 19:35
dogbert17 lizmat: should I close the original issue in RT ?
lizmat which one was that again? 19:36
dogbert17 RT #130366
lizmat dogbert17: yes please 19:37
dogbert17 done 19:38
lizmat: are you done with your Set/Bag/Mix work now? 19:41
lizmat I hope to be by the end of the week
most of the performance bits have been dealt with, though
dogbert17 lizmat++ 19:42
lizmat maintenability bits still to do
Zoffix Thanks. 19:43
dogbert17 I'm wondering if the merged tests cover RT #131241 as well or do we need more? 19:44
lizmat I don't think RT #131241 is covered yet :-( 19:46
dogbert17 where has synopsebot6 gone ...
guess I'll add them as well then :) 19:47
timotimo, you around?
lizmat dogbert17++
Zoffix there 19:48
RT #131241
synopsebot6 Link:
dogbert17 Zoffix++ 19:49
timotimo dogbert17: o/ 19:54
Geth rakudo/nom: b7953d0dd1 | (Elizabeth Mattijsen)++ | 4 files
Make R:I:Mappy* roles use a more abstract name for lowlevel hash

To prevent cognitive dissonance between Map's $!storage and the iterator's more generic copy of that, which is now also used by QuantHashes.
ugexe m: say"0.1") ~~"0.1.1+"); say"0.1.0") ~~"0.1.1+"); # Is this expected? I'm adding .0 parts to even the length to get the result I want, but wondered if this should already work like that 20:06
camelia True
dogbert17 m: my $b = <a b b c c c>.MixHash; $_ = -1 for $b.values; dd $b # lizmat 20:07
camelia MixHash $b = ("b"=>-1,"a"=>-1,"c"=>-1).MixHash
lizmat dogbert17: what about it ?
looks ok to me ? 20:08
m: my $b = <a b b c c c>.BagHash; $_ = -1 for $b.values; dd $b
dogbert17 so negative weights for mixes are ok?
camelia BagHash $b = ().BagHash
lizmat yup, any non-zero Real value
dogbert17 but not for bags?
lizmat nope, bags only take natural numbers (as in Ints > 0 ) 20:09
dogbert17 what about SetHashes?
lizmat SetHashes take only Bools
m: say "foo" if -1
camelia foo
lizmat any non-zero values is considered true 20:10
dogbert17 ok, I'll continue then, was scared for a sec :)
lizmat hehe,... good that you check :-)
Geth rakudo/nom: d9055e80fe | (Elizabeth Mattijsen)++ | 4 files
Retire R:Q:Quanty in favour of R:I:Mappy
timotimo now actually around 20:17
dogbert17 timotimo: Zoffix beat you to it, synopsebot6 was down 20:22
timotimo oh, it doesn't start up on boot? 20:28
Geth roast: dogbert17++ created pull request #286:
Add tests for RT #131241
synopsebot6 Link:
Geth roast: f7a57d4bd5 | (Jan-Olof Hendig)++ | 3 files
Add tests for RT #131241
roast: c2718dc4ec | lizmat++ (committed using GitHub Web editor) | 3 files
Merge pull request #286 from dogbert17/test-rt-131241

Add tests for RT #131241
synopsebot6 Link:
synopsebot6 Link:
travis-ci Rakudo build failed. Elizabeth Mattijsen 'Make sure Setty at least have a R:I:IterationSet type object' 20:50
buggable [travis build above] āœ“ All failures are due to timeout (0), missing build log (0), GitHub connectivity (1), or failed make test (0).
Geth rakudo/nom: 923c32e688 | (Elizabeth Mattijsen)++ | src/core/Rakudo/
Introducing R:Q.RAW-VALUES-MAP
rakudo/nom: 5b6cd4062c | (Elizabeth Mattijsen)++ | src/core/
Various Setty stringification improvements

  - based on a ^100 .Set
  - .Str about 2x faster
  - .perl about 2x faster
  - .gist about 1.4x faster *and* sorted: a long wish of TimToady
lizmat grrr --ll-exception is borked ? 21:14
ugexe yeah 21:15
AlexDaniel all *ables are down and nobody told me :) 21:19
AlexDaniel totally forgot to start them again after stuffā€¦ 21:20
Zoffix I thought another nqp commit fixed it? :/ 21:47
Geth nqp/dump_nqpmatch_seen_hash: 1d72de1b7a | (Timo Paulssen)++ | src/QRegex/Cursor.nqp
annotate values with an id and omit full value next time
Zoffix wonders if `perl --gen-moar --gen-nqp --backends=moar; make; make test; make install` is the quickest way to rebuild for nqp changes 22:15
Actually that don't even rebuild it :/ 22:18
timotimo it only rebuilds if the nqp you have isn't new enough 22:20
i.e. you'd have to manually bump the tools/build/*REVISION
Zoffix I'm just trying to do some debugging. What's the fastest process to see what my changes have done? 22:21
timotimo cd nqp; make install; cd ..; make clean; make install
Zoffix Thanks
m: use nqp; dd nqp::isnull(nqp::null_s) 22:25
camelia 0
Zoffix I see. Zoffix-- for breaking --ll-exception in the release
timotimo ah 22:27
Zoffix hm. seems that's not the only problem :/ 22:39
this build time is a killer :/ 22:41
timotimo it's pretty bad, yeah 22:43
used to be much worse
if we had the build speed of three years ago with the size of core setting of today, you'd be looking at like 10 minutes :P
Zoffix :o 22:44
timotimo just a guess, though 22:46
Zoffix *phew* I'm vindicated :) The problem is still there even with my modifications removed :) 22:50
Zoffix digs deeper
Right. A puzzle for another day. 23:22
.tell MasterDuke This commit isn't working because $error is BOOTException, not a Perl 6 exception and it don't got .message. We need some sort of a different solution.
yoleaux Zoffix: I'll pass your message to MasterDuke.
Geth rakudo/improved-version-accepts: 865b2aef43 | (Nick Logan)++ (committed using GitHub Web editor) | src/core/
Improve Version smart match with different lengths

Passes spectest, but changes the behavior of the following to pass:
use Test; nok v6 ~~ v6c; nok v6 ~~ v6.d; nok v6 ~~ v6.c+; nok v6 ~~ v6.d+'; done-testing;
rakudo: ugexe++ created pull request #1118:
Improve Version smart match with different lengths
Zoffix calls dibs on RT#131767 23:41
Why do we have a billion tickets reporting which tests are todoed? 23:43
"196 matches". Well, that makes it trying to find a ticket for Test.pm6's todo() rather hopeless :/