timotimo ops.c is only 1.9k lines, you have to look in string/ops.c 00:18
and it seems like there's no function or anything that does the thign
just a bunch of functions that have a "can_fit_into_*" variables that it updates as it goes through string data 00:19
MasterDuke: ^
or perhaps what you're looking for is actually the storage type that's specified in the string body? 00:21
timotimo goes to bed 00:22
00:33 pyrimidine joined
MasterDuke timotimo: ah, string/ops.c makes a lot more sense. but yeah, string->body.storage_type might be good enough, trying now 00:48
timotimo depending on what generated the string, you may end up with a storage type that's too wide to store the actual data 00:49
which isn't bad per se, of course
MasterDuke too wide? 00:50
timotimo using 32bit per grapheme when we could be using 8bit per grapheme instead 00:53
MasterDuke right 00:55
timotimo good luck! 00:57
*disappears*
MasterDuke what's the difference between MVM_STRING_GRAPHEME_ASCII and MVM_STRING_GRAPHEME_8? 01:15
japhb ISTR ASCII in Moar really means *ASCII*, which is to say, high bit 0. 01:16
But I may be misremembering, of course.
MasterDuke so a string with storage_type MVM_STRING_GRAPHEME_8 could have chars that aren't actually ASCII? 01:17
01:26 pyrimidine joined
japhb MasterDuke: Yes, like Latin-1 01:40
diakopter ANSI 01:41
japhb: you're a codepage
japhb diakopter: You're a grapheme combiner 01:42
diakopter japhb: you're a non-breaking space 01:44
get it?? because you're a googler who never takes breaks
MasterDuke hmm, that's disappointing 01:45
diakopter japhb: I mean, I'm kidding, that's just the stereotype
01:48 pyrimidine joined 01:50 ggoebel joined
MasterDuke how do i catch exceptions/errors in moar code? 02:09
02:11 pyrimidine joined 02:48 ilbot3 joined 03:44 pyrimidine joined 04:16 pyrimidine joined
japhb diakopter: I take breaks! See, I just took a long one. And I'm not working now, either. ;-) 04:28
diakopter aha 04:34
04:42 pyrimidine joined
samcv this function should clean up the code a lot for UCD generation github.com/samcv/UCD/commit/48588c...e7efebcR28 05:22
well put all the C code in a snippets folder and make it trivial to select which ones to return back
the second two arguments are not required, and if just supplying one word will just concat all the files in that folder 05:26
05:46 pyrimidine joined 05:58 pyrimidi_ joined 06:45 brrt joined
brrt good * #moarvm 06:46
samcv o/ brrt 06:47
brrt \o samcv 06:48
what's up
samcv nm just UCD stuff
06:54 domidumont joined
Geth arVM/even-moar-jit: bc50d0fc53 | (Bart Wiegmans)++ | docs/jit/plan.org
Update plan for CALL and ARGLIST compilation

Needed to implement function calls, but actually rather complex.
07:00
arVM/even-moar-jit: bf3765f580 | (Bart Wiegmans)++ | 3 files
Count number of ARGLIST refs

Because an ARGLIST node can have more than 3 refs, we need to count them during tiling so they can be allocated in one step in determine_live_ranges.
07:00 domidumont joined
brrt alright. unicode seems hard tbh 07:01
samcv not as hard as JIT compilation I would think 07:06
brrt nah, that's just something that you have to keep chipping at 07:16
the core process - writing a bunch of bytes - is really simple
the tricky bit is that these bytes should be correct, and should be reasonably fast
i should say: should translate to reasonably fast bytecode 07:17
but even then, the process is mostly straightforward as long :-)
i meant to say, as long as you have a clear idea on what to do 07:18
07:36 geekosaur joined 08:08 brrt joined 08:36 zakharyas joined
samcv what is the MVM op to uppercase a MVM string? 09:56
i need to uppercase the string we get for MVM_unicode_lookup_by_name 09:58
brrt, ? 09:59
brrt ehm, i'll have a look for you 10:00
samcv thanks :)
brrt MVM_string_uc(MVMThreadContext *tc, MVMString *s); src/strings/ops.c line 827 10:01
OP(uc): src/core/interp.c line 1528 10:02
samcv nice 10:05
and it uppercases the MVM string itself right 10:07
no need to use the return value
hmm seems it returns a MVMString 10:08
kk.
timotimo o/ 10:10
MVMString is immutable
samcv kk
timotimo, will MVM_string_ascii_encode lose any non ascii chars? 10:12
there's one emoji sequence the flag for some country has a diacritic
maybe would be best to strip the diacritic eventually 10:13
in the hash table
timotimo it'll probably complain about out-of-range codepoints?
.u dice
samcv hm
yoleaux2 No characters found
samcv well let me try
timotimo .u die
yoleaux2 U+2680 DIE FACE-1 [So] (⚀)
U+2681 DIE FACE-2 [So] (⚁)
U+2682 DIE FACE-3 [So] (⚂)
timotimo m: say "⚀".encode('ascii')
camelia rakudo-moar e5ca5c: OUTPUT«Error encoding ASCII string: could not encode codepoint 9856␤ in block <unit> at <tmp> line 1␤␤»
samcv m: "\c[Åland Islands]".say 10:14
camelia rakudo-moar e5ca5c: OUTPUT«===SORRY!===␤Error encoding ASCII string: could not encode codepoint 197␤»
samcv heh
m: "Å".ords.say
camelia rakudo-moar e5ca5c: OUTPUT«(197)␤»
samcv m: "Å".uniname.say
camelia rakudo-moar e5ca5c: OUTPUT«LATIN CAPITAL LETTER A WITH RING ABOVE␤»
samcv the emoji sequences are not fully official which is why they're not all ascii and uppercase like the other ones 10:15
well. they're official, but don't fit the requirements of the UCD
they're not guarenteed never to change, but i doubt they would 10:16
ok sweet it's working now :) 10:21
timotimo cool
samcv well maybe not the non ascii chars. but the uppercasing. so now it's all case insensitive
and now this works: "\c[family: man woman girl boy]" 10:22
timotimo hm, did we reach a conclusion as to whether we'll use base40 for our shift-tables?
samcv i think we should
it's 1/3 the space 10:23
timotimo isn't it 2/3?
samcv less
timotimo you want to pack them together, too?
samcv because of the shift level 1 + not storing pointers to every string in the char * whatever[] thing
what do you mean pack? in one data structure? yes 10:24
timotimo will we linear-scan through the shift-one table then?
samcv what do you mean by that
but i think no 10:25
i mean. it's just an array of strings
timotimo well, right now we can just strcpy shift_one_table[second_codeme] and be done with it
samcv yeah
timotimo oh, i'm talking about compressing the table itself
we're definitely going to use the table to compress the name data
samcv oh shift level 1? 10:26
timotimo yup
samcv well 40 names, so we could do that
not sure how much we'd save
timotimo that's what i was wondering about :) 10:27
how do these sequences work, btw? do we have a big table of strings to lists of codepoints for them? like family: * * * *?
samcv uh like 480bytes
well less
if we have pointers to each one 10:28
timotimo oh, that's not worth much
samcv but we'd save about 160
if we didn't have to have pointers we could save 480bytes
vs 160
pointers take a lot of space :P
timotimo they do
samcv it would be nice if we could compress them 10:29
since they're static data, let's say it all gets loaded into some range of memory, just compress them down to be an offset
savings would depend on how closely the data is able to be packed together based on available memory contiguity 10:30
jnthn morning, #moarvm
samcv morning jnthn
timotimo jnthning, #jnthn
samcv PR coming your way shortly
jnthn :)
samcv: about Prepend, I guess I was saying I'd expect us to tweak how we generate NFG_QC so that a Prepend is marked as False 10:33
samcv no it is marked as false 10:34
jnthn Oh
samcv but it comes _before_ the other character
jnthn Sure
samcv and we have to save state to make it work
with Extend we backtrack
though i guess prepend we could.. forward track?
still seems less than idea
ideal i think.
i had thought about that before. and thought saving state was much cleaner 10:35
timotimo yeah, saving state is probably the right way. otherwise you'd have to have some code to make sure everything's fine when you don't actually have the next character available yet 10:36
samcv and for emoji and stuff we really need it too 10:37
once we see an Emoji_Base we need to check the following and make sure it's a sequence, or same with regional indicators that come in twos or threes
jnthn Heh, so the sentence in TR 29 about being able to determine should_break based on just the immediately surrounding 2 chars really is a leftover from a simpler time... 10:40
jnthn feared so from reading it 10:41
timotimo >_<
samcv heh
does it still say that jnthn ? 10:42
or was that back in unicode 8.0
jnthn No, that sentence survived into unicode 9.0's TR 29 too
samcv where is it?
jnthn But seemed to me to be contradicted by the rules above it
samcv oh. uhm 10:43
that is for basic grapheme clusters
not extended grapheme clusters
which are fancier
but we don't want to be underachievers :P
jnthn heh, sitll there
unicode.org/reports/tr29/#Grapheme_...Boundaries
samcv i'm on that page
jnthn "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters. They can also be transformed into simple regular expressions." 10:44
samcv immediately adjecent
extended
;)
jnthn :P
timotimo weird. NQP_NFA_DEB on this RT by bdfoy causes it to crash inside dentin saying "repeat count (-4) cannot be negative" 10:45
samcv yeah "default Unicode grapheme clusters"
timotimo how does that even happen %)
samcv that means the old school ones
dentin?
jnthn In Unicode 8 you really could just do it correctly on the surrounding ones even for extended.
samcv like the part of your teeth?
jnthn But yeah, fair point, you can read it that way :)
timotimo dentin is the opposite of indent
samcv hahaha
jnthn Still, grumble. :) 10:46
timotimo larry wrote that code, he likes puns a whole lot
samcv XD
timotimo i mean, collectively puns are very beloved by the perl6 community, but larry might just be The King Of The Puns
samcv but uh. dentin means dent + in 10:47
so that part doesn't work. but i see how it's reversed..
well the origin of the word, in teeth literal translation
or inside of
link to RT? 10:48
timotimo indent and dentin are functions the debugging code in the NFA builder and optimizer uses to make pretty shapes in the output
samcv pretty shapes?
timotimo rt.perl.org/Ticket/Display.html?id=130637
samcv also explain what NFA is
timotimo charrange 8 -> -1
addedge 8 -> -1 CHARRANGE
addstate 9
addedge 8 -> -1 CHARRANGE
addstate 10
samcv i have heard abotu it but i stil don't know what it does
timotimo ...regex_nfa returns 10
we use nondeterministic finite automata to decide LTM
samcv kk 10:49
timotimo deterministic finite automata are really simple state machines that get fed one character at a time and at the end say "yea" or "nay"
nondeterministic finite automata are a bit more complex, because they are in a suporposition of states, and at the end they say if any states are yea, or if all states are nay
samcv nondeterministic finite automata is pretty opaque 10:50
of a word
timotimo it's what you have to feed into wikipedia to find out what it is ;)
the reason why it also says "finite" in there is, of course, because there's also infinite versions of both of these
jnthn Oh wow, this is a fun/evil bug: www.mono-project.com/news/2016/09/1...64-icache/ 10:57
brrt: ^^ maybe interest you :)
brrt oh, i'll tcheck it out
thanks
timotimo oooh, nasty 10:59
‘So THAT's what it is, thank you! Same issue in PPSSPP, we "solved" it with terrifying amounts of padding.’ 11:01
samcv that's interesting 11:03
11:08 zakharyas joined 11:13 pyrimidine joined 11:25 pyrimidine joined
Geth arVM: samcv++ created pull request #511:
Make getting Uni seq/cp's case insensitive and add seq from NamedSequences.txt
11:41
samcv there we go :)
full spectest pass
✔️️ 11:42
jnthn samcv: We won't ever get collisions between codepoint and seq by making them case-insensitive? 11:56
samcv no we shouldn't
and if we do. then that is fine because we check codepoint first 11:57
which are more canonical
than the emoji ones. but i don't expect the emoji to ever change
oh i don't know if i put in the comment of the thing
that i strip the commas 11:58
samcv checks
nope. i forget which standard has the commas which is what we follow. 11:59
it was like ISO something probably
timotimo an ISO standard about unicode sequences? o_O 12:00
samcv no 12:01
not sequpences
m: say "\c[BOY, GIRL]"
camelia rakudo-moar 483e4f: OUTPUT«👦👧␤»
samcv that just is those two codepoints after each other
timotimo oh!
samcv neat huh
samcv needs to finally write a unicode page for the docs...
jnthn Reviewed it. 12:03
timotimo docs are rolling-release, our unicode stuff gets pushed to users with every release, so you can do docs stuff whenver you want :)
samcv jnthn, yeah i had noticed that too, will make that change 12:04
timotimo, :) 12:05
but if the users don't know what they can use, is that not the same thing :P
timotimo that's a good point
you can write the docs whenever you want and merge it as a pull-request when the release happens
jnthn lunch & 12:06
samcv hah
timotimo maybe we should at some point start building multiple versions of the docs page. one for the latest star, one for the latest release, one for blead?
of course we'd need some good tooling to make that possible
samcv but yeah i wanted to wait until i had case insensitive working before adding to docs ( at least for emoji seq.) also wanted to add the NamedSequences.txt
timotimo i'm not volunteering, just wondering if it'd be useful
samcv so it was uniform
hmm
also jnthn some of the unicode emoji names have diacritics 12:10
timotimo AFKBBIAB 12:11
samcv so how does that work for uhm. are those hash keys accessible?
or do i need to do weird stuff.
lizmat jnthn: new fail more in HARNESS_TYPE=6: moar(78463,0x70000dff4000) malloc: *** error for object 0x16: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
12:39 pyrimidine joined
samcv jnthn, updated the PR 12:41
i am only changing case inside the MVM_unicode_string_from_name function then just pass that string onto MVM_unicode_lookup_by_name. 12:42
lizmat jnthn: good news: looks like that new fail is the only type of fail left in HARNESS_TYPE=6 12:43
jnthn: bad news, it appears to happen about 1 in 3 times :-(
samcv lizmat, how do i use harness6 by itself? 12:44
lizmat what do you mean: by itself?
samcv uhm
prove has its own thing
lizmat HARNESS_TYPE=6 make spectest 12:45
that's the thing
HARNESS_TYPE=5 is the default
samcv how do i use the harness6 for things other than doing spectest?
lizmat not sure... haven't looked into what "make spectest" does exactly :-) 12:46
samcv XD
lizmat not for a while anyway :-)
brrt that's the hallmark of good tools :-) 12:47
samcv m-testable t/spectest.data
err 12:48
this is just like 12:49
a web of things they refer to each other in the Makefile
jnthn It does bottom out eventually ;)
samcv one can never be truely sure
jnthn And once I fiugred out what it was doing it basically made sense :)
samcv what does it do? "something that makes sense you you figure it out"
jnthn lizmat: Hm, you get that one in three times? That's...a far better hit rate than I can get on Linux
lizmat: nwc10++ got me some ASAN reports; I need to analyze them further 12:50
lizmat: I may request a shell acocunt at some point to see if osx's malloc_history thingy can tell us something more valuable. 12:51
lizmat ok, will stop testing / reporting :-)
jnthn Since it works a bit differently to ASAN etc.
lizmat 🛑 # the joys of suggestions on the touch bar
jnthn But yeah, I can't get it to explode as much as 1 in 3 times so....
Anyway, will focus on my await refactors for the moment 12:52
Well, after reviewing samcv++'s PR updates 12:53
lizmat await refactors++ :-)
samcv i'm awaiting those
spectest also pass for the latest changes as well 13:06
jnthn Nice 13:07
Geth arVM: 2544eb5302 | (Samantha McVey)++ | 4 files
Make getting Uni seq/cp's case insensitive and add seq from NamedSequences.txt

This makes looking up codepoints or sequences by name case insensitive. We store them all in uppercase when generating the Unicode database and then uppercase any requests we receive before trying to look them up.
This also adds the sequences from NamedSequences.txt, previously only the Emoji Sequences ... (5 more lines)
13:15
arVM: 50c87ce257 | (Samantha McVey)++ | src/strings/unicode_ops.c
Avoid changing the case twice for codepoint lookup

This keeps MVM_unicode_lookup_by_name case insensitive, passing the uppercase string to MVM_unicode_lookup_by_name. Also fixes a GC issue with the last commit.
arVM: 4aab50646e | (Jonathan Worthington)++ | 4 files
Merge pull request #511 from samcv/NamedSequences

Make getting Uni seq/cp's case insensitive and add seq from NamedSequences.txt
samcv \o/
samcv just added a new alias to terminal so she can see which commits were actually merged with MoarVM/inactivate-async-tasks 13:18
gitmlog: aliased to git log $(git merge-base --octopus $(git log --merges --pretty=format:%P)).. --boundary --graph --pretty=oneline --abbrev-commit 13:19
heh
13:24 pyrimidine joined
[Coke] (making \c[] case insensitive) unicode won't ever give us a conflict there? 13:32
[Coke] hopes that's not a dupe question. 13:33
jnthn hehe, I asked that earlier today ;)
samcv nope
[Coke] thanks. :)
samcv all the canonical won't ever change are uppercase and ascii only. the emoji are much less restrictive
since they're like a seperate thing (parts of it)
so we don't have to support those if we want to say we support all of unicode or whatnever 13:34
[Coke] # since they seperate seperate named items ?
(that is a plausible sentence but felt worth checking on it)
samcv what? 13:40
oh uhm.
is that refering to: since they're like a seperate thing (parts of it) 13:41
?
[Coke] it's a comment in the commit. 13:42
github.com/MoarVM/MoarVM/blob/4aab...c.pl#L1110 13:43
samcv oh
should be 'separate named codepoints which are immediately adjacent to each other' or something like that 13:45
m: "#,".uninames
camelia ( no output )
samcv m: "#,".uninames.say
camelia rakudo-moar 483e4f: OUTPUT«(NUMBER SIGN COMMA)␤»
samcv so that would be NUMBER SIGN, COMMA
for example in this notation
i wasn't sure how to word it because we allow doing that for sequences or codepoints 13:46
i mean the commas have nothing to do with sequences, just 13:47
they separate the names, for use in denoting codepoints one after each other
i need to go to bed. night all o/ 13:48
jnthn 'night, samcv 13:50
14:46 pyrimidine joined 15:03 pyrimidine joined 15:12 lizmat joined 15:22 pyrimidine joined 16:04 pyrimidine joined 16:26 brrt joined 16:27 pyrimidine joined 16:34 pyrimidine joined 17:13 pyrimidine joined 17:34 pyrimidine joined 18:15 domidumont joined 18:34 pyrimidine joined 18:40 cale2 joined
cale2 Hey all 18:41
I was looking up what makes MoarVM unique from other VMs 18:42
I see references to 6model object model, but no explanation of what that is or how it is different
brokenchicken It's Perl 6's object model 18:44
cale2 brokenchicken: And the difference between P6's object model and Java's object model is the meta-objects, sort of like prototypal object systems like JS 18:47
brokenchicken cale2: there are slides: jnthn.net/papers/2013-yapceu-moarvm.pdf 18:49
jnthn It does strings at grapheme level, which is unique so far as I'm aware 19:01
And yeah, the 6model approach to things, including representation poly, is notable. 19:02
Besides that, there aren't really that many things that MoarVM is doing that nobody else is, but the combination of the things it does (that is, precisely the set that Perl 6 needs) is. 19:04
cale2 It says that MoarVM passes more tests than any other VM but some VMs pass tests that Moar does not
Why would that be?
jnthn I'm curious how true that actually is by now :) 19:05
geekosaur jvm has been around longer, you'd figure it has things fully implemented that are still work in progress
(in moarvm)
that said, the things it does are often not a good fit for perl 6, so the cases when jvm can leverage being more mature are rare 19:06
jnthn I guess there probably somewhere is a Perl 6 spectest fudged on MoarVM and not on JVM but it must be a rarity. 19:07
MoarVM hasn't really been a "make some futuristic VM doing things nobody else is", though. It was more "do the kind of things that modern VMs do that Perl 6 can benefit from" 19:10
timotimo we just have to put in more unicode tests
jnthn And along the way "do things that Perl 6 needs really well" 19:11
(Thus why we have 6model as the VMs native object system, and strings at grapheme level)
geekosaur true, but it can't be done in "click these two pieces together and they just work" either
jnthn Yes, things have a habbit of interescting in interesting, and sometimes surprising, ways. :) 19:13
19:14 pyrimidine joined
jnthn Thus why loose coupling is a battle constantly worth fighting in VM architecture. :) 19:16
jnthn was quite relieved today when taking a continuation on one thread and invoking it on another Just Worked. 19:17
Had always presumed it should work out, but didn't ever try it until today.
19:24 domidumont joined 19:34 pyrimidine joined 20:14 FROGGS joined 20:15 pyrimidine joined 20:52 pyrimidine joined 21:54 pyrimidine joined 22:08 sivoais joined
timotimo samcv: unicode.pod6 doesn't parse :( 22:11
ah, i see. 22:12
fixed it 22:13
c:\projects\rakudo\nqp\moarvm\src\core/frame.h(196) : error C2375: 'MVM_frame_destroy' : redefinition; different linkage 22:15
c:\projects\rakudo\nqp\moarvm\src\core/frame.h(181) : see declaration of 'MVM_frame_destroy'
o_O
yeah, there's two of them and one is marked MVM_PUBLIC 22:16
may have something to do with destory vs destroy?
Geth arVM: 357438a99c | (Timo Paulssen)++ | src/core/frame.h
remove second declaration of MVM_frame_destroy

seems like msvc (at least the one used on appveyor) didn't like this. Perhaps it was a left-over from typo-ing destroy as destory a few lines below.
22:19
jnthn samcv: I've given you a MoarVM commit bit, for your convenience. :) Feel free to commit small/uncontroversial things (like typos, warnings removal, build fixes) directly to master; also to merge pull requests from others that you're comfortable reviewing and comfortable are correct/good. For non-trivial changes, please do PRs (as I'm encouraging everyone here to do, me included). 22:38
(This I guess allows you to work in branches in the MoarVM repo too, which I guess will simplify workflow a bit.)
22:49 zakharyas joined 23:10 geekosaur joined
diakopter samcv++ 23:19
yoleaux2 13:38Z <brokenchicken> diakopter: 💩 💩 💩
diakopter brokenchicken: lolz
samcv thanks jnthn :) 23:21
will make sure not to rm *; git init; git touch 'moarvm'; git add .; git commit; git set remote origin [email@hidden.address] git push --force 23:23
heh tho github prolly has settings to stop people from doing that right? 23:24
i'm not sure
i know at least gitlab won't let you do that to a protected branch, which is master by default
timotimo, thanks for fixing unicode.pod6 23:25
jnthn :) 23:26
I've no idea if it's protected 23:27
Given a bunch of people have checkouts with full history, though... :)
samcv yeah hah 23:28
like loads
timotimo sure thing 23:30
MasterDuke what is this line doing, `zbase = zbase * radix;`? github.com/MoarVM/MoarVM/blob/mast...rce.c#L395
jnthn 'night, '#moarvm 23:48