samcv | jnthn, what would be the best way to store multiple codepoints for emoji sequences? | 00:33 | |
for decomposition we just store a string and then parse it. but there has to be a better way to store multiple things? | |||
bye bye dalek | 00:37 | ||
00:37
dalek joined
00:41
pyrimidine joined
02:04
lizmat_ joined,
pyrimidine joined
02:30
pyrimidine joined
02:48
ilbot3 joined
03:29
pyrimidine joined
04:01
geekosaur joined
04:25
geekosaur joined
04:40
pyrimidine joined
05:14
pyrimidine joined
05:36
pyrimidine joined
|
|||
samcv | jnthn, so i got it fully working \o/ | 06:59 | |
07:05
domidumont joined
07:10
domidumont joined
|
|||
nwc10 | good *, #moarvm | 07:11 | |
samcv | hi nwc10 :) | 07:12 | |
github.com/MoarVM/MoarVM/pull/492 \o/ | 07:18 | ||
07:28
Ven joined
07:37
brrt joined
|
|||
brrt | good * #moarvm | 07:41 | |
samcv waves | 07:42 | ||
brrt | i'm checking your PR | 07:45 | |
14 files changes, oh my | |||
samcv | need to fix a few things but | ||
most of those are just adding the op | |||
brrt | oh, wait, indeed | ||
brrt would prefer that the oplist were generated at build time.... | |||
samcv | what's the best way to construct a string from codepoints btw? i don't think i did it the best way | ||
brrt | honestly, i don't know | 07:46 | |
let's check first before i give you an answer to that | |||
samcv | trying to debug a problem when MVM errors for some of them | ||
i must not be constructing the grapheme properly | |||
but seems to happen when there's more than 2 codepoints in the sequence | 07:47 | ||
brrt | just for my info, why place getstrbyname before indexat in src/core/interp.c | ||
samcv | dunno | ||
brrt | i'd be surprised if that had any ill effect on the compiled code, but i had expected them to be in bytecode order | ||
hmmmm | |||
samcv | interp.c doesn't have to be in order | ||
brrt | no | ||
but i would still expect it to be | 07:48 | ||
samcv | hmm it seems to return the grapheme fine but. | ||
it errors later on | |||
MoarVM panic: MVM_nfg_get_synthetic_info called with out-of-range synthetic | |||
so it's not erroring in the code i made | |||
well maybe it is. maybe gdb is being weird | |||
may have not break'd at the right spot | 07:49 | ||
yeah it seems to panic after adding the 2nd codepoint to the buffer | |||
brrt | i assume you know why the enums have changed, so i'm not going to comment on that, either | ||
samcv | well i think it's erroring when adding the 3rd | ||
what about the enums? | |||
brrt | have you compiled with --debug | ||
samcv | yep | ||
brrt | hmmm | 07:50 | |
have you compiled with āoptimize=0 | |||
samcv | i need to go into unicode.c which is generated from a compilation of multiple | ||
plus there's macros | |||
oh it looks like it's not generating the right number of array items | 07:51 | ||
ah i see. because my original testing code didn't use * things | 07:52 | ||
brrt | hmmpf | 07:53 | |
i honestly have no comments on that PR | |||
:-) | |||
well, some things | |||
samcv | uhm how do i get the size of this structure properly: static const MVMint32 uni_seq_16[] = {0x1F487,0x1F3FF} | 07:56 | |
from const MVMint32 * uni_seq = uni_seq_enum[result->structitem]; | 07:57 | ||
maybe there is not a way. or maybe there is. i could always store the number of codepoints as the first item in the structure if i cannot | |||
uni_seq_enum[] just stores pointers to the uni_seq_xx | 07:58 | ||
brrt | what do you mean by 'size of this structure' | ||
i see no reason why sizeof() wouldn't give you the right thing | 07:59 | ||
namely, 8 | |||
samcv | ah yeah. i see what i was doing wrong | 08:00 | |
brrt, well i think i got it | 08:23 | ||
well not the size part yet. but the other crashy | |||
08:26
zakharyas joined
|
|||
samcv | brrt, sizeof(uni_seq)/sizeof(MVMint32); gives me half the size i want | 08:27 | |
that's what i thought would give me the number of items in the array, but it returns a number half that | 08:28 | ||
why is this? | |||
because uni_seq is a 64 bit pointer? | |||
lizmat | that would be my first guess ? | 08:29 | |
samcv | ok yeah. i don't want to divide by anything just want to dereference it | 08:30 | |
getting the right number now | 08:31 | ||
brrt | well, you just defined uni_seq_16, not uni_seq | ||
samcv | brrt, i want to get the number of elements in the array. still not working argh | 08:35 | |
sizeof(*uni_seq) gives me size of the MVMint32 type because uni_seq is MVMint32 * | |||
arnsholt | If uni_seq is declared as a pointer, there's no way to figure out the length of the array | 08:37 | |
samcv | ok that's what i thought originally | ||
well actually i declare static const MVMint32 uni_seq_449[] = {0x1F3C4,0x1F3FB,0x200D,0x2640,0xFE0F} | 08:38 | ||
and then i have a struct which contains uni_seq_449 uni_seq_450 etc | |||
so i access the uni_seq_xx from the struct | |||
brrt | uhuh | ||
hmm | |||
i see | 08:39 | ||
no, you can't do that | |||
samcv | uni_seq = uni_seq_enum[blah]; | ||
ok i will just have the 1st item be the length | |||
brrt was just about to suggest that | |||
samcv | yea | ||
brrt | alternatively, have a sentinel value at the end | ||
samcv | yeah | ||
arnsholt | Yeah, those are the standard solutions | ||
Pascal arrays (length first), or NULL-terminated | |||
samcv | i'm gonna go with length first | 08:40 | |
arnsholt | Yeah, I like length first too, TBH | ||
brrt | as long as you don't forget to bias your indexes | ||
samcv | yea | 08:42 | |
arnsholt | brrt: There's always "real_array = &data[1]" =) | 08:49 | |
brrt | true | 08:50 | |
although i've started to prefer: "real_array = data+1" | |||
arnsholt | Yeah, that'll work too | 08:51 | |
brrt | register arithmetic is surprisingly elegant if you get the hang of it | 08:52 | |
08:52
domidumont joined
|
|||
brrt imagines a thousand rustaceans fainting reading that | 08:52 | ||
arnsholt | Register arithmetic? | 08:53 | |
brrt | eh, pointer arithmetic | ||
hehe | |||
arnsholt | Oh, right =) | ||
Yeah, it's not too bad once you get used to it | 08:54 | ||
brrt | yeah, my bad, i'm working on a blog post | ||
arnsholt | But I do think a language like Rust has the potential to kill of entire classes of problems | ||
brrt | well, it goes hand in hand with certain patterns (of memory management / layout), and if you're not into those patterns, then it's going to suck | ||
hmmm. no doubt | |||
on the other hand | 08:55 | ||
arnsholt | And a problem with pointer arithmetic is that if you fuck it up, all kinds of weird shit can happen | ||
brrt | i've had to fix many, many errors in the register allocator before it worked | ||
i think just 2 of these were actual honest memory corruption / overflow errors | |||
and they were swiftly caught by ASAN | 08:56 | ||
moritz | dishonest memory corruption errors are the worst :-) | ||
brrt | one of these was actually a data-structure-and-algorithm-choice error, at the root of it | ||
the other was a noninitialized value | |||
all other issues were logic issues | |||
so..... | |||
it's undoubtedly true that rust relieves programmers of whole classes of errors | 08:57 | ||
samcv | ok it actually really works now \o/ | ||
brrt | what is not so self-evident is that those classes of errors are the most frequent or most important errors | ||
samcv++ | |||
(although I guess you could point to a number of CVE's which prove me wrong) | |||
on the other, other hand | 08:58 | ||
renember shell-shock | |||
nothing buffer overrunny about that | |||
was a logic error | |||
arnsholt | Yeah, Rust won't save you from those | ||
brrt | rust won't save you from phishing, either | ||
arnsholt | Nope | ||
brrt | so i'm *a bit* annoyed about the hype surrounding 'rust = safety' | 08:59 | |
that doesn't mean i don't want to try it out sometime :-) | |||
arnsholt | I think it's not too far off the mark | ||
brrt | it's a correct statement. it is the hype which is unreasonable | ||
arnsholt | Especially when you get things like memory shenanigans in file(1) and friends | ||
Yeah, hype is hype, I guess | |||
brrt | (this too shall pass :-)) | 09:00 | |
moritz | it's really that Rust offers compile-time abstractions without (much) of a runtime cost | 09:03 | |
without the crazy subtle semantics that C++ has, too :-) | |||
brrt | that's pretty cool, yes | ||
09:09
pyrimidine joined
09:24
brrt joined
|
|||
samcv | now just time to make spectest :) | 09:28 | |
brrt | make spectest, not segv | 09:32 | |
samcv | heh | ||
09:37
jnthn joined,
Util joined,
mst joined,
nine_ joined
09:39
nwc10_ joined,
moritz_ joined,
ggoebel joined
09:40
camelia joined
09:44
japhb joined
|
|||
jnthn | moarning o/ | 09:53 | |
samcv | morning jnthn | ||
brrt | moarning jnthn | 10:02 | |
jnthn catches up with backlog here and on #perl6-dev to see what happend during the night :) | 10:03 | ||
samcv: So, any leftover questions, or is it now at the point of "review my PR"? :) | |||
samcv | yeah. review my PR :-D | ||
it works fully and is gud | |||
let me know if there's something i did you don't like though | 10:04 | ||
spectest just finished and pass | 10:05 | ||
jnthn | Alrighty | ||
samcv | oh there's one thing MVM_string_from_grapheme i just copied it into that file | ||
other than that | 10:06 | ||
jnthn | Working example: nqp-m -e "say(nqp::getstrbyname(''person golfing: medium-light skin tone'))" | ||
samcv | err maybe it was already there | ||
jnthn | I...uh...doubt this works, due to the extra quote at the start? :) | ||
samcv | err wait where is it from | ||
lies! | |||
jnthn | From the PR description ;) | 10:07 | |
samcv | heh yeah whatever the double quotes | ||
oh i know where i stole it from | 10:08 | ||
it's MVM_string_chr except without checking to make sure there are no negative graphemes :) | |||
maybe should have MVM_string_chr call that one? anyway check out the PR and let me know | 10:09 | ||
(so we don't duplicate code) | |||
and move it to ops.c or something | |||
10:13
pyrimidine joined
|
|||
jnthn | Yeah, currently reviewing | 10:26 | |
OK, review done | 10:38 | ||
samcv | This is called string_from_grapheme, but actually is taking a codepoint, which is not always a grapheme. | 10:39 | |
but uhm. it takes both? | |||
synthetic and non synthetic's | |||
idk what it should be called then | |||
jnthn | That's not what grapheme means. | ||
Grapheme means "in NFG form" | |||
The positive integers of the NFG representation all just happen to align with NFC codepoints. | 10:40 | ||
samcv | so what are the negative ones? | ||
jnthn | Also graphemes | ||
samcv | those are graphemes yes? | ||
jnthn | Yes | ||
We use "synthetics" to talk about the negatives. | 10:41 | ||
But I think the routine being called string_from_grapheme is fine | |||
samcv | ok | ||
jnthn | It should just take MVMGrapheme32 and it doesn't need to run it through the normalizer at all | ||
Because it's already NFG | |||
Note that while having such a function in MoarVM is fine, we shouldn't expose that one directly to the outside world | 10:42 | ||
samcv | yeah | ||
jnthn | (We never expose synthetics, because we don't want people to rely on their integer values.) | ||
samcv | yep | ||
uhm so how do i do it without MVM_unicode_normalizer_process_codepoint | |||
i tried without it but i kept running into issues | 10:43 | ||
jnthn | Which "it"? :) | ||
How to implement string_from_grapheme? | 10:44 | ||
samcv | uh adding to buffer. | ||
also yes that | |||
err. no. | |||
but also i'm using MVM_unicode_normalizer_process_codepoint just because i don't want any issues if we run into the cases where we don't correctly break in emoji sequences | 10:45 | ||
there are still a few that don't work properly | |||
jnthn | I'd just change the signature to take `MVMGrapheme32 g` and then get rid of the use of the normalizer | ||
And then it's already correct | |||
samcv | ok how do i do it without the normalizer? | 10:46 | |
jnthn | Because s->body.storage.blob_32[0] = g; does the right thing | ||
(What I just said is about inside of string_from_grapeheme) | |||
It's totally reasonable to use the normalizer in MVM_unicode_string_from_name | 10:47 | ||
samcv | oh just don't use it in MVM_string_from_grapheme | ||
jnthn | Right | ||
Because by the time you call that you already have a grapheme :) | |||
samcv | and this will also work if i have multiple graphemes reight? | ||
well er probably not | 10:48 | ||
jnthn | Well, not at the moment, because the signature is MVMGrapheme32 | ||
samcv | but can cross that road when we come to it | ||
jnthn | Yeah | ||
Though fixing it now isn't so hard | |||
Lemme find a good example | 10:49 | ||
samcv | ok | ||
jnthn | github.com/MoarVM/MoarVM/blob/mast...lize.c#L88 | ||
This function actually already nearly does what you ned | |||
*need | |||
It's just that it takes an MVMObject * as its input and pulls data out of that | 10:50 | ||
samcv | yes i saw that | ||
jnthn | But we could split it into two parts | ||
One that works on a C-level array | |||
samcv | that would be cool | ||
jnthn | And takes a length | ||
And then you can just feed the codepoint array you've got into it | |||
samcv | yeah i had seen that function but it didn't do exactcly what i wanted | ||
well my array's 1st item is the number of items in it, but i can always move the pointer by 1 | 10:51 | ||
and already have the length | |||
jnthn | Sure, just move the pointer by 1 and pass in that and the length | ||
Though I was a tad confused about the length | |||
samcv | hm? | 10:52 | |
jnthn | Whether it includes the element specifying the length or not | ||
samcv | no it does not | ||
it's the number of codepoints | |||
jnthn | for (int i = 1; i < array_size; i++) { | ||
So isn't this an off-by-one, or do I need another coffee? :) | 10:53 | ||
(If we start at 1 to skip the length, then it'd need to be <= ? ) | |||
samcv | nope | ||
almost certain not | 10:54 | ||
i thought a similar thing at first but, that is off by one if i do <= | |||
but i will 2x check | |||
jnthn | m: my @a = 2, 100, 101; my $array_size = @a[0]; loop (my $i = 1; $i < $array_size; $i++) { say @a[$i] } | 10:56 | |
camelia | rakudo-moar ed5c86: OUTPUTĀ«100ā¤Ā» | ||
dalek | arVM: 8bfbb0e | jnthn++ | src/gc/orchestrate.c: Tweak full collection criteria in heap profiling. The recording of heap snapshots will of course use memory, which will throw off the RSS heuristic and make us a *lot* less likely to ever do a full collection, distorting the profiles. This is also a bit of a distortion (to more regular heap profiles being taken), but it's an improvement. (To do better, we could try tracking RSS before/after snapshots and excluding that memory from the calculation. Patches welcome if anyone tries it and finds that a viable appraoch.) |
10:59 | |
arVM: 68b5e35 | jnthn++ | src/profiler/heapsnapshot.c: Null-check the *correct* thread's ->cur_frame. 539346d | jnthn++ | src/io/ (3 files): Take into account actual allocated of I/O buffers. It seems libuv suggest we allocate 64KB sometimes, even when the input we get is tiny. While I'm not sure second-guessing it is wise, we should at least be honest internally about what's allocated. By storing the actual allocated size, the GC can track it as part of the gen2 promotion statistics, and be smarter about triggering full collections. This reduces memory overhead. |
|||
samcv | ok it is off by one now jnthn i must have changed something else | ||
MoarVM: 80c8044 | jnthn++ | src/io/ (3 files): | |||
MoarVM: Merge pull request #488 from MoarVM/more-pressure | |||
MoarVM: | |||
MoarVM: Take into account actual allocated of I/O buffers. | |||
jnthn | samcv: Phew, I didn't need stronger coffee after all :-) | ||
11:00
zakharyas joined
|
|||
samcv | you still need stronger coffee though | 11:00 | |
just because why not | |||
jnthn | The stuff I'm drinking now is quite a bit weaker than my regular... | 11:01 | |
I was given a box set of coffees at Christmas. | |||
I'm used to drinking a 5. If I found the 3 a bit weak, I dunno what I'll make of the 1s. :P | 11:02 | ||
Hm, let's switch to using Geth instead of dalek here...seems to be working fine for other projects | |||
Geth | arVM: 4d87b1cc70 | (Jonathan Worthington)++ | src/spesh/candidate.c Free up spesh log slots after specialization. Spesh logging keeps values alive, preventing the GC from collecting them. It logs values to sample what types show up, which is fine, but we should not hang on to them beyond the point the specializer has used them in its analysis. This reduces memory overhead, perhaps quite notably in some applications that have large objects (for example, RT #130494 leaked many objects in this way). On CORE.setting compilation it saves ~3MB - not much in the scheme of things, but nice to win. |
11:04 | |
arVM: c670eadf6b | (Jonathan Worthington)++ | src/spesh/candidate.c Merge pull request #490 from MoarVM/free-spesh-log-slots Free up spesh log slots after specialization. |
|||
jnthn | Nice...now our commits are reported by a bot running on MoarVM :) | 11:05 | |
samcv | jnthn, That's reasonable, but we should at the very least stick in an assert that we really get 0 back from this. | ||
what do we want to do in case it's not 0? | |||
return empty string? | |||
jnthn | Well, if the plan is that we'll re-use the code inside of MVM_unicode_codepoints_to_nfg_string it'll be fine | 11:06 | |
Since it handles cases where the sequence produces multiple graphemes. | |||
If it's non-zero it'd mean we were about to silently lose a grapheme. | 11:07 | ||
samcv | yeah | 11:08 | |
jnthn | But really, I'd break MVM_unicode_codepoints_to_nfg_string into two pieces | ||
Everything below input_codes = ((MVMArray *)codes)->body.elems; can be factored out | |||
And then called as with input and input_codes | |||
samcv | can this be done later? | 11:09 | |
jnthn | I guess, but it'd avoid the need to introduce MVM_string_from_grapheme and resolve all the issues I had in MVM_unicode_string_from_name except the off-by-one :) | 11:12 | |
And result in less code overall | 11:13 | ||
samcv | i will look into it tomorrow most likely since it's 3am here now | 11:14 | |
we won't need MVM_string_from_grapheme then anymore right? | 11:15 | ||
also aside from splitting unicode_codepoints_to_nfg_string, i think i've made all the changes you requested now | 11:16 | ||
jnthn | Right | 11:19 | |
OK, sounds good. | |||
Rest well :) | |||
timotimo | o/ | 11:21 | |
samcv | not asleep yet :P | ||
but i'm mostlyish done coding for the day | |||
timotimo | i do wonder what causes all our bots to consume more and more memory | 11:24 | |
i'd need to run them myself to figure that out | |||
jnthn | Well, lemme merge work-lifetime first :) | 11:25 | |
My first attempt to rebase stuff to clean up resulted in SEGV... | |||
arnsholt | Whee! =) | 11:26 | |
samcv | hehehe | 11:27 | |
timotimo | wow,oops | ||
jnthn | I probably did something silly :) | 11:28 | |
Works on second attempt | |||
All I intended to do ws trip out a commit that shouldn't have been in and aprt of another. | 11:29 | ||
timotimo | i think i already asked for it a long time ago ... someone could implement abs_i for our jit and it'd positively impact something inside commitable | ||
i mean, i already mentioned abs_i could be done | |||
but i don't know how to do that properly | |||
jnthn | Aww, where went Geth? | 11:32 | |
arnsholt | Ping timeout, apparently | ||
jnthn | Anyway, just pushed the rebase of work-lifetime fixing the thing timotimo++ mentioned ;) | ||
timotimo | wait, i mentioned what? ;) | 11:33 | |
oh the typo? | |||
i mean ... switcho? switcheroo? | |||
samcv | work-lifetime sounds sort of ominous. as if that concludes all work jnthn will do on mvm lol | ||
jnthn | Yeah, that. | ||
:D | |||
samcv | work-lifetime pushed. nothing more to do! | ||
jnthn | Yup. All done. Now I can go to the Alps and spend my days sipping beer and enjoying the view. :) | 11:36 | |
Well, NQP and Rakudo builds seem happy post-rebase | 11:37 | ||
timotimo | MoarViem | ||
jnthn | At first I was like "huh, got a few seconds slower again??", then realized I've got IntelliJ running outside of the VM which is probably hogging an amount of resources... | 11:38 | |
11:38
pyrimidine joined
|
|||
brrt feels for jnthn's computer | 11:38 | ||
jnthn | It leads a busy life :) | 11:40 | |
11:41
Geth joined
|
|||
notviki | aww | 11:41 | |
Ping timeout... unsure why | 11:42 | ||
samcv | jnthn, how to name MVM_unicode_codepoints_to_nfg_string that takes in a unicode string | 11:44 | |
err that takes a c array | |||
can i just uhm. make a new one and change tho op mapping | |||
timotimo | just put a _v at the end, just like OpenGL uses :P | ||
samcv | v? | 11:45 | |
timotimo | "vector" | ||
samcv | no i get that but | ||
but why vector | |||
timotimo | another word for contiguous array | ||
samcv | it's two dimensional i guessā¦ but | ||
timotimo | wow. i was hoping to find an example by searching for "glgetv", but it seems like there's shoes that are called that | ||
jnthn | samcv: I'd leave the original one as is and call the factored out bit MVM_unicode_codepoints_c_array_to_nfg_string or so :) | 11:47 | |
Seems work-lifetime is good for merge :) | 11:48 | ||
brrt | \o/ | ||
lizmat is looking forward :-) | 11:49 | ||
brrt apparently can't write short blog postsā¦. | 11:51 | ||
timotimo | it is difficult | 11:52 | |
samcv | i will have to make a blog post once this is all done on all this unicode things | ||
brrt | i'd be interested in that. are you syndicated on pl6anet.org? | ||
timotimo | ... "The Syndicate" title theme song plays in the distance ... | 11:53 | |
samcv | nope brrt how do i get that | ||
timotimo | i think stmuk can add your .rss to the list | 11:54 | |
brrt | you should ask moritz, I think | ||
moritz can't do anything on pl6anet | 11:55 | ||
yes, stmuk is the one to talk to | |||
brrt | (pointer following :-)) | 11:56 | |
notviki | samcv: just add yours to this file: github.com/stmuk/pl6anet.org/blob/...perlanetrc | 11:57 | |
samcv | sweet | ||
moritz | ooh, nice | 11:58 | |
maybe add a link to the github repo to the website, while you're at it? :-) | |||
Geth | arVM: samcv++ created pull request #493: Refactor MVM_unicode_codepoints_to_nfg_string |
12:03 | |
samcv | woah. fancy | ||
anyway jnthn here you go | |||
spectest almost done completing, so should be ready to merge if you have no problems with it | 12:04 | ||
ok spectest pass. that one is ready for Merge | 12:07 | ||
timotimo | nice | ||
jnthn | Travis is having a go slow... | 12:11 | |
jnthn spectests a fix for github.com/MoarVM/MoarVM/issues/482 | 12:21 | ||
lunch, bbi30 | |||
samcv | jnthn, fixed now. also i've rewritten the new get_string_from_name or whatever it's called to use the new function | 12:24 | |
will rebase the string from name one once the newest PR is accepted | 12:26 | ||
12:42
pyrimidine joined
|
|||
lizmat needs some help with a codegen issue in Actions | 12:50 | ||
samcv | night all o/ | ||
Geth | arVM/master: 17 commits pushed by jnthn++ review: github.com/MoarVM/MoarVM/compare/e...f712c6a777 |
||
lizmat | good night, samcv | 12:51 | |
jnthn | samcv: Newest PR looks good, I will merge it once Travis chcks OK | ||
Thanks; 'night o/ | |||
lizmat | basically, I need to get Zop to call infix:<Z>(...,:with(op)) | 12:52 | |
instead of somehow working something with METAOP_ZIP | 12:53 | ||
line 6893 in Actions | |||
oops, 6983 | |||
feels like I'm trying to work this at the wrong place | 12:54 | ||
oddly enough, bare Z does codegen to a direct call to &infix:<Z> | |||
would appreciate any help there :-) | 12:55 | ||
Geth | arVM/utf8-c8-boundary-fix: 9475d8db4c | (Jonathan Worthington)++ | src/strings/utf8_c8.c Decode (hopefully) all NFC UTF8 to NFG in UTF8-C8 In the last round of tweaks to UTF8-C8, we fixed some sequences that would not round-trip properly due to being mis-represented in UTF8. The fix dealt with those cases, but was a bit too sweeping. UTF8-C8 aims to decode everything that's both valid UTF8 and in NFC as the UTF8 decoder would, and express everything else as synthetics that will ensure round-tripping. This fix deals with the issue raised in MoarVM Issue #482, while not regressing any of the UTF8-C8 roundtrip tests. |
13:00 | |
arVM: jnthn++ created pull request #494: Decode (hopefully) all NFC UTF8 to NFG in UTF8-C8 |
13:01 | ||
jnthn | arnsholt: I think github.com/MoarVM/MoarVM/pull/494 does what you were suggesting; please take a glance if you've a moment :) | ||
lizmat: What does your diff to do it look like? | |||
I'd expect it to be mostly setting .named('with') | 13:02 | ||
lizmat | well, yes and no: | ||
I think the thinko I made is that METAOP_ZIP returns a block that takes a lol | |||
jnthn | Yes | ||
lizmat | whereas &infix:<Z>:with returns a Seq | ||
jnthn | Oh | ||
Yes | |||
So it won't work to do that simple rewrite | 13:03 | ||
lizmat | indeed | ||
jnthn | We need the extra level of thunk for nested meta-ops | ||
lizmat | why? it apparently isn't needed for a bare Z? | ||
or do you mean a ZZ ? | 13:04 | ||
or a ZX ? | |||
jnthn | Yes, any of those | 13:05 | |
Or ZZ :) | |||
lizmat | ah, so maybe I should codegen a call to Rakudo::Internals.ZipIterator... directly | ||
jnthn | Will that return a block? | 13:07 | |
lizmat | atm that returns a Seq | ||
no, Iterator | |||
arnsholt | jnthn: Yeah, that looks right to me! | 13:08 | |
jnthn | arnsholt: OK, thanks. :) | ||
lizmat: I think whatever we code-gen meta-ops to, we'd need to have it be a block except at the top level | 13:09 | ||
lizmat: We may be able to do smarter at the top level | |||
(But would also need a mechanism to detect it) | |||
lizmat | ok, lemme digest that for a bit :-) | 13:10 | |
jnthn | OK | ||
Going to switch to $other-job for a bit :) | |||
lizmat | thanks so far! | 13:11 | |
jnthn | But will be about :) | 13:12 | |
Will merge stuff when Travis is happy | |||
And bump MOAR/NQP revisions, so hopefully everyone can enjoy the fixes :) | |||
timotimo | for cases like my &bleh = &[Z,] and such | 13:14 | |
jnthn | Oh, that also :) | ||
Geth | arVM: dd7d4d086d | (Samantha McVey)++ | 2 files Refactor MVM_unicode_codepoints_to_nfg_string Seperate out the section which involves MVMObject so we can re-use this function in other places with native c data structures. |
13:47 | |
arVM: 37bb9737bd | (Jonathan Worthington)++ | 2 files Merge pull request #493 from samcv/MVM_unicode_codepoints_to_nfg_string Refactor MVM_unicode_codepoints_to_nfg_string |
|||
brrt | something extra to ponder | 13:49 | |
how am i going to extend the linear scan allocator (and the expr jit in general) to work with SSE registers | 13:50 | ||
im not at all sure that the rex byte will work for those | |||
matter of fact | 13:51 | ||
i know nothing about SSE registers and their encoding | |||
jnthn | I figure this is something we can worry about once we've got stuff working at all | 13:52 | |
(as in, post-merge) | |||
afaik we don't have any code that uses those today? | |||
So we won't miss out on anything? | |||
(anything we're already getting, that is) | |||
brrt | yes, definitely | 13:57 | |
but i make plans long ahead | |||
:-) | 13:58 | ||
i've more or less figured out how to implement ARGLISt | |||
as I said, that's the last essential bit before we can really consider merging | 13:59 | ||
by the way, the current JIT *does* work with SSE registers | |||
jnthn | Oh? What for ooc? | ||
brrt | for floating point calculations :-) | 14:00 | |
the alternative would be x87 coprocessor calculations. don't use those | |||
jnthn | heh, I didn't realize we weren't using those :P | ||
brrt | well, it's only for a few things | 14:01 | |
jnthn | Ah, just some floating point ops? | ||
So the basic things like + doesn't use them? | |||
Compilation completed successfully with 3,719 warnings in 20m 25s 687ms (moments ago) | |||
oops | 14:02 | ||
ww | |||
brrt | :-) | 14:03 | |
no, regular integer addition doesn't | 14:04 | ||
jnthn | But flaoting point addition? | ||
brrt | floating point addition does | ||
brrt looks for example | |||
github.com/MoarVM/MoarVM/blob/even....dasc#L987 | 14:07 | ||
jnthn | Hm | 14:08 | |
OK, so we probably do need to think about that at some point sooner rather than later if we want nice JIT of floating point code :) | 14:09 | ||
brrt | well, yeah | ||
i don't expect a terror, though | |||
i may need to extend dasm a bit again | |||
but the allocator shouldn't have to change (much) | 14:10 | ||
an extra stack for the additional registers, a few extra definitions, and some more care in accessors... | 14:11 | ||
14:13
pyrimidine joined
|
|||
brrt | oh, and passing floating point args, of course | 14:17 | |
14:22
pyrimidine joined
|
|||
lizmat | so, why is there no METAOP_REDUCE_NON, and why doesn't the lack of that not break Z.. ? | 14:25 | |
m: find-reducer-for-op(&[..]) | 14:26 | ||
camelia | rakudo-moar ed5c86: OUTPUTĀ«No such symbol '&METAOP_REDUCE_NON'ā¤ in block <unit> at <tmp> line 1ā¤ā¤Actually thrown at:ā¤ in block <unit> at <tmp> line 1ā¤ā¤Ā» | ||
moritz | m: say 1 Z.. 2 | 14:29 | |
camelia | rakudo-moar ed5c86: OUTPUTĀ«(1..2)ā¤Ā» | ||
moritz | wow | ||
lizmat | looks to me these operators only can take 2 iterators | 14:30 | |
ever | |||
m: say 1 Z.. 2 Z.. 3 | |||
camelia | rakudo-moar ed5c86: OUTPUTĀ«Range objects are not valid endpoints for Rangesā¤ in block <unit> at <tmp> line 1ā¤ā¤Ā» | ||
lizmat | m: dd 1 Zcmp 2 Zcmp 3 | 14:31 | |
camelia | rakudo-moar ed5c86: OUTPUTĀ«(Order::Less,).Seqā¤Ā» | ||
lizmat | m: dd 1 Zcmp 2 Zcmp 1 | ||
camelia | rakudo-moar ed5c86: OUTPUTĀ«(Order::Less,).Seqā¤Ā» | ||
lizmat | m: dd 1 Zcmp -1 Zcmp 1 | ||
camelia | rakudo-moar ed5c86: OUTPUTĀ«(Order::Same,).Seqā¤Ā» | ||
lizmat | m: dd 1 Zcmp 1 Zcmp -1 | ||
camelia | rakudo-moar ed5c86: OUTPUTĀ«(Order::More,).Seqā¤Ā» | ||
lizmat | m: dd 1 Zcmp 1 Zcmp 0 | 14:32 | |
camelia | rakudo-moar ed5c86: OUTPUTĀ«(Order::Same,).Seqā¤Ā» | ||
lizmat | m: dd 1 Zcmp 2 Zcmp 0 | ||
camelia | rakudo-moar ed5c86: OUTPUTĀ«(Order::Less,).Seqā¤Ā» | ||
lizmat | m: dd 1 Zcmp 2 Zcmp -1 | ||
camelia | rakudo-moar ed5c86: OUTPUTĀ«(Order::Same,).Seqā¤Ā» | ||
lizmat | yeah, that feels faulty | 14:33 | |
14:51
brrt left,
brrt joined
16:02
domidumont joined
16:04
zakharyas joined
|
|||
TimToady | yeah, should disallow more than 2 for non-assocs | 16:58 | |
16:59
zakharyas joined
17:25
zakharyas joined
17:59
pyrimidine joined
|
|||
Geth | arVM: 9475d8db4c | (Jonathan Worthington)++ | src/strings/utf8_c8.c Decode (hopefully) all NFC UTF8 to NFG in UTF8-C8 In the last round of tweaks to UTF8-C8, we fixed some sequences that would not round-trip properly due to being mis-represented in UTF8. The fix dealt with those cases, but was a bit too sweeping. UTF8-C8 aims to decode everything that's both valid UTF8 and in NFC as the UTF8 decoder would, and express everything else as synthetics that will ensure round-tripping. This fix deals with the issue raised in MoarVM Issue #482, while not regressing any of the UTF8-C8 roundtrip tests. |
18:27 | |
arVM: f9e14e9ca8 | (Jonathan Worthington)++ | src/strings/utf8_c8.c Merge pull request #494 from MoarVM/utf8-c8-boundary-fix Decode (hopefully) all NFC UTF8 to NFG in UTF8-C8 |
|||
18:29
pyrimidine joined
18:45
camelia joined
18:48
camelia joined
18:54
pyrimidine joined
19:01
Geth joined
19:13
brrt joined
|
|||
brrt | ohai #moarvm | 19:13 | |
i'm writing a longish blog post on the new register allocator and i've figured out a bug | |||
it is an extremely annoying bug, which is why i want to tell you about it | 19:14 | ||
if you read the literature about linear scan, the received wisdom is: expire registers prior to allocating a new one, so that if one of the input registers has it's last use in to create this live range, you can reuse it's register | 19:15 | ||
especially for two-operand instruction sets like x86-64, that's great, because that matches well with how the architecture works | |||
however, to get that effect, you need to arrange registers in a stack, not a register buffer | 19:16 | ||
s/register buffer/ring buffer/ | |||
so, that's one thing, but things get worse | 19:17 | ||
suppose you have no registers left and need to spill a value | |||
19:18
pyrimidine joined
|
|||
brrt | suppose you pick to spill a value which is used for the next instruction (i.e. where the new live range stats) | 19:18 | |
so then we split the live range into 'atomic' ranges | 19:19 | ||
since the new 'atomic' range is not in the past, it can't be retired, and must be put on the worklist | |||
however, once that's done, it can be immediately expired | 19:20 | ||
so suppose i have *two* such 'atomic' live ranges | 19:22 | ||
then one is allocated, e.g. to register rcx; before I allocate the second, this is expired, rcx is returned to the stack; the second value is also loaded into rcx, and my program is wrong | 19:24 | ||
... and yes, as usual, i know how to fix this | 19:26 | ||
but i'm *annoyed* | |||
also because the literature is just wrong about this | 19:27 | ||
TimToady | .oO("We don't know why your Fortran program crashes, but if you just throw in a few extra 'continue' statements, it should start working again.") |
19:28 | |
notviki | :o | 19:29 | |
This all sounds fancy pants. | |||
19:29
zakharyas joined
|
|||
notviki | brrt: is that stuff hard to learn? :) | 19:30 | |
TimToady | and yes, I heard that when I was (quite a bit) younger | ||
brrt | JIT compilers have many moving parts. that makes them kind of hard to explain | ||
each of the individual things is bog-standard. binary heap, disjoint set, linked lists | 19:31 | ||
TimToady: how.. even | |||
TimToady | probably buffer boundary issues | 19:32 | |
brrt | how does 'continue' make it work then :-o | 19:33 | |
did that acutally help | |||
notviki: i'm kind of hoping my blog has some practical hints on how you can do things. and i try to keep the LoC of the jit low | 19:34 | ||
that's also because it's just me writing it now | |||
notviki | brrt: do you have a CS degree? | 19:35 | |
brrt | (the proper fix, by the way, is to do two things; expire values *after* rather than *at* their last use; and add a special 'reuse' mechanism that checks if a register can be reused *at* it's last use; alternatively, expire a registers only once at a given code point) | 19:39 | |
notviki: not really, i've a degree in environmental science :-) | 19:41 | ||
i've kind of learned by brute force. perhaps not the most efficient way of doing it | 19:42 | ||
TimToady | brrt: it doesn't have to be continue, it can be anything that shifts the positions of the rest of the program, but continue is a no-op | 19:43 | |
but it's also possible they used it to mark basic block boundaries, or some such | 19:44 | ||
brrt | hmm, that makes some sense | 19:45 | |
TimToady | ancient Fortran optimizers were scary good, except when they weren't | ||
brrt | :-) | 19:46 | |
these days i think we have a bit more theory behind it | |||
i kind of like the 'expire once per codepoint' solution best | |||
simplest to implement | |||
notviki | :) | ||
brrt | and sufficient. but brittle | 19:47 | |
19:47
mtj_ joined
|
|||
brrt | otoh, everything about compilers is brittle | 19:47 | |
TimToady | .oO(that's why compiler writers make peanuts...) |
19:49 | |
brrt | i like peanut butter, so that's something | 19:51 | |
also, i think that actual compiler writers (that know what they are doing) actually have reasonable salaries | |||
samcv | ok i'm back | 20:17 | |
well uhm. i mean. good morning | 20:18 | ||
o/ | |||
20:23
pyrimidine joined
|
|||
brrt | good * samcv | 20:39 | |
evening for me | |||
samcv | i'll brb in an hour or so | 20:43 | |
brrt in an hour or 10 or so :-) | 20:44 | ||
sleep & | |||
20:50
pyrimidine joined
|
|||
jnthn | o/ samcv | 20:50 | |
20:57
pyrimidine joined
21:28
pyrimidine joined
21:33
pyrimidine joined
22:02
pyrimidine joined
22:47
pyrimidine joined
23:26
tbrowder joined
|
|||
samcv | jnthn, i have rewritten it to use the new reworked string creation function. waiting for travis builds to complete now | 23:30 | |
renamed it getstrfromname instead of getstrbyname, because of the existing getcpfromname | |||
jnthn | samcv++ | 23:36 | |
Will look in the morning :) | |||
samcv | aww ok | ||
jnthn | Super sleepy :) | ||
samcv | k | ||
jnthn | Found a moment to look anyway | 23:40 | |
Spotted one more thing | |||
But overall looks close | |||
Anything else before I attempt sleep? ) | |||
samcv | uhm i think that's it | 23:42 | |
jnthn | OK | ||
Be back in the morning then :) | 23:43 | ||
samcv | sleep well :) | ||
jnthn | Thanks...here's hoping :) | 23:44 | |
o/ | |||
samcv | o/ | 23:45 | |
timotimo | sleep wellthn | 23:53 |