github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm
Set by AlexDaniel on 12 June 2018.
01:54 vrurg_ is now known as vrurg 01:57 Kaiepi left 02:39 Kaiepi joined, Kaiepi left, Kaiepi joined 04:23 MasterDuke left 07:20 sena_kun joined 07:31 sena_kun left
nwc10 good *, #moarvm 07:40
07:54 sena_kun joined 08:12 zakharyas joined 08:20 zakharyas1 joined 08:23 zakharyas left 10:25 sena_kun left 10:30 sena_kun joined
Geth MoarVM/include-not-cat: 349d462b8c | (Nicholas Clark)++ | 3 files
Create a unicode.c with #include directives instead of generating it with cat.

This way we avoid generating a large temporary file, avoid needing to delete it at cleanup, avoid needing an entry in .gitignore, and get decent line numbers in warnings and errors from the C compiler. Where we replace "generate" src/strings/unicode.c with the C pre-processor instead of cat
11:20
nwc10 oops, that needed a bit more ammending 11:21
Geth MoarVM/include-not-cat: 6f50a1b944 | (Nicholas Clark)++ | 3 files
Create a unicode.c with #include directives instead of generating it with cat.

This way we avoid generating a large temporary file, avoid needing to delete it at cleanup, avoid needing an entry in .gitignore, and get decent line numbers in warnings and errors from the C compiler.
nwc10 Geth++ # excellent proofreader
Geth MoarVM: nwc10++ created pull request #1362:
Create a unicode.c with #include directives instead of generating it …
11:22
nwc10 I await the Win32 builds.
nine Oh I love this PR! 11:27
tellable6 2020-10-20T21:18:56Z #raku-dev <tbrowder> nine i changed msg too
2020-10-20T21:23:17Z #raku-dev <tbrowder> nine sorry, i'm mistaken again. i should have cancelled the PR and resubmitted with a better log as i've done in the past
11:50 zakharyas1 left 12:19 domidumont joined
Geth MoarVM: 4f5787d3ca | (Nicholas Clark)++ (committed by nwc10) | 3 files
Create a unicode.c with #include directives instead of generating it with cat.

This way we avoid generating a large temporary file, avoid needing to delete it at cleanup, avoid needing an entry in .gitignore, and get decent line numbers in warnings and errors from the C compiler.
12:45
nwc10 hmm, I see that it might have been better to just use regular git on the command line to do that. 12:46
12:51 zakharyas joined 13:05 zakharyas1 joined 13:07 zakharyas left 13:36 zakharyas joined 13:38 zakharyas1 left 14:53 domidumont left 15:25 zakharyas left 15:28 zakharyas joined 15:58 zakharyas left 16:25 mtj_ joined 17:13 domidumont joined 17:14 MasterDuke joined 17:16 raku-bridge joined, raku-bridge left, raku-bridge joined 17:21 zakharyas joined 17:26 Kaeipi joined, Kaiepi left 17:29 Kaeipi left, Kaeipi joined 17:31 domidumont left 18:02 zakharyas left
MasterDuke i think i asked this the last time he wrote about this, but would lemire.me/blog/2020/10/20/ridiculo...alidation/ be useful for us? i assume we'd have to translate to plain C, not C++? 18:45
samcv, timotimo: ^^^ 18:46
18:46 sena_kun left
timotimo unfortunately, we need a little more than just verification, though i suppose having a first step for verification followed by what we have to do without needing to worry about the data could be good? 18:46
like, we need to translate \r\n into one codepoint, and do all the other NFG fun 18:47
18:57 MasterDuke left
samcv timotimo, yeah, that is what I got when I looked at our code and compared it. That it wouldn't fit in well with how we do things 19:42
timotimo one day™ we'll have a type of string that is in source form 19:44
samcv timotimo, like utf8? 19:52
19:57 MasterDuke joined
timotimo for example, yeah 20:05
the user would pay in terms of algorithmic complexity, but it's a trade-off anyway and maybe the user wants to make their own trade-off rather than having it made for them
MasterDuke samcv, timotimo: so if i'm looking at the right code, in MVM_string_utf8_decode we get a `char *`, then in a loop `decode_utf8_byte` (which i can't find the definition of) chunks of it, and if each chunk is ok we normalize it into a grapheme, stick it in an array of graphemes, and at the end turn that into an MVMString if there were no errors? 20:06
timotimo i didn't look at the code, but that looks about right 20:07
MasterDuke but that you don't think it would be faster to do a fast validation of the `char*` first, and then go into a tighter loop if it validates and do the error case if not? 20:08
timotimo maybe. we'll still have to go through and do the variable-length-encoding thing 20:11
and normalization
MasterDuke right, we just wouldn't have the switch on the result of decode_utf8_byte 20:12
we do MVM_EXPECT the UTF8_ACCEPT case, so maybe it wouldn't be much faster 20:13
i don't think i'm up for converting his code to something we could use just to try it out, so i'll happily accept your judgement 20:15
timotimo i would be happy to be proven wrong ;) 20:18
MasterDuke github.com/MoarVM/MoarVM/projects/...d-47838858 20:24
i still the gmp branch to finish up, the MVMSpeshCandidate as REPR branch to finish up, the coverage parser script to enhance, a wedding to plan, a wife to murder, a country to frame... 20:26
20:29 zakharyas joined
samcv MasterDuke, i would think it'd be faster to do it all at once. well. at least i know when i vectorized the string copy operations, doing a run over the whole thing was faster. may be a tie if no vectorization happens 20:29
MasterDuke ah, well volunteered! ;) 20:30
timotimo i guess if the string fits in l2 cache, it'll be worth doing whatever does more stuff per instruction? 20:42
l3 and further outwards, the memory fetch overhead might very well eat up any improvements from vectorization and friends
MasterDuke how many strings are we validating that don't fit in l2? 20:43
20:54 zakharyas left
MasterDuke when compiling CORE.c setting, most common value of the bytes argument to MVM_string_utf8_decode is 12, then 13, then 14, then 2, then 16 21:06
timotimo do we want to hardcode some bitmasks for short strings ... 21:50
22:48 MasterDuke left 23:49 bingos_ joined 23:50 bingos left