github.com/moarvm/moarvm | IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018. |
|||
01:54
vrurg_ is now known as vrurg
01:57
Kaiepi left
02:39
Kaiepi joined,
Kaiepi left,
Kaiepi joined
04:23
MasterDuke left
07:20
sena_kun joined
07:31
sena_kun left
|
|||
nwc10 | good *, #moarvm | 07:40 | |
07:54
sena_kun joined
08:12
zakharyas joined
08:20
zakharyas1 joined
08:23
zakharyas left
10:25
sena_kun left
10:30
sena_kun joined
|
|||
Geth | MoarVM/include-not-cat: 349d462b8c | (Nicholas Clark)++ | 3 files Create a unicode.c with #include directives instead of generating it with cat. This way we avoid generating a large temporary file, avoid needing to delete it at cleanup, avoid needing an entry in .gitignore, and get decent line numbers in warnings and errors from the C compiler. Where we replace "generate" src/strings/unicode.c with the C pre-processor instead of cat |
11:20 | |
nwc10 | oops, that needed a bit more ammending | 11:21 | |
Geth | MoarVM/include-not-cat: 6f50a1b944 | (Nicholas Clark)++ | 3 files Create a unicode.c with #include directives instead of generating it with cat. This way we avoid generating a large temporary file, avoid needing to delete it at cleanup, avoid needing an entry in .gitignore, and get decent line numbers in warnings and errors from the C compiler. |
||
nwc10 | Geth++ # excellent proofreader | ||
Geth | MoarVM: nwc10++ created pull request #1362: Create a unicode.c with #include directives instead of generating it … |
11:22 | |
nwc10 | I await the Win32 builds. | ||
nine | Oh I love this PR! | 11:27 | |
tellable6 | 2020-10-20T21:18:56Z #raku-dev <tbrowder> nine i changed msg too | ||
2020-10-20T21:23:17Z #raku-dev <tbrowder> nine sorry, i'm mistaken again. i should have cancelled the PR and resubmitted with a better log as i've done in the past | |||
11:50
zakharyas1 left
12:19
domidumont joined
|
|||
Geth | MoarVM: 4f5787d3ca | (Nicholas Clark)++ (committed by nwc10) | 3 files Create a unicode.c with #include directives instead of generating it with cat. This way we avoid generating a large temporary file, avoid needing to delete it at cleanup, avoid needing an entry in .gitignore, and get decent line numbers in warnings and errors from the C compiler. |
12:45 | |
nwc10 | hmm, I see that it might have been better to just use regular git on the command line to do that. | 12:46 | |
12:51
zakharyas joined
13:05
zakharyas1 joined
13:07
zakharyas left
13:36
zakharyas joined
13:38
zakharyas1 left
14:53
domidumont left
15:25
zakharyas left
15:28
zakharyas joined
15:58
zakharyas left
16:25
mtj_ joined
17:13
domidumont joined
17:14
MasterDuke joined
17:16
raku-bridge joined,
raku-bridge left,
raku-bridge joined
17:21
zakharyas joined
17:26
Kaeipi joined,
Kaiepi left
17:29
Kaeipi left,
Kaeipi joined
17:31
domidumont left
18:02
zakharyas left
|
|||
MasterDuke | i think i asked this the last time he wrote about this, but would lemire.me/blog/2020/10/20/ridiculo...alidation/ be useful for us? i assume we'd have to translate to plain C, not C++? | 18:45 | |
samcv, timotimo: ^^^ | 18:46 | ||
18:46
sena_kun left
|
|||
timotimo | unfortunately, we need a little more than just verification, though i suppose having a first step for verification followed by what we have to do without needing to worry about the data could be good? | 18:46 | |
like, we need to translate \r\n into one codepoint, and do all the other NFG fun | 18:47 | ||
18:57
MasterDuke left
|
|||
samcv | timotimo, yeah, that is what I got when I looked at our code and compared it. That it wouldn't fit in well with how we do things | 19:42 | |
timotimo | one day™ we'll have a type of string that is in source form | 19:44 | |
samcv | timotimo, like utf8? | 19:52 | |
19:57
MasterDuke joined
|
|||
timotimo | for example, yeah | 20:05 | |
the user would pay in terms of algorithmic complexity, but it's a trade-off anyway and maybe the user wants to make their own trade-off rather than having it made for them | |||
MasterDuke | samcv, timotimo: so if i'm looking at the right code, in MVM_string_utf8_decode we get a `char *`, then in a loop `decode_utf8_byte` (which i can't find the definition of) chunks of it, and if each chunk is ok we normalize it into a grapheme, stick it in an array of graphemes, and at the end turn that into an MVMString if there were no errors? | 20:06 | |
timotimo | i didn't look at the code, but that looks about right | 20:07 | |
MasterDuke | but that you don't think it would be faster to do a fast validation of the `char*` first, and then go into a tighter loop if it validates and do the error case if not? | 20:08 | |
timotimo | maybe. we'll still have to go through and do the variable-length-encoding thing | 20:11 | |
and normalization | |||
MasterDuke | right, we just wouldn't have the switch on the result of decode_utf8_byte | 20:12 | |
we do MVM_EXPECT the UTF8_ACCEPT case, so maybe it wouldn't be much faster | 20:13 | ||
i don't think i'm up for converting his code to something we could use just to try it out, so i'll happily accept your judgement | 20:15 | ||
timotimo | i would be happy to be proven wrong ;) | 20:18 | |
MasterDuke | github.com/MoarVM/MoarVM/projects/...d-47838858 | 20:24 | |
i still the gmp branch to finish up, the MVMSpeshCandidate as REPR branch to finish up, the coverage parser script to enhance, a wedding to plan, a wife to murder, a country to frame... | 20:26 | ||
20:29
zakharyas joined
|
|||
samcv | MasterDuke, i would think it'd be faster to do it all at once. well. at least i know when i vectorized the string copy operations, doing a run over the whole thing was faster. may be a tie if no vectorization happens | 20:29 | |
MasterDuke | ah, well volunteered! ;) | 20:30 | |
timotimo | i guess if the string fits in l2 cache, it'll be worth doing whatever does more stuff per instruction? | 20:42 | |
l3 and further outwards, the memory fetch overhead might very well eat up any improvements from vectorization and friends | |||
MasterDuke | how many strings are we validating that don't fit in l2? | 20:43 | |
20:54
zakharyas left
|
|||
MasterDuke | when compiling CORE.c setting, most common value of the bytes argument to MVM_string_utf8_decode is 12, then 13, then 14, then 2, then 16 | 21:06 | |
timotimo | do we want to hardcode some bitmasks for short strings ... | 21:50 | |
22:48
MasterDuke left
23:49
bingos_ joined
23:50
bingos left
|