#moarvm on 21 October 2020 - Raku Programming Language Log

github.com/moarvm/moarvm \| IRC logs at colabti.org/irclogger/irclogger_logs/moarvm Set by AlexDaniel on 12 June 2018.
01:54 vrurg_ is now known as vrurg 01:57 Kaiepi left 02:39 Kaiepi joined, Kaiepi left, Kaiepi joined 04:23 MasterDuke left 07:20 sena_kun joined 07:31 sena_kun left
nwc10	good *, #moarvm	07:40	Copy link Message link Add to gist Remove
07:54 sena_kun joined 08:12 zakharyas joined 08:20 zakharyas1 joined 08:23 zakharyas left 10:25 sena_kun left 10:30 sena_kun joined
Geth	MoarVM/include-not-cat: 349d462b8c \| (Nicholas Clark)++ \| 3 files Create a unicode.c with #include directives instead of generating it with cat. This way we avoid generating a large temporary file, avoid needing to delete it at cleanup, avoid needing an entry in .gitignore, and get decent line numbers in warnings and errors from the C compiler. Where we replace "generate" src/strings/unicode.c with the C pre-processor instead of cat	11:20	Copy link Message link Add to gist Remove
nwc10	oops, that needed a bit more ammending	11:21	Copy link Message link Add to gist Remove
Geth	MoarVM/include-not-cat: 6f50a1b944 \| (Nicholas Clark)++ \| 3 files Create a unicode.c with #include directives instead of generating it with cat. This way we avoid generating a large temporary file, avoid needing to delete it at cleanup, avoid needing an entry in .gitignore, and get decent line numbers in warnings and errors from the C compiler.		Copy link Message link Add to gist Remove
nwc10	Geth++ # excellent proofreader		Copy link Message link Add to gist Remove
Geth	MoarVM: nwc10++ created pull request #1362: Create a unicode.c with #include directives instead of generating it …	11:22	Copy link Message link Add to gist Remove
nwc10	I await the Win32 builds.		Copy link Message link Add to gist Remove
nine	Oh I love this PR!	11:27	Copy link Message link Add to gist Remove
tellable6	2020-10-20T21:18:56Z #raku-dev <tbrowder> nine i changed msg too		Copy link Message link Add to gist Remove
	2020-10-20T21:23:17Z #raku-dev <tbrowder> nine sorry, i'm mistaken again. i should have cancelled the PR and resubmitted with a better log as i've done in the past		Copy link Message link Add to gist Remove
11:50 zakharyas1 left 12:19 domidumont joined
Geth	MoarVM: 4f5787d3ca \| (Nicholas Clark)++ (committed by nwc10) \| 3 files Create a unicode.c with #include directives instead of generating it with cat. This way we avoid generating a large temporary file, avoid needing to delete it at cleanup, avoid needing an entry in .gitignore, and get decent line numbers in warnings and errors from the C compiler.	12:45	Copy link Message link Add to gist Remove
nwc10	hmm, I see that it might have been better to just use regular git on the command line to do that.	12:46	Copy link Message link Add to gist Remove
12:51 zakharyas joined 13:05 zakharyas1 joined 13:07 zakharyas left 13:36 zakharyas joined 13:38 zakharyas1 left 14:53 domidumont left 15:25 zakharyas left 15:28 zakharyas joined 15:58 zakharyas left 16:25 mtj_ joined 17:13 domidumont joined 17:14 MasterDuke joined 17:16 raku-bridge joined, raku-bridge left, raku-bridge joined 17:21 zakharyas joined 17:26 Kaeipi joined, Kaiepi left 17:29 Kaeipi left, Kaeipi joined 17:31 domidumont left 18:02 zakharyas left
MasterDuke	i think i asked this the last time he wrote about this, but would lemire.me/blog/2020/10/20/ridiculo...alidation/ be useful for us? i assume we'd have to translate to plain C, not C++?	18:45	Copy link Message link Add to gist Remove
	samcv, timotimo: ^^^	18:46	Copy link Message link Add to gist Remove
18:46 sena_kun left
timotimo	unfortunately, we need a little more than just verification, though i suppose having a first step for verification followed by what we have to do without needing to worry about the data could be good?	18:46	Copy link Message link Add to gist Remove
	like, we need to translate \r\n into one codepoint, and do all the other NFG fun	18:47	Copy link Message link Add to gist Remove
18:57 MasterDuke left
samcv	timotimo, yeah, that is what I got when I looked at our code and compared it. That it wouldn't fit in well with how we do things	19:42	Copy link Message link Add to gist Remove
timotimo	one day™ we'll have a type of string that is in source form	19:44	Copy link Message link Add to gist Remove
samcv	timotimo, like utf8?	19:52	Copy link Message link Add to gist Remove
19:57 MasterDuke joined
timotimo	for example, yeah	20:05	Copy link Message link Add to gist Remove
	the user would pay in terms of algorithmic complexity, but it's a trade-off anyway and maybe the user wants to make their own trade-off rather than having it made for them		Copy link Message link Add to gist Remove
MasterDuke	samcv, timotimo: so if i'm looking at the right code, in MVM_string_utf8_decode we get a `char *`, then in a loop `decode_utf8_byte` (which i can't find the definition of) chunks of it, and if each chunk is ok we normalize it into a grapheme, stick it in an array of graphemes, and at the end turn that into an MVMString if there were no errors?	20:06	Copy link Message link Add to gist Remove
timotimo	i didn't look at the code, but that looks about right	20:07	Copy link Message link Add to gist Remove
MasterDuke	but that you don't think it would be faster to do a fast validation of the `char*` first, and then go into a tighter loop if it validates and do the error case if not?	20:08	Copy link Message link Add to gist Remove
timotimo	maybe. we'll still have to go through and do the variable-length-encoding thing	20:11	Copy link Message link Add to gist Remove
	and normalization		Copy link Message link Add to gist Remove
MasterDuke	right, we just wouldn't have the switch on the result of decode_utf8_byte	20:12	Copy link Message link Add to gist Remove
	we do MVM_EXPECT the UTF8_ACCEPT case, so maybe it wouldn't be much faster	20:13	Copy link Message link Add to gist Remove
	i don't think i'm up for converting his code to something we could use just to try it out, so i'll happily accept your judgement	20:15	Copy link Message link Add to gist Remove
timotimo	i would be happy to be proven wrong ;)	20:18	Copy link Message link Add to gist Remove
MasterDuke	github.com/MoarVM/MoarVM/projects/...d-47838858	20:24	Copy link Message link Add to gist Remove
	i still the gmp branch to finish up, the MVMSpeshCandidate as REPR branch to finish up, the coverage parser script to enhance, a wedding to plan, a wife to murder, a country to frame...	20:26	Copy link Message link Add to gist Remove
20:29 zakharyas joined
samcv	MasterDuke, i would think it'd be faster to do it all at once. well. at least i know when i vectorized the string copy operations, doing a run over the whole thing was faster. may be a tie if no vectorization happens	20:29	Copy link Message link Add to gist Remove
MasterDuke	ah, well volunteered! ;)	20:30	Copy link Message link Add to gist Remove
timotimo	i guess if the string fits in l2 cache, it'll be worth doing whatever does more stuff per instruction?	20:42	Copy link Message link Add to gist Remove
	l3 and further outwards, the memory fetch overhead might very well eat up any improvements from vectorization and friends		Copy link Message link Add to gist Remove
MasterDuke	how many strings are we validating that don't fit in l2?	20:43	Copy link Message link Add to gist Remove
20:54 zakharyas left
MasterDuke	when compiling CORE.c setting, most common value of the bytes argument to MVM_string_utf8_decode is 12, then 13, then 14, then 2, then 16	21:06	Copy link Message link Add to gist Remove
timotimo	do we want to hardcode some bitmasks for short strings ...	21:50	Copy link Message link Add to gist Remove
22:48 MasterDuke left 23:49 bingos_ joined 23:50 bingos left

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!