#moarvm on 10 September 2017 - Raku Programming Language Log

01:16 geekosaur joined 01:55 ilbot3 joined
Geth	MoarVM: samcv++ created pull request #685: Merge collation-arrays branch	05:02	Copy link Message link Add to gist Remove
	MoarVM: 088aa0a022 \| Skarsnik++ (committed using GitHub Web editor) \| src/6model/reprs/CArray.c Fix a leak in CArray repr. When expand is called (via at-pos) it make room for more perl6 object in the body, but this does not get free if the array is not managed by MoarVM.	07:30	Copy link Message link Add to gist Remove
	MoarVM: 1059eed1cc \| niner++ (committed using GitHub Web editor) \| src/6model/reprs/CArray.c Merge pull request #684 from Skarsnik/patch-1 Fix a leak in CArray repr.		Copy link Message link Add to gist Remove
07:58 domidumont joined 08:03 domidumont joined
nine	ugexe: it's taken me years to get this into my head: MVM_ROOT is less about the value than it is about the variable. It's for having the GC update your local pointer to the object after it moved the object to the second generation as the object's address will change.	08:32	Copy link Message link Add to gist Remove
yoleaux	8 Sep 2017 15:53Z <tbrowder> nine: I just filed issue #101 with Inline::Perl5; failure using Expect::Simple		Copy link Message link Add to gist Remove
	8 Sep 2017 19:19Z <tbrowder> nine: my version was too old, closing issue 101		Copy link Message link Add to gist Remove
08:52 leont joined
Geth	MoarVM: 866623d933 \| (Samantha McVey)++ (committed using GitHub Web editor) \| 8 files Merge Full Unicode Collation Algorithm Implementation This is a full implementation of the Unicode Collation Algorithm. We iterate by codepoint and put this into a ring buffer. The ring buffers hold the exact number of codepoints which comprise the longest sequence of codepoints which map to its own collation keys in the Unicode Collation Algorithm. As of Unicode 9.0 this number was 3. When Generate-Collation-Data.p6 is run, this number will ... (42 more lines)	10:18	Copy link Message link Add to gist Remove
	MoarVM: 0b81969db2 \| (Samantha McVey)++ \| 2 files Remove unneeded file from UCA implementation This file isn't needed for the generation script.	10:21	Copy link Message link Add to gist Remove
timotimo	nine: not only will the address change when the object is moved to the second gen, but we also move it from one half of the nursery to the other half at least once	10:45	Copy link Message link Add to gist Remove
nine	Oh, I didn't know that!	10:57	Copy link Message link Add to gist Remove
timotimo	yeah, we have "semispace copying nurseries" or what it's called	10:58	Copy link Message link Add to gist Remove
jnthn	Yeah, the nursery is a semispace; we allocate in half of it, then evacuate the objects still alive, but never spotted by GC before, into the other half, and move the ones we did see once before into gen2	11:00	Copy link Message link Add to gist Remove
	So each thread switches between the spaces per GC		Copy link Message link Add to gist Remove
	This is a really nice scheme in that 1) you get cheap bump-the-pointer allocation, 2) for objects that die right away you do close to zero work	11:02	Copy link Message link Add to gist Remove
	And 1 implies good cache locality also		Copy link Message link Add to gist Remove
	Though you can't do it for the entire heap, otherwise your memory overhead becomes immediately at least 2 * what the program actually needs	11:03	Copy link Message link Add to gist Remove
11:09 Skarsnik joined 11:44 vendethiel- joined
MasterDuke	samcv: your latest commit or two broke the moarvm build for me	11:46	Copy link Message link Add to gist Remove
	dir	11:48	Copy link Message link Add to gist Remove
	wrong window		Copy link Message link Add to gist Remove
	src/strings/unicode.c:78279:43: error: unknown type name ‘sub_node’	11:49	Copy link Message link Add to gist Remove
	src/strings/unicode.c:78324:24: error: ‘codepoint_sequence_no_max’ undeclared here (not in a function); did you mean ‘codepoints_by_name’?		Copy link Message link Add to gist Remove
	anyone else getting those?	11:59	Copy link Message link Add to gist Remove
12:13 brrt joined
timotimo	let me look	12:14	Copy link Message link Add to gist Remove
MasterDuke	i did make realclean, but still seeing it	12:15	Copy link Message link Add to gist Remove
timotimo	yeah i get these errors too		Copy link Message link Add to gist Remove
	this starter_main_elems is a #define in unicode_uca.c	12:16	Copy link Message link Add to gist Remove
MasterDuke	huh, it builds fine on my laptop		Copy link Message link Add to gist Remove
timotimo	easy fix	12:17	Copy link Message link Add to gist Remove
	rm unicode.c; make		Copy link Message link Add to gist Remove
Skarsnik	what MVMROOT does?	12:39	Copy link Message link Add to gist Remove
timotimo	what about it?	12:40	Copy link Message link Add to gist Remove
Skarsnik	Dunno I am reading CArray.c at_pos function, since I get 2 crash pointing at this	12:41	Copy link Message link Add to gist Remove
	well pointing at the NC.pm operator [] for array	12:42	Copy link Message link Add to gist Remove
timotimo	oh you mean "what does it do"		Copy link Message link Add to gist Remove
Skarsnik	Yeah sorry x)		Copy link Message link Add to gist Remove
timotimo	it tells the GC "this local variable points at a GC-managed object. i want this object to stay alive and i want you to update the pointer when the object gets moved"		Copy link Message link Add to gist Remove
Skarsnik	hm I am not sure understand. This mark the whole child_objs array and not the new object created in it ? github.com/MoarVM/MoarVM/blob/mast...ray.c#L296	12:45	Copy link Message link Add to gist Remove
timotimo	the storage array is likely handled by gc_mark	12:47	Copy link Message link Add to gist Remove
	the object we're calling at_pos on is "root", that gets pointed out to the GC and the GC will then do the rest		Copy link Message link Add to gist Remove
	but it doesn't look like the storage array is gc-relevant	12:48	Copy link Message link Add to gist Remove
Skarsnik	storage is handled by C in this part. What I get is the chlild_objs array is what Moar add to track the perl6 objects it return	12:51	Copy link Message link Add to gist Remove
timotimo	mhm		Copy link Message link Add to gist Remove
	oh	12:55	Copy link Message link Add to gist Remove
	the reason why the crash points at the MVMROOT		Copy link Message link Add to gist Remove
	is because gdb is stupid		Copy link Message link Add to gist Remove
	it treats the whole macro as a single line		Copy link Message link Add to gist Remove
	so where the crash happens exactly is hidden	12:56	Copy link Message link Add to gist Remove
Skarsnik	I was not clear, it does not crash on MVMROOT but the backtrace pointed me at this function x)	12:57	Copy link Message link Add to gist Remove
	So I try to understand it x)à		Copy link Message link Add to gist Remove
timotimo	oh ok	13:00	Copy link Message link Add to gist Remove
13:02 dogbert2 joined
dogbert2	c: MVM_SPESH_NODELAY=1 HEAD use Test; sub try_eval($str) { try EVAL $str }; is(try_eval('my @x = <a b c>; sub y (@z) { @z[0] }; y(@x)'), "a b c", "NO-BREAK SPACE") for ^10;	14:08	Copy link Message link Add to gist Remove
committable6	dogbert2, ¦HEAD(9b42484): «ok 1 - NO-BREAK SPACE␤ok 2 - NO-BREAK SPACE␤ok 3 - NO-BREAK SPACE␤ok 4 - NO-BREAK SPACE␤ok 5 - NO-BREAK SPACE␤ok 6 - NO-BREAK SPACE␤ok 7 - NO-BREAK SPACE␤ok 8 - NO-BREAK SPACE␤ok 9 - NO-BREAK SPACE␤ok 10 - NO-BREAK SPACE»		Copy link Message link Add to gist Remove
dogbert2	hmm		Copy link Message link Add to gist Remove
	c: MVM_SPESH_NODELAY=1 HEAD use Test; sub try_eval($str) { try EVAL $str }; is(try_eval('my @x = <a b c>; sub y (@z) { @z[0] }; y(@x)'), "a b c", "NO-BREAK SPACE") for ^10;	14:09	Copy link Message link Add to gist Remove
committable6	dogbert2, ¦HEAD(9b42484): «ok 1 - NO-BREAK SPACE␤ok 2 - NO-BREAK SPACE␤ok 3 - NO-BREAK SPACE␤ok 4 - NO-BREAK SPACE␤ok 5 - NO-BREAK SPACE␤ok 6 - NO-BREAK SPACE␤ok 7 - NO-BREAK SPACE␤ok 8 - NO-BREAK SPACE␤ok 9 - NO-BREAK SPACE␤ok 10 - NO-BREAK SPACE»		Copy link Message link Add to gist Remove
dogbert2	this code is part of t/spec/S02-lexical-conventions/unicode-whitespace.t and on my system it behaves very badly when MVM_SPESH_NODELAY=1	14:10	Copy link Message link Add to gist Remove
	lots of test failures, although the first few works. Interestingly, if I add MVM_SPESH_BLOCKING=1 the problem vanishes	14:12	Copy link Message link Add to gist Remove
	for me the output from running the above mentioned file looks like this: gist.github.com/dogbert17/cb708cb3...0ff0a668e5	14:21	Copy link Message link Add to gist Remove
14:21 robertle joined 15:06 zakharyas joined 15:56 zakharyas joined
dogbert2	should it be possible to decode any Buf, containing bytes, to a Str with utf8-c8 ?	16:28	Copy link Message link Add to gist Remove
timotimo	should be, yeah, but currently we'll bail if there's "almost valid" utf8	16:31	Copy link Message link Add to gist Remove
dogbert2	ah, I was looking at one of [Tux]'s test files and it generates random buffers and try to decode them to Str with utf8-c8	16:33	Copy link Message link Add to gist Remove
	so if the buffer contains "almost valid" utf-8 then the utf8-c8 decoder will fail?	16:34	Copy link Message link Add to gist Remove
timotimo	right	16:36	Copy link Message link Add to gist Remove
	for you see		Copy link Message link Add to gist Remove
	there's encoding-wise valid utf8 that's still invalid because the codepoints encoded aren't proper		Copy link Message link Add to gist Remove
	we'd still want to encode these with synthetics, though		Copy link Message link Add to gist Remove
	nobody implemented that yet, though		Copy link Message link Add to gist Remove
dogbert2	aha, that explains it, many thanks	16:37	Copy link Message link Add to gist Remove
16:39 domidumont joined
timotimo	sure	16:39	Copy link Message link Add to gist Remove
	m: say chr(0x10ffffff + 1)	16:43	Copy link Message link Add to gist Remove Run code
camelia	Error encoding UTF-8 string: could not encode codepoint 285212672 (0x11000000), codepoint out of bounds. Cannot encode higher than 1114111 (0x10FFFF) in block <unit> at <tmp> line 1		Copy link Message link Add to gist Remove
timotimo	m: say chr(0x10ffff + 1)		Copy link Message link Add to gist Remove Run code
camelia	Error encoding UTF-8 string: could not encode codepoint 1114112 (0x110000), codepoint out of bounds. Cannot encode higher than 1114111 (0x10FFFF) in block <unit> at <tmp> line 1		Copy link Message link Add to gist Remove
timotimo	if i'm not mistaken, utf8 can no-problem represent codepoints higher than this, but they are "forbidden"	16:47	Copy link Message link Add to gist Remove
dogbert2	that's mean :)		Copy link Message link Add to gist Remove
timotimo	so if you encounter a properly encoded value above 0x10ffff we won't create a synthetic because the encoding is correct, but it's still explosive in a "later stage"		Copy link Message link Add to gist Remove
	with the cold i'm having i don't have the necessary brain grease to step in and fix it		Copy link Message link Add to gist Remove
dogbert2	do you use any house remedies to get well, e.g. c-vitamins and such	16:48	Copy link Message link Add to gist Remove
timotimo	i got aspirin complex which is painkiller + cough suppressant + something else	16:49	Copy link Message link Add to gist Remove
	something that prevents the nose from running as much	16:50	Copy link Message link Add to gist Remove
dogbert2	cool, I believe that these combo meds are forbidden where I live, don't know why though		Copy link Message link Add to gist Remove
timotimo	huh? weird.	16:51	Copy link Message link Add to gist Remove
dogbert2	indeed, so you have to get several meds instead of one		Copy link Message link Add to gist Remove
timotimo	i wonder if the combo meds are noticably more expensive than getting the individual parts and mixing them by yourself	16:52	Copy link Message link Add to gist Remove
dogbert2	good question	16:53	Copy link Message link Add to gist Remove
leont	utf-8 isn't really well-defined beyond that point	17:04	Copy link Message link Add to gist Remove
timotimo	oh	17:15	Copy link Message link Add to gist Remove
17:18 pmurias joined
pmurias	the MoarVM wants to have all not static function MVM_ prefixed?	17:19	Copy link Message link Add to gist Remove
leont	Start bytes F0-4 are defined as encoding for 4 byte codepoints, and logically F5-F8 would do the same (IMHO), but what should F9 do? 5 bytes?	17:27	Copy link Message link Add to gist Remove
	Erm, that's a one off, F5-F7, and F8		Copy link Message link Add to gist Remove
	(in the C0-F4 range, the number of initial 1s is the number of bytes for the character)	17:29	Copy link Message link Add to gist Remove
	The reason for the 0x10ffff limit is that UTF-16 can't express any higher codepoints, because it's a idiotic encoding	17:32	Copy link Message link Add to gist Remove
20:19 robertle joined 20:27 robertle_ joined 20:31 zakharyas joined 21:58 dogbert2 joined 23:02 bisectable6 joined, benchable6 joined 23:17 MasterDuke joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!