#moarvm on 4 March 2026 - Raku Programming Language Log

Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes. Set by lizmat on 24 May 2021.
lizmat	timo: #masakism logs now also online	00:20	Copy link Message link Add to gist Remove
	sleep&		Copy link Message link Add to gist Remove
jubilatious1_98524	Hi @shimmerfairy , I nudged an ICU conversation on the old perl6-users mailing list, and got 13 replies (latest 2024). Here take a look: www.nntp.perl.org/group/perl.perl6...g9241.html	04:55	Copy link Message link Add to gist Remove
lizmat	m: .say with try 56792.chr # sorta expected to have that say Nil	11:41	Copy link Message link Add to gist Remove Run code
camelia	Error encoding UTF-8 string: could not encode Unicode Surrogate codepoint 56792 (0xDDD8) in block <unit> at <tmp> line 1		Copy link Message link Add to gist Remove
lizmat	actually, not say anything at all :-)		Copy link Message link Add to gist Remove
	timo: for reading large (2GB+) files, would it make sense to presize the Buf, (and then resize to what's used) and then have nqp::readfh start adding bytes after what is already in there?	12:57	Copy link Message link Add to gist Remove
timo	it makes sense that the 56792.chr works but the say doesn't; after all, our strings are capable of more than just™ utf8, so it can represent a lone surrogate codepoint ... or maybe we don't want that to be possible at all is what you're suggesting?	13:33	Copy link Message link Add to gist Remove
patrickb	I'm puzzled again. Looking at github.com/MoarVM/MoarVM/blob/main...ode.c#L764 The frame deserializer only reads the debug_locals data if the debug server is active, but it does not advance `pos` when the debug server isn't active.	13:36	Copy link Message link Add to gist Remove
	Won't this mess up all following deserialization? (Or is it fine simply because it's the last part of the frame data?)	13:37	Copy link Message link Add to gist Remove
timo	it's the last part, the next frame's position will be read from the table of all frame's positions again	13:38	Copy link Message link Add to gist Remove
patrickb	Ah. Understood. Then it's fine. Thanks for confirming!	13:39	Copy link Message link Add to gist Remove
timo	no problem!		Copy link Message link Add to gist Remove
	actually, a few months ago I hacked together an imhex pattern definition file for moarvm files, which you may find interesting to look at. also there's lizmat's module that can read these files and give statistics and such	13:40	Copy link Message link Add to gist Remove
lizmat	MoarVM::Profile you mean ?		Copy link Message link Add to gist Remove
timo	no i think it's MoarVM::Bytecode?	13:42	Copy link Message link Add to gist Remove
lizmat	ah, ok, yes		Copy link Message link Add to gist Remove
	Q: given enough RAM, should we be able to slurp a 2G+ file ?	13:48	Copy link Message link Add to gist Remove
timo	slurp to string, right?	13:53	Copy link Message link Add to gist Remove
lizmat	nope... binary	13:54	Copy link Message link Add to gist Remove
	sorry		Copy link Message link Add to gist Remove
timo	yeah that should surely work. do we still grow buffers only linearly after a specific size?		Copy link Message link Add to gist Remove
lizmat	well, I'm working on optimizing that	13:55	Copy link Message link Add to gist Remove
timo	ISTR a pull request or maybe just a branch with experiments for that		Copy link Message link Add to gist Remove
lizmat	yeah, that got merged, but it also needs some rework	13:56	Copy link Message link Add to gist Remove
timo	what would be really nice is if we could mmap a file into a Buf, but that's actually a different operation from slurping, since then changes to the file will cause changes in memory as well		Copy link Message link Add to gist Remove
lizmat	well, I'll leave that for a later exercise		Copy link Message link Add to gist Remove
timo	you can already get that with NativeCall and a CArray or CPointer	13:58	Copy link Message link Add to gist Remove
lizmat	% rl '"m5.sql".IO.slurp'		Copy link Message link Add to gist Remove
	initial read of 1048576: 1048576		Copy link Message link Add to gist Remove
	taking slow path for 2417660750		Copy link Message link Add to gist Remove
	sub slurp-PIO-uncertain(Mu \PIO, int $size)		Copy link Message link Add to gist Remove
	trying to read 1879048192 bytes		Copy link Message link Add to gist Remove
	appending 538612558 to 1879048192		Copy link Message link Add to gist Remove
	trying to read 1879048192 bytes		Copy link Message link Add to gist Remove
	MoarVM panic: Memory allocation failed; could not allocate 18446744066204519736 bytes		Copy link Message link Add to gist Remove
timo	well, that's not great! :)	13:59	Copy link Message link Add to gist Remove
lizmat	feels like that allocation number is... way too big ?	14:00	Copy link Message link Add to gist Remove
timo	yes, negative number		Copy link Message link Add to gist Remove
	I recall looking at some code paths where we made sure to use the biggest number type for file size information the OS / C library can give us		Copy link Message link Add to gist Remove
	m: say 18446744066204519736.base(16)		Copy link Message link Add to gist Remove Run code
camelia	FFFFFFFE40AA4D38		Copy link Message link Add to gist Remove
lizmat	m: my int $ = 18446744066204519736	14:01	Copy link Message link Add to gist Remove Run code
camelia	Cannot unbox 64 bit wide bigint into native integer. Did you mix int and Int or literals? in block <unit> at <tmp> line 1		Copy link Message link Add to gist Remove
lizmat	anyways, will contemplate while doing some gardening&	14:03	Copy link Message link Add to gist Remove
timo	maybe check with strace if you can spot where the number may come from	14:05	Copy link Message link Add to gist Remove
patrickb	When a raku process (my debugger UI) stops and doesn't perform any cleanup on shutdown, a Proc::Async child (the moar under debug) might first receive a SIGHUP (causing moar to try to shutdown) and then a SIGINT (causing it to die immediately)? I'm asking because when shutting the UI down I receive a last sole 83 byte from moar. I'd guess it's the start of a ThreadEnded notification. Does that sound plausible?	15:39	Copy link Message link Add to gist Remove
	Byte 83 is the introducer of a fixed with map with 3 elements.	15:40	Copy link Message link Add to gist Remove
timo	I can only recommend rr to record the whole process tree so you can figure things like that out reliably :)	15:58	Copy link Message link Add to gist Remove
	sigint shouldn't kill moar immediately, but we also don't catch it by ourselves so we really don't do very much between receiving sigint and exiting		Copy link Message link Add to gist Remove
lizmat	an update on the MoarVM panic: Memory allocation failed; could not allocate 18446744066204519736 bytes error	16:16	Copy link Message link Add to gist Remove
	if I don't do the $blob.append($part) I don't get an error	16:17	Copy link Message link Add to gist Remove
	so it feels like the nqp::splice of large buffers is doing something wonky		Copy link Message link Add to gist Remove
	aha: looks like it fails when such a large blob is being returned from a sub	16:46	Copy link Message link Add to gist Remove
16:48 [Coke]_ joined
lizmat	frobnicating&	16:48	Copy link Message link Add to gist Remove
16:51 [Coke] left
timo	if we are adding newly read blobs to an existing blob one by one, if we don't otherwise have GC pressure (did not verify) we might keep a lot of smaller blobs around before gc can toss them out?	17:34	Copy link Message link Add to gist Remove
	we don't really have a Blob.join or something right? but ... we probably could ...		Copy link Message link Add to gist Remove
	actually, i don't think appending a native array with the splice op is bad?		Copy link Message link Add to gist Remove
	rakudo -e '"bla/bla/bigfile.bin".IO.slurp(:bin).elems.say' → 2188252818	17:36	Copy link Message link Add to gist Remove
	2.1G big		Copy link Message link Add to gist Remove
lizmat	hmmmm that's a good point	18:02	Copy link Message link Add to gist Remove
	right... I forgot the :bin in my slurp test	18:03	Copy link Message link Add to gist Remove
	so it does trying to decode the 2.4G blob		Copy link Message link Add to gist Remove
	*dies	18:04	Copy link Message link Add to gist Remove
timo	> MoarVM panic: Memory allocation failed; could not allocate 18446744065282693704 bytes	18:11	Copy link Message link Add to gist Remove
	ah yes		Copy link Message link Add to gist Remove
lizmat	gist.github.com/lizmat/905751e5c76...c22a8c9d9c some decoding memory usage		Copy link Message link Add to gist Remove
timo	#3 MVM_string_utf8_decode (tc=0x5ec4c020080, result_type=<optimized out>, utf8=0x7ffe6a010000 "Rar!\032\a",	18:13	Copy link Message link Add to gist Remove
lizmat	note that in the end, even the ticker gets interrupted (for about 24 ticks)		Copy link Message link Add to gist Remove
timo	bytes=2188252818) at src/strings/utf8.c:241		Copy link Message link Add to gist Remove
	241 MVMGrapheme32 buffer = MVM_malloc(sizeof(MVMGrapheme32) bufsize);		Copy link Message link Add to gist Remove
	"ticker gets interrupted" could be a surprisingly long GC pause?	18:14	Copy link Message link Add to gist Remove
	MVMint32 bufsize = bytes ... yeah that can too easily overflow haha		Copy link Message link Add to gist Remove
lizmat	yeah, about 2.4 seconds worth ?		Copy link Message link Add to gist Remove
	ao appending bufs for more than 32bit ints is wonky	18:15	Copy link Message link Add to gist Remove
	?	18:17	Copy link Message link Add to gist Remove
timo	we should note that a MVMString has a num_graphs attribute that is a 32bit integer too	18:18	Copy link Message link Add to gist Remove
lizmat	aaahhh I guess that's where the decoding of larger blobs dies on	18:19	Copy link Message link Add to gist Remove
	would it work if it would make separate strands ?		Copy link Message link Add to gist Remove
timo	there is however no upper limit to how many bytes you can decode to still fit into that because we might be creating some enormous composed graphemes		Copy link Message link Add to gist Remove
	no the string that has the strands in it still has to have the total length in that attribute		Copy link Message link Add to gist Remove
lizmat	ok, so slurping my sample file will just not work	18:20	Copy link Message link Add to gist Remove
	so I guess we will need Blob.decode to check for max length and give a less LTA error message		Copy link Message link Add to gist Remove
timo	the first thing we do is generate a buffer that's big enough to have 4 bytes for every 1 byte of the input	18:21	Copy link Message link Add to gist Remove
	that's the line where the panic happens for us now		Copy link Message link Add to gist Remove
lizmat	yeah, but the buf is not 4611686016551129934 bytes	18:22	Copy link Message link Add to gist Remove
timo	correct		Copy link Message link Add to gist Remove
lizmat	so the value it shows is bogus		Copy link Message link Add to gist Remove
timo	but we go through a 32bit integer on the way to the malloc call		Copy link Message link Add to gist Remove
	that makes it go negative and then back up to unsigned 64bit which makes it huge	18:24	Copy link Message link Add to gist Remove
lizmat	ok, I guess that makes sense... but still LTA :-)		Copy link Message link Add to gist Remove
timo	yes, for sure		Copy link Message link Add to gist Remove
	rakudo -e 'say "x" x (2**32 + 1)'	18:25	Copy link Message link Add to gist Remove
	Repeat count (4294967297) cannot be greater than max allowed number of graphemes 4294967295		Copy link Message link Add to gist Remove
18:38 [Coke]_ is now known as [Coke]
lizmat	m: say "x" x (2**32)	18:48	Copy link Message link Add to gist Remove Run code
camelia	Repeat count (4294967296) cannot be greater than max allowed number of graphemes 4294967295 in block <unit> at <tmp> line 1		Copy link Message link Add to gist Remove
lizmat	that's uint32	18:49	Copy link Message link Add to gist Remove
	some handling appears to conk out at half of that because of signed int32 ?		Copy link Message link Add to gist Remove
timo	could be		Copy link Message link Add to gist Remove
lizmat	hehe, looks like .slurp(:bin) on a 2G file currently doesn't even work	18:50	Copy link Message link Add to gist Remove
timo	i think it might just be really slow because the buffer grows linearly		Copy link Message link Add to gist Remove
lizmat	Reading from filehandle failed: Invalid argument	18:51	Copy link Message link Add to gist Remove
timo	you have local changes?		Copy link Message link Add to gist Remove
lizmat	this is on 2026.01		Copy link Message link Add to gist Remove
timo	could be a bug on macos only?	18:52	Copy link Message link Add to gist Remove
lizmat	it's because nqp::readfh gets a too large value, exceeding max positive value on an int32	18:53	Copy link Message link Add to gist Remove
timo	as i showed above I can slurp(:bin) a file that's 2188252818 bytes big just fine		Copy link Message link Add to gist Remove
	you do "path".IO.slurp(:bin)?	18:54	Copy link Message link Add to gist Remove
lizmat	ah, maybe af30c7bed30b725a124876addeb1303da97ce7cf is to blame		Copy link Message link Add to gist Remove
timo	right, i'm on 2025.12-8-ga42e10a59 still i think		Copy link Message link Add to gist Remove
	oh, i see	18:55	Copy link Message link Add to gist Remove
	i see 64bits all over the place in the implementation of read_fhb that implements nqp::readfh	18:58	Copy link Message link Add to gist Remove
lizmat	0.47 to slurp a 2.4G file		Copy link Message link Add to gist Remove
	try an nqp::readfh with 0x080000000 as size	18:59	Copy link Message link Add to gist Remove
timo	the read function we use for the actual reading on the other hand returns signed int64 for the number of bytes read, but takes an unsigned int for the number of bytes to read		Copy link Message link Add to gist Remove
	According to POSIX.1, if count is greater than SSIZE_MAX, the result is implementation-defined; see NOTES	19:00	Copy link Message link Add to gist Remove
	for the upper limit on Linux.		Copy link Message link Add to gist Remove
lizmat	% r 'use nqp; nqp::readfh(nqp::open("m5.sql","r"),Buf.new,0x080000000)'		Copy link Message link Add to gist Remove
	Reading from filehandle failed: Invalid argument		Copy link Message link Add to gist Remove
	m: say my int32 $ = 0x080000000	19:01	Copy link Message link Add to gist Remove Run code
camelia	-2147483648		Copy link Message link Add to gist Remove
lizmat	m: say my int32 $ = 0x070000000		Copy link Message link Add to gist Remove Run code
camelia	1879048192		Copy link Message link Add to gist Remove
timo	that file has to actually be big enough?		Copy link Message link Add to gist Remove
lizmat	nope	19:02	Copy link Message link Add to gist Remove
timo	no error on my machine		Copy link Message link Add to gist Remove
lizmat	hmmm... ok lemme try on a linux box	19:03	Copy link Message link Add to gist Remove
timo	[pid 138197] <... read resumed>, "QFI\373\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\24\200\0\0\0"..., 2147483648) = 2147479552		Copy link Message link Add to gist Remove
	m: say 2147483648.base(16)		Copy link Message link Add to gist Remove Run code
camelia	80000000		Copy link Message link Add to gist Remove
timo	On Linux, read() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, return‐	19:04	Copy link Message link Add to gist Remove
	ing the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)		Copy link Message link Add to gist Remove
	can you check the man page on your system `man 2 read` what it says?		Copy link Message link Add to gist Remove
lizmat	The read() and pread() call may also return the following error:	19:07	Copy link Message link Add to gist Remove
	[EINVAL] The value provided for nbyte exceeds INT_MAX.		Copy link Message link Add to gist Remove
	confirmed it's not an issue on Linux		Copy link Message link Add to gist Remove
timo	ok, so on mac the maximum is INT_MAX, which i think is 32bit integers? and on linux the maximum is SSIZE_MAX which is 64bit on 64bit systems?		Copy link Message link Add to gist Remove
lizmat	I guess		Copy link Message link Add to gist Remove
	no idea what INT_MAX is on MacOS	19:08	Copy link Message link Add to gist Remove
timo	i wonder if limits.h has something		Copy link Message link Add to gist Remove
	in lldb you should be able to print(INT_MAX)		Copy link Message link Add to gist Remove
lizmat	google says:		Copy link Message link Add to gist Remove
	INT_MAX is a macro that represents the maximum value of the upper limit of the integer data type in C/C++. The value of INT_MAX is: INT_MAX = 2147483647 (for 32-bit Integers)		Copy link Message link Add to gist Remove
timo	though of course a sufficiently wild platform could have different definitions :P	19:09	Copy link Message link Add to gist Remove
	(we do not target platforms like that)		Copy link Message link Add to gist Remove
lizmat	yeah... so the underlying issue is read on MacOS being 32bit bound		Copy link Message link Add to gist Remove
timo	if there's no standardised #define available we can either probe for that in the Configure.pl or we just always cap to INT_MAX for all systems, or we try once with the value passed and if we get EINVAL we reduce the size we attempt to read	19:12	Copy link Message link Add to gist Remove
lizmat	I think capping for INT_MAX for now is ok :-)	19:13	Copy link Message link Add to gist Remove
timo	we don't do anything inside the read_bytes function of syncfile to handle having read less than was requested, do all users of nqp::readfh deal with smaller actual results?	19:14	Copy link Message link Add to gist Remove
	i guess they already have to		Copy link Message link Add to gist Remove
lizmat	I guess I'll check them after I commit this	19:15	Copy link Message link Add to gist Remove
	anyways, one probably shouldn't be slurping files with that size anyway	19:16	Copy link Message link Add to gist Remove
timo	it's quite possibly not what you actually want to do, yeah	19:17	Copy link Message link Add to gist Remove
20:27 patrickb left 20:41 patrickb joined

Please report any issues / comments / feature requests as an issue on App::Raku::Log.

Thank you!