Welcome to the main channel on the development of MoarVM, a virtual machine for NQP and Rakudo (moarvm.org). This channel is being logged for historical purposes.
Set by lizmat on 24 May 2021.
04:09 lizmat_ joined 04:12 kjp left 04:13 lizmat left, kjp joined 04:14 kjp left, kjp joined 05:25 Techcable left, harrow left, Nicholas left, gfldex left, ilogger2_ left, tbrowder left, committable6 left, quotable6 left 05:27 committable6 joined, quotable6 joined, harrow joined, Nicholas joined, gfldex joined, ilogger2_ joined, tbrowder joined 05:28 Techcable joined 05:32 Techcable left 05:33 Techcable joined 07:39 lizmat_ left, lizmat joined
patrickb I have a malloced C string that I'd like to throw in an adhoc exception. But I see no way to do so without leaking the memory in that string. Is there a way? 08:55
lizmat timo or nine might know a way 08:58
timo yeah there is
patrickb Oh I think I see. There is a waste arg. 08:59
timo MVM_exception_throw_adhoc_free i think is what the function is called, you pass it an array of pointers to c strings that ends with a NULL to signal the end, and it takes care of freeing the values for you after making the string from the format
patrickb Whoop!
timo btw i've created a new command for the debug server
patrickb Do tell? 09:01
!
timo it 1) lists all filenames it knows about (for breakpoints) and 2) sends you notifications whenever new files are introduced to the files table, and it allows you to optionally ask the thread to stop after encountering any new file 09:02
plus, for convenience, send a stack trace immediately if you want
patrickb Oh! That's kind of related to what I want to. I.e. make moar send the source of such files. 09:04
timo right
lizmat thread name PR merged 09:05
timo for the next step i want to also make a big ol' list of line numbers that have a breakpoint instruction corresponding to it somewhere
make that available via the protocol
lizmat ok, now that we're all here: I'd like to make a module today that would be callable with -Mfoo from the command line 09:06
and which would force a full stack trace when Ctrl-C is pressed 09:07
timo since the breakpoints are based on annotations being read and turned into breakpoint instructions when the instrumentation barrier is first hit by a frame, that makes it kind of "lazy" when lines from the file show up
lizmat basically, it would invoke itself with the debug-server enabled, and add a signal handler
patrickb timo: What's the use of such a list? Is it just so that the client doesn't have to keep track of active breakpoints? 09:08
timo plus i would really like there to be more specificity to breakpoint locations, since one line can have instructions from multiple frames in it (think of a line with multiple map instructions with blocks and what-not
it's needed so that the user can get feedback on whether a line they put a breakpoint on will even ever trigger or not 09:09
i may also provide a way to force all frames from a given compunit, or all frames everywhere, to be instrumentation-barriered immediately, or perhaps a way to filter all frames by what file names are in their annotations 09:10
i'm a bit annoyed by the commands in the moar remote app. they are kind of wordy. it'd really be great to have something more TUI like that also has some keybinds maybe. though it's not required to make it TUI-like to support some keys on the input side, of course 09:14
patrickb: do you already have something in mind for how to register what source belongs to a given compunit? i'm thinking the only really reliable way to work with this is to have rakudo and nqp call a moar syscall (for example) that takes the compunit and either the source text as a string, or maybe a path to a local file, plus some extra information. for example, we do a lot of small compilations 09:19
for BEGIN time things (rakuast improves upon this by not always having to compile to run code) and there it'd be interesting to have a start and end position of the relevant code as well
and it'd be good to be able to grab the bytecode of a frame over the connection as well, but for that we would have to come up with a format :D 09:25
09:49 sena_kun joined
patrickb I have still very little idea of all the moving parts in this. But my naive idea was to have rakudo annotate the source to the comp units. (And provide an opt out.) 10:12
So we'd have the source embedded in the bytecode. I'd ideally zstd compress it (and put zstd into moar by default). 10:13
patrickb is back to $work 10:15
timo: I'm pretty sure you have a lot more context in that area. Before I start working on this we should definitely first consult and make a plan together. 10:36
13:27 sena_kun left 13:33 sena_kun joined 13:34 sena_kun left, sena_kun joined
timo you know, we don't have to change anything about the moarvm file format, we can just have a buffer with the zstd compressed source serialized into the serialized blob, and some way to find which object is the one we want when looking for the sources 14:17
lizmat could even be a HLL level thing in RakuAST 14:41
for each frame, a substr into the total source file ?
timo it'd be nice if the source string were not deserialized and decompressed until it is actually accessed, that should definitely be possible in terms of SMOP in the rakuast nodes 14:42
otherwise we're going to get a bad memory usage increase for the core setting for example 14:43
lizmat right
timo but also i'm not entirely sure if putting source text, even compressed, into the precomp files is a great idea 14:44
we already install sources alongside distributions, right? with a hash of the contents?
lizmat I mean, the core C setting source gzipped is 500K or so 14:45
the core c setting bytecode file is about 29M 14:46
feels like an additional 500K would only make it about 2% larger ?
timo even with zstd -19 it's 431 KiB big 14:47
lizmat wonder whether the .Uni of the source would even compress better 14:50
timo i can give that a try
aww, you can only spurt a buf8 or buf16 14:54
we don't have anything fast built-in to turn a buf32 into a buf16 or buf8 with big or little endian byte ordering? 14:55
patrickb The reason why I'd like to have it in the byte code is availability and 14:57
extensibility. 14:58
timo and the issue of source on disk different from source in storage probably also?
patrickb If you put it into the bytecode, then moar will by design not run into issues of the likes of not finding the sources.
timo well, maybe not for installed dists, but for .precomp files when you use -I .
patrickb Also it's possible to provide sources for things that don't exist on the disk. E.g. EVALed stuff. 14:59
timo indeed, but for EVALed stuff we don't have to store it in the compunit since the debugserver is also in the process
no need to compress then either, and can leave it out if the debugserver is not turned on 15:00
patrickb That would shift a lot of the source handling logic into moar, right?
timo can you elaborate a bit? i'm getting a headache (unrelated to this discussion) and thinking is a bit hard right now 15:01
patrickb If it'd be an annotation on the CompUnit, then moar would not have to deal with how the source is generated at all.
The Raku compiler would add the annotation during compilation. Moar would simply reach for that annotation and return it if found. 15:02
All the fancy stuff of how to find the source (on the disk, some EVAL string, some deparsed RakuAST) would then live in the compiler, not in moar. 15:03
timo ok, so the behaviour of read-uint8 on a buf32 is that the offset you pass in selects the array slot, then you get the lowest 8 bits of the value in it as your result 15:04
lizmat there's also an Endian argument 15:05
hmmm 15:06
timo right, i imagine that will let you get either the highest or the lowest byte, but not the two in the middle :D
lizmat actually, for read-uint8 the endian setting has no meanig
for buf32 you'd need read-uint32 ?
timo yeah but i want to get the individual bytes so i can write them out :D 15:07
well, we can write_fhb a int16 buf so i can get the lower and upper two-bytes with read-uint16 and stash them in my result buf16
actually, if i get the BigEndian first, then the LittleEndian, for a 16bit read, i get the incredibleness of middle-endian in the result 15:09
oh no
actually what i get with LittleEndian and BigEndian is the lowest 16 bits both times, but once "backwards" and once "forwards" 15:10
the upper 16 bits of the 32bit buf entry remains inaccessible to /.read-u?int[8|16]/ 15:11
could be easiest to just create a CArray to hold all the data and just nativecast it
lizmat well, the semantics of the .readxxx methods are based on blob8 15:12
timo that's fair
it's tricky to make it make sense intuitively + consistently 15:13
lizmat yup 15:15
afk&
timo 508 KiB core_c_setting_uni_nfc_to_32bit_binary.bin.zst; 432 K gen/moar/CORE.c.setting.zst 15:20
there are 3053 codepoints outside of 0x00..0xff in the core setting, compared to 3262397 inside 0x00..0xff 15:40
looks like the variable size of utf8 is working very well here, and zstd -19 doesn't gain anything from using a fixed width encoding
15:47 sena_kun left, sena_kun joined
timo zstd --ultra --long -22 is not noticeably better either 15:53
like 7KiB better for 32bit integers and 1KiB better for utf8 15:54
patrickb: do you have a good idea for how we would handle source files that use #?line directives? 15:55
patrickb Unsure, but isn't that a non issue? 15:58
timo not exactly sure
i'm going to go lie down for my head :| 15:59
patrickb The #?line thing is there to bridge the gap between what's in the source that's compiled and the original source files, right?
Recover well,
timo right, we mainly use it for anything we gen-cat while building rakudo, but i think users can also use it if they like 16:00
patrickb If so, then the file that's actually compiled is the one that we would end up putting in the byte code. So there is no line number mismatch anymore. 16:01
We would simply ignore those #?line statements.
timo the annotations we put into the moarvm file reference lines "after" remapping to individual files 16:02
patrickb Oh. 16:03
So a single comp unit can reference lines in multiple files? 16:04
timo that's right. but we don't want to just get the contents of these files as they are, since gen-cat also handles #?if moar and #?if jvm and whatnot
they make the line numbers line up by emitting empty lines where something didn't match though 16:05
patrickb Would it be feasible to have "raw" line number annotations in addition to the "mapped" ones we already have?
timo i wouldn't like them to be on every annotation, but it'd be fine to have a mapping from filename to line (maybe character) offset into the source so you can fix them up very cheaply 16:06
although 16:07
don't try what happens when you have something like #?line foo.txt 1 bla bla #?line bar.txt 1 bla bla #?line foo.txt 10 lol tricked you #?line bar.txt 99 what
i.e. the same file name multiple times in the file
or even the same filename a few times in a row but with overlapping starting line numbers? 16:08
ok the syntax is different
i thought the #line comment had to be a little bit more special than that. i wonder if anybody put a comment like that by accident in their own code? maybe it's only active during rakudo build somehow?? 16:09
patrickb Would it make more sense to have annotations for raw line numbers only and a mapping that takes the #?line magic into account? Feels a bit backwards to do these transformations and then add logic to revert it again.
timo the logic to revert it would only be for the debug server / client, whereas the correctly mapped line numbers are used in stack traces and everything 16:12
patrickb right
timo OK i'll go rest for real now
patrickb o/
timo feel free to investigate, i don't know very much about the machinery behind line numbers and annotations and such 16:13
it randomly occurs to me that there's more "interesting" stuff to account for in terms of filename / line number stuff in the context of the debug server. for example, we allow multiple versions of the same module to be loaded simultaneously. if we don't give sources/AB/CDEFABCDEF as the filename, the user of the debug protocol can't really express which of the files with the same name they mean 16:15
when they create a new breakpoint
having the actual source in the compunit is one step towards making this better, but the problem of identifying what the user is refering to is almost orthogonal to that 16:16
maybe we just want to change the API to allow either a filename as string, or a filename as an int that refers to a table. same for when the debug server sends over stack traces and other bits that have a filename in them 16:17
plus, there's maybe even a use case for differentiating a breakpoint spot by more dimensions. like what if we have an additional number for every breakpoint that refers to the same filename and line number that we increment every time we generate code that has that in it, then you could break in just one spesh candidate for example, or only when the frame is non-speshed, or only when it's jitted, 16:20
etc etc
per-spesh-candidate breakpoints make it simple to "break here but only if this function has been called with three object arguments" 16:21
patrickb I guess from a usability point of view we'd want to be able to see a list of all open files (with identifiers that allows distinguishing them) and a way to add breakpoints to files that haven't been loaded yet. When defining such breakpoints one should ideally be able to be fuzzy as being forced to get the identifier perfectly right is annoying.
timo 100% agree on the fuzzy thing. integrating fzf would probably not be a bad choice 16:22
patrickb You already added something that notifies whenever a file is loaded, right?
timo well, i call it "loaded" but really it's when the instrumentation barrier is first hit on any frame that mentions the filename. which i guess already happens with a compunit's mainline frame being run, and perhaps even in the dependencies+deserialize frame? 16:23
patrickb Then that logic wouldn't need to live in the server but could reside in the client! (Given the server gives the client a chance to register new breakpoints before proceeding with executing code of the newly loaded file.)
timo yes, there is an option to suspend the thread after it finishes the instrumentation barrier, though making all threads suspend at that point is a little bit more annoying to write 16:24
so actually, after the first thread loads the first frame from the file and goes to sleep, another thread could run past and run any other frame from that file, since it's now no longer "never seen before" 16:25
patrickb Hm. That's a problem we'll need to solve... 16:26
timo i can think of a way to prevent other threads from racing past to a different frame after the first one has been seen. it'd be semantically different from "stop all threads at that moment" since threads that aren't touching that file at all would continue running 16:28
nah, the most sensible thing is to implement the "suspend all threads" thing at that exact point 16:29
and there should be a similar thing to "file loaded" (name change pending, maybe) that refers to compunits rather than seeing annotations in frames. and probably also a request to immediately read all annotations and get all filenames and line numbers as soon as a compunit is loaded? or maybe that should always happen by default if the debug server is running 16:31
ok, AFK for real now
patrickb o/ again :-) 16:33
Geth MoarVM/more_helpful_oops_messages: 4 commits pushed by (Timo Paulssen)++ 22:30
23:21 sena_kun left