codesections | There are a lot of ways to do something like that, depending on the exact usecase. From what you've said so far, I'd probably be inclined to zip the two lazy lists together and then diff each line one at a time. See docs.raku.org/language/operators#i...p_operator But I'd be happy to give thought to a more specific solution if | 00:54 | |
you'd care to share more details (either here or on StackOverflow – everyone who answers/moderates the [raku] tag is very welcoming) | |||
avuserow | I thought about suggesting zip with lines, but then you don't have a way to re-synchronize in the case where a line is added. | 01:56 | |
<@297037173541175296> did you post the python version somewhere? Curious if I'm understanding the problem correctly | 01:57 | ||
03:22
zacts left
05:35
codesections left
05:36
codesections joined
|
|||
Nemokosch | I think we have the same thought in mind | 07:05 | |
The exact thing I had to do with Python was a bit more convoluted so I'm gonna turn it a bit more theoretical so please treat it as a pseudocode. The lines have a key-value structure conceptually, that's the basis of saying whether one line is extra or they are differing. | 07:24 | ||
```py | |||
def diff_lines(left_iter, right_iter): | |||
try: | |||
left_line = next(left_iter) | |||
right_line = next(right_iter) | |||
while True: | |||
cmp_result = compare_keys(left_line, right_line) | |||
if cmp_result < 0: | |||
print('Announce extra line on the left side') | |||
left_line = next(left_iter) | |||
elif cmp_result > 0: | |||
print('Announce extra line on the right side') | |||
right_line = next(right_iter) | |||
I think this is enough to grasp the supposed behavior. This solution isn't precise because the left_line can take a step when the right_iter is exhausted so it's possible that we lose one line but that doesn't add to the point | 07:26 | ||
gfldex | `.lines` on a filehandle is a `Seq` so there is your bottled up iterator. `Z` will also return a `Seq` | 07:34 | |
m:``` | 07:35 | ||
for $*PROGRAM.open.lines Z $*PROGRAM.open.lines -> [$left, $right] { | |||
say $left, $right; | |||
} | |||
``` | |||
:-/ | 07:36 | ||
Nemokosch | I know that but how do you tell Z that certain subsequent lines shouldn't be matched against each other but should be considered as extra? | ||
gfldex | m: Seq.^methods.grep(*.name eq 'pop').say; | 07:38 | |
Nemokosch | this is on point | ||
gfldex | `Seq` got `.skip` | 07:39 | |
Nemokosch | without zipping them then, right? | 07:40 | |
gfldex | Alas, `.skip` is not a mutator. So there is some wrapping required. | 07:44 | |
Nemokosch | Well this is a bit too cryptic for me 😅 | 07:46 | |
Actually I think I succeeded with iterators but it was rather painful for exactly one reason: IterationEnd is Mu | 07:53 | ||
if it was any normal value (literally haha), it would have been no big deal | |||
gfldex | ``` | 08:02 | |
my @left = $*PROGRAM.open.lines; | |||
while @left { | |||
@left.shift if (True, False).roll; | |||
try put @left.shift; | |||
} | |||
``` | |||
<@!297037173541175296> You assign the `Seq` to an `Array` and then use the mutators on the `Array` to manipulate the iterator of the `Seq` indirectly. | 08:03 | ||
Nemokosch | is this lazy? | 08:04 | |
gfldex | To answer that question would require to understand github.com/rakudo/rakudo/blob/mast...y.pm6#L184 | 08:07 | |
08:07
dakkar joined
|
|||
It does use `IterationBuffer` so it looks to me as if it would be lazy in chunks. lizmat can likely shed more light on this then me. | 08:09 | ||
Nemokosch | because yes, given the files with like 20k lines, slurping them and doing something afterwards, it would still be okay and dead simple | 08:10 | |
lizmat | that is lazy | ||
Nemokosch | but it's more than just a hypothetical that the files can get big and we have more time than memory in this case | 08:11 | |
lizmat | also, since $*PROGRAM is an IO::Path, you don't need the .open | ||
Nemokosch | interesting that something with an Array interface can be lazy 😮 | 08:12 | |
gfldex | Sadly, I don't have a text file bigger then 64GB to test if it does fill all the RAMz. :) | 08:13 | |
Nemokosch | 64 GB, heh | ||
I managed to get the Python script that acted upon ~ 100 MB files dead on a 4 GB VM | 08:14 | ||
probably trying to sort the records real-time wasn't a marvelous idea 🙂 | 08:15 | ||
Morfent | m:``` | ||
use v6; | |||
.say for Array(gather for 1..3 { .say; .take }) | |||
``` | |||
er | |||
that's not how i test what i wanna test | |||
m:``` | 08:16 | ||
use v6; | |||
.say for gather for 1..3 { .say; .take } | |||
``` | |||
m:``` | 08:17 | ||
use v6; | |||
.say for my @ = gather for 1..3 { .say; .take } | |||
``` | |||
m:``` | 08:18 | ||
use v6; | |||
.say for my @ = lazy gather for 1..3 { .say; .take } | |||
``` | |||
there we go | |||
gfldex | We spend great afford to be able to be lazy. :) | 08:20 | |
Nemokosch | Not gonna lie, it's annoying that all languages I know have iterators some way and they somehow always get to mess it up with a useless interface to it lol | 08:21 | |
I'm starting to think the Java interface actually isn't bad | |||
no matter how ugly this hasNext next combo sounds | 08:22 | ||
with this nice `try` statement prefix, even the Python interface could be decent in Raku | 08:23 | ||
gfldex | The irc-bridge bot basically lives of `try`-statements. :) | 08:31 | |
08:31
discord-raku-bot left
08:32
discord-raku-bot joined
11:56
Guest28 joined
11:58
Guest28 left
|
|||
avuserow | So from a higher order perspective, you kind of want a zip function that takes a comparator function to determine whether to consume from the left side, right side, or both. Basically you'd pass in your compare_keys function | 13:53 | |
Nemokosch | yes, kinda | 14:19 | |
16:36
dakkar left
|