| oshaboy | Yknow I like Raku but I wish there was a way to say "actually I don't want the string to be Unicode normalized" | 00:07 | |
| nemokosch | out of curiosity, what is your use case? | 00:19 | |
|
02:28
hulk joined
02:29
kylese left
02:41
SmokeMachine left,
SmokeMachine joined
|
|||
| Voldenet | re > stackoverflow.com/questions/516441...9#79892319 | 02:42 | |
| I don't overly hate idea, but it makes code really verbose | 02:43 | ||
| I've tried that a few times and overall Result<T, Error> needs good syntax support | 02:45 | ||
|
02:47
skaji__ left,
skaji__ joined
|
|||
| Voldenet | but I suppose something like this would be pretty neat: | 02:47 | |
| Hm, nevermind, I thought about using .resume with value but it ended up being really convoluted | 02:48 | ||
| overall probably there's no need to complicate this | 02:55 | ||
| m: sub some-fail { die "random" }; sub x { CATCH { default { return 42; } }; return some-fail() }; say x() | |||
| camelia | 42 | ||
| nemokosch | well I don't like this more but for sure it does less | 02:56 | |
| the truth is, I can entertain the task but in general I think it's massive code smell if you are ever going to need to do something like this | 02:57 | ||
| providing separate defaults for separate unusual code paths | |||
| Voldenet | it totally isn't – having default value if something is not responding right is sane | 02:58 | |
| nemokosch | that is sane - having different defaults on different kind of problems is insane | ||
|
02:58
cpli left
02:59
cpli joined
|
|||
| Voldenet | actually I've used that and it's not horrible | 02:59 | |
| nemokosch | I'm not sold for sure | ||
| Voldenet | # sub do-request { CATCH { when DeserializationException { return http-error(400) }; when ConnectionError { return http-error(500); }}; return http-json(200, do-request()); } | 03:01 | |
| that might have some typos but you have the idea | |||
| ofc it's also possible to do that…: | |||
| # sub do-request { CATCH { when handle-error($_) { $_.http-format }; }; return http-json(200, do-request()); } | 03:02 | ||
| nemokosch | it's almost like this is a good reason to not use exceptions but failures at the very least | ||
| Voldenet | Failures don't have stack trace though | 03:03 | |
| it's hardly ever a good idea to not have stack | |||
| or rather - error has to be very local | |||
| nemokosch | well, you either want to use something as a value, or troubleshoot it, but not both at once | ||
| here, it seems that you want to use it as a value and you don't need a stack trace - you don't even need custom control flow | 03:04 | ||
| just a special value | |||
| Voldenet | but then do-request doesn't have to bother with understanding all the errors | ||
| it just does regular golden path with "die" when necessary | 03:05 | ||
| nemokosch | I don't understand | ||
| Voldenet | > my $db = db-connect; LEAVE $db.try-disconnect; return $db.get-item($request.int-from-post<id>) | 03:06 | |
| that'd be code of some method | 03:07 | ||
| and all of these methods can throw 10 exceptions | |||
| socket connection, dns resolution, timeouts, wrong request | |||
| or rather, deserialization error | 03:08 | ||
| when writing that you don't want to handle errors on-site because request would be failed anyway… but you don't want to hardwire the code to return HTTP codes | |||
| nemokosch | none of those seem to be recoverable | ||
| I still don't see this middle ground where by default you'd still want to propagate but cherry-pick some to return values | 03:09 | ||
| Voldenet | Well, connection error and friends should be 500 - server error | 03:10 | |
| nemokosch | also, at this point I can't see the big win | ||
| Voldenet | but deserialization error would be 400 | ||
| failure to find item would be 404 | |||
| nemokosch | in my solution, you still have the exceptions, it's just the "catching" is the default behavior | ||
| Voldenet | however it's only responsibility of http, not the internal method that gets the item | ||
| nemokosch | and the "rethrowing" is the manual part | ||
| also, you avoid the sub wrapping and the phaser | 03:11 | ||
| to my understanding, my version also doesn't need manual propagation on all levels of the call stack, only one rethrow - specifically needed because some exceptions were handled on that level | 03:13 | ||
| not just handled but "recast" as values | |||
|
03:15
hulk left,
kylese joined
|
|||
| Voldenet | well, you can't handle one specific error and not others | 03:16 | |
| nemokosch | how not? | ||
| Voldenet | m: class X is Exception { }; my $result = do given try { die X.new }, $! { when *, X::AdHoc { "Some other exception thrown: $_[1]" }; default { "Proper value: $_[0]"; } }; say $result | ||
| camelia | Use of Nil in string context Proper value: in block at <tmp> line 1 |
||
| nemokosch | okay, this is hard to read | 03:17 | |
| but if I get the idea: this can be addressed by one line of code | |||
| and it's a constant one line | |||
| Voldenet | + you get to choose which errors are handled and which should be thrown further | 03:19 | |
| for completness | 03:20 | ||
| m: class X is Exception { }; my $result = sub n() { CATCH { when X::AdHoc { return "bar" }; }; die "blah"; }(); say $result | |||
| camelia | bar | ||
| Voldenet | m: class X is Exception { }; my $result = sub n() { CATCH { when X::AdHoc { return "bar" }; }; X.new.throw }(); say $result | ||
| camelia | Died with X in sub n at <tmp> line 1 in block <unit> at <tmp> line 1 |
||
| nemokosch | well, you get to choose that either way, don't you | 03:21 | |
| this is not a difference | |||
| Voldenet | No, because try will gobble up the exception | ||
| so you can't say "ignore this error" because you've already chosen to handle everything :) | 03:22 | ||
| you can do `*, Exception { $_[1].rethrow }` but it's what you get for free when using regular CATCH | 03:23 | ||
| nemokosch | I mean, to recite a classic: "this is a sacrifice I'm willing to make" | 03:24 | |
| this is the constant one line "fix" | |||
| Voldenet | well, probably depends on use case | 03:25 | |
| but it suspiciouly looks like golang error handling | |||
| which is scary verbose | |||
| 80% of any code is not actual code but the ritual of error handling | 03:26 | ||
| nemokosch | 1. I don't agree it's scary verbose 2. it does auto-propagate 3. I think the whole idea is quite mad to begin with | ||
| on a different note: I wonder if last statements in CATCH count as sink context, because they de facto are and could report a worry | 03:27 | ||
| Voldenet | note, I've always used `return ""` | 03:28 | |
| It probably may not work well otherwise | |||
| nemokosch | yes, it wouldn't work well (and to be able to use return, you needed a sub) | 03:29 | |
| Voldenet | or rather, not work at all | ||
| nemokosch | but I wonder if there is a worry for just dropping something like CATCH { 42 } | 03:30 | |
| Voldenet | Ah, yes, that could be worry | 03:31 | |
| nemokosch | anyway, the trade-offs are vastly different | 03:32 | |
| what I appreciate in your version is that indeed, it does the right/safe thing by default, it's harder to mess up | |||
| however, I really don't like the extraneous sub needed for the return(s), and the implication was "I might want to handle undefined values differently" so some sort of dispatch over the outcome of the call would be needed either way | 03:34 | ||
| considering that, I'd say it was actually rather compact | |||
| Voldenet | well yes, in the version with (result, error) tuple you can handle any sort of invalid result | 03:36 | |
| so I guess I was just looking at very specific use case | |||
| nemokosch | this was meant to be my main complaint, by the way: separating into 3 (or more) categories, special-case for both undefined values and exceptions, and then Raku even has failures - trivial to add into this given-when tuple pattern | 03:38 | |
| "weird flex but okay" | 03:39 | ||
| I like my flexes to be weird | |||
| Voldenet | liking your weird flexes is a weird flex | ||
| ;) | |||
| nemokosch | is there a catch-23? | ||
| Voldenet | I didn't get it, is there a catch? | 03:41 | |
| nemokosch | no, much more like a loophole | 03:42 | |
| Voldenet | btw, regarding that `sub + return`there was supposed to be something that would leave the block | ||
| nemokosch | really? | 03:43 | |
| this makes me wanna recall how Self is structured | 03:45 | ||
| Voldenet | it was in apocalypse 6: www.perl.com/pub/2003/03/07/apocalypse6.html/ | 03:47 | |
| > There will be a leave function that can return from other scopes. By default it exits from the innermost block | |||
| nemokosch | Self has this with exit | 03:49 | |
| but... if I understand it correctly, it requires a magic callback for the block | 03:55 | ||
| so it's kinda like the Promise constructor in JS | |||
| to be fair, you can emulate early termination in Raku, with control exceptions - that's how last and next work | 03:57 | ||
| well, this is not such a smart observation as I hoped xD that's how return itself works, and the sub wrapping is probably the cleanest way to catch it | 04:00 | ||
|
04:12
stanrifkin_ joined
04:14
stanrifkin left
05:12
jrjsmrtn left
05:13
jrjsmrtn joined
|
|||
| Voldenet | yes, otoh you can return from block directly and it's a feature… sort of | 05:41 | |
| m: sub x { { return 42 } }; say x | 05:42 | ||
| camelia | 42 | ||
| Voldenet | the only thing I don't get is this | 05:47 | |
| m: sub x { Any.map({ return 42 }) }; say x() # Yes I know what I'm doing! | 05:48 | ||
| camelia | Attempt to return outside of immediately-enclosing Routine (i.e. `return` execution is outside the dynamic scope of the Routine where `return` was used) in block <unit> at <tmp> line 1 |
||
|
06:03
wayland76 joined
06:28
Aedil joined
07:46
ab5tract left,
ab5tract joined
08:01
abraxxa joined
08:06
abraxxa left
08:07
abraxxa joined
08:13
perryprog left,
perryprog joined
08:38
Sgeo left
08:59
tailgate left,
tailgate joined
09:21
stanrifkin_ left
09:40
bobv joined
11:26
librasteve_ joined
|
|||
| lizmat | weekly: dev.to/lizmat/store-proxy-fetch-a07 | 11:41 | |
| notable6 | lizmat, Noted! (weekly) | ||
| lizmat | m: sub x { for 42 { return 42 } }; say x # oddly enough, this is not an issue Voldenet | 11:43 | |
| camelia | 42 | ||
| Voldenet | lizmat: it would be scary if returning from `for` didn't work, it would make writing simple find difficult | 11:47 | |
| lizmat | yeah, but the .map is functionally equivalent | 11:48 | |
| Voldenet | ah, right | 11:49 | |
| m: sub x { do for ^5 { return 42 } }; x().say | |||
| camelia | 42 | ||
| Voldenet | m: sub x { (^5).map({ return 42 }) }; x().say | 11:50 | |
| camelia | Attempt to return outside of immediately-enclosing Routine (i.e. `return` execution is outside the dynamic scope of the Routine where `return` was used) in block <unit> at <tmp> line 1 |
||
| lizmat | so I wonder what's going on there | ||
| Voldenet | m: sub my-map(@x, &y) { do for @x { y($_) } }; (^5).&my-map({ $_ + 1 }).say | 11:52 | |
| camelia | (1 2 3 4 5) | ||
| Voldenet | m: sub my-map(@x, &y) { do for @x { y($_) } }; (^5).&my-map({ return 42 }).say | ||
| camelia | Attempt to return outside of any Routine in block <unit> at <tmp> line 1 |
||
| Voldenet | however… | 11:53 | |
| sub my-map(@x, &y) { do for @x { y($_); return 42 } }; (^5).&my-map({ $_ + 1 }).say | |||
| m: sub my-map(@x, &y) { do for @x { y($_); return 42 } }; (^5).&my-map({ $_ + 1 }).say | |||
| camelia | 42 | ||
| Voldenet | so it's not about for, but about how block is executed | ||
| m: sub foo(&y) { y() }; foo({ return 42 }).say | 11:54 | ||
| camelia | Attempt to return outside of any Routine in block <unit> at <tmp> line 1 |
||
|
12:06
xelxebar left,
xelxebar joined
|
|||
| lizmat | yeah, the argument handling logic is not inside the scope of the sub yet, I don't think | 12:08 | |
|
12:20
abraxxa left
|
|||
| lizmat | hmmmm... | 13:08 | |
| the problem is that the { return 42 } is indeed outside the scope of any routine | 13:13 | ||
| timo might have an idea of what is going on here | 13:14 | ||
| timo | i think return is specified to be lexotic, i.e. has to be not only lexically inside the routine it's meant to return from, and also inside its dynamic scope (which is required for return to even work at all) | 13:27 | |
| m: return 99 | |||
| camelia | Attempt to return outside of any Routine in block <unit> at <tmp> line 1 |
||
| timo | m: sub foo(&y) { y() }; sub bar { foo({ return 42 }).say }; bar(); | ||
| camelia | ( no output ) | ||
| timo | m: sub foo(&y) { y() }; sub bar { foo({ return 42 }).say }; say bar(); | ||
| camelia | 42 | ||
| timo | m: sub foo(&y) { LEAVE { say "returning from foo" }; y() }; sub bar { LEAVE { say "returning from bar"; }; foo({ return 42 }).say }; say bar(); | 13:28 | |
| camelia | returning from foo returning from bar 42 |
||
| timo | well, that's not accurate to say "returning from", it's just "leaving" | 13:29 | |
| but that shows that the .say inside of bar isn't the one outputting the 42 | 13:31 | ||
| m: sub foo(&y) { LEAVE { say "leaving foo" }; y() }; sub bar { LEAVE { say "leaving bar"; }; say "result of foo: " ~ foo({ return 42 }) }; say "result of bar: " ~ bar(); | 13:32 | ||
| camelia | leaving foo leaving bar result of bar: 42 |
||
| timo | that's what i meant | ||
| lizmat | timo++ | 13:33 | |
| Voldenet | ah, so in fact, it's a feature of map that prevents that | 13:38 | |
| m: sub x { (^5).-map({ return 42 }) }; x.say | |||
| camelia | ===SORRY!=== Error while compiling <tmp> Malformed postfix call at <tmp>:1 ------> sub x { (^5).<HERE>-map({ return 42 }) }; x.say |
||
| Voldenet | m: sub x { (^5).map({ return 42 }) }; x.say | ||
| camelia | Attempt to return outside of immediately-enclosing Routine (i.e. `return` execution is outside the dynamic scope of the Routine where `return` was used) in block <unit> at <tmp> line 1 |
||
| Voldenet | m: m: sub my-map(@x, &y) { do for @x { y($_) } }; sub x { (^5).&my-map({ return 42 }) }; x.say # custom map implementation, same return | ||
| camelia | 42 | ||
| timo | very possible that map catches CONTROL exceptions and reacts to return | 13:47 | |
| lizmat | pretty sure it doesn't do the return control message | ||
| redo / next / last yes | 13:48 | ||
| timo | then it could be that the "do for" doesn't do lazyness as expected | 13:49 | |
| wait i think i got it in backwards | |||
| lizmat | m: dd do for ^10 { } | 13:50 | |
| camelia | (Nil, Nil, Nil, Nil, Nil, Nil, Nil, Nil, Nil, Nil) | ||
| lizmat | hmmm.,,.. I thought that generated a Seq? | ||
| m: my $a := do for ^10 { }; dd $a | |||
| camelia | (Nil, Nil, Nil, Nil, Nil, Nil, Nil, Nil, Nil, Nil) | ||
| lizmat | m: my $a := do for ^10 { }; dd $a.^name | ||
| camelia | "List" | ||
| lizmat | interesting | 13:51 | |
| I guess it's a lazy list | |||
| timo | m: my $a := do lazy for ^10 { }; dd $a.^name | ||
| camelia | "Seq" | ||
| timo | m: my $a := do lazy for ^10 { }; dd $a | ||
| camelia | (Any, Any, Any, Any, Any, Any, Any, Any, Any, Any).lazy.Seq | ||
| timo | m: sub my-map(@x, &y) { do lazy for @x { y($_) } }; sub x { (^5).&my-map({ return 42 }) }; | ||
| camelia | ( no output ) | ||
| lizmat | m: my $a := do for ^10 { }; dd $a.iterator.^name | ||
| camelia | "Rakudo::Iterator::ReifiedListIterator" | ||
| timo | m: sub my-map(@x, &y) { do lazy for @x { y($_) } }; sub x { (^5).&my-map({ return 42 }) }; x.say | ||
| camelia | (...) | ||
| timo | still not the same thing, the builtin-map one tries to reify when stringifying, so it doesn't just turn into "(...)" without running the code? | 13:52 | |
| lizmat | that'd be .gist fr you | ||
| timo | m: sub my-map(@x, &y) { do lazy for @x { y($_) } }; sub x { (^5).&my-map({ say "mapped block run"; return 42 }) }; say "first"; x.say; say "second"; .say for x | 13:53 | |
| camelia | first Attempt to return outside of immediately-enclosing Routine (i.e. `return` execution is outside the dynamic scope of the Routine where `return` was used) in block <unit> at <tmp> line 1 (...) second mapped block run |
||
| timo | ah, and this is stdout and stderrr interleaving on camelia | ||
| m: sub my-map(@x, &y) { do lazy for @x { y($_) } }; sub x { (^5).&my-map({ note "mapped block run"; return 42 }) }; note "first"; x.say; note "second"; .¬e for x | |||
| camelia | first (...) second mapped block run Attempt to return outside of immediately-enclosing Routine (i.e. `return` execution is outside the dynamic scope of the Routine where `return` was used) in block <unit> at <tmp> line 1 |
||
|
13:58
dmvrtx left
13:59
dmvrtx joined
|
|||
| nemokosch | this is interesting indeed | 14:04 | |
| but why/how is it "outside the dynamic scope of the Routine"? | 14:05 | ||
| timo | the x routine is already long gone when the block is run | ||
| nemokosch | ahh, so this is an effect of the laziness? | 14:06 | |
| timo | i wonder if we can cheaply get a reference to the routine where "return" was actually put | ||
| that's right | 14:07 | ||
| nemokosch | it's kind of a closure | ||
| timo | the routine where return lexically lives is x, but x has returned the lazy sequence, so attempting to return something else when grabbing an element from the lazy sequence would have to jump back in time to cause the lazy sequence to not have been returned in the first place, causing a grandfather paradox | 14:08 | |
| nemokosch | xddd | ||
| timo | under the interpretation that the return has to return from the lexically enclosing routine, rather than just "sending a return control exception upwards to return from the first routine that can be found on the stack" | 14:09 | |
| nemokosch | comparing it with Self: | 14:10 | |
| > It is an error to evaluate a block method after the activation record for its lexically enclosing scope has returned. Such a block is called a non-lifo block because returning from it would violate the last-in, first-out semantics of activation object invocation. | |||
| the block outright cannot run like this | 14:11 | ||
| I don't know, it could be that I opened an issue back then for next and last propagating through function boundaries | 14:14 | ||
|
14:48
jgaz joined,
eseyman left
14:49
eseyman joined
15:04
hvxgr_ joined
|
|||
| aruniecrisps | I use .NFD.list.chrs to get the code points as a list | 15:36 | |
|
15:40
Sgeo joined
15:42
El_Che left
15:44
El_Che joined
|
|||
| nemokosch | I'd really be curious about the use case because aggressive normalisation of strings (performance aside) is imo the best thing one can do | 15:47 | |
| one of the things that Raku got "the most right" | |||
| timo | I would have liked a built-in that turns incoming bytes into "untreated" unicode codepoints so that parsing of JSON can be done fully round-trippable | 16:09 | |
|
16:15
human_blip left
16:17
human_blip joined
|
|||
| ab5tract | timo: I've always been confused why this isn't just built in | 16:20 | |
|
16:21
stanrifkin joined
|
|||
| ab5tract | I got into a debate about this with a Swift fan 10 years ago and wrote up a class that does this via bufs | 16:24 | |
| antononcube | I was wondering should I go back to Swift or not... Did a fair amount of programming with it 4 years ago. (The potential app "audience" is huge!) | 16:28 | |
| ab5tract | timo: I have to imagine that we could do some sort of pragma that costs more memory but allows round-tripping | 16:31 | |
| Voldenet | Ah, so it's actually possible to use return from map inside a sub… | ||
| m: sub x { (^5).map({ return 42 }).eager }; x.say | |||
| camelia | 42 | ||
| Voldenet | It all makes sense now | 16:32 | |
| ab5tract | Voldenet: shouldn't that be 42 xx 5 | ||
| Voldenet | no, it's a very weird special case :P | ||
| ab5tract | ok :) | ||
| Voldenet | return 42 actually returns value from x, not from map | ||
| ab5tract | right, that's what I expect, but I wasn't sure what you expect :) | 16:33 | |
| Voldenet | it was a continuation of overly long topic that started with how error handling interacts with with return | 16:34 | |
| s/with// | |||
| erm, not that with | |||
| s/with with/with/ | |||
| ab5tract | ah I didn't scroll back far enough to discover that it was about error handling | 16:36 | |
| timo | ab5tract: using utf8-c8 as encoding allows for round tripping | 16:39 | |
| aruniecrisps | @nemokosch one thing I did run into with the Unicode normalization that made it slightly harder was working with Indic alphasyllbaries, it was harder to delete diacritics and or modify diacritics when its completely normalized | ||
| timo | i'm not sure if you can actually get the unicode codepoints out of a utf8-c8 encoded string easily | ||
| ab5tract | timo: ah good to know. But I am thinking it would be nice to have the NFG for all the comparison operations but then be able to put the original bytes back as I found them | 16:40 | |
| timo | if you really just want the original bytes without changing stuff, it's probably best to store the original Blob in addition to the Str | 16:42 | |
|
16:52
human_blip left
16:54
human_blip joined
|
|||
| nemokosch | Swift seemed like a rather clever language to me | 16:55 | |
| a good tradeoff between a large, rather unopinionated high-level language and performance and reliability concerns | 16:56 | ||
| why is it needed to delete or modify the diacritics in that domain? | 16:57 | ||
| timo | could have been for building something literally for working with these kinds of characters | 16:58 | |
| nemokosch | what I'm trying to figure out is whether strings are the logical level of abstraction | 17:03 | |
| korvo | Swift's got quite a lot of bad design decisions under the hood. I agree that its Unicode handling is inspiring, but be wary of copying it directly. | 17:06 | |
| nemokosch | ultimately I think this string topic depends a lot on the principle | 17:09 | |
| one has to ask the question: "what is a string?" | 17:10 | ||
| antononcube | @korvo "Swift's got quite a lot of bad design decisions under the hood." I am glad I started programming with Swift 5.0 and not Swift 4.0. As for bad design decisions, I recall some frustration, but that was 4 years ago. Maybe with LLMs those design elements do not matter much. | ||
| nemokosch | in general, "this feature can be abused" would be funny criticism on a Raku server | 17:11 | |
| korvo | danielchasehooper.com/posts/why-swift-is-slow/ is an example that is still outstanding. | 17:16 | |
| librasteve | maybe we need a concept where you can store the Str and the original Blob in the same object (and it ~~ Str when passed to a sub)? | 17:21 | |
| nemokosch | you posted this before, I remember | 17:22 | |
| lizmat | that sounds suspiciously like a BlobStr allomorph | ||
| nemokosch | and this is rather a case of "can be abused deliberately" | ||
| I didn't want to proactively say it but please no more allomorphs... | 17:23 | ||
|
17:23
stanrifkin left,
human_blip left
|
|||
| the problem with allomorphs is that they simply try to guess when a diamond problem arises | 17:23 | ||
| and given how fat Any is, the diamond problem will arise | 17:24 | ||
|
17:24
human_blip joined
|
|||
| lizmat | alternately, introduce a $*ENCODING dynvar (default to utf-8) and make Blob.Str default to Blob.decode($*ENCODING) | 17:25 | |
| librasteve | probably this one could be done as a module, then you can decide if you like this approach or not | ||
| nemokosch | a question like "what is a string" could still have different answers but I tend to prefer Raku's answer | 17:26 | |
| which is: a string is text | |||
| it's not bytes, it's not codepoints | |||
| bytes and codepoints have a place in the universe regardless | 17:27 | ||
| ShimmerFairy | If you need to work with text on multiple levels, you're gonna have to keep copies in different forms no matter what you do (e.g. a Str and a Uni, or an NFC and a Blob). Whatever it is you're doing, you're probably wanting to keep operations in sync across the different forms, and that's gonna be impossible with a bundled-up premade allomorph class. | 17:39 | |
| nemokosch | I'd naively think it's almost always the right thing to make the lower-level representation the "master" in these cases | 17:42 | |
|
18:00
stanrifkin joined
|
|||
| aruniecrisps | @nemokosch because I'm building a conjugations engine for South Asian languages, and much of that involves removing/replacing the diacritics which are used to represent vowels in these languages | 18:03 | |
| nemokosch | how to ask this in a way that makes sense, hm... | 18:04 | |
| is it just coincidentally useful to be able to remove the diacritics or is there some sort of invariant systemic rule about it? | 18:05 | ||
| aruniecrisps | No you have to replace vowels, in languages like Tamil in order to combine words and endings, if the final vowel of the first word is u and the second word begins in a vowel, the first letter of the second word gets turned into a diacritic of the final consonant | 18:06 | |
| lizmat | m: use v6.e.PREVIEW; say "élève".nomark' # aruniecrisps | 18:07 | |
| camelia | ===SORRY!=== Error while compiling <tmp> Two terms in a row at <tmp>:1 ------> use v6.e.PREVIEW; say "élève".nomark<HERE>' # aruniecrisps expecting any of: infix infix stopper postfix statement … |
||
| nemokosch | so this happens always then? | ||
| aruniecrisps | Yes | ||
| lizmat | m: use v6.e.PREVIEW; say "élève".nomark # aruniecrisps | ||
| camelia | eleve | ||
| nemokosch | well, fair enough | ||
| but is this blocked by normalization? | 18:08 | ||
| aruniecrisps | @lizmat it's a bit more complicated than that, because in a lot of South Asian scripts you can actually combine consonants together graphically and the additional consonants themselves become either diacritics or they change the shape of the base consonant, the most appropriate function in a lot of these cases isn't nomark, it would be a function that only removes the final mark at the end of a cluster | 18:09 | |
| Which usually is a vowel mark | |||
| lizmat | understood.... I just wanted to mention nomark as a quick way of finding out if something would need to be done | 18:10 | |
| aruniecrisps | @nemokosch you can't just use S/// to replace the vowel because that grapheme is normalized | ||
| lizmat | if .nomark returns the same as the invocant, there are no diacritics | ||
| which could allow you to shortcut maybe | 18:11 | ||
| aruniecrisps | The character that is counted as the last character is the base consonant which we don't want at all | ||
| nemokosch | if the word ends with a vowel, how can the last character be a consonant? | 18:12 | |
| aruniecrisps | Because the vowel isn't represented by a separate vowel character like it is in Latin or Cyrillic | 18:13 | |
| It's a diacritic | |||
| One second I'm on phone I'll make this easier to understand | |||
| nemokosch | ooo 🤯 | 18:16 | |
| ShimmerFairy | Out of curiosity, have you tried working on this problem with the latest Rakudo versions? Now that it's been updated to the latest Unicode, it has Unicode's new rules for Indic grapheme clusters and such, so I wonder if that's changed how you use NFG strings for the task. | ||
|
18:17
bobv left
|
|||
| aruniecrisps | Okay so let's take the letter க (the tamil letter ka) as an example; it's not a consonant or a vowel, it's a syllable. it works kind of like how it does in katakana in that this letter is a consonant and a vowel | 18:18 | |
| this in particular is the sound 'cu' in cut | |||
| but if i wanted to make this syllable the ki in kit, i would have to modify it so that it looks like this கி | |||
| and if i just wanted the k consonant, i add a dot on top of it like this: க் | 18:19 | ||
| nemokosch | yes, this is quite incompatible with a grapheme-based approach | 18:20 | |
| aruniecrisps | but according to Unicode these are all just variants of the letter க so i can't just replace the last vowel like i could in other languages | ||
| nemokosch | now I get it | 18:21 | |
| it's not necessarily useful to just say that கி is one letter | 18:22 | ||
| aruniecrisps | @shimmerfairy I have used the latest Raku versions, and they do solve a bunch of problems for me, in particular i can just ask uniprops to see whether the last codepoint is a vowel killer ('Virama' according to Unicode) if i flatten the string to a bunch of codepoints. This solves the problem of manually checking to see whether the code point is a set of all South Asian languages Viramas | 18:23 | |
| @nemokosch well the thing is கி is one letter according to Tamil; we count things by grapheme as well | 18:24 | ||
| we count characters as graphemes | |||
| so Raku is correct for doing this | |||
| lizmat | m: dd "கி".chars | 18:25 | |
| camelia | 1 | ||
| nemokosch | sounds like a rather conflicted reasoning | ||
| Raku is less wrong then Tamil | |||
| this is what I gather | |||
| aruniecrisps | they're both correct | 18:26 | |
| the thing is Raku has really nice defaults for Unicode, but the problem is that actually manipulating those graphemes without anything more than nomark is a headache | 18:27 | ||
| partially due to unicode's handling of these languages | |||
| lizmat | so would making .Uni more like strings help ? | 18:28 | |
| especially wrt .subst ? | |||
| aruniecrisps | honestly it probably would help | 18:30 | |
| lizmat | so what would a typical needle be ? | ||
| aruniecrisps | like this is my current code for checking if a word using an South Asian abugida ends in a consonant: | ||
| lizmat | (as opposed to the haystack) | ||
| aruniecrisps | sub ends-in-consonant(Str $s) is export { $s.comb».uniprops('InSC').flat.tail eq 'Virama' } | 18:31 | |
| multi lastmark(Str $s where .NFD.codes > 1) is export { $s.NFKD.list[ - 1].chrs } multi lastmark(Str $s) { '' } | |||
| oops | |||
| multi lastmark(Str $s where *.NFD.codes > 1) is export { $s.NFKD.list[* - 1].chrs } multi lastmark(Str $s) { '' } | |||
| nemokosch | markdown for the one | ||
| win, even | 18:32 | ||
| lizmat | $s.substr(*-1) would give you the last char | ||
| wouldn't that help ? | |||
| nemokosch | the vowel would at the very least be a part of the last character, no? | 18:33 | |
| lizmat | $s.substr(*-1).uniprops("InSC") eq 'Virama' ? | ||
| aruniecrisps | for ends-in-consonant or lastmark? | ||
| lizmat | ends-in-consonant | ||
| aruniecrisps | $s.substr(* - 1) would get us something like க், and and uniprops on that wouldn't equal Virama | 18:34 | |
| it would equal (Consonant Virama) | 18:35 | ||
| lizmat | $s.substr(*-1).uniprops("InSC").tail eq 'Virama' ? | ||
| nemokosch | that should be good news, no? 😅 | ||
| aruniecrisps | that indeed does work | 18:36 | |
| thanks for the help liz | |||
| lizmat | yw | 18:37 | |
| ShimmerFairy | lizmat: fwiw I do think we need first-class support for working with strings of text at levels other than graphemes (e.g. wanting to write a Grammar for a file format where you want users to specify unnormalized strings), along with better support for Unicode properties and whatever else in the standard is useful to string processing. | 18:46 | |
| lizmat | would that need to live in core initially ? | ||
| ShimmerFairy | That being said, I'm under no illusions that going over Raku's Unicode support like that would be a huge undertaking, so I'm not expecting it to be solved anytime soon. (Though I should at least get back to that review of the predefined Grammar rules.) | 18:47 | |
| nemokosch | the other thing is (probably also a huge undertaking) that Raku's string processing is just too slow for anything you wouldn't want to read in one go | 18:48 | |
| ShimmerFairy | lizmat: I think it would be possible, for the most part, to write a module implementing 'StrV2' and whatnot. The only hard parts would be Grammar support (e.g. `$my-Uni-str ~~ /<:L>+ <:M>/`) and possibly IO stuff. | 18:51 | |
| MoarVM/NQP's handling of Unicode properties isn't the best (which would impact any proposed features needing properties), but a module could easily parse the UCD and set up its own kind of uniprop for that. | 18:53 | ||
| lizmat | I guess... but perhaps having .subst("foo".NFC, "bar".NFC) work would already be a something really useful for | ||
| aruniecrisps | |||
| ShimmerFairy | Yeah, non-regexy .subst variants ought to be just as doable without underlying Grammar support | 18:54 | |
| nemokosch | the first parameter of .subst is (usually) a regex, though, right? | ||
| lizmat | well.. if you *can* specify it as a string, it's *much* faster | ||
| nemokosch | sounds like a worthy special case | 18:55 | |
| lizmat | that's why it is implemented | ||
| ShimmerFairy | Oh yeah, I suppose string literals would be another issue a module can't easily solve. Even if you could make up your own Q adverbs for custom string type literals, you'd still need the underlying Raku compiler to hand over the codepoints untouched. | 18:56 | |
| nemokosch | it's tempting to say: it could be a different quotation | ||
| ShimmerFairy | But overall, at first thought a module exploring ideas should be able to illustrate/work out a hefty chunk of the potential redesign, if not most of it. | 18:57 | |
| nemokosch | I think as a different quotation, a slang could already do that | ||
| although I have no idea how low it would have to dig to replicate the parsing logic | 18:58 | ||
| ShimmerFairy | Like I said though, you'd still need to make sure Raku/NQP doesn't normalize the source code before it gets to you, so that you can manipulate it yourself. (One thing I've realized lately is that Raku ironically should probably not be parsing source as NFG text, for issues like this.) | 18:59 | |
| nemokosch | yeah... I for one don't know the order of things during parsing well enough to know whether a slang comes in too late or just in time to avoid normalisation | 19:00 | |
| ShimmerFairy | But perhaps "unnormalized string literals" are a low-priority feature in the grand scheme. Have to check what, say, C++ has to say about this sort of thing. | ||
| lizmat | as long as it's valid NFG, the closing quote handling could probably re-encode it to whatever we want | 19:01 | |
| nemokosch | the problem is, it won't round trip | ||
| so I guess you get the NFC version or something, at best | 19:02 | ||
| lizmat | why? It would be just a case of adapting NFC.raku | ||
| m: dd "foo".NFC | |||
| camelia | uint32 = Uni.new(0x0066, 0x006f, 0x006f).NFC | ||
| lizmat | so it wouldn't return that, but something like §foo§ where § as quoting character would indicate NFC | 19:03 | |
| nemokosch | my point is that this isn't "unnormalized string" | ||
| it's just "less normalized" | |||
| lizmat | they're valid codepoints, so not just any ints | 19:05 | |
| ShimmerFairy | For a reimplementation of the standard normalization forms ('Module::NFCv2' etc.), the fact that it gets normalized beforehand doesn't matter. But it would be an issue if you wanted a Uni string literal specifically, since you will lose details when normalizing. (Anything with an NFC_QC property of "No", specifically.) | 19:09 | |
| nemokosch | so my understanding is: NFG <-> NFC is bijective for our purposes | 19:10 | |
| lizmat | ok, so you're saying it is impossible to create all possible .NFC from a string (in NFG) ? | ||
| nemokosch | any valid NFC sequence can be turned into one unambiguous sequence of NFG and vice versa | 19:11 | |
| lizmat | so that some NFC's would need to be created "manually" ? | ||
| nemokosch | however, NFC itself is "lossy" | ||
| lizmat | but I think ShimmerFairy is stating that NFG is lossy ? | ||
| or am I misunderstanding ? | 19:12 | ||
| ShimmerFairy | m: say "\c[GREEK QUESTION MARK]".uniname # An example of a codepoint that gets lost in NFC | ||
| camelia | SEMICOLON | ||
| nemokosch | so it gets lost in NFC - I think that's understood | ||
| lizmat | right, so technically it's possible to create a quoting slang that would handle NFC and friends | 19:13 | |
| nemokosch | yes, NFC-normalized codepoint-strings should be possible imo | ||
| it's just... they aren't so interesting because you could retrieve NFC codepoints of a string at any time | 19:15 | ||
| so it's not a big win | |||
| ShimmerFairy | Refreshing myself on the NFs, it seems that all the normalization forms are fine if your text was NFC'd first. The table under Goal 1 here shows equivalent transformations; anything with an inner "toNFC(x)" is relevant here: www.unicode.org/reports/tr15/#Design_Goals | 19:16 | |
| nemokosch | it's time to teach people to always write in NFC /jk | 19:30 | |
|
19:39
hvxgr_ left
19:40
ds7832 joined
19:42
Aedil left
|
|||
| ds7832 | Just stumbled upon this: As expected, comparing two Lists works by element-wise comparison. However, comparing two different Seqs works by comparing their stringifications. Is this behavior of Seq intended? | 20:05 | |
| m: say (10, 5) cmp (7,6) | 20:06 | ||
| camelia | More | ||
| ds7832 | m: say (10, 5).Seq cmp (7,6).Seq | ||
| camelia | Less | ||
| ds7832 | or even if only one of them is a Seq: | ||
| m: say (10, 5) cmp (7,6).Seq | |||
| camelia | Less | ||
| ds7832 | m: say (1, "5 6") cmp (1, 5, 6).Seq | 20:07 | |
| camelia | Same | ||
| lizmat | I'd say that's a good catch | 20:09 | |
| timo | yeah that doesn't seem right indeed | 20:10 | |
| lizmat | perhaps multi sub infix:<cmp>(List:D \a, List:D \b) { | 20:11 | |
| should really be | |||
| aruniecrisps | @shimmerfairy @lizmat i would generally agree in that having extra utility functions for handling strings not at the grapheme level might help a lot more, and making Regexes work better with codepoints would help a lot with string substitution algos but ultimately it's up to you guys | ||
| lizmat | multi sub infix:<cmp>(Iterable:D \a, Iterable:D \b) { | 20:12 | |
| ds7832: could you make a Rakudo issue for that ? | 20:13 | ||
| ds7832 | yes I'll open an issue :) | 20:14 | |
| lizmat | thanks! | ||
|
20:15
ds7832 left
20:17
ds7832 joined
20:24
DarthGandalf left
20:25
DarthGandalf joined
20:48
ds7832 left
|
|||
| oshaboy | I wanted to make a string decomposition tool that shows the individual codepoints. But I agree that most time you'd rather have normalization | 21:32 | |
| That would still normalize it | 21:33 | ||
| Just with NFD | |||
|
21:44
smls joined
|
|||
| smls | m: say "XßX".match(/ :i ss /); | 21:54 | |
| camelia | 「ßX」 | ||
| smls | Is this a bug? | ||
| [Coke] | in that it includes the second X? seems like. | 21:55 | |
| bisectable6: say "XßX".match(/ :i ss /); | |||
| bisectable6 | [Coke], Will bisect the whole range automagically because no endpoints were provided, hang tight | ||
| [Coke], Output on all releases: gist.github.com/46d87472f00b6598e8...ac097e0464 | 21:56 | ||
| [Coke], Bisecting by output (old=2017.03 new=2017.04.3) because on both starting points the exit code is 0 | |||
| [Coke], bisect log: gist.github.com/57edbe1f0325792529...e6ae2bde00 | 21:57 | ||
| [Coke], (2017-04-14) github.com/rakudo/rakudo/commit/82...75f0bf0f7e | |||
| [Coke], Output on all releases and bisected commits: gist.github.com/390856f756b92c673a...6324f581a7 | |||
| [Coke] | Looks like it's always been that way. | ||
| (since it worked at all) | |||
| smls | m: say "ss".match(/ :i ß /); | 21:59 | |
| camelia | 「s」 | ||
| smls | Looks like the actual matching considers multi-codepoint expansions for casefolding, but the `Match` result object doesn't know about this | 22:02 | |
|
22:28
stanrifkin left
22:45
swaggboi left
22:47
swaggboi joined
23:10
Geth joined
23:16
librasteve_ left
23:40
smls left
|
|||