[Coke] I have an agent running here, can see a single agent under "recent builds" in the UI, but it doesn't appear to be doing anything. 00:00
05:26 melezhik_ joined 05:33 melezhik_ left 05:34 melezhik_ joined 05:43 melezhik_ left, melezhik_ joined 05:50 melezhik_ left, melezhik_ joined 05:57 melezhik_ left, melezhik_ joined 06:04 melezhik_ left, melezhik_ joined 06:11 melezhik_ left, melezhik_ joined 06:23 melezhik_ left, melezhik_ joined 06:32 melezhik_ left, melezhik_ joined 06:39 melezhik_ left, melezhik_ joined 06:46 melezhik_ left, melezhik_ joined 06:58 melezhik_ left, melezhik_ joined 07:04 melezhik_ left 07:05 melezhik_ joined 07:11 melezhik_ left 07:12 melezhik_ joined 07:19 melezhik_ left, melezhik_ joined 07:28 melezhik_ left, melezhik_ joined 07:38 melezhik_ left, melezhik_ joined 07:52 melezhik_ left, melezhik_ joined 08:02 melezhik_ left, melezhik_ joined 08:07 melezhik joined
melezhik . 08:07
tellable6 2025-11-23T23:59:13Z #raku-dev <[Coke]> melezhik - agent instructions say to use 127.0.0.1 in the browser, but here that fails and I need literal 'localhost'
08:09 melezhik_ left, melezhik_ joined
melezhik [Coke]: o10r seems to stopped yesterday , I am restarting it now . I guess you need to restart your agent and it will start receiving payload . Localhost or 127.0.0.1 I don’t know why but on some OS one need to choose one or another . 127.0.0.1 usually works for me. Technically container forward 4000 port to default host network interface port 4000 08:10
. 08:14
08:21 melezhik_ left, melezhik_ joined 08:28 melezhik_ left, melezhik_ joined 08:38 melezhik_ left, melezhik_ joined 08:50 melezhik_ left, melezhik_ joined 08:58 melezhik_ left, melezhik_ joined 09:12 melezhik_ left, melezhik_ joined 09:19 melezhik_ left, melezhik_ joined 09:30 melezhik_ left, melezhik_ joined
disbot5 <melezhik.> ^^ ab5tract: 09:42
09:51 melezhik_ left, melezhik_ joined 10:00 melezhik_ left 10:01 melezhik_ joined 10:17 melezhik_ left, melezhik_ joined 10:18 tellable6 left, greppable6 left, evalable6 left
[Tux] Rakudo v2025.11-29-g3e9f25272 (v6.d) on MoarVM 2025.11-3-g17b39b0c9
csv-ip5xs0.265 - 0.270
csv-ip5xs-201.121 - 1.139
csv-parser1.100 - 1.107
csv-test-xs-200.115 - 0.116
test1.880 - 1.885
test-t0.451 - 0.453
test-t --race0.281 - 0.283
test-t-205.669 - 5.795
test-t-20 --race1.415 - 1.430
10:19
csv-test-xs 0.014 - 0.014
tux.nl/Talks/CSV6/speed4-20.html / tux.nl/Talks/CSV6/speed4.html tux.nl/Talks/CSV6/speed.log
10:28 melezhik_ left 10:29 melezhik_ joined 10:36 melezhik_ left, melezhik_ joined 10:45 melezhik_ left, melezhik_ joined 10:54 melezhik_ left, melezhik_ joined 11:01 melezhik_ left, melezhik_ joined 11:18 melezhik_ left, melezhik_ joined 11:24 melezhik_ left, melezhik_ joined 11:34 melezhik_ left, melezhik_ joined 11:41 melezhik_ left, melezhik_ joined 11:43 tellable6 joined, greppable6 joined, evalable6 joined 11:44 librasteve_ joined 11:50 melezhik_ left, melezhik_ joined 11:58 melezhik_ left, melezhik_ joined 12:07 melezhik_ left, melezhik_ joined 12:14 melezhik_ left, melezhik_ joined 12:34 melezhik left 12:44 melezhik_ left, melezhik_ joined 12:51 melezhik_ left, melezhik_ joined 13:10 melezhik_ left, melezhik_ joined 13:17 melezhik_ left, melezhik_ joined
[Coke] restarted, getting "jobs-run-cnt: 0, max-threads: 4 13:26
every 5 s
BUILD SUMMARY STATE: FAILED 13:27
13:32 melezhik_ left, melezhik_ joined 13:48 melezhik_ left 13:49 melezhik_ joined 14:00 melezhik_ left, melezhik_ joined 14:07 melezhik_ left, melezhik_ joined 14:14 melezhik_ left, melezhik_ joined 14:15 melezhik joined 14:29 melezhik_ left, melezhik_ joined 14:36 melezhik_ left, melezhik_ joined 14:52 melezhik_ left, melezhik_ joined
melezhik [Coke]: ab5tract o10r is ready to distribute jobs 14:54
15:02 melezhik_ left, melezhik_ joined 15:09 melezhik_ left, melezhik_ joined
[Coke] running agent. again, looks like getting 0 jobs every 5s 15:15
BRW_AGENT_NAME_PREFIX=cokebot
ah, got "installing zstd"
ok, doing stuff. 15:16
melezhik++
15:16:48 :: mykrcijfhszdgubntloa.130 15:17
?
is that basically a custom GUID?
15:21 melezhik_ left, melezhik_ joined 15:31 melezhik_ left, melezhik_ joined
melezhik_ ping from agent: cokebot-42925061, version: 0.0.20, jobs-run-cnt: 4, max-threads: 4 15:43
[Coke]: your agent has 4 threads, and they are all busy right now
depending on your CPU availability you may want to increase threads max num
on recent builds on your agent UI you should see running agent.job jobs 15:44
agent.job_number to be accurate 15:45
"ah, got "installing zstd"" - yep agent first install rakudo from whatever able and then it starts spawning agent.job_number jobs 15:46
15:59 melezhik_ left, melezhik_ joined
melezhik_ did you stop your agent ? don't see any pings from it? 16:08
[Coke] nope, it's still running 16:15
seeing a lot of "skip job: agent is busy"
I don't think it's telling me *what* it's running, but I see 4 copies of sparky-runner.raku
I assume that while I'm busy, I don't reach back with a heartbeat or anything? 16:16
(might want a heartbeat that is much less frequent so you know I'm not dead?)
16:23 melezhik_ left, melezhik_ joined
melezhik_ ok, that agent name is printed on your main agent job ? 16:29
[Coke] Looks like localhost:4000/ is now dead 16:35
the docker container is still running.
2025-11-24T16:29:44.670395Z --- sparkyd: parse sparky job yaml config from: /root/.sparky/projects/agent/sparky.yaml
2025-11-24T16:29:44.675814Z --- [agent] neither crontab nor scm setup found, consider manual start, SKIP ...
lizmat timo: almost got the "call original token if mixed in token doesn't match" logic working: gist.github.com/lizmat/eb3786140bc...d804dbcedc 16:38
problem is that a multi method is not installed properly in the grammar, and the non-multi *is* installed, but then nextcallee logic fails 16:39
melezhik_ if docker container is running, then localhost:4000/ should be accessible 16:45
if you bash to container, what do you get by `tail -f ~/.sparky/sparky-web.log` ? 16:46
and `ps uax|grep sparky-web` also, does it give anything ? 16:47
it could be a case if container does not have enough RAM , then kernel kill some process by OOM 16:48
but it's hard to say with any details
librasteve_ notable6: weekly
tellable6 librasteve_, I'll pass your message to notable6
librasteve_ hmmm - maybe I am holding it wrong
melezhik_ last heartbeat from cokebot-42925061 was 1 hour and 17 minutes ago ... 16:49
"but I see 4 copies of sparky-runner.raku" - this just means you have ( had ) 4 agent.job jobs executed in parallel which is correct 16:50
17:00 greppable6 left, quotable6 left, nativecallable6_ left, tellable6 left, unicodable6 left, bloatable6__ left, evalable6 left, coverable6__ left, releasable6 left, linkable6 left, sourceable6 left 17:02 unicodable6 joined, greppable6 joined, bisectable6 joined, sourceable6 joined, quotable6 joined 17:03 nativecallable6 joined, shareable6 joined, coverable6 joined, benchable6 joined, notable6 joined, bloatable6 joined 17:04 linkable6 joined, tellable6 joined, releasable6 joined, evalable6 joined, committable6 joined
[Coke] sparky-web isn't there. 17:06
lizmat ugexe tonyo I've uploaded a Sigil::Nogil 1.2 to fez, but it still hasn't shown up on zef
[Coke] there is one that starts: "/bin/sh -c cd /opt/sparky && sparman"
17:06 melezhik_ left
lizmat after 4+ hours 17:06
[Coke] put-job-file: create job file - /root/.sparky/projects/agent.job_1763998057/.files/hscxdmetkrfgaubpnwio.130/Concurrent::File::Find.log.txt ... 17:07
ws: send data to client: 20 lines
ws: send data to client: 28 lines
^^ last 3 lines from that log file, which was last updated about 90 minutes ago
so my zef issue on windows 11 started with the 2025.08 release. Worked fine in 2025.05 17:08
er, it's specifically when using it with rakubrew, it's not a zef issue per se. 17:10
so my guess is that's when we pushed the changes to the wrapper scripts. 17:11
lizmat patrickb ^^ 17:19
[Coke]: I seem to recall needing to remove any .bat extension handling around that time, could that be it ? 17:20
melezhik Ok. Can I get data from agent main job file ? Wgst does it say ? 17:21
[Coke] fwiw, there is no zef.bat in that folder.
what is the "main agent job file"? 17:22
lizmat ugexe tonyo looks like the last module uploaded is 22 hours ago
melezhik And am I correct that ps aux | grep sparky-web does not produce anything ? 17:23
I mean agent job , not agent.job job
Oh, my irc is mixing messages order 17:24
Ahh ok, finally got it, these were logs from sparky web 17:27
17:44 melezhik_ joined
librasteve_ rakudoweekly.blog/2025/11/24/2025-...t-calling/ 17:45
17:49 melezhik_ left
lizmat liibrasteve++ 17:52
ab5tract m: role R[::T] { my class A is Array[T] {}; has A $.a = A.new }; dd R[Int].new.a.HOW 18:04
camelia Perl6::Metamodel::ClassHOW+{<anon>}+{<anon>}.new
ab5tract with RakuAST, we get the following error: # Type check failed in assignment to $!a; expected R::A[Int] but got R::A (Array[T].new())
So it gets the right type for the container but gets the wrong type for the initializer 18:05
🙃
:cloun_shoes:
clown shoes is how I feel, not a statement on the code or the problem 18:06
nine: I wonder if you have any thoughts given the above? 18:07
18:59 melezhik_ joined
melezhik_ . 19:00
[Coke] melezhik: just restarted the agent 19:04
melezhik Ok. It’d be interesting to see logs
[Coke] which? 19:06
melezhik_ ah, ok, sorry, thought this was from ab5tract ) 19:12
ping from agent: cokebot-26318746, version: 0.0.20, jobs-run-cnt: 4, max-threads: 4
do see pings now
19:15 lucs_ is now known as lucs
melezhik_ reports arriving as well - brw.sparrowhub.io/report/cokebot-26...report/341 19:19
19:32 melezhik_ left
melezhik [Coke]: if there are any red builds now on recent build page ? 19:32
Oh, reports started coming again 19:33
Anyway it’d be interesting to know if there are any red ones now, or they all are green …
It’s still suspicious that there are not so many reports from coke-agent , but it’s hard to say what’s going on without having recent builds info 19:37
Ok. Now I see that coke-agent does not send any reports , only pings, similar to the issue we had before with ab5tract agent as well. Again it’d be extremely valuable to 1) get a screenshot of recent builds page ( i expect many red ones there ) 2) go to any agent.job_number job and get full report via UI 19:46
And btw I do see that agent now has some jobs usercontent.irccloud-cdn.com/file/...013689.JPG 19:50
But they just fail to send results back to o10r 19:51
Technically code either stopped before this line - github.com/melezhik/brownie/blob/1...wfile#L232 or data fail to transfer over network 19:52
[Coke] melezhik: no, there is now, as there was before, a single "last build" still in running state 19:55
ab5tract melezhik: I think reporting on this should be somewhat automatic 19:56
at least a script that can upload the logs to the o10r or something
[Coke] here's the full dump of the log from the UI: gist.github.com/coke/12285e3c11644...1dc648f0bf 19:57
melezhik Ok , this is main agent job, thanks , can you please provide full report from any failed agent.job job from recent builds page ? 19:58
[Coke] the docker run output ended with "BUILD SUMMARY" "SATE: FAILED" "PROJECT: agent.job_1764013995" "CONFIG: { }"
I have never seen one of those jobs.
melezhik You should see them on recent builds page
If you go to UI 19:59
[Coke] at localhost:4000, under projects, single project "agent" last build 1 state "running"
melezhik And click on “recent builds” link on the top
“Recent builds”
[Coke] ahhh
that's confusing.
4 running, many failed, 8 succeed, 1 running 20:00
(where the one last running is the agent itself again)
docker output now includes a bunch of 2025-11-24T20:00:19.787148Z --- [agent] neither crontab nor scm setup found, consider manual start, SKIP ... 20:01
melezhik 127.0.0.1:4000/builds
Please don’t look at docket output , use Ui
[Coke] again, 127.0.0.1 does not work here, and I think that's the problem.
because in the "running" job, I see:>>> send request: PUT job file to 127.0.0.1:4000/file/project/agent.j..._ಠ.log.txt 20:02
melezhik Do you use laptop of phone ? On phone browser top menu is not shown
[Coke] >>> (599 recieved) http retry: #01
>>> (599 recieved) http retry: #02
>>> (599 recieved) http retry: #03
also 19:59:32 :: Cannot create a Zef::Distribution from non-existent path: /tmp/.zef.1764014365.76134/Acme%3A%3Aಠ_ಠ%3Aver%3C0.0.1%3E%3Aauth%3Ccpan%3AELIZABETH%3E.tar.gz/Acme-\340\262\240_\340\262\240-0.0.1/META6.json
gist.github.com/coke/5f7bc95bf121f...246fb03c53
melezhik Ok, this is what I needed 20:03
[Coke] If it helps: I'm running this on my mac.
melezhik >>> send request: PUT job file to 127.0.0.1:4000 20:04
[Coke] I assumed that if I'm docker, I'm good. (I did see some instances of "rosetta" running somewhere)
melezhik This is not going to work on your env ))
[Coke] that's from inside the container, right? let me try a curl from inside. 20:05
melezhik Looks like 127.0.0.1 is not resolved from within your docker container
And I have no idea why ))
This explains why reports are not send back to o10r, as test job fails in the end and never reach the mentioned code line 20:06
Line of code
Hold on , it’s not that simple 20:08
[Coke] curl to 127.0.0.1:4000/ and localhost:4000/ both fail from inside the container. 20:10
20:10 finanalyst joined
melezhik I guess it’s just at some points http requests from within docker to agent ui itself 127.0.0.1:4000 stop to work , however they succeed for some time , it’s probably some docker related weirdness , I don’t know what to say … 20:10
[Coke] ps auxww | grep -i sparky | wc -l # 29 lines
melezhik Yes it fails NOW
[Coke] what's the process serving out port 4000 inside the container? 20:11
melezhik But I guess it succeeded at least several time ms before
This agent UI/web server , it serves on 127.0.0.1:4000 within agent
Technically this is sparky web server 20:12
You can reach it from within host machine via localhost:4000
But sparky job itself can’t reach it from within container by 127.0.0.1:4000 which is strange 20:13
[Coke] ... if I do a ps, what's the name of the process? 20:14
it's now not responding from my laptop.
so I'm assuming it's dead.
so I'm guessing it's not a networking issue, but something else.
melezhik Because for example in the beginning of the log you send me some requests from within container to 127.0.0.1:4000 succeed , for example this line “send request: PUT job file to 127.0.0.1:4000/file/project/agent.j...kuenv.txt” 20:15
The name of the process for sparky web server is “ps aux|grep sparky-web" 20:16
599 errors means the port is not served by
[Coke] ok, yup, not running
melezhik That may mean that at some point something ( kernel ? ) kills the process , maybe by OOM 20:17
[Coke] what folder is this stuff running out of in the container?
melezhik You may run dmesg or grep syslog messages from within container
Folder ? 20:18
[Coke] dmesg: read kernel buffer failed: Operation not permitted
where does "sparky-web" live in the container?
melezhik Ah ok
"/opt/sparky/bin/sparky-web" I guess 20:20
[Coke] (just found that with /proc also, thanks) 20:22
is this normal:
2025-11-24T20:20:58.014496Z --- sparkyd: parse sparky job yaml config from: /root/.sparky/projects/agent/sparky.yaml
melezhik Good
[Coke] 2025-11-24T20:20:58.032649Z --- [agent] neither crontab nor scm setup found, consider manual start, SKIP ...
getting a bunch of those while sparky-web is still running
melezhik It’s ok. Those are messages from sparkyd which is sparky job runner , not to confuse with sparky-web 20:23
[Coke] AFK
melezhik And those messages are completely valid
So the task is to find out what and why kill sparky-web at some point at the docker container 20:24
I guess maybe ab5tract had/have similar issue with his podman agent 20:25
Also there’s log for sparky-web itself , however I doubt it will provide anything essential, it’s in /root/.sparky/sparky-web.log or something 20:26
I would try to find something in dmesg or syslog messages relevant to killing Rakudo or sparky-web process which is just a cro application 20:27
I am not sure if this helps but I run the same docker container on MacBook and don’t have such an issue . However I use ORB stack docker container runtime . HTH 20:28
[Coke] again, dmesg doesn't work. 20:30
and the fix requires sysctl which is permission denied.
wow, all my iterm windows just froze. 20:34
20:43 finanalyst left
ab5tract Yeah I’ve had something similar happen too. I’ve tried every which way I could find to set resource restraints on the containers 20:48
Alas, to no avail
[Coke] -1 on debian, I guess. 20:49
22:54 melezhik left 23:01 [Coke] left, [Coke] joined