Problems running CDio 0.45.7 headless on Alpine 3.19_alpha20230901 (edge) VM #1970
Replies: 3 comments 7 replies
-
Moved to 'discussions', there's way too much in your post to unpack |
Beta Was this translation helpful? Give feedback.
-
Heya 1 You dont mention how you installed it at all, docker? pip? unzip? tape drive? 2 If we dont know anything about how you tried to run it.. then i dont know how to help you here if it all |
Beta Was this translation helpful? Give feedback.
-
#1783 the selenium driver options were tweaked, maybe you have an old selenium library version? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Version 0.45.7 running headless on an Alpine 3.19alpha, modifications to
content_fetcher.py
were seemingly required to get everything running.I first had to update Python 3.7 to 3.11 as various things weren't able to compile (pillow cffi wheel errors and so on), which were resolved by unpicking older dependencies on other software. Once compiled, I did an
apk add chromium chromium-webdriver
, which installed Chrome and driver 118.0.5993.117. Selenium 4.15.2On running up CDio, I found that any attempts to use the WebDriver Chrome/Javascript method did not work, erroring immediately at the Chrome stage. Various errors and possible red herrings, including
session not created: DevToolsActivePort file doesn't exist
and other immediate bailouts to errors.CDio was also not capable of detecting the $WEBDRIVER_URL environment variable, so I resorted to adding
export WEBDRIVER_URL="http://localhost:3456" in
/etc/profilethen issued
source /etc/profile` before restarting the service, which appeared to work. (Cose a non-default port for testing.)I then found that Chrome was not being opened with the correct parameters, causing it to crash immediately. example from chromedriver log:
This went on for a while as I gradually picked my way through errors. This server is a minimal headless Alpine VM, so has none of the usual GUI things. In an effort to resolve this issue, I resorted to installing the following:
dbus
.dbus-dev
anddbus-x11
xfce4
andxfce4-terminal
xvfb
andxvfb-run
jpeg-dev
andzlib1g-dev
(for Chrome)I did also try installing firefox and the gecko-driver, but the CDio code seems to expect Chromium and its webdriver, so didn't pursue that further.
These packages seemed to improve things, but I still found Chrome wasn't running properly. I then looked at how I could get Selenium to launch Chrome 'properly', as various things have changed and permanently deprecated (disabled), as I found out while hacking about in the code.
I resorted to focusing on
content_fetcher.py
as it appeared to be doing the legwork of calling the browser. After line 613 (options = ChromeOptions()
), I addedSpecifying
ChromeOptions.add_argument("headless")
makes no difference as it appears to be for either a capabilities-based launch or for the local webdriver (CDio is using the remote webdriver with a few differences, which caught me out for a while due to unfamiliarity). I tried the--remote-debugging-pipe
method as I'd noted some references to fixed bidi control, but it didn't solve my scenario. The--disable-gpu
switch is an older switch which some say is Windows-specific and was required on older versions of Selenium, but has been deprecated as it's no longer needed. I left it because it wasn't breaking anything, and I might run this on Windows in future.The
--headless
switch has recently changed to--headless=new
in Selenium. This is discussed on stackoverflow on comments to a reply (including comments by someone on the Selenium steering group who offered some useful insight) but it's been poorly communicated as older replies abound online to similar sorts of browser problems.After line 622, because my specific site had a mandatory cookie banner, while testing, I added the following xpath clicker (which has also very recently changed syntax, thanks Selenium): (included the comments to myself as a reference)
I realised that I could specify this xpath filter in the Filters & Triggers section, but wasn't sure whether it was also possible to append a
.click()
command. Due to the extreme resource constraints of my VM, running any test takes upwards of a minute, so I was running out of patience debugging my issues. Could this feature be added perhaps in an "actionable" section, to dismiss banners prior to screengrabs, unless it's already possible somehow else?For the sake of completeness, I also amended after line 651,
I modified
to become
otherwise these arguments seemed to not pass to chromedriver, causing an error on testing the webdriver connection. (it also seems to run a crawl/capture after any modification to a check, is this entirely necessary, could it be offered as a per-check option?)
I also modified Line 628's
self.driver.set_window_size(1280, 720)
toself.driver.set_window_size(1920, 1080)
without any apparent undue effects.After these modifications, testing did appear to work cleanly. However I noticed that any small parsing error caused by a coding error in content_fetcher (my fault from the amateur hacking) would often chromedriver to not be able to open or pass control signals to Chromium properly, thus did not properly kill the spawned chromium processes. This resulted in having several abandoned Chromium processes running using a lot of resources. I had to go and perform some
pkill
tidy-up and restart chromedriver. This seemed to be due to Selenium not being told to quit/exit/kill the sessions properly following an exception or error, perhaps that's something which could be done as a failsafe if CDio/content_fetcher script errors out mid-loop?I ended up hacking the file directly as I could see no way of specifying arguments to pass to the webdriver. If I was more skilled with github I could submit a PR but I don't have any kind of managed code environment set up on this machine. I think it might be more useful if a user was able to configure custom Chromium flags they wanted to pass to the webdriver in the CDio settings, and CDio also did some informal checks for things like an X environment and other apparent prerequisites (dbus etc) in order for Chrome to run properly on a headless machine.
Finally there's the apparent 'quirk' of changedetection.io not exiting cleanly on the first Ctrl-C if running either interactively or daemonised. Interactively, this forces you to Ctrl-C again, but start-stop-daemon invoked by OpenRC is easily confused and bails with an error. Early on I had to manually kill the process, tidy up the pidfile and couple of times had to issue a
zap
command (rc-service changedetection zap
) to make openrc 'forget' the service was in acrashed
state. There's probably a neater way of dealing with this than I ended up using in my init file.On the whole now, things are running well. My VM is running on a Synology NAS, so it's not very powerful. However it can manage one Chrome check alongside other plaintext checks no issue, which is all I want it to do at the moment. Chrome takes around 40-60 seconds to spawn and very occasionally bails, though it's now able to fairly reliably perform a successful check. Occasionally it fails or times out, so I don't know whether being able to manually dial in higher timeout thresholds or adjust any implicit/explicit waits for Selenium into the check options through the web GUI might help.
Fantastic application though, really useful. Tried it out after failing/10 with other similar apps/scripts with steeper learning curves once you really got into it, and the Telegram integration was surprisingly simple once I'd followed some instructions online. Not sure to what extent I can include excerpts of monitored text or screengrabs in Telegram notifications; I've not played with it much yet due to the effort required just to get even one test run out of my VM. However, it is working. I call that a win for now. 😄
Beta Was this translation helpful? Give feedback.
All reactions