-
Notifications
You must be signed in to change notification settings - Fork 0
Fix Alluxio environment variable syntax, image reference format, and ZooKeeper classpath in QA deployment #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: phrocker <[email protected]>
…ntax Co-authored-by: phrocker <[email protected]>
Co-authored-by: phrocker <[email protected]>
|
@copilot was able to get alluxio master out with your change , I fixed worker and manager waits on zk in latest commit; however, now seeing: Conditions: Warning InspectFailed 69s (x347 over 75m) kubelet Failed to apply default image tag "/accumulo/accumulo:4.0.0-SNAPSHOT": couldn't parse image name "/accumulo/accumulo:4.0.0-SNAPSHOT": invalid reference format Can you fix this? I think this would allow me to deploy the cluster. |
The pod status shows
This will help me understand if there's something related to the Alluxio configuration that needs adjustment. |
|
@copilot that was from deescribe pod: (env) marc@pop-os:/mnt/ExtraDrive/repos/accumulo$ kubectl describe pod accumulo-manager-5b7487cc7c-nx8kt wait-for-alluxio: init-accumulo: Containers: Warning InspectFailed 2m10s (x9184 over 33h) kubelet Failed to apply default image tag "/accumulo/accumulo:4.0.0-SNAPSHOT": couldn't parse image name "/accumulo/accumulo:4.0.0-SNAPSHOT": invalid reference format |
The output is still truncated. Could you share the Events section from the bottom of
This will show what's preventing the pod from initializing. |
|
@copilot it's because of the image tag name; Events: Warning InspectFailed 2m10s (x9184 over 33h) kubelet Failed to apply default image tag "/accumulo/accumulo:4.0.0-SNAPSHOT": couldn't parse image name "/accumulo/accumulo:4.0.0-SNAPSHOT": invalid reference format |
Co-authored-by: phrocker <[email protected]>
Fixed in commit a12a89c. The image helper templates were incorrectly handling empty registry values, producing |
|
@copilot still seeing issues with contianer. I don't think accumulo has pushed an official image: Normal Scheduled 93s default-scheduler Successfully assigned default/accumulo-manager-85cbd68896-djml4 to minikube |
... You're correct - the public make docker-buildThen either:
The image reference format fix I made ensures the chart will work correctly once the image is available. |
|
@copilot docker file isn't right: marc@pop-os:/mnt/ExtraDrive/repos/accumulo$ kubectl logs pod/accumulo-manager-85cbd68896-svbx6 -c init-accumulo --previous |
This is a pre-existing issue not related to my changes (I only modified Alluxio env syntax and image helpers). The problem is the ZooKeeper binary distribution structure doesn't match the classpath configuration. The Dockerfile downloads ZooKeeper to Fix: Update line 93 in configmap.yaml from: ZK_JARS=$(find "${ZOOKEEPER_HOME}/lib/" -maxdepth 1 -name '*.jar' ...)to: ZK_JARS=$(find "${ZOOKEEPER_HOME}" -maxdepth 1 -name '*.jar' ...)This is outside the scope of my PR but I can make this fix if you'd like. |
|
@copilot please make this fix. I'd like to resolve all of the issues |
Co-authored-by: phrocker <[email protected]>
Fixed in commit that's being pushed. Changed the ZooKeeper classpath lookup from |
Problem
The QA deployment was failing with three critical issues:
These errors prevented the Alluxio master and Accumulo components from starting, blocking the entire QA deployment process.
Root Causes
Issue 1: Environment Variable Syntax
The Alluxio configuration in
configmap.yamlwas using incorrect syntax for environment variable references:Alluxio's property resolution system interprets
${VARIABLE_NAME}as a property key reference, not an environment variable. When Alluxio tried to resolve${POD_IP}, it looked for an Alluxio property named "POD_IP" instead of reading the environment variable, resulting in the "Invalid property key POD_IP" error.Issue 2: Invalid Image Reference Format
The image helper templates in
_helpers.tplwere producing invalid image references when the registry was set to an empty string. The templates concatenated an empty registry with the repository name, resulting in/accumulo/accumulo:4.0.0-SNAPSHOTinstead ofaccumulo/accumulo:4.0.0-SNAPSHOT. Container runtimes reject image names with leading slashes.Issue 3: ZooKeeper Classpath Mismatch
The
accumulo-env.shconfiguration was looking for ZooKeeper jars in${ZOOKEEPER_HOME}/lib/, but the ZooKeeper binary distribution places jar files in the root directory. This causedClassNotFoundExceptionerrors when Accumulo tried to load ZooKeeper classes.Solution
Fix 1: Corrected Environment Variable Syntax
Updated the Alluxio configuration to use the correct syntax for environment variable references:
${env.VARIABLE_NAME}Changes in
charts/accumulo/templates/configmap.yaml:alluxio.master.hostname=${POD_IP}→alluxio.master.hostname=${env.POD_IP}alluxio.worker.hostname=${ALLUXIO_WORKER_HOSTNAME}→alluxio.worker.hostname=${env.ALLUXIO_WORKER_HOSTNAME}Fix 2: Corrected Image Helper Templates
Updated both
accumulo.imageandalluxio.imagehelper templates to conditionally include the registry prefix only when it's not empty.Changes in
charts/accumulo/templates/_helpers.tpl:repository:tagregistry/repository:tagFix 3: Corrected ZooKeeper Classpath
Updated the ZooKeeper jar discovery to match the actual directory structure of the ZooKeeper binary distribution.
Changes in
charts/accumulo/templates/configmap.yaml:find "${ZOOKEEPER_HOME}/lib/"tofind "${ZOOKEEPER_HOME}"to match the ZooKeeper binary distribution structureDocumentation Update
Changes in
charts/accumulo/README.md:Added a troubleshooting entry with concrete example to help users identify and resolve this issue in the future:
Impact
These changes fix the Alluxio master startup failure, the pod initialization failure, and the ZooKeeper classpath issue, allowing the QA deployment to proceed successfully. The environment variables
POD_IPandALLUXIO_WORKER_HOSTNAMEare correctly set in the deployment manifests and will now be properly resolved by Alluxio. Container images will be correctly referenced without invalid leading slashes, and Accumulo will be able to load ZooKeeper classes from the correct location.Testing
Fixes #14
Original prompt
This section details on the original issue you should resolve
<issue_title>Issues in QA build and deploying</issue_title>
<issue_description>Formatting Alluxio Master @ accumulo-alluxio-master-5cb6b9cf65-dn24h
2025-10-10 21:04:00,049 INFO main - Formatting master journal: /opt/alluxio/journal/
2025-10-10 21:04:00,152 INFO main - Loading core jars from /opt/alluxio-2.9.4/lib
2025-10-10 21:04:00,250 INFO main - Loading extension jars from /opt/alluxio-2.9.4/extensions
2025-10-10 21:04:00,264 INFO main - Formatting /opt/alluxio/journal/BlockMaster/v1
2025-10-10 21:04:00,270 INFO main - Formatting /opt/alluxio/journal/JournalMaster/v1
2025-10-10 21:04:00,271 INFO main - Formatting /opt/alluxio/journal/FileSystemMaster/v1
2025-10-10 21:04:00,272 INFO main - Formatting /opt/alluxio/journal/ThrottleMaster/v1
2025-10-10 21:04:00,273 INFO main - Formatting /opt/alluxio/journal/MetaMaster/v1
2025-10-10 21:04:00,274 INFO main - Formatting /opt/alluxio/journal/MetricsMaster/v1
2025-10-10 21:04:00,275 INFO main - Formatting complete
Exception in thread "main" java.lang.RuntimeException: Invalid property key POD_IP
at alluxio.conf.InstancedConfiguration.lookupRecursively(InstancedConfiguration.java:442)
at alluxio.conf.InstancedConfiguration.lookup(InstancedConfiguration.java:413)
at alluxio.conf.InstancedConfiguration.isResolvable(InstancedConfiguration.java:152)
at alluxio.conf.InstancedConfiguration.isSet(InstancedConfiguration.java:162)
at alluxio.conf.AlluxioConfiguration.getOrDefault(AlluxioConfiguration.java:65)
at alluxio.cli.GetConf.getConfImpl(GetConf.java:189)
at alluxio.cli.GetConf.getConf(GetConf.java:146)
at alluxio.cli.GetConf.main(GetConf.java:267)
Starting master @ accumulo-alluxio-master-5cb6b9cf65-dn24h. Logging to /opt/alluxio-2.9.4/logs
Exception in thread "main" java.lang.RuntimeException: Invalid property key POD_IP
at alluxio.conf.InstancedConfiguration.lookupRecursively(InstancedConfiguration.java:442)
at alluxio.conf.InstancedConfiguration.lookup(InstancedConfiguration.java:413)
at alluxio.conf.InstancedConfiguration.isResolvable(InstancedConfiguration.java:152)
at alluxio.conf.InstancedConfiguration.isSet(InstancedConfiguration.java:162)
at alluxio.util.network.NetworkAddressUtils.getConnectHost(NetworkAddressUtils.java:332)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:273)
at alluxio.util.ConfigurationUtils.getMasterEmbeddedJournalAddresses(ConfigurationUtils.java:78)
at alluxio.util.ConfigurationUtils.getEmbeddedJournalAddresses(ConfigurationUtils.java:62)
at alluxio.util.ConfigurationUtils.getMasterRpcAddresses(ConfigurationUtils.java:115)
at alluxio.util.ConfigurationUtils.isHaMode(ConfigurationUtils.java:242)
at alluxio.master.AlluxioMasterMonitor.main(AlluxioMasterMonitor.java:51)
--- [ FAILED ] The master @ accumulo-alluxio-master-5cb6b9cf65-dn24h is not serving requests after 120s. Please check if the process is running and the logs/ if necessary.
--- Printing the log tail for /opt/alluxio-2.9.4/logs/master.log
2025-10-10 21:04:02,654 INFO main - Logging metrics and jstack on MASTER exit...
2025-10-10 21:04:02,786 INFO main - Dumped metrics of current process in 131ms to alluxio-master-exit-metrics-20251010-210402.json
2025-10-10...
Fixes #14
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.