-
Notifications
You must be signed in to change notification settings - Fork 2
Consolidate legacy services #719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe PR removes the entire Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @docker-compose-legacy.yml:
- Line 48: Replace the hardcoded traefik middleware credential value in the
docker-compose entry (traefik.http.middlewares.auth.basicauth.users) with an
environment variable reference and load that variable from .env (e.g.,
TRAEFIK_BASIC_AUTH_USERS); update the docker-compose service to use
"${TRAEFIK_BASIC_AUTH_USERS}" for traefik.http.middlewares.auth.basicauth.users
and add instructions to set TRAEFIK_BASIC_AUTH_USERS in the gitignored .env (or
secret store) instead of committing the bcrypt hash.
🧹 Nitpick comments (2)
docker-compose-legacy.yml (2)
19-24: Consider pinning image versions for reproducibility.The
swmath2swh:latestanchor and other services (ipfs/kubo at line 197, mscexplainer at line 226, mscbackend at line 235, msceval at line 244) uselatestor unpinned tags. This can lead to unexpected behavior when images are updated. The date-tagged flask image (line 24) is a better approach.
139-141: Remove deprecatedlinksdirective.The
linksdirective is a legacy feature. Services on the same Docker network can already resolve each other by hostname. Thedepends_ondirective handles startup ordering, makinglinksredundant here.Suggested fix
depends_on: - cassandra-oai - elasticsearch-oai - links: - - "cassandra-oai" - - "elasticsearch-oai" volumes:Apply the same change to
oai-provider(lines 160-161).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docker-compose-legacy.yml
🔇 Additional comments (5)
docker-compose-legacy.yml (5)
121-121: Elasticsearch security is disabled.
xpack.security.enabled=falsedisables authentication/authorization. This appears acceptable since there are no Traefik router labels exposing this service externally. Verify this service is only accessed internally byoai-backend.
174-178: Verify intentional loss of state on restart.
OAI_STATE_PATH=/tmp/state.jsoncombined withrestart: alwaysmeans the importer loses its state and restarts from scratch on every container restart. Compare withswmath-swh(line 185, 193) which persists state to a volume. Is this the intended behavior, or should state be persisted?
186-188: Ensure required environment variables are defined.These environment variables are passed through without default values, which is a good pattern for secrets. Ensure your
.envfile or deployment environment definesSWH_API_TOKEN,SWMATH_USER_DEPOSIT_PRODUCTION, andSWMATH_PWD_DEPOSIT_PRODUCTIONto avoid runtime failures.
195-223: LGTM!Good separation between the public IPFS gateway and the authenticated admin RPC interface. The reference to upstream documentation is helpful for maintainability.
275-279: LGTM!The MTU setting of 1450 is appropriate for OpenStack/VXLAN environments where encapsulation adds overhead. This aligns with the PR objective of maintaining services on OpenStack.
docker-compose-legacy.yml
Outdated
| - traefik.http.routers.dashboard.tls.certResolver=le | ||
| - traefik.http.routers.dashboard.service=api@internal | ||
| - traefik.http.routers.dashboard.middlewares=auth | ||
| - traefik.http.middlewares.auth.basicauth.users=mardi:$$2y$$05$$Ubl1B.74bDJkpGHXZ6Y4Xuq8lSx88bi51bmE85/VYf1nQizfKKuH. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded basic auth credentials in version control.
The bcrypt-hashed password is committed to the repository. While the hash is not reversible, it's better practice to reference credentials via environment variables or external secrets management, especially for production configurations.
Suggested approach
- - traefik.http.middlewares.auth.basicauth.users=mardi:$$2y$$05$$Ubl1B.74bDJkpGHXZ6Y4Xuq8lSx88bi51bmE85/VYf1nQizfKKuH.
+ - traefik.http.middlewares.auth.basicauth.users=${TRAEFIK_BASIC_AUTH_USERS}Then define TRAEFIK_BASIC_AUTH_USERS in your .env file (which should be gitignored).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - traefik.http.middlewares.auth.basicauth.users=mardi:$$2y$$05$$Ubl1B.74bDJkpGHXZ6Y4Xuq8lSx88bi51bmE85/VYf1nQizfKKuH. | |
| - traefik.http.middlewares.auth.basicauth.users=${TRAEFIK_BASIC_AUTH_USERS} |
🤖 Prompt for AI Agents
In @docker-compose-legacy.yml at line 48, Replace the hardcoded traefik
middleware credential value in the docker-compose entry
(traefik.http.middlewares.auth.basicauth.users) with an environment variable
reference and load that variable from .env (e.g., TRAEFIK_BASIC_AUTH_USERS);
update the docker-compose service to use "${TRAEFIK_BASIC_AUTH_USERS}" for
traefik.http.middlewares.auth.basicauth.users and add instructions to set
TRAEFIK_BASIC_AUTH_USERS in the gitignored .env (or secret store) instead of
committing the bcrypt hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @docker-compose-legacy.yml:
- Around line 83-84: The jmxremote.password file mounted in the docker-compose
bind mount (./oaipmh/jmxremote.password) currently has overly permissive mode
644; change its permissions to 600 to restrict access by running chmod 600
./oaipmh/jmxremote.password on the host so the JMX credential file used by the
jmxremote.password bind mount is readable/writable only by the owner.
🧹 Nitpick comments (9)
docker-compose-legacy.yml (9)
1-1: Consider removing the deprecatedversionfield.The
versionkey is obsolete in modern Docker Compose. The Compose Specification no longer requires it, and Docker Compose v2+ ignores it.Suggested change
-version: '3.4' -
19-20: Avoid using:latesttags for production deployments.The
swmath2swh:latesttag (and similar ones at lines 195, 224, 233, 242) can lead to unpredictable deployments. Consider pinning to a specific version or SHA digest for reproducibility.
46-46: Hardcoded credentials in version control.The bcrypt hash for basic auth is committed to the repository. While the hash is not directly reversible, consider moving sensitive credentials to environment variables or external secrets management.
Suggested approach
- - traefik.http.middlewares.auth.basicauth.users=mardi:$$2y$$05$$Ubl1B.74bDJkpGHXZ6Y4Xuq8lSx88bi51bmE85/VYf1nQizfKKuH. + - traefik.http.middlewares.auth.basicauth.users=${TRAEFIK_BASIC_AUTH_USERS}
50-61: Portainer exposed without additional middleware protection.Portainer has its own authentication, but consider adding basic auth middleware (like Traefik dashboard) as an additional layer of defense, given it provides full Docker management capabilities.
86-94: Setup container lacks restart policy and has no volume for state.The
cassandra-oai-setupcontainer runs initialization but has no restart policy. If it fails transiently, it won't retry. Consider addingrestart: on-failureif the setup should be retried on failure.
137-139: Thelinksdirective is deprecated.The
linksoption is legacy and unnecessary. Services on the same Docker network can communicate using their service names directly. Consider removing thelinkssections.Suggested change
depends_on: - cassandra-oai - elasticsearch-oai - links: - - "cassandra-oai" - - "elasticsearch-oai" volumes:
147-148: Another hardcoded credential hash.Same concern as line 46 - consider externalizing to environment variables for easier rotation and to avoid committing secrets.
158-159: Remove deprecatedlinksdirective.Same issue as oai-backend - the
linksdirective is unnecessary.Suggested change
depends_on: - oai-backend - links: - - "oai-backend" volumes:
194-195: IPFS image lacks version tag.
ipfs/kubowithout a tag defaults to:latest, which can cause unexpected behavior during deployments. Pin to a specific version.Suggested change
ipfs: - image: ipfs/kubo + image: ipfs/kubo:v0.27.0 # or appropriate version restart: unless-stopped
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docker-compose-legacy.yml
🔇 Additional comments (6)
docker-compose-legacy.yml (6)
32-33: Docker socket mount grants container root-equivalent access.Mounting
/var/run/docker.sockgives Traefik (and any attacker who compromises it) full control over the Docker daemon. This is a known trade-off for Traefik's dynamic configuration. Consider using a Docker socket proxy liketecnativa/docker-socket-proxyto limit API exposure.
111-126: Elasticsearch security is disabled.
xpack.security.enabled=falsemeans no authentication for Elasticsearch. This is acceptable since the service has no Traefik labels and is only accessible within the Docker network. Ensure it remains unexposed externally.
171-176: State stored in/tmpwill be lost on container restart.
OAI_STATE_PATH=/tmp/state.jsonuses ephemeral storage. Withrestart: always, the container will lose its state on every restart, which the comment says is intentional ("start over on each run"). Verify this is the desired behavior for production.
223-248: MSC services use unversioned images from Docker Hub.These services (
mscexplainer,mscbackend,msceval) use:latesttags and have Watchtower auto-updates enabled. While this provides automatic updates, it can introduce breaking changes without warning. The Docker Hub images (aggipp/automscbackend,aggipp/automsceval) should be verified as trusted sources.
250-262: Volume declarations look appropriate.Named volumes are correctly defined for persistent data. The
cassandra-datavolume is properly shared betweencassandra-oaiandcassandra-backupservices for backup operations.
264-268: MTU setting appropriate for OpenStack environment.The MTU of 1450 correctly accounts for VXLAN encapsulation overhead in OpenStack environments, preventing packet fragmentation issues.
docker-compose-legacy.yml
Outdated
| - ./oaipmh/jmxremote.access:/opt/java/openjdk/lib/management/jmxremote.access | ||
| - ./oaipmh/jmxremote.password:/etc/cassandra/jmxremote.password |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the docker-compose-legacy.yml file around lines 83-84
head -n 90 docker-compose-legacy.yml | tail -n 20Repository: MaRDI4NFDI/portal-compose
Length of output: 773
🏁 Script executed:
# Search for the jmxremote.password file in the repository
fd -t f 'jmxremote.password' -x stat -c '%a %n' {} \; 2>/dev/null || echo "File not found in repository"Repository: MaRDI4NFDI/portal-compose
Length of output: 97
🏁 Script executed:
# Also check if the oaipmh directory and related files exist
ls -la oaipmh/ 2>/dev/null || echo "oaipmh directory not found"Repository: MaRDI4NFDI/portal-compose
Length of output: 1039
Fix JMX credential file permissions: change from 644 to 600.
The jmxremote.password file at ./oaipmh/jmxremote.password currently has permissions 644 (readable by group and others), but should be restricted to 600 (readable/writable by owner only) to protect the sensitive JMX credentials. Run chmod 600 ./oaipmh/jmxremote.password to remediate.
🤖 Prompt for AI Agents
In @docker-compose-legacy.yml around lines 83 - 84, The jmxremote.password file
mounted in the docker-compose bind mount (./oaipmh/jmxremote.password) currently
has overly permissive mode 644; change its permissions to 600 to restrict access
by running chmod 600 ./oaipmh/jmxremote.password on the host so the JMX
credential file used by the jmxremote.password bind mount is readable/writable
only by the owner.
|
I think this makes sense, but I would wait a bit until we have finished the experiment of renaming all pages. |
|
Technically, this commit adds a new file and does not change any of the existing services. |
|
I can put everything in a single docker-compose.yml file. I thought about keeping the old ones just for reference but maybe it is not necessary. Are there any experiments planned or running though? I thought you were already running the changes in production. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @docker-compose.yml:
- Around line 111-126: Elasticsearch in the elasticsearch-oai service disables
X-Pack security (xpack.security.enabled=false); update the docker-compose
service definition named elasticsearch-oai to re-enable security (set
xpack.security.enabled=true) and wire up authentication (e.g., set
ELASTIC_USERNAME/ELASTIC_PASSWORD or configure the built-in users/keystore and
provide TLS/cert volumes) or else mark the service explicitly as legacy/archived
with a clear README comment; ensure related clients (oai-backend) are updated to
use the configured credentials and any certificate paths referenced in the
elasticsearch-oai volumes.
🧹 Nitpick comments (7)
docker-compose.yml (7)
17-22: Consider pinning image versions for reproducibility.Using
latest,:main, and major-version-only tags (:2) can lead to unexpected behavior when images are updated upstream. For production stability, consider pinning to specific versions:
- Line 18:
:main→ pin to a specific commit SHA or release tag- Line 20:
:latest→ pin to a specific version- Line 22:
:2→ pin to a specific minor/patch version (e.g.,2.x.y)
46-46: Credentials hash committed to source control.While this is a bcrypt hash (not plaintext), committing authentication credentials to version control is generally discouraged. Consider using an environment variable or external secret management.
- - traefik.http.middlewares.auth.basicauth.users=mardi:$$2y$$05$$Ubl1B.74bDJkpGHXZ6Y4Xuq8lSx88bi51bmE85/VYf1nQizfKKuH. + - traefik.http.middlewares.auth.basicauth.users=${TRAEFIK_BASIC_AUTH_USERS}
74-85: Consider adding a health check for Cassandra.The
cassandra-oai-setupservice useswait-for-it.shto check port availability, but a proper health check oncassandra-oaiwould improve orchestration reliability for all dependent services.♻️ Suggested health check
cassandra-oai: hostname: cassandra-oai image: *cassandra-oai-image environment: LOG4J_FORMAT_MSG_NO_LOOKUPS: "true" + healthcheck: + test: ["CMD-SHELL", "cqlsh -e 'describe cluster'"] + interval: 30s + timeout: 10s + retries: 5 + start_period: 60s volumes: - cassandra-data:/var/lib/cassandra/
137-139: Remove deprecatedlinksdirective.The
linksdirective is deprecated. Sincedepends_onis already specified and services on the same Docker network can communicate by service name, these links are unnecessary.♻️ Remove deprecated links
depends_on: - cassandra-oai - elasticsearch-oai - links: - - "cassandra-oai" - - "elasticsearch-oai" volumes:
158-159: Remove deprecatedlinksdirective.Same as above—the
linksdirective is deprecated and redundant withdepends_on.♻️ Remove deprecated links
depends_on: - oai-backend - links: - - "oai-backend" volumes:
193-221: Minor inconsistencies in IPFS service configuration.
Unpinned image tag (line 195):
ipfs/kubowithout a version tag will pulllatest, which can cause unexpected behavior.Inconsistent restart policy (line 196): Uses
unless-stoppedwhile other services usealways. This may be intentional, but consider aligning for consistency.♻️ Suggested changes
ipfs: - image: ipfs/kubo - restart: unless-stopped + image: ipfs/kubo:v0.24.0 # or appropriate version + restart: always ports:
104-107: Consider read-only mount for backup source data.If
cassandra-backuponly reads fromcassandra-datafor backups, mounting it read-only would follow the principle of least privilege.volumes: - backup-logs:/logs - - cassandra-data:/source_data + - cassandra-data:/source_data:ro - ./cassandra-backup:/backup
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docker-compose-extra.ymldocker-compose.yml
💤 Files with no reviewable changes (1)
- docker-compose-extra.yml
🔇 Additional comments (6)
docker-compose.yml (6)
50-72: LGTM!The portainer and uptime-kuma services are properly configured with appropriate restart policies, volume mounts, and Traefik routing labels.
168-176: Verify ephemeral state is intentional.The
OAI_STATE_PATH=/tmp/state.jsonmeans state is lost on every container restart. Combined withrestart: always, the importer will restart from scratch after any failure or host reboot.The comment suggests this is intentional, but please confirm this won't cause duplicate imports or missed data if the container restarts mid-operation.
178-191: LGTM!The service properly persists state to a volume and loads secrets from the environment. The configuration is consistent and appropriate.
223-248: LGTM!The msc* services use Watchtower for automatic updates, which explains the use of
latesttags. This is a valid pattern for keeping these services current. The Traefik routing configuration is correct.
250-262: LGTM!Volume declarations are properly defined with helpful comments for IPFS volumes.
264-268: LGTM!The MTU 1450 setting is appropriate for OpenStack environments where overlay network overhead requires reduced MTU.
| elasticsearch-oai: | ||
| hostname: elasticsearch-oai | ||
| image: *elasticsearch-oai-image | ||
| environment: | ||
| # - bootstrap.memory_lock=true | ||
| - "ES_JAVA_OPTS=-Dlog4j2.formatMsgNoLookups=true -Xms2g -Xmx2g" | ||
| - "LOG4J_FORMAT_MSG_NO_LOOKUPS=true" | ||
| - discovery.type=single-node | ||
| - xpack.security.enabled=false | ||
| ulimits: | ||
| memlock: | ||
| soft: -1 | ||
| hard: -1 | ||
| volumes: | ||
| - es-logs:/usr/share/elasticsearch/logs | ||
| - es-data:/usr/share/elasticsearch/data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Read the docker-compose.yml file around the flagged lines
wc -l docker-compose.yml && echo "---" && sed -n '100,140p' docker-compose.ymlRepository: MaRDI4NFDI/portal-compose
Length of output: 1251
🏁 Script executed:
# Check if there are any other elasticsearch configurations or references
rg "elasticsearch" docker-compose.yml -i --context=3Repository: MaRDI4NFDI/portal-compose
Length of output: 1247
🏁 Script executed:
# Look for any security-related configurations or documentation
find . -type f \( -name "*.md" -o -name "*.env*" -o -name "*.yml" -o -name "*.yaml" \) -not -path "./node_modules/*" -not -path "./.git/*" | head -30 | xargs grep -l "elasticsearch\|xpack.security" 2>/dev/null | head -10Repository: MaRDI4NFDI/portal-compose
Length of output: 206
🏁 Script executed:
# Check the README and any security documentation
head -100 README.mdRepository: MaRDI4NFDI/portal-compose
Length of output: 4207
🏁 Script executed:
# Check docker-compose-dev.yml and docker-compose-ci.yml to see if they also disable security
cat docker-compose-dev.yml | grep -A 15 "elasticsearch-oai"Repository: MaRDI4NFDI/portal-compose
Length of output: 51
🏁 Script executed:
# Check the main docker-compose.yml for any Traefik routes or port exposure
rg "ports:|expose:" docker-compose.yml -A 2 | head -40Repository: MaRDI4NFDI/portal-compose
Length of output: 210
🏁 Script executed:
# Look for any environment or deployment documentation
find . -type f \( -name "*.md" -o -name "SECURITY*" -o -name "DEPLOYMENT*" \) -not -path "./.git/*" | head -20Repository: MaRDI4NFDI/portal-compose
Length of output: 107
🏁 Script executed:
# Check docker-compose-dev.yml more thoroughly for elasticsearch configuration
cat docker-compose-dev.yml | grep -A 20 "elasticsearch-oai:" || echo "Not found in dev"Repository: MaRDI4NFDI/portal-compose
Length of output: 85
🏁 Script executed:
# Check if there are any network definitions or security controls
rg "networks:|driver_opts:" docker-compose.yml -B 2 -A 5Repository: MaRDI4NFDI/portal-compose
Length of output: 755
🏁 Script executed:
# Look for any oai-elasticsearch.yml mentioned in search results to understand elasticsearch setup
cat oaipmh/oai-elasticsearch.yml | head -50Repository: MaRDI4NFDI/portal-compose
Length of output: 1730
Elasticsearch security is disabled, but this is archived/legacy code.
Line 119 disables xpack security (xpack.security.enabled=false). While this technically means no authentication, the elasticsearch-oai service is internal-only (not exposed via Traefik) and only accessed by oai-backend on the Docker bridge network. More importantly, this repository is archived—the README states the primary MaRDI deployment moved to Kubernetes in October 2025, and this docker-compose configuration is no longer actively used.
If reusing this configuration for new deployments, enable xpack security or migrate to the k8s deployment method.
🤖 Prompt for AI Agents
In @docker-compose.yml around lines 111 - 126, Elasticsearch in the
elasticsearch-oai service disables X-Pack security
(xpack.security.enabled=false); update the docker-compose service definition
named elasticsearch-oai to re-enable security (set xpack.security.enabled=true)
and wire up authentication (e.g., set ELASTIC_USERNAME/ELASTIC_PASSWORD or
configure the built-in users/keystore and provide TLS/cert volumes) or else mark
the service explicitly as legacy/archived with a clear README comment; ensure
related clients (oai-backend) are updated to use the configured credentials and
any certificate paths referenced in the elasticsearch-oai volumes.
MaRDI Pull Request
Changes:
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.