Skip to content

Fix building locally and run the services#2

Merged
iperezx merged 21 commits intomainfrom
fix-local-builds
Feb 6, 2026
Merged

Fix building locally and run the services#2
iperezx merged 21 commits intomainfrom
fix-local-builds

Conversation

@iperezx
Copy link
Collaborator

@iperezx iperezx commented Jan 29, 2026

  • Adds docker compose file
  • Adds env.example file
  • Update README
  • Add pr kustomize overlay
  • Update kubernetes/README

- Adds docker compose file
- Adds env.example file
- Update README
- Changes the models so that are run on a CPU by default
- Change in weavloader from UNALLOWED_NODES to ALLOWED_NODES
@iperezx iperezx self-assigned this Jan 29, 2026
@iperezx
Copy link
Collaborator Author

iperezx commented Feb 5, 2026

Some last things I will add before I am done with this PR:

  • PR overlay so that we can easily test an instance of the PR on k8s
  • add another entry to the service monitors so that it scrapes weavloader

@FranciscoLozCoding
Copy link
Collaborator

Some last things I will add before I am done with this PR:

  • PR overlay so that we can easily test an instance of the PR on k8s
  • add another entry to the service monitors so that it scrapes weavloader

Awesome! You can merge it when you are done.

@iperezx
Copy link
Collaborator Author

iperezx commented Feb 6, 2026

Seems like it is running as expected. I will merge it once the pipelines finish.

@iperezx
Copy link
Collaborator Author

iperezx commented Feb 6, 2026

Seeing this:

[2026-02-06 20:08:22,551: ERROR/ForkPoolWorker-3] Task job_system.tasks.process_image_task[90c020fb-a15c-42b3-9454-944bad75d648] raised unexpected: UnpickleableExceptionWrapper('tritonclient.utils', 'InferenceServerException', (), 'InferenceServerException()')
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/celery/app/trace.py", line 477, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/celery/app/trace.py", line 760, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/celery/app/autoretry.py", line 60, in run
    ret = task.retry(exc=exc, **retry_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/celery/app/task.py", line 736, in retry
    raise_with_context(exc)
  File "/usr/local/lib/python3.11/site-packages/celery/app/autoretry.py", line 38, in run
    return task._orig_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/job_system/tasks.py", line 203, in process_image_task
    raise self.retry(countdown=retry_delay, exc=exc)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/celery/app/task.py", line 736, in retry
    raise_with_context(exc)
  File "/app/job_system/tasks.py", line 168, in process_image_task
    triton_client = get_triton_client()
                    ^^^^^^^^^^^^^^^^^^^
  File "/app/job_system/tasks.py", line 104, in get_triton_client
    if _triton_client.is_server_ready():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tritonclient/grpc/_client.py", line 344, in is_server_ready
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.11/site-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
celery.utils.serialization.UnpickleableExceptionWrapper: InferenceServerException()

Any clues?

@iperezx
Copy link
Collaborator Author

iperezx commented Feb 6, 2026

I'll just merge it and make the issue for later. Seems like it goes back to a healthy state after some time.

@iperezx iperezx merged commit db79a72 into main Feb 6, 2026
2 checks passed
@FranciscoLozCoding FranciscoLozCoding deleted the fix-local-builds branch February 6, 2026 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants