-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Make K8s tasks persist task report + Fix MSQ INSERT/REPLACE querying problem. #18206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
After making the appropriate changes, I find that tasks started by pod templates are able to execute the task without reporting the 404 error to frontend, but K8s tasks that do not use Pod Templates still fail. Going into the job container shows a mismatch between expected task report directory, and actual task report directory:
Findings:
Current Solution in Mind: |
FrankChen021
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Converting to Draft - Looking into case where baseTaskDir is not set for Pod Template deployment... |
|
For forward-compatibility purposes: Added the ability to replace baseTaskDir environment variable, since we will be heading in the direction that allows Druid to automatically generate the task directory for us. How it's doneIn Kubernetes, if an environment variable is defined multiple times, the last occurrence takes precedence. Doing a |
|
Made sure CICD passes again before merge. PTAL @FrankChen021 |
|
@cryptoe why is this removed from druid 34 release? |
|
@FrankChen021 This PR did not make it in time for the Druid 34 code freeze. |
|
@GWphua, I have been observing some failures in Druid clusters after this PR was merged. The cause is the same as the error you described. From what I can understand, the fix is changing the configuration of the base directory, which might be causing this issue (failure to find reports to push). I have raised a PR with an alternate fix that does resolve the issues that I am seeing (#18379). Could you please check if this also resolves the issue that you were initially seeing? |
|
Hello @adarshsanjeev, thanks for letting me know. Will try to get some clusters on running with your changes next week. Just want to understand about the environment you are deploying on which caused failures: are your clusters deployed using the Kubernetes extension (With or without PodTemplates?), and are there any notable configurations made to your baseTaskDir (e.g. made specifically for Peon tasks?) |
|
Hello @adarshsanjeev, update on the issue: I created a cluster with your changes, and am able to run MSQ tasks + see the task reports. Thanks! |
| druid.indexer.task.encapsulatedTask=true | ||
| ``` | ||
|
|
||
| **Note**: Prior to Druid 35.0.0, you will need the `druid.indexer.task.baseTaskDir` runtime property, along with the `TASK_DIR` and `attemptId` arguments to `/peon.sh` to run your jobs. There is no need for that now as Druid will automatically configure the task directory. You can still choose to customize the target task directory by adjusting `druid.indexer.task.baseTaskDir` on the Overlord service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
druid.indexer.task.baseTaskDir -> druid.indexer.task.baseDir, the old document is wrong already, 🤣
Fixes #18197
This problem is introduced through this PR in Druid v26.
Problem Statement
Using the
druid-multi-stage-queryextension with K8s, using theINSERTorREPLACEquery will result in afailed with status code 404message. However, the query is able to succeed uneventfully on a MM architecture. (See related issue for screenshots.)Logs
Broker Logs
Overlord Logs
Investigation
Root Cause Analysis
druid/v2/sql/statement/{queryId}.{"error":"druidException","errorCode":"notFound","persona":"USER","category":"NOT_FOUND","errorMessage":"Query [query-a72b5075-0dae-48bf-b1b8-7f66c2e070e6] was not found. The query details are no longer present or might not be of the type [query_controller]. Verify that the id is correct.","context":{}}Code Dive
Looking into Dashboard API Call
druid/v2/sql/statement/{queryId}?detail=true.SqlStatementResource#doGetStatus, and proceeds into the following steps:SqlStatementResource#getStatementStatusResponseif exists, else build and return a non-OKResponse.Get Statement Status
Contact the overlord for the task status with
taskId=queryIdat/druid/indexer/v1/tasks/{taskId}/status.OverlordResource#getTaskStatusContact the overlord for the task report with
taskId=queryIdat/druid/indexer/v1/tasks/{taskId}/reports.Calls
OverlordResource#doGetReports, which callsSwitchingTaskLogStreamer#streamTaskReports.SwitchingTaskLogStreamerwill first query task report from the task runner (Kubernetes pods in K8s context), before trying to find task reports in deep storage.Looking into the above Overlord logs, there is a stack trace that surfaces after the task completes. This shows us that the Overlord is unable to contact the exited Peon pod to find the task report, and then unable to find the task report from deep storage.
Hypothesis
Given the trace from above, there is reason to believe that the task report cannot be found in the Deep Storage. We are now guessing that the code is unable to push the task report to HDFS (Deep Storage).
Kubernetes Task Reports Not Uploaded Into Deep Storage During Task Cleanup
In the Kubernetes context, the K8s tries to push task reports, followed by task status via
AbstractTaskwhen cleaning up the task.Checking the logs, I found that the only the task status is pushed, but not the task reports.
Entering into the bash terminal of the Peon pod, I found that the report file exists in
./var/tmp/attempt/1/report.jsonThrough examination, we note that the AbstractTask is trying to read from
./var/tmp/{queryId}/attempt/{attemptId}/report.jsoninstead.The missing
queryIdin the path is probably a cause for the error.This problem will not happen for MM architecture.
Why Task Report is Written to the Wrong Path
Under CliPeon, we have binded TaskReportFileWriter to SingleFileTaskReportFileWriter. This SingleFileTaskReportFileWriter is configured to write to the input
File.CliPeon is called with variables:
Main internal peon taskDirPath attemptId.Under the K8s Model, we do not have the queryId in
taskDirPath:org.apache.druid.cli.Main internal peon /opt/druid/var/tmp/ 1 --taskId query-5f496cd7-9c23-43c8-a666-76fa81825a2fUnder the MM Model, we have the queryId in
taskDirPath:org.apache.druid.cli.Main internal peon var/druid/task/slot0/query-5f496cd7-9c23-43c8-a666-76fa81825a2f 1 --loadBroadcastDatasourceMode NONEHowever, the path of
reportFileandstatusFileinAbstractTaskis constructed with thequeryIdHence, this causes a mismatch between the file path finding.
The reason why Druid is able to find the
statusFileis because of a last-minute write into the status file inAbstractTask#cleanUp:Comparison Between MM and MM-less
Task Creation
MM Architecture constructs the peon startup command from
ForkingTaskRunner#run:MM-less Architecture constructs the peon startup command from
K8sTaskAdapter#generateCommand.MM-less Architecture with PodTemplate constructs the peon startup command via Helm Chart, with environment variables as constructed in
PodTemplateTaskAdapter#getEnv.Solution
Pass the
TASK_DIRinto peon.sh. ThisTASK_DIRwill be of the form:baseTaskDir/taskId.Impact
Release note
/peon.shscript arguments, and configurebaseTaskDiranymore, the Overlord will construct this for you automatically.Key changed/added classes in this PR
CliPeondistribution/docker/peon.shThis PR has: