Skip to content

Commit 03c1903

Browse files
rajataryaCopilot
andauthored
Updated diagnostics scripts to collect logs (#542)
- also updated README - added analysis script to load latest dump collected --------- Co-authored-by: Copilot <[email protected]>
1 parent 85b5ba5 commit 03c1903

File tree

5 files changed

+351
-31
lines changed

5 files changed

+351
-31
lines changed

README.md

Lines changed: 47 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Please join us in making xet-core better. We value everyone's contributions. Cod
4343

4444
## Issues, Diagnostics & Debugging
4545

46-
If you encounter an issue when using `hf-xet` please help us fix the issue by collecting diagnostic information and attaching that when creating a [new Issue](https://github.com/huggingface/xet-core/issues/new/choose). Download the [hf-xet-diag-linux.sh](hf-xet-diag-linux.sh) or [hf-xet-diag-windows.sh](hf-xet-diag-windows.sh) script based on your operating system and then re-run the python command that resulted in the issue. The diagnostic scripts will download and install debug symbols, setup up logging, and take periodic stack traces throughout process execution in a diagnostics directory that is easy to analyze, package, and upload.
46+
If you encounter an issue when using `hf-xet` please help us fix the issue by collecting diagnostic information and attaching that when creating a [new Issue](https://github.com/huggingface/xet-core/issues/new/choose). Download the [hf-xet-diag-linux.sh](hf-xet-diag-linux.sh), [hf-xet-diag-macos.sh](hf-xet-diag-macos.sh), or [hf-xet-diag-windows.sh](hf-xet-diag-windows.sh) script based on your operating system and then re-run the python command that resulted in the issue. The diagnostic scripts will download and install debug symbols, setup up logging, and take periodic stack traces throughout process execution in a diagnostics directory that is easy to analyze, package, and upload.
4747

4848
### Diagnostics - Linux (`hf-xet-diag-linux.sh`)
4949

@@ -103,7 +103,7 @@ sudo xcode-select --install
103103

104104
### Output Layout
105105

106-
Both scripts produce a diagnostics directory named:
106+
The diagnostic scripts produce a diagnostics directory named:
107107

108108
```
109109
diag_<command>_<timestamp>/
@@ -120,53 +120,70 @@ This unified layout makes it easier to compare diagnostics across platforms.
120120

121121
### Analyzing Dumps
122122

123-
### Usage
123+
Use the [hf-xet-diag-analyze-latest.sh](hf-xet-diag-analyze-latest.sh) script to automatically find and open the most recent dump in the appropriate debugger for your platform.
124124

125-
From your repo root:
125+
**Usage:**
126126

127127
```bash
128-
./analyze-latest.sh
128+
./hf-xet-diag-analyze-latest.sh
129129
```
130130

131-
* Finds the most recent `diag_*` directory.
132-
* Opens the latest dump inside:
131+
* Auto-detects your OS (Linux, macOS, or Windows)
132+
* Finds the most recent `diag_*` directory
133+
* Opens the latest dump in the platform-appropriate debugger:
134+
* **Linux:** `gdb` with core dumps from `dumps/`
135+
* **macOS:** `lldb` with `.core` files from `dumps/`
136+
* **Windows (Git-Bash):** `windbg` with `.dmp` files from `stacks/`
133137

134-
* **Linux:** opens `dumps/core_*` in `gdb`.
135-
* **Windows (Git-Bash):** opens `stacks/*.dmp` in **WinDbg** (`windbg` must be on PATH).
136-
* You can also pass a base directory if your diagnostics are stored elsewhere:
138+
You can also specify a diagnostics directory:
137139

138-
```bash
139-
./analyze-latest.sh /path/to/diagnostics
140-
```
140+
```bash
141+
./hf-xet-diag-analyze-latest.sh diag_python_hfxet_test_20250127120000
142+
```
141143

142-
**Linux**
144+
**Manual Analysis**
145+
146+
If you prefer to analyze dumps manually:
143147

144-
* Stack traces are saved under `stacks/` as plain text.
145-
* Core dumps (`dumps/core_*`) can be analyzed with gdb:
148+
**Linux**
149+
* Stack traces: `stacks/*.txt` (plain text, captured periodically)
150+
* Core dumps: `dumps/core_*`
151+
* Analysis:
152+
```bash
153+
gdb python dumps/core_<timestamp>.<pid>
154+
(gdb) bt # backtrace of current thread
155+
(gdb) thread apply all bt # backtrace of all threads
156+
(gdb) info threads # list all threads
157+
```
158+
* Ensure debug symbols (`hf_xet-*.so.dbg`) are in the `hf_xet` package directory
146159

160+
**macOS**
161+
* Stack traces: `stacks/*.txt` (from `sample` command)
162+
* Core dumps: `dumps/dump_<pid>_<timestamp>.core`
163+
* Analysis:
147164
```bash
148-
gdb python dumps/core_<pid>
149-
(gdb) bt # backtrace
150-
(gdb) thread apply all bt
165+
lldb -c dumps/dump_<pid>_<timestamp>.core python3
166+
(lldb) bt # backtrace of current thread
167+
(lldb) thread backtrace all # backtrace of all threads
168+
(lldb) thread list # list all threads
151169
```
152-
* Ensure the matching debug symbols (`hf_xet-*.dbg`) are in the `hf_xet` package directory.
170+
* Ensure debug symbols (`hf_xet-*.dylib.dSYM`) are in the `hf_xet` package directory
153171

154172
**Windows**
155-
156-
* Dumps are saved under `stacks/` as `.dmp` files.
157-
* Open `.dmp` files in **WinDbg** (install via [Windows SDK](https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk/)):
158-
173+
* Dumps: `stacks/dump_<timestamp>.dmp`
174+
* Install [WinDbg via Windows SDK](https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/)
175+
* Analysis:
159176
```cmd
160-
windbg -z dump_20250101_120000.dmp
177+
windbg -z stacks\dump_<timestamp>.dmp
161178
```
162179
* Common WinDbg commands:
163-
164180
```
165-
!analyze -v # Automatic analysis
166-
~* kb # Show stack traces for all threads
167-
lm # List loaded modules (verify hf_xet.pdb loaded)
181+
!analyze -v # automatic analysis
182+
~* kb # backtrace of all threads
183+
~ # list all threads
184+
lm # list loaded modules (verify hf_xet.pdb loaded)
168185
```
169-
* Ensure `hf_xet.pdb` is installed in the `hf_xet` package directory so symbols load correctly.
186+
* Ensure debug symbols (`hf_xet.pdb`) are in the `hf_xet` package directory
170187

171188
---
172189

hf-xet-diag-analyze-latest.sh

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
#!/usr/bin/env bash
2+
# hf-xet-diag-analyze-latest.sh — Cross-platform dump analyzer
3+
# Finds the latest diagnostics directory and opens the most recent dump
4+
# in the appropriate debugger for your platform (gdb, lldb, or WinDbg).
5+
6+
set -Eeuo pipefail
7+
8+
print_usage() {
9+
cat <<'USAGE'
10+
Usage: hf-xet-diag-analyze-latest.sh [diagnostics-directory]
11+
12+
Finds and analyzes the latest dump from a diagnostics collection.
13+
14+
Arguments:
15+
diagnostics-directory Path to a specific diag_* directory
16+
(default: latest diag_* in current directory)
17+
18+
Examples:
19+
./hf-xet-diag-analyze-latest.sh
20+
./hf-xet-diag-analyze-latest.sh diag_python_hfxet_test_20250127120000
21+
22+
This script will:
23+
- Auto-detect your OS (Linux, macOS, or Windows)
24+
- Find the most recent dump file
25+
- Launch the appropriate debugger:
26+
* Linux: gdb with core dumps from dumps/
27+
* macOS: lldb with .core files from dumps/
28+
* Windows: WinDbg with .dmp files from stacks/
29+
USAGE
30+
}
31+
32+
# --- option parsing ---
33+
if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
34+
print_usage
35+
exit 0
36+
fi
37+
38+
DIAG_DIR="${1:-}"
39+
40+
# --- find diagnostics directory ---
41+
if [[ -z "$DIAG_DIR" ]]; then
42+
# Find the latest diag_* directory in current directory
43+
DIAG_DIR=$(find . -maxdepth 1 -type d -name "diag_*" -print0 2>/dev/null | \
44+
xargs -0 ls -dt 2>/dev/null | head -1 || true)
45+
46+
if [[ -z "$DIAG_DIR" ]]; then
47+
echo "ERROR: No diag_* directories found in current directory."
48+
echo "Please specify a diagnostics directory or run from a directory containing diag_* folders."
49+
exit 1
50+
fi
51+
52+
echo "Found latest diagnostics directory: $DIAG_DIR"
53+
elif [[ ! -d "$DIAG_DIR" ]]; then
54+
echo "ERROR: Directory not found: $DIAG_DIR"
55+
exit 1
56+
fi
57+
58+
# --- detect OS ---
59+
OS_TYPE=""
60+
case "${OSTYPE:-}" in
61+
linux*) OS_TYPE="linux" ;;
62+
darwin*) OS_TYPE="macos" ;;
63+
msys*|mingw*|cygwin*) OS_TYPE="windows" ;;
64+
*)
65+
# Fallback: check uname
66+
UNAME=$(uname -s 2>/dev/null || echo "")
67+
case "$UNAME" in
68+
Linux*) OS_TYPE="linux" ;;
69+
Darwin*) OS_TYPE="macos" ;;
70+
MINGW*|MSYS*|CYGWIN*) OS_TYPE="windows" ;;
71+
*)
72+
echo "ERROR: Unsupported OS: ${OSTYPE:-unknown} / ${UNAME:-unknown}"
73+
exit 1
74+
;;
75+
esac
76+
;;
77+
esac
78+
79+
echo "Detected OS: $OS_TYPE"
80+
81+
# --- find latest dump file ---
82+
DUMP_FILE=""
83+
84+
case "$OS_TYPE" in
85+
linux)
86+
# Linux: look for core dumps in dumps/ directory
87+
if [[ -d "$DIAG_DIR/dumps" ]]; then
88+
DUMP_FILE=$(find "$DIAG_DIR/dumps" -type f -name "core_*" -print0 2>/dev/null | \
89+
xargs -0 ls -t 2>/dev/null | head -1 || true)
90+
fi
91+
92+
if [[ -z "$DUMP_FILE" ]]; then
93+
echo "ERROR: No core dumps found in $DIAG_DIR/dumps/"
94+
echo "Core dumps should be named: core_<timestamp>.<pid>"
95+
exit 1
96+
fi
97+
;;
98+
99+
macos)
100+
# macOS: look for .core files in dumps/ directory
101+
if [[ -d "$DIAG_DIR/dumps" ]]; then
102+
DUMP_FILE=$(find "$DIAG_DIR/dumps" -type f -name "*.core" -print0 2>/dev/null | \
103+
xargs -0 ls -t 2>/dev/null | head -1 || true)
104+
fi
105+
106+
if [[ -z "$DUMP_FILE" ]]; then
107+
echo "ERROR: No core dumps found in $DIAG_DIR/dumps/"
108+
echo "Core dumps should be named: dump_<pid>_<timestamp>.core"
109+
exit 1
110+
fi
111+
;;
112+
113+
windows)
114+
# Windows: look for .dmp files in stacks/ directory
115+
if [[ -d "$DIAG_DIR/stacks" ]]; then
116+
DUMP_FILE=$(find "$DIAG_DIR/stacks" -type f -name "*.dmp" -print0 2>/dev/null | \
117+
xargs -0 ls -t 2>/dev/null | head -1 || true)
118+
fi
119+
120+
if [[ -z "$DUMP_FILE" ]]; then
121+
echo "ERROR: No dump files found in $DIAG_DIR/stacks/"
122+
echo "Dump files should be named: dump_<timestamp>.dmp"
123+
exit 1
124+
fi
125+
;;
126+
esac
127+
128+
echo "Found dump file: $DUMP_FILE"
129+
130+
# --- determine python executable ---
131+
PYTHON_EXE=""
132+
for py_candidate in python3 python; do
133+
if command -v "$py_candidate" >/dev/null 2>&1; then
134+
PYTHON_EXE=$(command -v "$py_candidate")
135+
break
136+
fi
137+
done
138+
139+
if [[ -z "$PYTHON_EXE" ]]; then
140+
echo "WARNING: Could not find python executable. Using 'python' as fallback."
141+
PYTHON_EXE="python"
142+
else
143+
echo "Using python executable: $PYTHON_EXE"
144+
fi
145+
146+
# --- launch debugger ---
147+
case "$OS_TYPE" in
148+
linux)
149+
if ! command -v gdb >/dev/null 2>&1; then
150+
echo "ERROR: gdb not found. Install with: sudo apt-get install gdb"
151+
exit 1
152+
fi
153+
154+
echo ""
155+
echo "======================================"
156+
echo "Opening dump in GDB..."
157+
echo "======================================"
158+
echo "Useful commands:"
159+
echo " (gdb) bt # backtrace of current thread"
160+
echo " (gdb) thread apply all bt # backtrace of all threads"
161+
echo " (gdb) info threads # list all threads"
162+
echo " (gdb) quit # exit gdb"
163+
echo "======================================"
164+
echo ""
165+
166+
exec gdb "$PYTHON_EXE" "$DUMP_FILE"
167+
;;
168+
169+
macos)
170+
if ! command -v lldb >/dev/null 2>&1; then
171+
echo "ERROR: lldb not found. Install with: xcode-select --install"
172+
exit 1
173+
fi
174+
175+
echo ""
176+
echo "======================================"
177+
echo "Opening dump in LLDB..."
178+
echo "======================================"
179+
echo "Useful commands:"
180+
echo " (lldb) bt # backtrace of current thread"
181+
echo " (lldb) thread backtrace all # backtrace of all threads"
182+
echo " (lldb) thread list # list all threads"
183+
echo " (lldb) quit # exit lldb"
184+
echo "======================================"
185+
echo ""
186+
187+
exec lldb -c "$DUMP_FILE" "$PYTHON_EXE"
188+
;;
189+
190+
windows)
191+
# Check for various WinDbg installations
192+
WINDBG_EXE=""
193+
194+
# Check if windbg is on PATH
195+
if command -v windbg.exe >/dev/null 2>&1; then
196+
WINDBG_EXE="windbg.exe"
197+
elif command -v windbgx.exe >/dev/null 2>&1; then
198+
WINDBG_EXE="windbgx.exe"
199+
else
200+
# Common installation paths
201+
for dbg_path in \
202+
"/c/Program Files (x86)/Windows Kits/10/Debuggers/x64/windbg.exe" \
203+
"/c/Program Files (x86)/Windows Kits/10/Debuggers/x86/windbg.exe" \
204+
"$PROGRAMFILES/Windows Kits/10/Debuggers/x64/windbg.exe" \
205+
"${PROGRAMFILES_X86}/Windows Kits/10/Debuggers/x64/windbg.exe"
206+
do
207+
if [[ -f "$dbg_path" ]]; then
208+
WINDBG_EXE="$dbg_path"
209+
break
210+
fi
211+
done
212+
fi
213+
214+
if [[ -z "$WINDBG_EXE" ]]; then
215+
echo "ERROR: WinDbg not found."
216+
echo ""
217+
echo "Please install WinDbg from:"
218+
echo " https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/"
219+
echo ""
220+
echo "Or add WinDbg to your PATH."
221+
echo ""
222+
echo "You can manually open the dump file:"
223+
echo " windbg -z \"$DUMP_FILE\""
224+
exit 1
225+
fi
226+
227+
echo ""
228+
echo "======================================"
229+
echo "Opening dump in WinDbg..."
230+
echo "======================================"
231+
echo "Useful commands:"
232+
echo " !analyze -v # automatic analysis"
233+
echo " ~* kb # backtrace of all threads"
234+
echo " ~ # list all threads"
235+
echo " lm # list loaded modules"
236+
echo " q # quit"
237+
echo "======================================"
238+
echo ""
239+
240+
# Convert to Windows path format
241+
DUMP_FILE_WIN=$(cygpath -w "$DUMP_FILE" 2>/dev/null || echo "$DUMP_FILE")
242+
243+
exec "$WINDBG_EXE" -z "$DUMP_FILE_WIN"
244+
;;
245+
esac
246+

hf-xet-diag-linux.sh

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ else
167167
fi
168168

169169
# --- launch target ---
170+
SCRIPT_START_TIME=$(date +%s)
170171
echo "Launching target at $(date -Is) ..." | tee -a "$CONSOLE_LOG"
171172

172173
LAUNCH_ENV=()
@@ -275,5 +276,22 @@ while kill -0 "$TARGET_PID" 2>/dev/null; do
275276
done
276277

277278
echo "Process $TARGET_PID has exited at $(date -Is)." | tee -a "$CONSOLE_LOG"
279+
280+
# --- collect xet log files from this execution ---
281+
HF_HOME="${HF_HOME:-$HOME/.cache/huggingface}"
282+
XET_LOG_DIR="$HF_HOME/xet/logs"
283+
if [[ -d "$XET_LOG_DIR" ]]; then
284+
echo "Collecting xet logs from $XET_LOG_DIR ..." | tee -a "$CONSOLE_LOG"
285+
mkdir -p "$OUTDIR/xet_logs"
286+
287+
# Find log files created during or after script start time using GNU find
288+
find "$XET_LOG_DIR" -name "xet_*.log" -type f -newermt "@$SCRIPT_START_TIME" 2>/dev/null | while read -r logfile; do
289+
cp "$logfile" "$OUTDIR/xet_logs/" 2>/dev/null && \
290+
echo " Copied: $(basename "$logfile")" | tee -a "$CONSOLE_LOG"
291+
done
292+
else
293+
echo "No xet log directory found at $XET_LOG_DIR" | tee -a "$CONSOLE_LOG"
294+
fi
295+
278296
echo "Logs and stacks are in: $OUTDIR"
279297
disown "$LOGGER_BG" 2>/dev/null || true

0 commit comments

Comments
 (0)