Skip to content

Refactor log data processing in dataLoader.ts for improved memory efficiency and error handling #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

FindHao
Copy link
Member

@FindHao FindHao commented Aug 16, 2025

Summary:

  • Introduced a new function, parseLogDataFromStream, to handle line-by-line parsing of NDJSON from a stream, enhancing memory efficiency for large files.
  • Updated processArrayBuffer to utilize the new streaming function for gzip files and improved error handling.
  • Adjusted loadLogDataFromFile to use streaming for files exceeding 100 MB, ensuring better performance with large datasets.

These changes enhance the robustness and efficiency of log data processing in the application.

…iciency and error handling

Summary:
- Removed console logs during NDJSON parsing to clean up output.
- Introduced a new function, `parseLogDataFromStream`, to handle line-by-line parsing of NDJSON from a stream, enhancing memory efficiency for large files.
- Updated `processArrayBuffer` to utilize the new streaming function for gzip files and improved error handling.
- Adjusted `loadLogDataFromFile` to use streaming for files exceeding 100 MB, ensuring better performance with large datasets.

These changes enhance the robustness and efficiency of log data processing in the application.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 16, 2025
@facebook-github-bot
Copy link
Contributor

@FindHao has imported this pull request. If you are a Meta employee, you can view this in D80458467.

Comment on lines +328 to +335
try {
const parsedLine: LogEntry = JSON.parse(line);
if (parsedLine && typeof parsedLine === 'object') {
entries.push(parsedLine);
}
} catch (e) {
console.warn(`Failed to parse line as JSON: ${line.substring(0, 100)}...`);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor nit - you can de-duplicate this with the other place (above) where you handle this logic

@@ -367,6 +400,18 @@ export async function loadLogData(url: string): Promise<LogEntry[]> {
* @returns Promise resolving to an array of LogEntry objects
*/
export function loadLogDataFromFile(file: File): Promise<LogEntry[]> {
// For large files, we should use streaming to avoid memory issues
const LARGE_FILE_THRESHOLD = 100 * 1024 * 1024; // 100 MB
Copy link

@davidberard98 davidberard98 Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share the constant with CodeViewer.tsx?

Edit: nvm, I realize it's a different limit

Copy link

@davidberard98 davidberard98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

A few nits inline - otherwise looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants