Skip to content

Conversation

@hjxilinx
Copy link
Contributor

Description

feat: support batch analysis to generate the envelop for all the time series data.

Issue(s)

https://project.feishu.cn/taosdata_td/feature/detail/6792189109

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hjxilinx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new feature to the analytics platform: batch analysis for time series data. It enables the generation of envelopes for multiple time series segments, providing robust data preprocessing and golden batch creation capabilities. This enhancement allows for more sophisticated and automated analysis workflows, improving the platform's ability to handle and interpret complex time-series datasets.

Highlights

  • New Batch Processing Module: Introduced a comprehensive batch processing module (batch.py) for time series data, including functionalities like Hampel filtering, derivative checking, progress normalization, Savitzky-Golay smoothing, and golden batch envelope generation.
  • API Endpoint for Batch Analysis: Added a new RESTful API endpoint /tool/batch to the Flask application, allowing users to perform batch analysis on time series data with configurable parameters.
  • Enhanced Error Handling and Utility Functions: Improved error handling in anomaly detection by importing a specific exception type and refactored a utility function (do_check_before_exec) into a more modular do_initial_check.
  • Extensive Unit Tests: Included a dedicated test file (batch_process_test.py) with a wide range of unit tests for the new batch processing module, covering boundary conditions, exception handling, special scenarios, data quality, and configuration parameters.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • tools/tdgpt/taosanalytics/algo/anomaly.py
    • Imported failed_load_model_except for more specific error handling.
    • Modified error handling logic when a specified analysis service is not found, logging the error and raising a specific exception instead of falling back to 'ksigma'.
  • tools/tdgpt/taosanalytics/algo/tool/batch.py
    • Added a new file containing functions for time series batch processing.
    • Implemented hampel_filter for outlier detection.
    • Developed derivative_check to filter data based on rate of change.
    • Created normalize_progress for time series normalization to a target length.
    • Included plotting helper functions (plot_compare, plot_hempel_filter, plot_sg_smoothing) for visualization.
    • Implemented split_by_windows to segment time series data.
    • Designed build_golden to create golden batch envelopes using mean/std or median/percentile methods.
    • Provided do_batch_process as the main pipeline for batch analysis.
    • Defined get_default_config and update_config for managing batch processing parameters.
  • tools/tdgpt/taosanalytics/app.py
    • Imported numpy for numerical operations.
    • Imported do_batch_process and update_config from the new batch module.
    • Imported do_initial_check from taosanalytics.util.
    • Added a new /tool/batch POST endpoint to handle batch processing requests, parsing input data, windows, and configuration, and returning the calculated golden batch envelope.
  • tools/tdgpt/taosanalytics/misc/init.py
    • Added a placeholder batch function.
  • tools/tdgpt/taosanalytics/misc/input_data.csv
    • Renamed file from tools/tdgpt/taosanalytics/misc/input_data.csv to tools/tdgpt/taosanalytics/data/input_data.csv.
  • tools/tdgpt/taosanalytics/test/batch_process_test.py
    • Added a new file containing extensive unit tests for the batch module.
    • Included tests for boundary conditions (empty input, single point, threshold points, empty/overlapping/adjacent windows).
    • Implemented tests for exception handling (mismatched lengths, NaN/Inf values, non-monotonic time, invalid window/config formats, invalid filter parameters).
    • Covered special scenarios (all windows filtered, extreme outliers, constant values, high-frequency noise, step functions, large datasets, cubic interpolation with few points, zero derivative rate limit).
    • Added tests for data quality (sparse data, irregular sampling, duplicate timestamps, missing data segments).
    • Tested configuration parameters, including different golden batch methods and plot disabling.
  • tools/tdgpt/taosanalytics/test/restful_api_test.py
    • Removed an unused import of numpy.
    • Added a new test case test_batch_process to verify the functionality of the /tool/batch API endpoint.
  • tools/tdgpt/taosanalytics/util.py
    • Refactored do_check_before_exec into two functions: do_initial_check (which performs initial request validation) and a modified do_check_before_exec (which now calls do_initial_check and then proceeds with payload validation).
Activity
  • The author hjxilinx created this pull request to introduce batch analysis functionality.
  • A new Python module for batch processing time series data was added.
  • The Flask application was updated to expose this new functionality via a dedicated API endpoint.
  • Comprehensive unit tests were developed for the new batch processing logic and its API integration.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@hjxilinx hjxilinx changed the title feat: generate the envelop for all the time series data. feat: generate the envelop for the given time series data. Feb 10, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new batch processing feature for time series data, including a new API endpoint and its underlying processing pipeline, along with comprehensive unit tests. However, a security analysis identified two high-severity path traversal vulnerabilities, which could lead to arbitrary file writes and potential denial of service or remote code execution, and one medium-severity log injection vulnerability, allowing log entry forgery. Additionally, the review suggests improvements for correctness by handling edge cases (like division by zero), fixing bugs in logging and error messages, and enhancing maintainability through consistent comments and more robust configuration update logic.

@zitsen zitsen merged commit de9a302 into main Feb 11, 2026
10 of 11 checks passed
@zitsen zitsen deleted the feat/batch branch February 11, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants