Skip to content

Conversation

@ahnopologetic
Copy link
Contributor

@ahnopologetic ahnopologetic commented Jun 1, 2025

Adding a feature that parse react-based codebase as well.

  • Adding react with its types to understand its types.
  • Adding additional signatures other than just identifiers (e.g., there can be custom function calls like mixpanelAnalytics.track which is a property of a custom class).

@ahnopologetic ahnopologetic changed the title feat: add additional ts signatures to detect feat: add additional ts signatures and react-based types to detect Jun 1, 2025
@ahnopologetic ahnopologetic marked this pull request as ready for review June 1, 2025 16:44
@ahnopologetic
Copy link
Contributor Author

Also, noticed CI is not calling new test code. I think it is because of cache.
I'm attaching local test run passed below:

> @flisk/[email protected] test
> node tests

Running tests...

Found test files: analyzeGo.test.js, analyzeJavaScript.test.js, analyzePython.test.js, analyzeRuby.test.js, analyzeTypeScript.test.js, cli.test.js, generateDescriptions.test.js, schema.test.js, utils.test.js 

▶ analyzeGoFile
  �[32m✔ should correctly analyze Go file with multiple tracking providers �[90m(25.356417ms)�[39m�[39m
  �[32m✔ should handle files without tracking events �[90m(1.135708ms)�[39m�[39m
  �[32m✔ should handle missing custom function �[90m(2.703083ms)�[39m�[39m
  �[32m✔ should handle nested property types correctly �[90m(3.526458ms)�[39m�[39m
  �[32m✔ should match expected tracking-schema.yaml output �[90m(3.37375ms)�[39m�[39m
�[32m✔ analyzeGoFile �[90m(37.720084ms)�[39m�[39m
▶ analyzeJsFile
  �[32m✔ should correctly analyze JavaScript file with multiple tracking providers �[90m(19.9865ms)�[39m�[39m
  �[32m✔ should handle files without tracking events �[90m(1.631458ms)�[39m�[39m
  �[32m✔ should handle missing custom function �[90m(1.786292ms)�[39m�[39m
  �[32m✔ should handle nested property types correctly �[90m(2.335125ms)�[39m�[39m
  �[32m✔ should detect array types correctly �[90m(5.26725ms)�[39m�[39m
  �[32m✔ should handle different function contexts correctly �[90m(3.017625ms)�[39m�[39m
  �[32m✔ should handle case variations in provider names �[90m(2.627ms)�[39m�[39m
  �[32m✔ should exclude action field from Snowplow properties �[90m(2.409833ms)�[39m�[39m
  �[32m✔ should handle mParticle three-parameter format �[90m(2.20675ms)�[39m�[39m
�[32m✔ analyzeJsFile �[90m(43.532791ms)�[39m�[39m
▶ analyzePythonFile
  �[32m✔ should correctly analyze Python file with multiple tracking providers �[90m(1481.982125ms)�[39m�[39m
  �[32m✔ should handle files without tracking events �[90m(33.582292ms)�[39m�[39m
  �[32m✔ should handle missing custom function �[90m(28.709833ms)�[39m�[39m
  �[32m✔ should handle nested property types correctly �[90m(28.210542ms)�[39m�[39m
  �[32m✔ should match expected tracking-schema.yaml output �[90m(27.557208ms)�[39m�[39m
  �[32m✔ should handle type annotations correctly �[90m(31.211083ms)�[39m�[39m
�[32m✔ analyzePythonFile �[90m(1632.190333ms)�[39m�[39m
▶ analyzeRubyFile
  �[32m✔ should correctly analyze Ruby file with multiple tracking providers �[90m(84.225792ms)�[39m�[39m
  �[32m✔ should handle files without tracking events �[90m(0.64825ms)�[39m�[39m
  �[32m✔ should handle missing custom function �[90m(4.999375ms)�[39m�[39m
  �[32m✔ should handle nested property types correctly �[90m(4.324792ms)�[39m�[39m
  �[32m✔ should detect tracking in modules �[90m(4.76525ms)�[39m�[39m
  �[32m✔ should handle all property types correctly �[90m(5.519083ms)�[39m�[39m
  �[32m✔ should correctly identify function names in different contexts �[90m(4.358708ms)�[39m�[39m
  �[32m✔ should correctly differentiate between Segment and Rudderstack �[90m(4.453166ms)�[39m�[39m
�[32m✔ analyzeRubyFile �[90m(114.887583ms)�[39m�[39m
▶ analyzeTsFile
  �[32m✔ should correctly analyze TypeScript file with multiple tracking providers �[90m(466.154875ms)�[39m�[39m
  �[32m✔ should handle files without tracking events �[90m(248.734084ms)�[39m�[39m
  �[32m✔ should handle missing custom function �[90m(218.490958ms)�[39m�[39m
  �[32m✔ should handle nested property types correctly �[90m(245.959959ms)�[39m�[39m
  �[32m✔ should detect and expand interface types correctly �[90m(214.954041ms)�[39m�[39m
  �[32m✔ should handle shorthand property assignments correctly �[90m(169.113542ms)�[39m�[39m
  �[32m✔ should handle variable references correctly �[90m(174.2885ms)�[39m�[39m
  �[32m✔ should exclude action field from Snowplow properties �[90m(193.445125ms)�[39m�[39m
  �[32m✔ should handle mParticle three-parameter format �[90m(154.477667ms)�[39m�[39m
  �[32m✔ should handle readonly array types correctly �[90m(158.516834ms)�[39m�[39m
  �[32m✔ should handle exported vs non-exported interfaces �[90m(164.075833ms)�[39m�[39m
  �[32m✔ should correctly analyze React TypeScript file with multiple tracking providers �[90m(222.80025ms)�[39m�[39m
�[32m✔ analyzeTsFile �[90m(2632.103375ms)�[39m�[39m
▶ CLI End-to-End Tests
  �[32m✔ should analyze Go files and generate a tracking schema �[90m(673.16975ms)�[39m�[39m
  �[32m✔ should analyze JavaScript files and generate a tracking schema �[90m(483.665666ms)�[39m�[39m
  �[32m✔ should analyze TypeScript files and generate a tracking schema �[90m(969.801792ms)�[39m�[39m
  �[32m✔ should analyze Python files and generate a tracking schema �[90m(1572.444084ms)�[39m�[39m
  �[32m✔ should analyze Ruby files and generate a tracking schema �[90m(552.292958ms)�[39m�[39m
  �[32m✔ should handle empty files and generate an empty tracking schema �[90m(3613.406125ms)�[39m�[39m
  �[32m✔ should analyze all languages together and generate a combined tracking schema �[90m(2092.29275ms)�[39m�[39m
�[32m✔ CLI End-to-End Tests �[90m(9961.453667ms)�[39m�[39m
Running 1 prompts in parallel...
Running 1 prompts in parallel...
Running 2 prompts in parallel...
Running 1 prompts in parallel...
Running 1 prompts in parallel...
Running 1 prompts in parallel...
Error during LLM response parsing: LLM API error
Failed to get description for event: test_event
Running 1 prompts in parallel...
▶ generateDescriptions Tests
  �[32m✔ should generate descriptions for simple event �[90m(2.523625ms)�[39m�[39m
  �[32m✔ should handle nested properties �[90m(0.678625ms)�[39m�[39m
  �[32m✔ should handle multiple events �[90m(0.542ms)�[39m�[39m
  �[32m✔ should handle events with multiple implementations �[90m(0.351334ms)�[39m�[39m
  �[32m✔ should handle events with no properties �[90m(0.269167ms)�[39m�[39m
  �[32m✔ should handle LLM errors gracefully �[90m(0.467542ms)�[39m�[39m
  �[32m✔ should maintain original event structure �[90m(0.628583ms)�[39m�[39m
�[32m✔ generateDescriptions Tests �[90m(6.871459ms)�[39m�[39m
▶ Schema Validation Tests
  �[32m✔ should generate YAML that conforms to the JSON schema specification �[90m(2540.1225ms)�[39m�[39m
  �[32m✔ should validate schema compliance for each language individually �[90m(3800.193125ms)�[39m�[39m
  �[32m✔ should validate required fields are present �[90m(2117.339625ms)�[39m�[39m
  �[32m✔ should validate enum constraints �[90m(2062.107792ms)�[39m�[39m
�[32m✔ Schema Validation Tests �[90m(10521.642125ms)�[39m�[39m
▶ fileProcessor
  �[32m✔ should get all files in directory recursively �[90m(1.24725ms)�[39m�[39m
  �[32m✔ should exclude hidden files and directories �[90m(0.564209ms)�[39m�[39m
  �[32m✔ should exclude common directories �[90m(1.756208ms)�[39m�[39m
  �[32m✔ should handle ENOENT errors gracefully �[90m(0.613708ms)�[39m�[39m
  �[32m✔ should return empty array for empty directory �[90m(0.546958ms)�[39m�[39m
�[32m✔ fileProcessor �[90m(10.446959ms)�[39m�[39m
▶ repoDetails
  �[32m✔ should get repository details �[90m(6.757333ms)�[39m�[39m
  �[32m✔ should override with custom source details �[90m(2.414917ms)�[39m�[39m
  �[32m✔ should handle missing git repository �[90m(56.842875ms)�[39m�[39m
Tracking schema YAML file generated: /Users/taeahn/devs/personal/analyze-tracking/tests/temp-yamlgen/test-schema.yaml
Tracking schema YAML file generated: /Users/taeahn/devs/personal/analyze-tracking/tests/temp-yamlgen/formatted-schema.yaml
Tracking schema YAML file generated: /Users/taeahn/devs/personal/analyze-tracking/tests/temp-yamlgen/empty-schema.yaml
Tracking schema YAML file generated: /Users/taeahn/devs/personal/analyze-tracking/tests/temp-yamlgen/special-chars-schema.yaml
  �[32m✔ should fall back to execSync when isomorphic-git fails �[90m(176.108084ms)�[39m�[39m
�[32m✔ repoDetails �[90m(326.221667ms)�[39m�[39m
▶ yamlGenerator
  �[32m✔ should generate YAML schema file �[90m(8.518458ms)�[39m�[39m
  �[32m✔ should generate YAML with proper formatting �[90m(1.016042ms)�[39m�[39m
  �[32m✔ should handle empty events �[90m(0.860208ms)�[39m�[39m
  �[32m✔ should handle special characters in YAML �[90m(0.9615ms)�[39m�[39m
�[32m✔ yamlGenerator �[90m(12.50375ms)�[39m�[39m
�[34mℹ tests 71�[39m
�[34mℹ suites 11�[39m
�[34mℹ pass 71�[39m
�[34mℹ fail 0�[39m
�[34mℹ cancelled 0�[39m
�[34mℹ skipped 0�[39m
�[34mℹ todo 0�[39m
�[34mℹ duration_ms 10662.805875�[39m

@ahnopologetic
Copy link
Contributor Author

@skarim lmk what you think!

@skarim
Copy link
Contributor

skarim commented Jun 2, 2025

@ahnopologetic thanks for contributing! need to do a deeper read, but looks great at first glance!

curious, did the existing version not work with react for you? i did some spot tests on react codebases (although not in the test suite) and did see events get extracted. is there a particular format you are looking to parse?

@skarim skarim added enhancement New feature or request good first issue Good for newcomers labels Jun 2, 2025
@ahnopologetic
Copy link
Contributor Author

@skarim Hi. Yes, I saw some of them works, but it does not extract some simple usecases like this:

      mixpanelAnalytics.trackUserScenario('sign_up: completed', {

And the mixpanelAnalytics is declared like this

export class MixpanelAnalytics {
  app: MixpanelModule;
  debug: boolean;

  constructor(app: MixpanelModule, debug: boolean = false) {
    this.app = app;
    this.debug = debug;
  }
  trackUserScenario(eventName: string, properties: Record<string, any> = {}) {
    if (properties?.is_user_scenario === false) {
      console.warn(
        'MixpanelAnalytics.trackUserScenario(): is_user_scenario is false, use track instead',
      );
    }

    this.app.track(eventName, {
      ...properties,
      is_user_scenario: true,
    });
  }
...

I tried to pull them up using customFunction option, but it turns out customFunction only checks if it is identifier (typescript AST), not other usecases like class property/method, and so forth. Thus, I added this part in isCustomFunction function:

const canBeCustomFunction = ts.isIdentifier(node.expression) ||
    ts.isPropertyAccessExpression(node.expression) ||
    ts.isCallExpression(node.expression) || // For chained calls like getTracker().track()
    ts.isElementAccessExpression(node.expression) || // For array/object access like trackers['analytics'].track()
    (ts.isPropertyAccessExpression(node.expression?.expression) && ts.isThisExpression(node.expression.expression.expression)); // For class methods like this.analytics.track()

I think this issue is specific to React usecases because they are usually wrapped by custom arrow functions, useCallbacks, and so on. Hope it helps you have a better understanding for this PR.

Copy link
Contributor

@skarim skarim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahnopologetic left some comments for a few areas that could be improved. open to your thoughts either way. lemme know when you have a chance to review/update and i'll get this into the next release!

@skarim skarim added this to the 0.7.4 milestone Jun 4, 2025
@skarim skarim removed the good first issue Good for newcomers label Jun 4, 2025
@ahnopologetic ahnopologetic requested a review from skarim June 5, 2025 17:49
@ahnopologetic
Copy link
Contributor Author

@skarim Just to make sure, resolved your all RCs and passed the tests locally. 👍

Copy link
Contributor

@skarim skarim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ✅
thanks @ahnopologetic!

@skarim skarim merged commit 9cb48f9 into fliskdata:main Jun 5, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants