Invalid JSON generated during the categorization process

I encountered an issue where the JSON output gets truncated partway when running sensemaking on a file that contains long Japanese text. Here is a snippet of the truncated output:


```
[
  {
    "id": "1870821933711040512",
    "topics": [
      {
        "name": "M1 Grand Prix",
        "subtopics": [
          {
            "name": "General Appreciation"
          }
        ]
      }
    ]
  },
  {
    "id": "1870821933648163328",
    "topics": [
      {
        "name": "M1 Grand Prix",
        "subtopics": [
          {
            "name": "Performance Analysis"
          }
        ]
      }
    ]
  },
...(many rows)..
  {
    "id": "1870821922046710016",
    "topics": [
      {
        "name":
```

It appears that `categorizationBatchSize` is currently fixed at 100, which might be causing the model to exceed its output token limit, especially for languages like Japanese that consume more tokens or for comments that are very long. 

**Proposed Solution**
It would be helpful if `categorizationBatchSize` could be passed as a parameter upon invocation, so users can adjust it according to their language or the size of their dataset. This way, we can avoid hitting the model’s output token limit and prevent truncated JSON outputs.

Would it be possible to make `categorizationBatchSize` configurable? If you have any suggestions or alternative approaches, I'd be happy to hear them. Thank you in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Invalid JSON generated during the categorization process #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Invalid JSON generated during the categorization process #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions