docs: clarify nullability for builtin function argument types (#843)

georgeh0 · web-flow · commit 0bae4393f5bc · 2025-08-02T23:06:22.000-07:00
diff --git a/docs/docs/core/data_types.mdx b/docs/docs/core/data_types.mdx
@@ -204,9 +204,10 @@ Currently, the following types are key types
 
 CocoIndex supports *Null* values. A *Null* value represents the absence of data or an unknown value, distinct from empty strings, zero numbers, or false boolean values.
 
-### Nullable Data
+### Nullable Type
 
-For any data (e.g. a field of a *Struct*, an argument or return value of a CocoIndex function), if it is nullable, it means its value can be *Null*. 
+For any data (e.g. a field of a *Struct*, an argument or return value of a CocoIndex function), if it is nullable, it means its value can be *Null*.
+We use a `?` suffix to indicate a nullable type, e.g. *Str?*, *Person?*.
 
 In Python, *Null* is represented as `None`, so a nullable type can be represented by `T | None` or `typing.Optional[T]`.
 
diff --git a/docs/docs/ops/functions.md b/docs/docs/ops/functions.md
@@ -9,12 +9,12 @@ description: CocoIndex Built-in Functions
 
 `ParseJson` parses a given text to JSON.
 
-The spec takes the following fields:
+Input data:
 
-*   `text` (`str`): The source text to parse.
-*   `language` (`str`, optional): The language of the source text.  Only `json` is supported now.  Default to `json`.
+*   `text` (*Str*): The source text to parse.
+*   `language` (*Str?*, default: `"json"`): The language of the source text.  Only `json` is supported now.
 
-Return: *Json*
+Return: *Json*, the parsed JSON object.
 
 ## SplitRecursively
 
@@ -37,7 +37,7 @@ Input data:
 
 *   `text` (*Str*): The text to split.
 *   `chunk_size` (*Int64*): The maximum size of each chunk, in bytes.
-*   `min_chunk_size` (*Int64*, optional): The minimum size of each chunk, in bytes. If not provided, default to `chunk_size / 2`.
+*   `min_chunk_size` (*Int64*, default: `chunk_size / 2`): The minimum size of each chunk, in bytes.
 
     :::note
 
@@ -48,8 +48,8 @@ Input data:
 
     :::
 
-*   `chunk_overlap` (*Int64*, optional): The maximum overlap size between adjacent chunks, in bytes.
-*   `language` (*Str*, optional): The language of the document.
+*   `chunk_overlap` (*Int64?*, default: *Null*): The maximum overlap size between adjacent chunks, in bytes.
+*   `language` (*Str*, default: `""`): The language of the document.
     Can be a language name (e.g. `Python`, `Javascript`, `Markdown`) or a file extension (e.g. `.py`, `.js`, `.md`).
 
 
@@ -61,7 +61,7 @@ Input data:
         *   `custom_languages` in the spec, against the `language_name` or `aliases` field of each entry.
         *   Builtin languages (see [Supported Languages](#supported-languages) section below), against the language, aliases or file extensions of each entry.
 
-        All matches are in a case-insensitive manner. If the value of `language` is null, it'll be treated as empty string.
+        All matches are in a case-insensitive manner.
 
     *   If no match is found, the input will be treated as plain text.
 
@@ -185,7 +185,7 @@ Not all LLM APIs support text embedding. See the [LLM API Types table](/docs/ai/
 
 Input data:
 
-*   `text` (*Str*, required): The text to embed.
+*   `text` (*Str*): The text to embed.
 
 Return: *Vector[Float32, N]*, where *N* is the dimension of the embedding vector determined by the model.