docs: add schema inference note block with type compatibility warning (FIR-50448) (#1104)

devin-ai-integration[bot] · asyafire · web-flow · commit 55bdc8376dea · 2025-11-26T14:32:41.000-08:00
Co-authored-by: Devin AI &lt;158243242+devin-ai-integration[bot]@users.noreply.github.com&gt;
Co-authored-by: asya@firebolt.io &lt;asya@firebolt.io&gt;
diff --git a/docs-mdx/reference-sql/commands/data-management/copy-from.mdx b/docs-mdx/reference-sql/commands/data-management/copy-from.mdx
@@ -255,7 +255,9 @@ You can use the automatic schema discovery feature in `COPY FROM` to handle even
 * **Parquet files** - Firebolt automatically reads metadata in Parquet files to create corresponding target tables.
 * **CSV files** - Firebolt infers column types based on the data content itself, which can streamline the initial data loading process significantly. Use `WITH HEADER=TRUE` if your CSV file contains column names in the first line.
 
-When loading multiple files, the schema is inferred from the most recently modified file.
+<Note>
+**When loading multiple files, Firebolt infers the schema from the most recently modified file.** The remaining files must have compatible data types. If types vary between files (e.g., a column contains integers in one file but doubles in another, or is numeric in one file but text in another), the inferred schema may not match all files and thus cause data type errors or query failures. In such cases, we recommend defining an explicit schema using either [external tables](/reference-sql/commands/data-definition/create-external-table) or [`COPY FROM`](/reference-sql/commands/data-management/copy-from) into existing tables.
+</Note>
 
 Automatic schema discovery operates on a "best effort" basis, and attempts to balance accuracy with practical usability, but it may not always be error-free.
 
diff --git a/docs-mdx/reference-sql/functions-reference/table-valued/read_avro.mdx b/docs-mdx/reference-sql/functions-reference/table-valued/read_avro.mdx
@@ -53,7 +53,9 @@ READ_AVRO (
 
 The result is a table with data from the Avro files. Columns are read and parsed using their inferred data types based on the Avro schema. All data types are inferred as nullable.
 
-When reading multiple files, the schema is inferred from the most recently modified file.
+<Note>
+**When loading multiple files, Firebolt infers the schema from the most recently modified file.** The remaining files must have compatible data types. If types vary between files (e.g., a column contains integers in one file but doubles in another, or is numeric in one file but text in another), the inferred schema may not match all files and thus cause data type errors or query failures. In such cases, we recommend defining an explicit schema using either [external tables](/reference-sql/commands/data-definition/create-external-table) or [`COPY FROM`](/reference-sql/commands/data-management/copy-from) into existing tables.
+</Note>
 
 ## Avro Data Type Mapping
 
@@ -226,4 +228,4 @@ The pattern will recursively match files in all subdirectories. For example:
 ```sql
 SELECT * FROM READ_AVRO('s3://firebolt-publishing-public/help_center_assets/firebolt_sample_avro/*.avro')
 ```
-will read all Avro files in the `firebolt_sample_avro` directory and all of its subdirectories. 
+will read all Avro files in the `firebolt_sample_avro` directory and all of its subdirectories.      
diff --git a/docs-mdx/reference-sql/functions-reference/table-valued/read_csv.mdx b/docs-mdx/reference-sql/functions-reference/table-valued/read_csv.mdx
@@ -72,7 +72,9 @@ READ_CSV (
 
 The result is a table with the data from the CSV file. Each cell is read as a `TEXT`.
 
-When reading multiple files, the schema is inferred from the most recently modified file.
+<Note>
+**When loading multiple files, Firebolt infers the schema from the most recently modified file.** The remaining files must have compatible data types. If types vary between files (e.g., a column contains integers in one file but doubles in another, or is numeric in one file but text in another), the inferred schema may not match all files and thus cause data type errors or query failures. In such cases, we recommend defining an explicit schema using either [external tables](/reference-sql/commands/data-definition/create-external-table) or [`COPY FROM`](/reference-sql/commands/data-management/copy-from) into existing tables.
+</Note>
 
 ## Examples
 
diff --git a/docs-mdx/reference-sql/functions-reference/table-valued/read_parquet.mdx b/docs-mdx/reference-sql/functions-reference/table-valued/read_parquet.mdx
@@ -58,7 +58,9 @@ READ_PARQUET (
 
 The result is a table with data from the Parquet files. Columns are read and parsed using their inferred data types.
 
-When reading multiple files, the schema is inferred from the most recently modified file.
+<Note>
+**When loading multiple files, Firebolt infers the schema from the most recently modified file.** The remaining files must have compatible data types. If types vary between files (e.g., a column contains integers in one file but doubles in another, or is numeric in one file but text in another), the inferred schema may not match all files and thus cause data type errors or query failures. In such cases, we recommend defining an explicit schema using either [external tables](/reference-sql/commands/data-definition/create-external-table) or [`COPY FROM`](/reference-sql/commands/data-management/copy-from) into existing tables.
+</Note>
 
 ## Best practice