diff --git a/.gitignore b/.gitignore index 1495fc3948..9667fb25fc 100644 --- a/.gitignore +++ b/.gitignore @@ -11,6 +11,7 @@ python/pyfory/__pycache__/ python/dist python/build python/pyfory.egg-info +**/*.egg-info cython_debug **/*.prof **/*.pyc @@ -34,6 +35,7 @@ scala/.idea bazel-*/ bazel-fory/ bazel-fory/** +**/generated java/**/generated javascript/**/dist/ javascript/**/node_modules/ @@ -93,4 +95,4 @@ examples/cpp/cmake_example/build **/benchmark_results.json **/benchmark_report.md **/benchmark_*.png -**/results/ \ No newline at end of file +**/results/ diff --git a/compiler/README.md b/compiler/README.md new file mode 100644 index 0000000000..e1bd06cc4b --- /dev/null +++ b/compiler/README.md @@ -0,0 +1,430 @@ +# Fory Definition Language (FDL) Compiler + +The FDL compiler generates cross-language serialization code from schema definitions. It enables type-safe cross-language data exchange by generating native data structures with Fory serialization support for multiple programming languages. + +## Features + +- **Multi-language code generation**: Java, Python, Go, Rust, C++ +- **Rich type system**: Primitives, enums, messages, lists, maps +- **Cross-language serialization**: Generated code works seamlessly with Apache Fory +- **Type ID and namespace support**: Both numeric IDs and name-based type registration +- **Field modifiers**: Optional fields, reference tracking, repeated fields +- **File imports**: Modular schemas with import support + +## Documentation + +For comprehensive documentation, see the [FDL Schema Guide](../docs/schema/index.md): + +- [FDL Syntax Reference](../docs/schema/fdl-syntax.md) - Complete language syntax and grammar +- [Type System](../docs/schema/type-system.md) - Primitive types, collections, and language mappings +- [Compiler Guide](../docs/schema/compiler-guide.md) - CLI options and build integration +- [Generated Code](../docs/schema/generated-code.md) - Output format for each target language +- [Protocol Buffers vs FDL](../docs/schema/proto-vs-fdl.md) - Feature comparison and migration guide + +## Installation + +```bash +cd compiler +pip install -e . +``` + +## Quick Start + +### 1. Define Your Schema + +Create a `.fdl` file: + +```fdl +package demo; + +enum Color [id=101] { + GREEN = 0; + RED = 1; + BLUE = 2; +} + +message Dog [id=102] { + optional string name = 1; + int32 age = 2; +} + +message Cat [id=103] { + ref Dog friend = 1; + optional string name = 2; + repeated string tags = 3; + map scores = 4; + int32 lives = 5; +} +``` + +### 2. Compile + +```bash +# Generate for all languages +fory compile schema.fdl --output ./generated + +# Generate for specific languages +fory compile schema.fdl --lang java,python --output ./generated + +# Override package name +fory compile schema.fdl --package myapp.models --output ./generated + +# Language-specific output directories (protoc-style) +fory compile schema.fdl --java_out=./src/main/java --python_out=./python/src + +# Combine with other options +fory compile schema.fdl --java_out=./gen --go_out=./gen/go -I ./proto +``` + +### 3. Use Generated Code + +**Java:** + +```java +import demo.*; +import org.apache.fory.Fory; + +Fory fory = Fory.builder().build(); +DemoForyRegistration.register(fory); + +Cat cat = new Cat(); +cat.setName("Whiskers"); +cat.setLives(9); +byte[] bytes = fory.serialize(cat); +``` + +**Python:** + +```python +import pyfory +from demo import Cat, register_demo_types + +fory = pyfory.Fory() +register_demo_types(fory) + +cat = Cat(name="Whiskers", lives=9) +data = fory.serialize(cat) +``` + +## FDL Syntax + +### Package Declaration + +```fdl +package com.example.models; +``` + +### Imports + +Import types from other FDL files: + +```fdl +import "common/types.fdl"; +import "models/address.fdl"; +``` + +Imports are resolved relative to the importing file. All types from imported files become available for use in the current file. + +**Example:** + +```fdl +// common.fdl +package common; + +message Address [id=100] { + string street = 1; + string city = 2; +} +``` + +```fdl +// user.fdl +package user; +import "common.fdl"; + +message User [id=101] { + string name = 1; + Address address = 2; // Uses imported type +} +``` + +### Enum Definition + +```fdl +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + INACTIVE = 2; +} +``` + +### Message Definition + +```fdl +message User [id=101] { + string name = 1; + int32 age = 2; + optional string email = 3; +} +``` + +### Type Options + +Types can have options specified in brackets after the name: + +```fdl +message User [id=101] { ... } // Registered with type ID 101 +message User [id=101, deprecated=true] { ... } // Multiple options +``` + +Types without `[id=...]` use namespace-based registration: + +```fdl +message Config { ... } // Registered as "package.Config" +``` + +### Primitive Types + +| FDL Type | Java | Python | Go | Rust | C++ | +| ----------- | ----------- | -------------------- | ----------- | ----------------------- | ---------------------- | +| `bool` | `boolean` | `bool` | `bool` | `bool` | `bool` | +| `int8` | `byte` | `pyfory.Int8Type` | `int8` | `i8` | `int8_t` | +| `int16` | `short` | `pyfory.Int16Type` | `int16` | `i16` | `int16_t` | +| `int32` | `int` | `pyfory.Int32Type` | `int32` | `i32` | `int32_t` | +| `int64` | `long` | `int` | `int64` | `i64` | `int64_t` | +| `float32` | `float` | `pyfory.Float32Type` | `float32` | `f32` | `float` | +| `float64` | `double` | `float` | `float64` | `f64` | `double` | +| `string` | `String` | `str` | `string` | `String` | `std::string` | +| `bytes` | `byte[]` | `bytes` | `[]byte` | `Vec` | `std::vector` | +| `date` | `LocalDate` | `datetime.date` | `time.Time` | `chrono::NaiveDate` | `fory::LocalDate` | +| `timestamp` | `Instant` | `datetime.datetime` | `time.Time` | `chrono::NaiveDateTime` | `fory::Timestamp` | + +### Collection Types + +```fdl +repeated string tags = 1; // List +map scores = 2; // Map +``` + +### Field Modifiers + +- **`optional`**: Field can be null/None +- **`ref`**: Enable reference tracking for shared/circular references +- **`repeated`**: Field is a list/array + +```fdl +message Example { + optional string nullable_field = 1; + ref OtherMessage shared_ref = 2; + repeated int32 numbers = 3; +} +``` + +### Fory Extension Options + +FDL supports protobuf-style extension options using the `(fory)` prefix: + +**File-level options:** + +```fdl +option (fory).use_record_for_java_message = true; +option (fory).polymorphism = true; +``` + +**Message/Enum options:** + +```fdl +message MyMessage { + option (fory).id = 100; + option (fory).evolving = false; + option (fory).use_record_for_java = true; + string name = 1; +} + +enum Status { + option (fory).id = 101; + UNKNOWN = 0; + ACTIVE = 1; +} +``` + +**Field options:** + +```fdl +message Example { + MyType friend = 1 [(fory).ref = true]; + string nickname = 2 [(fory).nullable = true]; + MyType data = 3 [(fory).ref = true, (fory).nullable = true]; +} +``` + +See `extension/fory_options.proto` for the complete list of available options. + +## Architecture + +``` +fory_compiler/ +├── __init__.py # Package exports +├── __main__.py # Module entry point +├── cli.py # Command-line interface +├── parser/ +│ ├── ast.py # AST node definitions +│ ├── lexer.py # Hand-written tokenizer +│ └── parser.py # Recursive descent parser +└── generators/ + ├── base.py # Base generator class + ├── java.py # Java POJO generator + ├── python.py # Python dataclass generator + ├── go.py # Go struct generator + ├── rust.py # Rust struct generator + └── cpp.py # C++ struct generator +``` + +### Parser + +The parser is a hand-written recursive descent parser that produces an AST: + +- **Lexer** (`lexer.py`): Tokenizes FDL source into tokens (keywords, identifiers, punctuation) +- **AST** (`ast.py`): Defines node types - `Schema`, `Message`, `Enum`, `Field`, `FieldType` +- **Parser** (`parser.py`): Builds AST from token stream with validation + +### Generators + +Each generator extends `BaseGenerator` and implements: + +- `generate()`: Returns list of `GeneratedFile` objects +- `generate_type()`: Converts FDL types to target language types +- Language-specific registration helpers + +## Generated Output + +### Java + +Generates POJOs with: + +- Private fields with getters/setters +- `@ForyField` annotations for nullable/ref fields +- Registration helper class + +```java +public class Cat { + @ForyField(trackingRef = true) + private Dog friend; + + @ForyField(nullable = true) + private String name; + + private List tags; + // ... +} +``` + +### Python + +Generates dataclasses with: + +- Type hints +- Default values +- Registration function + +```python +@dataclass +class Cat: + friend: Optional[Dog] = None + name: Optional[str] = None + tags: List[str] = None +``` + +### Go + +Generates structs with: + +- Fory struct tags +- Pointer types for nullable fields +- Registration function with error handling + +```go +type Cat struct { + Friend *Dog `fory:"trackRef"` + Name *string `fory:"nullable"` + Tags []string +} +``` + +### Rust + +Generates structs with: + +- `#[derive(ForyObject)]` macro +- `#[fory(...)]` field attributes +- `#[tag(...)]` for namespace registration + +```rust +#[derive(ForyObject, Debug, Clone, PartialEq, Default)] +pub struct Cat { + pub friend: Rc, + #[fory(nullable = true)] + pub name: Option, + pub tags: Vec, +} +``` + +### C++ + +Generates structs with: + +- `FORY_STRUCT` macro for serialization +- `std::optional` for nullable fields +- `std::shared_ptr` for ref fields + +```cpp +struct Cat { + std::shared_ptr friend; + std::optional name; + std::vector tags; +}; +FORY_STRUCT(Cat, friend, name, tags, scores, lives); +``` + +## CLI Reference + +``` +fory compile [OPTIONS] FILES... + +Arguments: + FILES FDL files to compile + +Options: + --lang TEXT Target languages (java,python,cpp,rust,go or "all") + Default: all + --output, -o PATH Output directory + Default: ./generated + --package TEXT Override package name from FDL file + --help Show help message +``` + +## Examples + +See the `examples/` directory for sample FDL files and generated output. + +```bash +# Compile the demo schema +fory compile examples/demo.fdl --output examples/generated +``` + +## Development + +```bash +# Install in development mode +pip install -e . + +# Run the compiler +python -m fory_compiler compile examples/demo.fdl + +# Or use the installed command +fory compile examples/demo.fdl +``` + +## License + +Apache License 2.0 diff --git a/compiler/examples/common.fdl b/compiler/examples/common.fdl new file mode 100644 index 0000000000..4f8951d715 --- /dev/null +++ b/compiler/examples/common.fdl @@ -0,0 +1,42 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// Common types that can be imported by other FDL files +package common; + +// Status enum used across multiple domains +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; + CANCELLED = 3; +} + +// Address type for shared use +message Address [id=101] { + string street = 1; + string city = 2; + string state = 3; + string country = 4; + optional string postal_code = 5; +} + +// Contact information +message Contact [id=102] { + optional string email = 1; + optional string phone = 2; +} diff --git a/compiler/examples/demo.fdl b/compiler/examples/demo.fdl new file mode 100644 index 0000000000..98e9b4b6c2 --- /dev/null +++ b/compiler/examples/demo.fdl @@ -0,0 +1,42 @@ +// Demo FDL file for testing the compiler + +package demo; + +enum Color [id=101] { + GREEN = 0; + RED = 1; + BLUE = 2; + WHITE = 3; +} + +message Item [id=102] { + string name = 1; +} + +message Dog [id=103] { + optional string name = 1; + int32 age = 2; +} + +message Cat [id=104] { + ref Dog friend = 1; + optional string name = 2; + repeated string tags = 3; + map scores = 4; + int32 lives = 5; +} + +// Demonstrates primitive arrays for numeric types +message SensorData [id=105] { + string sensor_id = 1; + repeated int32 readings = 2; + repeated float64 temperatures = 3; + repeated int64 timestamps = 4; + repeated bool flags = 5; +} + +// No type ID - uses name-based registration +message Config { + string key = 1; + string value = 2; +} diff --git a/compiler/examples/user.fdl b/compiler/examples/user.fdl new file mode 100644 index 0000000000..8e5ffa9b68 --- /dev/null +++ b/compiler/examples/user.fdl @@ -0,0 +1,40 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// User domain types - demonstrates import functionality +package user; + +// Import common types from common.fdl +import "common.fdl"; + +// User profile using imported types +message User [id=200] { + string id = 1; + string name = 2; + optional Address home_address = 3; + optional Address work_address = 4; + optional Contact contact = 5; + Status status = 6; +} + +// User preferences +message UserPreferences [id=201] { + ref User user = 1; + string language = 2; + string timezone = 3; + bool notifications_enabled = 4; +} diff --git a/compiler/extension/fory_options.proto b/compiler/extension/fory_options.proto new file mode 100644 index 0000000000..92d6478fa6 --- /dev/null +++ b/compiler/extension/fory_options.proto @@ -0,0 +1,152 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// Fory Options for FDL (Fory Definition Language) +// @author: chaokunyang +// +// This file defines custom options that can be used in FDL schemas +// to control Fory serialization behavior. Copy this file into your +// project to use these options. +// +// Usage Examples: +// +// File-level options: +// option (fory).use_record_for_java_message = true; +// +// Message options: +// message MyMessage { +// option (fory).id = 100; +// option (fory).evolving = false; +// option (fory).use_record_for_java = true; +// } +// +// Enum options: +// enum Status { +// option (fory).id = 101; +// } +// +// Field options: +// message Example { +// MyType field = 1 [(fory).ref = true, (fory).nullable = true]; +// } + +syntax = "proto3"; + +package fory; + +import "google/protobuf/descriptor.proto"; + +// =========================================================================== +// File-Level Options +// =========================================================================== +// These options apply to the entire FDL file and affect all generated code. + +message ForyFileOptions { + // Generate Java records instead of classes for all messages in this file. + // Only applies to Java code generation. + // Default: false (generates traditional POJOs) + optional bool use_record_for_java_message = 1; + + // Enable polymorphism support for all types in this file. + // When true, type metadata is included in serialization for polymorphic dispatch. + // Default: false + optional bool polymorphism = 2; +} + +extend google.protobuf.FileOptions { + optional ForyFileOptions fory = 50001; +} + +// =========================================================================== +// Message-Level Options +// =========================================================================== +// These options apply to individual message definitions. + +message ForyMessageOptions { + // Unique type ID for cross-language registration. + // Used for efficient type lookup during deserialization. + // Must be a positive integer unique within the schema. + // If not specified, namespace-based registration is used. + optional int32 id = 1; + + // Enable schema evolution for this message. + // When true (default), fields can be added/removed with forward/backward compatibility. + // When false, schema is fixed like a struct - no changes allowed, better performance. + optional bool evolving = 2; + + // Generate a Java record instead of a class for this specific message. + // Overrides file-level use_record_for_java_message setting. + // Only applies to Java code generation. + optional bool use_record_for_java = 3; + + // Mark this message as deprecated. + // Generates appropriate deprecation annotations in target languages. + optional bool deprecated = 4; + + // Custom namespace for type registration. + // Overrides the default package-based namespace. + optional string namespace = 5; +} + +extend google.protobuf.MessageOptions { + optional ForyMessageOptions fory = 50001; +} + +// =========================================================================== +// Enum-Level Options +// =========================================================================== +// These options apply to individual enum definitions. + +message ForyEnumOptions { + // Unique type ID for cross-language registration. + // Used for efficient type lookup during deserialization. + // Must be a positive integer unique within the schema. + optional int32 id = 1; + + // Mark this enum as deprecated. + optional bool deprecated = 2; +} + +extend google.protobuf.EnumOptions { + optional ForyEnumOptions fory = 50001; +} + +// =========================================================================== +// Field-Level Options +// =========================================================================== +// These options apply to individual field definitions. + +message ForyFieldOptions { + // Enable reference tracking for this field. + // When true, Fory tracks object references to handle: + // - Circular references (e.g., tree structures with parent pointers) + // - Shared references (same object referenced multiple times) + // Default: false + optional bool ref = 1; + + // Mark this field as nullable. + // When true, the field can be null/None/nil. + // Default: false (field must have a value) + optional bool nullable = 2; + + // Mark this field as deprecated. + optional bool deprecated = 3; +} + +extend google.protobuf.FieldOptions { + optional ForyFieldOptions fory = 50001; +} diff --git a/compiler/fory_compiler/__init__.py b/compiler/fory_compiler/__init__.py new file mode 100644 index 0000000000..3df24fc44d --- /dev/null +++ b/compiler/fory_compiler/__init__.py @@ -0,0 +1,35 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""FDL (Fory Definition Language) compiler for Apache Fory.""" + +__version__ = "0.1.0" + +from fory_compiler.parser.ast import Schema, Message, Enum, Field, EnumValue, Import +from fory_compiler.parser.parser import Parser +from fory_compiler.parser.lexer import Lexer + +__all__ = [ + "Schema", + "Message", + "Enum", + "Field", + "EnumValue", + "Import", + "Parser", + "Lexer", +] diff --git a/compiler/fory_compiler/__main__.py b/compiler/fory_compiler/__main__.py new file mode 100644 index 0000000000..c4ae1a4836 --- /dev/null +++ b/compiler/fory_compiler/__main__.py @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Entry point for running the package as a module.""" + +import sys +from fory_compiler.cli import main + +if __name__ == "__main__": + sys.exit(main()) diff --git a/compiler/fory_compiler/cli.py b/compiler/fory_compiler/cli.py new file mode 100644 index 0000000000..d91ebf8206 --- /dev/null +++ b/compiler/fory_compiler/cli.py @@ -0,0 +1,422 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""CLI entry point for the FDL compiler.""" + +import argparse +import sys +from pathlib import Path +from typing import Dict, List, Optional, Set + +from fory_compiler.parser.lexer import Lexer, LexerError +from fory_compiler.parser.parser import Parser, ParseError +from fory_compiler.parser.ast import Schema +from fory_compiler.generators.base import GeneratorOptions +from fory_compiler.generators import GENERATORS + + +class ImportError(Exception): + """Error during import resolution.""" + pass + + +def parse_fdl_file(file_path: Path) -> Schema: + """Parse a single FDL file and return its schema.""" + source = file_path.read_text() + lexer = Lexer(source, str(file_path)) + tokens = lexer.tokenize() + parser = Parser(tokens) + return parser.parse() + + +def resolve_import_path( + import_stmt: str, + importing_file: Path, + import_paths: List[Path], +) -> Optional[Path]: + """ + Resolve an import path by searching in multiple directories. + + Search order: + 1. Relative to the importing file's directory + 2. Each import path in order (from -I / --proto_path / --import_path) + + Args: + import_stmt: The import path string from the import statement + importing_file: The file containing the import statement + import_paths: List of additional search directories + + Returns: + Resolved Path if found, None otherwise + """ + # First, try relative to the importing file + relative_path = (importing_file.parent / import_stmt).resolve() + if relative_path.exists(): + return relative_path + + # Then try each import path + for search_path in import_paths: + candidate = (search_path / import_stmt).resolve() + if candidate.exists(): + return candidate + + return None + + +def resolve_imports( + file_path: Path, + import_paths: Optional[List[Path]] = None, + visited: Optional[Set[Path]] = None, + cache: Optional[Dict[Path, Schema]] = None, +) -> Schema: + """ + Recursively resolve imports and merge all types into a single schema. + + Args: + file_path: Path to the FDL file to parse + import_paths: List of directories to search for imports + visited: Set of already visited files (for cycle detection) + cache: Cache of already parsed schemas + + Returns: + Schema with all imported types merged in + """ + if import_paths is None: + import_paths = [] + if visited is None: + visited = set() + if cache is None: + cache = {} + + # Normalize path + file_path = file_path.resolve() + + # Check for circular imports + if file_path in visited: + raise ImportError(f"Circular import detected: {file_path}") + + # Return cached schema if available + if file_path in cache: + return cache[file_path] + + visited.add(file_path) + + # Parse the file + schema = parse_fdl_file(file_path) + + # Process imports + imported_enums = [] + imported_messages = [] + + for imp in schema.imports: + # Resolve import path using search paths + import_path = resolve_import_path(imp.path, file_path, import_paths) + + if import_path is None: + # Build helpful error message with search locations + searched = [str(file_path.parent)] + searched.extend(str(p) for p in import_paths) + raise ImportError( + f"Import not found: {imp.path}\n" + f" at line {imp.line}, column {imp.column}\n" + f" Searched in: {', '.join(searched)}" + ) + + # Recursively resolve the imported file + imported_schema = resolve_imports(import_path, import_paths, visited.copy(), cache) + + # Collect types from imported schema + imported_enums.extend(imported_schema.enums) + imported_messages.extend(imported_schema.messages) + + # Create merged schema with imported types first (so they can be referenced) + merged_schema = Schema( + package=schema.package, + imports=schema.imports, + enums=imported_enums + schema.enums, + messages=imported_messages + schema.messages, + ) + + cache[file_path] = merged_schema + return merged_schema + + +def parse_args(args: Optional[List[str]] = None) -> argparse.Namespace: + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + prog="fory", + description="FDL (Fory Definition Language) compiler", + ) + + subparsers = parser.add_subparsers(dest="command", help="Available commands") + + # compile command + compile_parser = subparsers.add_parser( + "compile", + help="Compile FDL files to language-specific code", + ) + + compile_parser.add_argument( + "files", + nargs="+", + type=Path, + metavar="FILE", + help="FDL files to compile", + ) + + compile_parser.add_argument( + "--lang", + type=str, + default="all", + help="Comma-separated list of target languages (java,python,cpp,rust,go). Default: all", + ) + + compile_parser.add_argument( + "--output", + "-o", + type=Path, + default=Path("./generated"), + help="Output directory. Default: ./generated", + ) + + compile_parser.add_argument( + "--package", + type=str, + default=None, + help="Override package name from FDL file", + ) + + compile_parser.add_argument( + "-I", + "--proto_path", + "--import_path", + dest="import_paths", + action="append", + type=Path, + default=[], + metavar="PATH", + help="Add a directory to the import search path. Can be specified multiple times.", + ) + + # Language-specific output directories (protoc-style) + compile_parser.add_argument( + "--java_out", + type=Path, + default=None, + metavar="DST_DIR", + help="Generate Java code in DST_DIR", + ) + + compile_parser.add_argument( + "--python_out", + type=Path, + default=None, + metavar="DST_DIR", + help="Generate Python code in DST_DIR", + ) + + compile_parser.add_argument( + "--cpp_out", + type=Path, + default=None, + metavar="DST_DIR", + help="Generate C++ code in DST_DIR", + ) + + compile_parser.add_argument( + "--go_out", + type=Path, + default=None, + metavar="DST_DIR", + help="Generate Go code in DST_DIR", + ) + + compile_parser.add_argument( + "--rust_out", + type=Path, + default=None, + metavar="DST_DIR", + help="Generate Rust code in DST_DIR", + ) + + return parser.parse_args(args) + + +def get_languages(lang_arg: str) -> List[str]: + """Parse the language argument into a list of languages.""" + if lang_arg == "all": + return list(GENERATORS.keys()) + + languages = [l.strip().lower() for l in lang_arg.split(",")] + + # Validate languages + invalid = [l for l in languages if l not in GENERATORS] + if invalid: + print(f"Error: Unknown language(s): {', '.join(invalid)}", file=sys.stderr) + print(f"Available: {', '.join(GENERATORS.keys())}", file=sys.stderr) + sys.exit(1) + + return languages + + +def compile_file( + file_path: Path, + lang_output_dirs: Dict[str, Path], + package_override: Optional[str] = None, + import_paths: Optional[List[Path]] = None, +) -> bool: + """Compile a single FDL file with import resolution. + + Args: + file_path: Path to the FDL file + lang_output_dirs: Dictionary mapping language name to output directory + package_override: Optional package name override + import_paths: List of import search paths + """ + print(f"Compiling {file_path}...") + + # Parse and resolve imports + try: + schema = resolve_imports(file_path, import_paths) + except OSError as e: + print(f"Error reading {file_path}: {e}", file=sys.stderr) + return False + except (LexerError, ParseError) as e: + print(f"Error: {e}", file=sys.stderr) + return False + except ImportError as e: + print(f"Import error: {e}", file=sys.stderr) + return False + + # Print import info + if schema.imports: + print(f" Resolved {len(schema.imports)} import(s)") + + # Validate merged schema + errors = schema.validate() + if errors: + for error in errors: + print(f"Error: {error}", file=sys.stderr) + return False + + # Generate code for each language + for lang, lang_output in lang_output_dirs.items(): + options = GeneratorOptions( + output_dir=lang_output, + package_override=package_override, + ) + + generator_class = GENERATORS[lang] + generator = generator_class(schema, options) + files = generator.generate() + generator.write_files(files) + + for f in files: + print(f" Generated: {lang_output / f.path}") + + return True + + +def cmd_compile(args: argparse.Namespace) -> int: + """Handle the compile command.""" + # Build language -> output directory mapping + # Language-specific --{lang}_out options take precedence + lang_specific_outputs = { + "java": args.java_out, + "python": args.python_out, + "cpp": args.cpp_out, + "go": args.go_out, + "rust": args.rust_out, + } + + # Determine which languages to generate + lang_output_dirs: Dict[str, Path] = {} + + # First, add languages specified via --{lang}_out (these use direct paths) + for lang, out_dir in lang_specific_outputs.items(): + if out_dir is not None: + lang_output_dirs[lang] = out_dir + + # Then, add languages from --lang that don't have specific output dirs + # These use output_dir/lang pattern + if args.lang != "all" or not lang_output_dirs: + # Only use --lang if no language-specific outputs are set, or if --lang is explicit + languages_from_arg = get_languages(args.lang) + for lang in languages_from_arg: + if lang not in lang_output_dirs: + lang_output_dirs[lang] = args.output / lang + + if not lang_output_dirs: + print("Error: No target languages specified.", file=sys.stderr) + print("Use --lang or --{lang}_out options.", file=sys.stderr) + return 1 + + # Validate that all languages are supported + invalid = [l for l in lang_output_dirs.keys() if l not in GENERATORS] + if invalid: + print(f"Error: Unknown language(s): {', '.join(invalid)}", file=sys.stderr) + print(f"Available: {', '.join(GENERATORS.keys())}", file=sys.stderr) + return 1 + + # Resolve and validate import paths (support comma-separated paths) + import_paths = [] + for p in args.import_paths: + # Split by comma to support multiple paths in one option + for part in str(p).split(","): + part = part.strip() + if not part: + continue + resolved = Path(part).resolve() + if not resolved.is_dir(): + print(f"Warning: Import path is not a directory: {part}", file=sys.stderr) + import_paths.append(resolved) + + # Create output directories + for out_dir in lang_output_dirs.values(): + out_dir.mkdir(parents=True, exist_ok=True) + + success = True + for file_path in args.files: + if not file_path.exists(): + print(f"Error: File not found: {file_path}", file=sys.stderr) + success = False + continue + + if not compile_file(file_path, lang_output_dirs, args.package, import_paths): + success = False + + return 0 if success else 1 + + +def main(args: Optional[List[str]] = None) -> int: + """Main entry point.""" + parsed = parse_args(args) + + if parsed.command is None: + print("Usage: fory [options]", file=sys.stderr) + print("Commands: compile", file=sys.stderr) + print("Use 'fory --help' for more information", file=sys.stderr) + return 1 + + if parsed.command == "compile": + return cmd_compile(parsed) + + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/compiler/fory_compiler/generators/__init__.py b/compiler/fory_compiler/generators/__init__.py new file mode 100644 index 0000000000..5c8ebd5be6 --- /dev/null +++ b/compiler/fory_compiler/generators/__init__.py @@ -0,0 +1,43 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Code generators for different target languages.""" + +from fory_compiler.generators.base import BaseGenerator +from fory_compiler.generators.java import JavaGenerator +from fory_compiler.generators.python import PythonGenerator +from fory_compiler.generators.cpp import CppGenerator +from fory_compiler.generators.rust import RustGenerator +from fory_compiler.generators.go import GoGenerator + +GENERATORS = { + "java": JavaGenerator, + "python": PythonGenerator, + "cpp": CppGenerator, + "rust": RustGenerator, + "go": GoGenerator, +} + +__all__ = [ + "BaseGenerator", + "JavaGenerator", + "PythonGenerator", + "CppGenerator", + "RustGenerator", + "GoGenerator", + "GENERATORS", +] diff --git a/compiler/fory_compiler/generators/base.py b/compiler/fory_compiler/generators/base.py new file mode 100644 index 0000000000..4895a956a8 --- /dev/null +++ b/compiler/fory_compiler/generators/base.py @@ -0,0 +1,203 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Base class for code generators.""" + +from abc import ABC, abstractmethod +from dataclasses import dataclass +from pathlib import Path +from typing import Dict, List, Optional + +from fory_compiler.parser.ast import Schema, Message, Enum, Field, FieldType + + +@dataclass +class GeneratedFile: + """A generated source file.""" + + path: str + content: str + + +@dataclass +class GeneratorOptions: + """Options for code generation.""" + + output_dir: Path + package_override: Optional[str] = None + + +class BaseGenerator(ABC): + """Base class for language-specific code generators.""" + + # Override in subclasses + language_name: str = "base" + file_extension: str = ".txt" + + def __init__(self, schema: Schema, options: GeneratorOptions): + self.schema = schema + self.options = options + self.indent_str = " " # 4 spaces by default + + @property + def package(self) -> Optional[str]: + """Get the package name.""" + return self.options.package_override or self.schema.package + + @abstractmethod + def generate(self) -> List[GeneratedFile]: + """Generate code and return a list of generated files.""" + pass + + @abstractmethod + def generate_type(self, field_type: FieldType, nullable: bool = False) -> str: + """Generate the type string for a field type.""" + pass + + def indent(self, text: str, level: int = 1) -> str: + """Indent text by the given number of levels.""" + prefix = self.indent_str * level + lines = text.split("\n") + return "\n".join(prefix + line if line else line for line in lines) + + def to_pascal_case(self, name: str) -> str: + """Convert name to PascalCase. + + Handles various input formats: + - snake_case -> PascalCase (device_tier -> DeviceTier) + - UPPER_SNAKE_CASE -> PascalCase (DEVICE_TIER -> DeviceTier) + - camelCase -> PascalCase (deviceTier -> DeviceTier) + - ALLCAPS -> Allcaps (UNKNOWN -> Unknown) + """ + if not name: + return name + + # Handle snake_case and UPPER_SNAKE_CASE + if "_" in name: + return "".join(word.capitalize() for word in name.lower().split("_")) + + # Handle all uppercase single word (e.g., UNKNOWN -> Unknown) + if name.isupper(): + return name.capitalize() + + # Handle already PascalCase or camelCase + return name[0].upper() + name[1:] + + def to_camel_case(self, name: str) -> str: + """Convert name to camelCase.""" + pascal = self.to_pascal_case(name) + if not pascal: + return pascal + return pascal[0].lower() + pascal[1:] + + def to_snake_case(self, name: str) -> str: + """Convert name to snake_case. + + Handles acronyms properly: + - DeviceTier -> device_tier + - HTTPStatus -> http_status + - XMLParser -> xml_parser + - HTMLToText -> html_to_text + """ + if not name: + return name + result = [] + for i, char in enumerate(name): + if char.isupper(): + # Add underscore before uppercase if: + # 1. Not at the start + # 2. Previous char is lowercase, OR + # 3. Next char exists and is lowercase (handles acronyms like HTTP->Status) + if i > 0: + prev_lower = name[i - 1].islower() + next_lower = (i + 1 < len(name)) and name[i + 1].islower() + if prev_lower or next_lower: + result.append("_") + result.append(char.lower()) + else: + result.append(char) + return "".join(result) + + def to_upper_snake_case(self, name: str) -> str: + """Convert name to UPPER_SNAKE_CASE.""" + return self.to_snake_case(name).upper() + + def write_files(self, files: List[GeneratedFile]): + """Write generated files to disk.""" + for file in files: + path = self.options.output_dir / file.path + path.parent.mkdir(parents=True, exist_ok=True) + path.write_text(file.content) + + def strip_enum_prefix(self, enum_name: str, value_name: str) -> str: + """Strip the enum name prefix from an enum value name. + + For protobuf-style enums where values are prefixed with the enum name + in UPPER_SNAKE_CASE, strip the prefix to get cleaner scoped enum values. + + Example: + enum_name="DeviceTier", value_name="DEVICE_TIER_UNKNOWN" -> "UNKNOWN" + enum_name="DeviceTier", value_name="DEVICE_TIER_TIER1" -> "TIER1" + enum_name="DeviceTier", value_name="DEVICE_TIER_1" -> "DEVICE_TIER_1" (keeps original, "1" is invalid) + + The prefix is only stripped if the remainder is a valid identifier + (starts with a letter). + + Args: + enum_name: The enum type name (e.g., "DeviceTier") + value_name: The enum value name (e.g., "DEVICE_TIER_UNKNOWN") + + Returns: + The stripped value name, or original if stripping would yield an invalid name + """ + # Convert enum name to UPPER_SNAKE_CASE prefix + prefix = self.to_upper_snake_case(enum_name) + "_" + + # Check if value_name starts with the prefix + if not value_name.startswith(prefix): + return value_name + + # Get the remainder after stripping prefix + remainder = value_name[len(prefix):] + + # Check if remainder is a valid identifier (starts with letter) + if not remainder or not remainder[0].isalpha(): + return value_name + + return remainder + + def get_license_header(self, comment_prefix: str = "//") -> str: + """Get the Apache license header.""" + lines = [ + "Licensed to the Apache Software Foundation (ASF) under one", + "or more contributor license agreements. See the NOTICE file", + "distributed with this work for additional information", + "regarding copyright ownership. The ASF licenses this file", + "to you under the Apache License, Version 2.0 (the", + '"License"); you may not use this file except in compliance', + "with the License. You may obtain a copy of the License at", + "", + " http://www.apache.org/licenses/LICENSE-2.0", + "", + "Unless required by applicable law or agreed to in writing,", + "software distributed under the License is distributed on an", + '"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY', + "KIND, either express or implied. See the License for the", + "specific language governing permissions and limitations", + "under the License.", + ] + return "\n".join(f"{comment_prefix} {line}" if line else comment_prefix for line in lines) diff --git a/compiler/fory_compiler/generators/cpp.py b/compiler/fory_compiler/generators/cpp.py new file mode 100644 index 0000000000..5dafeb31cf --- /dev/null +++ b/compiler/fory_compiler/generators/cpp.py @@ -0,0 +1,348 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""C++ code generator.""" + +from typing import List, Set + +from fory_compiler.generators.base import BaseGenerator, GeneratedFile +from fory_compiler.parser.ast import ( + Schema, + Message, + Enum, + Field, + FieldType, + PrimitiveType, + PrimitiveKind, + NamedType, + ListType, + MapType, +) + + +class CppGenerator(BaseGenerator): + """Generates C++ structs with FORY_STRUCT macros.""" + + language_name = "cpp" + file_extension = ".h" + + # Mapping from FDL primitive types to C++ types + PRIMITIVE_MAP = { + PrimitiveKind.BOOL: "bool", + PrimitiveKind.INT8: "int8_t", + PrimitiveKind.INT16: "int16_t", + PrimitiveKind.INT32: "int32_t", + PrimitiveKind.INT64: "int64_t", + PrimitiveKind.FLOAT32: "float", + PrimitiveKind.FLOAT64: "double", + PrimitiveKind.STRING: "std::string", + PrimitiveKind.BYTES: "std::vector", + PrimitiveKind.DATE: "fory::serialization::LocalDate", + PrimitiveKind.TIMESTAMP: "fory::serialization::Timestamp", + } + + def generate(self) -> List[GeneratedFile]: + """Generate C++ files for the schema.""" + files = [] + + # Generate a single header file with all types + files.append(self.generate_header()) + + return files + + def get_header_name(self) -> str: + """Get the header file name.""" + if self.package: + return self.package.replace(".", "_") + return "generated" + + def get_namespace(self) -> str: + """Get the C++ namespace.""" + if self.package: + return self.package.replace(".", "::") + return "" + + def generate_header(self) -> GeneratedFile: + """Generate a C++ header file with all types.""" + lines = [] + includes: Set[str] = set() + + # Collect includes (including from nested types) + includes.add("") + includes.add("") + includes.add('"fory/serialization/fory.h"') + + for message in self.schema.messages: + self.collect_message_includes(message, includes) + + # License header + lines.append("/*") + for line in self.get_license_header(" *").split("\n"): + lines.append(line) + lines.append(" */") + lines.append("") + + # Header guard + guard_name = f"{self.get_header_name().upper()}_H_" + lines.append(f"#ifndef {guard_name}") + lines.append(f"#define {guard_name}") + lines.append("") + + # Includes + for inc in sorted(includes): + lines.append(f"#include {inc}") + lines.append("") + + # Namespace + namespace = self.get_namespace() + if namespace: + lines.append(f"namespace {namespace} {{") + lines.append("") + + # Forward declarations (including nested types as flat names) + self.generate_forward_declarations(lines, "") + if self.schema.messages: + lines.append("") + + # Generate enums (top-level) + for enum in self.schema.enums: + lines.extend(self.generate_enum(enum, "")) + lines.append("") + + # Generate messages (including nested as flat types with qualified names) + for message in self.schema.messages: + lines.extend(self.generate_message_with_nested(message, "")) + + # Generate registration function + lines.extend(self.generate_registration()) + lines.append("") + + # Close namespace + if namespace: + lines.append(f"}} // namespace {namespace}") + lines.append("") + + # End header guard + lines.append(f"#endif // {guard_name}") + lines.append("") + + return GeneratedFile( + path=f"{self.get_header_name()}.h", + content="\n".join(lines), + ) + + def collect_message_includes(self, message: Message, includes: Set[str]): + """Collect includes for a message and its nested types recursively.""" + for field in message.fields: + self.collect_includes(field.field_type, field.optional, field.ref, includes) + for nested_msg in message.nested_messages: + self.collect_message_includes(nested_msg, includes) + + def generate_forward_declarations(self, lines: List[str], parent_name: str): + """Generate forward declarations for all messages (flattened).""" + for message in self.schema.messages: + self._generate_forward_decl_recursive(lines, message, "") + + def _generate_forward_decl_recursive(self, lines: List[str], message: Message, parent_name: str): + """Recursively generate forward declarations.""" + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + lines.append(f"struct {type_name};") + for nested_msg in message.nested_messages: + self._generate_forward_decl_recursive(lines, nested_msg, type_name) + + def generate_enum(self, enum: Enum, parent_name: str = "") -> List[str]: + """Generate a C++ enum class.""" + lines = [] + + # For nested enums, use Parent_Child naming + type_name = f"{parent_name}_{enum.name}" if parent_name else enum.name + + lines.append(f"enum class {type_name} : int32_t {{") + # Enum values (strip prefix for scoped enums) + stripped_names = [] + for value in enum.values: + stripped_name = self.strip_enum_prefix(enum.name, value.name) + stripped_names.append(stripped_name) + lines.append(f" {stripped_name} = {value.value},") + lines.append("};") + + # FORY_ENUM macro + value_names = ", ".join(stripped_names) + lines.append(f"FORY_ENUM({type_name}, {value_names});") + + return lines + + def generate_message(self, message: Message, parent_name: str = "") -> List[str]: + """Generate a C++ struct.""" + lines = [] + + # For nested messages, use Parent_Child naming + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + + lines.append(f"struct {type_name} {{") + + # Fields + for field in message.fields: + cpp_type = self.generate_type(field.field_type, field.optional, field.ref, parent_name) + field_name = self.to_snake_case(field.name) + lines.append(f" {cpp_type} {field_name};") + + lines.append("") + + # Equality operator + lines.append(f" bool operator==(const {type_name}& other) const {{") + if message.fields: + conditions = [] + for field in message.fields: + field_name = self.to_snake_case(field.name) + conditions.append(f"{field_name} == other.{field_name}") + lines.append(f" return {' && '.join(conditions)};") + else: + lines.append(" return true;") + lines.append(" }") + + lines.append("};") + + # FORY_STRUCT macro + field_names = ", ".join(self.to_snake_case(f.name) for f in message.fields) + lines.append(f"FORY_STRUCT({type_name}, {field_names});") + + return lines + + def generate_message_with_nested(self, message: Message, parent_name: str = "") -> List[str]: + """Generate a C++ struct and all its nested types (flattened).""" + lines = [] + + # Current message's type name + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + + # First, generate all nested enums + for nested_enum in message.nested_enums: + lines.extend(self.generate_enum(nested_enum, type_name)) + lines.append("") + + # Then, generate all nested messages (recursively) + for nested_msg in message.nested_messages: + lines.extend(self.generate_message_with_nested(nested_msg, type_name)) + + # Finally, generate this message + lines.extend(self.generate_message(message, parent_name)) + lines.append("") + + return lines + + def generate_type(self, field_type: FieldType, nullable: bool = False, ref: bool = False, parent_name: str = "") -> str: + """Generate C++ type string.""" + if isinstance(field_type, PrimitiveType): + base_type = self.PRIMITIVE_MAP[field_type.kind] + if nullable: + return f"std::optional<{base_type}>" + return base_type + + elif isinstance(field_type, NamedType): + # Convert qualified names (Parent.Child) to C++-style (Parent_Child) + type_name = field_type.name.replace(".", "_") + # If it's a simple name and we have a parent context, it might be a nested type + if "." not in field_type.name and parent_name: + type_name = f"{parent_name}_{type_name}" + if ref: + return f"std::shared_ptr<{type_name}>" + if nullable: + return f"std::optional<{type_name}>" + return type_name + + elif isinstance(field_type, ListType): + element_type = self.generate_type(field_type.element_type, False, False, parent_name) + return f"std::vector<{element_type}>" + + elif isinstance(field_type, MapType): + key_type = self.generate_type(field_type.key_type, False, False, parent_name) + value_type = self.generate_type(field_type.value_type, False, False, parent_name) + return f"std::map<{key_type}, {value_type}>" + + return "void*" + + def collect_includes(self, field_type: FieldType, nullable: bool, ref: bool, includes: Set[str]): + """Collect required includes for a field type.""" + if nullable: + includes.add("") + if ref: + includes.add("") + + if isinstance(field_type, PrimitiveType): + if field_type.kind == PrimitiveKind.STRING: + includes.add("") + elif field_type.kind == PrimitiveKind.BYTES: + includes.add("") + elif field_type.kind in (PrimitiveKind.DATE, PrimitiveKind.TIMESTAMP): + includes.add('"fory/serialization/temporal_serializers.h"') + + elif isinstance(field_type, ListType): + includes.add("") + self.collect_includes(field_type.element_type, False, False, includes) + + elif isinstance(field_type, MapType): + includes.add("") + self.collect_includes(field_type.key_type, False, False, includes) + self.collect_includes(field_type.value_type, False, False, includes) + + def generate_registration(self) -> List[str]: + """Generate the Fory registration function.""" + lines = [] + + lines.append("inline void RegisterTypes(fory::serialization::Fory& fory) {") + + # Register enums (top-level) + for enum in self.schema.enums: + self.generate_enum_registration(lines, enum, "") + + # Register messages (including nested types) + for message in self.schema.messages: + self.generate_message_registration(lines, message, "") + + lines.append("}") + + return lines + + def generate_enum_registration(self, lines: List[str], enum: Enum, parent_name: str): + """Generate registration code for an enum.""" + type_name = f"{parent_name}_{enum.name}" if parent_name else enum.name + + if enum.type_id is not None: + lines.append(f" fory.register_enum<{type_name}>({enum.type_id});") + else: + ns = self.package or "default" + lines.append(f' fory.register_enum<{type_name}>("{ns}", "{type_name}");') + + def generate_message_registration(self, lines: List[str], message: Message, parent_name: str): + """Generate registration code for a message and its nested types.""" + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + + # Register nested enums first + for nested_enum in message.nested_enums: + self.generate_enum_registration(lines, nested_enum, type_name) + + # Register nested messages recursively + for nested_msg in message.nested_messages: + self.generate_message_registration(lines, nested_msg, type_name) + + # Register this message + if message.type_id is not None: + lines.append(f" fory.register_struct<{type_name}>({message.type_id});") + else: + ns = self.package or "default" + lines.append(f' fory.register_struct<{type_name}>("{ns}", "{type_name}");') diff --git a/compiler/fory_compiler/generators/go.py b/compiler/fory_compiler/generators/go.py new file mode 100644 index 0000000000..03d3693c18 --- /dev/null +++ b/compiler/fory_compiler/generators/go.py @@ -0,0 +1,346 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Go code generator.""" + +from typing import List, Optional, Set, Tuple + +from fory_compiler.generators.base import BaseGenerator, GeneratedFile +from fory_compiler.parser.ast import ( + Schema, + Message, + Enum, + Field, + FieldType, + PrimitiveType, + PrimitiveKind, + NamedType, + ListType, + MapType, +) + + +class GoGenerator(BaseGenerator): + """Generates Go structs with fory tags.""" + + language_name = "go" + file_extension = ".go" + indent_str = "\t" # Go uses tabs + + def get_go_package_info(self) -> Tuple[Optional[str], str]: + """Parse go_package option and return (import_path, package_name). + + Supports format: "github.com/mycorp/apis/gen/payment/v1;paymentv1" + - Part before ';' is the import path + - Part after ';' is the package name + - If no ';', the last element of the import path is used as package name + - If no go_package option, falls back to FDL package + + Returns: + Tuple of (import_path, package_name). import_path may be None. + """ + go_package = self.schema.get_option("go_package") + if go_package: + if ";" in go_package: + import_path, package_name = go_package.split(";", 1) + return (import_path, package_name) + else: + # Use last element of path as package name + parts = go_package.rstrip("/").split("/") + return (go_package, parts[-1]) + + # Fall back to FDL package + if self.schema.package: + parts = self.schema.package.split(".") + return (None, parts[-1]) + + return (None, "generated") + + # Mapping from FDL primitive types to Go types + PRIMITIVE_MAP = { + PrimitiveKind.BOOL: "bool", + PrimitiveKind.INT8: "int8", + PrimitiveKind.INT16: "int16", + PrimitiveKind.INT32: "int32", + PrimitiveKind.INT64: "int64", + PrimitiveKind.FLOAT32: "float32", + PrimitiveKind.FLOAT64: "float64", + PrimitiveKind.STRING: "string", + PrimitiveKind.BYTES: "[]byte", + PrimitiveKind.DATE: "time.Time", + PrimitiveKind.TIMESTAMP: "time.Time", + } + + def generate(self) -> List[GeneratedFile]: + """Generate Go files for the schema.""" + files = [] + + # Generate a single Go file with all types + files.append(self.generate_file()) + + return files + + def get_package_name(self) -> str: + """Get the Go package name.""" + _, package_name = self.get_go_package_info() + return package_name + + def get_file_name(self) -> str: + """Get the Go file name.""" + if self.package: + return self.package.replace(".", "_") + return "generated" + + def generate_file(self) -> GeneratedFile: + """Generate a Go file with all types.""" + lines = [] + imports: Set[str] = set() + + # Collect imports (including from nested types) + imports.add('fory "github.com/apache/fory/go/fory"') + + for message in self.schema.messages: + self.collect_message_imports(message, imports) + + # License header + lines.append(self.get_license_header("//")) + lines.append("") + + # Package declaration + lines.append(f"package {self.get_package_name()}") + lines.append("") + + # Imports + if imports: + lines.append("import (") + for imp in sorted(imports): + lines.append(f'\t{imp}') + lines.append(")") + lines.append("") + + # Generate enums (top-level) + for enum in self.schema.enums: + lines.extend(self.generate_enum(enum, "")) + lines.append("") + + # Generate messages (including nested as flat types with qualified names) + for message in self.schema.messages: + lines.extend(self.generate_message_with_nested(message, "")) + + # Generate registration function + lines.extend(self.generate_registration()) + lines.append("") + + return GeneratedFile( + path=f"{self.get_file_name()}.go", + content="\n".join(lines), + ) + + def collect_message_imports(self, message: Message, imports: Set[str]): + """Collect imports for a message and its nested types recursively.""" + for field in message.fields: + self.collect_imports(field.field_type, imports) + for nested_msg in message.nested_messages: + self.collect_message_imports(nested_msg, imports) + + def generate_enum(self, enum: Enum, parent_name: str = "") -> List[str]: + """Generate a Go enum (using type alias and constants).""" + lines = [] + + # For nested enums, use Parent_Child naming + type_name = f"{parent_name}_{enum.name}" if parent_name else enum.name + + # Type definition + lines.append(f"type {type_name} int32") + lines.append("") + + # Constants (strip prefix first, then add enum name back for Go's unscoped style) + lines.append("const (") + for value in enum.values: + # Strip the proto-style prefix (e.g., DEVICE_TIER_UNKNOWN -> UNKNOWN) + stripped_name = self.strip_enum_prefix(enum.name, value.name) + # Add enum name prefix for Go (e.g., DeviceTierUnknown) + const_name = f"{type_name}{self.to_pascal_case(stripped_name)}" + lines.append(f"\t{const_name} {type_name} = {value.value}") + lines.append(")") + + return lines + + def generate_message(self, message: Message, parent_name: str = "") -> List[str]: + """Generate a Go struct.""" + lines = [] + + # For nested messages, use Parent_Child naming + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + + lines.append(f"type {type_name} struct {{") + + # Fields + for field in message.fields: + field_lines = self.generate_field(field, parent_name) + for line in field_lines: + lines.append(f"\t{line}") + + lines.append("}") + + return lines + + def generate_message_with_nested(self, message: Message, parent_name: str = "") -> List[str]: + """Generate a Go struct and all its nested types (flattened).""" + lines = [] + + # Current message's type name + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + + # First, generate all nested enums + for nested_enum in message.nested_enums: + lines.extend(self.generate_enum(nested_enum, type_name)) + lines.append("") + + # Then, generate all nested messages (recursively) + for nested_msg in message.nested_messages: + lines.extend(self.generate_message_with_nested(nested_msg, type_name)) + + # Finally, generate this message + lines.extend(self.generate_message(message, parent_name)) + lines.append("") + + return lines + + def generate_field(self, field: Field, parent_name: str = "") -> List[str]: + """Generate a struct field.""" + lines = [] + + go_type = self.generate_type(field.field_type, field.optional, field.ref, parent_name) + field_name = self.to_pascal_case(field.name) # Go uses PascalCase for exported fields + + # Build fory tag + tags = [] + if field.optional: + tags.append("nullable") + if field.ref: + tags.append("trackRef") + + if tags: + tag_str = ",".join(tags) + lines.append(f'{field_name} {go_type} `fory:"{tag_str}"`') + else: + lines.append(f"{field_name} {go_type}") + + return lines + + def generate_type(self, field_type: FieldType, nullable: bool = False, ref: bool = False, parent_name: str = "") -> str: + """Generate Go type string.""" + if isinstance(field_type, PrimitiveType): + base_type = self.PRIMITIVE_MAP[field_type.kind] + if nullable and base_type not in ("[]byte",): + return f"*{base_type}" + return base_type + + elif isinstance(field_type, NamedType): + # Convert qualified names (Parent.Child) to Go-style (Parent_Child) + type_name = field_type.name.replace(".", "_") + # If it's a simple name and we have a parent context, it might be a nested type + # that needs the parent prefix + if "." not in field_type.name and parent_name: + # Check if this could be a sibling nested type + type_name = f"{parent_name}_{type_name}" + if nullable or ref: + return f"*{type_name}" + return type_name + + elif isinstance(field_type, ListType): + element_type = self.generate_type(field_type.element_type, False, False, parent_name) + return f"[]{element_type}" + + elif isinstance(field_type, MapType): + key_type = self.generate_type(field_type.key_type, False, False, parent_name) + value_type = self.generate_type(field_type.value_type, False, False, parent_name) + return f"map[{key_type}]{value_type}" + + return "interface{}" + + def collect_imports(self, field_type: FieldType, imports: Set[str]): + """Collect required imports for a field type.""" + if isinstance(field_type, PrimitiveType): + if field_type.kind in (PrimitiveKind.DATE, PrimitiveKind.TIMESTAMP): + imports.add('"time"') + + elif isinstance(field_type, ListType): + self.collect_imports(field_type.element_type, imports) + + elif isinstance(field_type, MapType): + self.collect_imports(field_type.key_type, imports) + self.collect_imports(field_type.value_type, imports) + + def generate_registration(self) -> List[str]: + """Generate the Fory registration function.""" + lines = [] + + lines.append("func RegisterTypes(f *fory.Fory) error {") + + # Register enums (top-level) + for enum in self.schema.enums: + self.generate_enum_registration(lines, enum, "") + + # Register messages (including nested types) + for message in self.schema.messages: + self.generate_message_registration(lines, message, "") + + lines.append("\treturn nil") + lines.append("}") + + return lines + + def generate_enum_registration(self, lines: List[str], enum: Enum, parent_name: str): + """Generate registration code for an enum.""" + type_name = f"{parent_name}_{enum.name}" if parent_name else enum.name + + if enum.type_id is not None: + lines.append(f"\tif err := f.RegisterEnum({type_name}(0), {enum.type_id}); err != nil {{") + lines.append("\t\treturn err") + lines.append("\t}") + else: + # Use FDL package for namespace (consistent across languages) + ns = self.schema.package or "default" + lines.append(f'\tif err := f.RegisterTagType("{ns}.{type_name}", {type_name}(0)); err != nil {{') + lines.append("\t\treturn err") + lines.append("\t}") + + def generate_message_registration(self, lines: List[str], message: Message, parent_name: str): + """Generate registration code for a message and its nested types.""" + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + + # Register nested enums first + for nested_enum in message.nested_enums: + self.generate_enum_registration(lines, nested_enum, type_name) + + # Register nested messages recursively + for nested_msg in message.nested_messages: + self.generate_message_registration(lines, nested_msg, type_name) + + # Register this message + if message.type_id is not None: + lines.append(f"\tif err := f.Register({type_name}{{}}, {message.type_id}); err != nil {{") + lines.append("\t\treturn err") + lines.append("\t}") + else: + # Use FDL package for namespace (consistent across languages) + ns = self.schema.package or "default" + lines.append(f'\tif err := f.RegisterTagType("{ns}.{type_name}", {type_name}{{}}); err != nil {{') + lines.append("\t\treturn err") + lines.append("\t}") diff --git a/compiler/fory_compiler/generators/java.py b/compiler/fory_compiler/generators/java.py new file mode 100644 index 0000000000..3dd5b4c83f --- /dev/null +++ b/compiler/fory_compiler/generators/java.py @@ -0,0 +1,695 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Java code generator.""" + +from typing import List, Optional, Set + +from fory_compiler.generators.base import BaseGenerator, GeneratedFile +from fory_compiler.parser.ast import ( + Schema, + Message, + Enum, + Field, + FieldType, + PrimitiveType, + PrimitiveKind, + NamedType, + ListType, + MapType, +) + + +class JavaGenerator(BaseGenerator): + """Generates Java POJOs with Fory annotations.""" + + language_name = "java" + file_extension = ".java" + + def get_java_package(self) -> Optional[str]: + """Get the Java package name. + + Priority: + 1. Command-line override (options.package_override) + 2. java_package option from FDL file + 3. FDL package declaration + """ + if self.options.package_override: + return self.options.package_override + java_package = self.schema.get_option("java_package") + if java_package: + return java_package + return self.schema.package + + def get_java_outer_classname(self) -> Optional[str]: + """Get the Java outer classname if specified. + + When set, all types are generated as inner classes of this outer class + in a single file (unless java_multiple_files is true). + """ + return self.schema.get_option("java_outer_classname") + + def get_java_multiple_files(self) -> bool: + """Check if java_multiple_files option is set to true. + + When true, each top-level type gets its own file, even if + java_outer_classname is set. + """ + value = self.schema.get_option("java_multiple_files") + return value is True + + # Mapping from FDL primitive types to Java types + PRIMITIVE_MAP = { + PrimitiveKind.BOOL: "boolean", + PrimitiveKind.INT8: "byte", + PrimitiveKind.INT16: "short", + PrimitiveKind.INT32: "int", + PrimitiveKind.INT64: "long", + PrimitiveKind.FLOAT32: "float", + PrimitiveKind.FLOAT64: "double", + PrimitiveKind.STRING: "String", + PrimitiveKind.BYTES: "byte[]", + PrimitiveKind.DATE: "java.time.LocalDate", + PrimitiveKind.TIMESTAMP: "java.time.Instant", + } + + # Boxed versions for nullable primitives + BOXED_MAP = { + PrimitiveKind.BOOL: "Boolean", + PrimitiveKind.INT8: "Byte", + PrimitiveKind.INT16: "Short", + PrimitiveKind.INT32: "Integer", + PrimitiveKind.INT64: "Long", + PrimitiveKind.FLOAT32: "Float", + PrimitiveKind.FLOAT64: "Double", + } + + # Primitive array types for repeated numeric fields + PRIMITIVE_ARRAY_MAP = { + PrimitiveKind.BOOL: "boolean[]", + PrimitiveKind.INT8: "byte[]", + PrimitiveKind.INT16: "short[]", + PrimitiveKind.INT32: "int[]", + PrimitiveKind.INT64: "long[]", + PrimitiveKind.FLOAT32: "float[]", + PrimitiveKind.FLOAT64: "double[]", + } + + def generate(self) -> List[GeneratedFile]: + """Generate Java files for the schema. + + Generation mode depends on options: + - java_multiple_files = true: Separate file per type (default behavior) + - java_outer_classname set + java_multiple_files = false: Single file with outer class + - Neither set: Separate file per type + """ + files = [] + + outer_classname = self.get_java_outer_classname() + multiple_files = self.get_java_multiple_files() + + if outer_classname and not multiple_files: + # Generate all types in a single outer class file + files.append(self.generate_outer_class_file(outer_classname)) + # Generate registration helper (with outer class prefix) + files.append(self.generate_registration_file(outer_classname)) + else: + # Generate separate files for each type + # Generate enum files (top-level only, nested enums go inside message files) + for enum in self.schema.enums: + files.append(self.generate_enum_file(enum)) + + # Generate message files (includes nested types as inner classes) + for message in self.schema.messages: + files.append(self.generate_message_file(message)) + + # Generate registration helper + files.append(self.generate_registration_file()) + + return files + + def get_java_package_path(self) -> str: + """Get the Java package as a path.""" + java_package = self.get_java_package() + if java_package: + return java_package.replace(".", "/") + return "" + + def generate_enum_file(self, enum: Enum) -> GeneratedFile: + """Generate a Java enum file.""" + lines = [] + java_package = self.get_java_package() + + # License header + lines.append(self.get_license_header()) + lines.append("") + + # Package + if java_package: + lines.append(f"package {java_package};") + lines.append("") + + # Enum declaration + lines.append(f"public enum {enum.name} {{") + + # Enum values (strip prefix for scoped enums) + for i, value in enumerate(enum.values): + comma = "," if i < len(enum.values) - 1 else ";" + stripped_name = self.strip_enum_prefix(enum.name, value.name) + lines.append(f" {stripped_name}{comma}") + + lines.append("}") + lines.append("") + + # Build file path + path = self.get_java_package_path() + if path: + path = f"{path}/{enum.name}.java" + else: + path = f"{enum.name}.java" + + return GeneratedFile(path=path, content="\n".join(lines)) + + def generate_message_file(self, message: Message) -> GeneratedFile: + """Generate a Java class file for a message.""" + lines = [] + imports: Set[str] = set() + java_package = self.get_java_package() + + # Collect imports (including from nested types) + self.collect_message_imports(message, imports) + + # License header + lines.append(self.get_license_header()) + lines.append("") + + # Package + if java_package: + lines.append(f"package {java_package};") + lines.append("") + + # Imports + if imports: + for imp in sorted(imports): + lines.append(f"import {imp};") + lines.append("") + + # Class declaration + lines.append(f"public class {message.name} {{") + + # Generate nested enums as static inner classes + for nested_enum in message.nested_enums: + for line in self.generate_nested_enum(nested_enum): + lines.append(f" {line}") + + # Generate nested messages as static inner classes + for nested_msg in message.nested_messages: + for line in self.generate_nested_message(nested_msg, indent=1): + lines.append(f" {line}") + + # Fields + for field in message.fields: + field_lines = self.generate_field(field) + for line in field_lines: + lines.append(f" {line}") + + lines.append("") + + # Default constructor + lines.append(f" public {message.name}() {{") + lines.append(" }") + lines.append("") + + # Getters and setters + for field in message.fields: + getter_setter = self.generate_getter_setter(field) + for line in getter_setter: + lines.append(f" {line}") + + # equals method + for line in self.generate_equals_method(message): + lines.append(f" {line}") + + # hashCode method + for line in self.generate_hashcode_method(message): + lines.append(f" {line}") + + lines.append("}") + lines.append("") + + # Build file path + path = self.get_java_package_path() + if path: + path = f"{path}/{message.name}.java" + else: + path = f"{message.name}.java" + + return GeneratedFile(path=path, content="\n".join(lines)) + + def generate_outer_class_file(self, outer_classname: str) -> GeneratedFile: + """Generate a single Java file with all types as inner classes of an outer class. + + This is used when java_outer_classname option is set. + """ + lines = [] + imports: Set[str] = set() + java_package = self.get_java_package() + + # Collect imports from all types + for message in self.schema.messages: + self.collect_message_imports(message, imports) + for enum in self.schema.enums: + pass # Enums don't need special imports + + # License header + lines.append(self.get_license_header()) + lines.append("") + + # Package + if java_package: + lines.append(f"package {java_package};") + lines.append("") + + # Imports + if imports: + for imp in sorted(imports): + lines.append(f"import {imp};") + lines.append("") + + # Outer class declaration + lines.append(f"public final class {outer_classname} {{") + lines.append("") + lines.append(f" private {outer_classname}() {{") + lines.append(" // Prevent instantiation") + lines.append(" }") + lines.append("") + + # Generate all top-level enums as static inner classes + for enum in self.schema.enums: + for line in self.generate_nested_enum(enum): + lines.append(f" {line}") + + # Generate all top-level messages as static inner classes + for message in self.schema.messages: + for line in self.generate_nested_message(message, indent=1): + lines.append(f" {line}") + + lines.append("}") + lines.append("") + + # Build file path + path = self.get_java_package_path() + if path: + path = f"{path}/{outer_classname}.java" + else: + path = f"{outer_classname}.java" + + return GeneratedFile(path=path, content="\n".join(lines)) + + def collect_message_imports(self, message: Message, imports: Set[str]): + """Collect imports for a message and all its nested types recursively.""" + for field in message.fields: + self.collect_imports(field.field_type, imports) + if field.optional or field.ref: + imports.add("org.apache.fory.annotation.ForyField") + + # Add imports for equals/hashCode + imports.add("java.util.Objects") + if self.has_array_field_recursive(message): + imports.add("java.util.Arrays") + + # Collect imports from nested messages + for nested_msg in message.nested_messages: + self.collect_message_imports(nested_msg, imports) + + def has_array_field_recursive(self, message: Message) -> bool: + """Check if message or any nested message has array fields.""" + if self.has_array_field(message): + return True + for nested_msg in message.nested_messages: + if self.has_array_field_recursive(nested_msg): + return True + return False + + def generate_nested_enum(self, enum: Enum) -> List[str]: + """Generate a nested enum as a static inner class.""" + lines = [] + lines.append(f"public static enum {enum.name} {{") + + # Enum values (strip prefix for scoped enums) + for i, value in enumerate(enum.values): + comma = "," if i < len(enum.values) - 1 else ";" + stripped_name = self.strip_enum_prefix(enum.name, value.name) + lines.append(f" {stripped_name}{comma}") + + lines.append("}") + lines.append("") + return lines + + def generate_nested_message(self, message: Message, indent: int = 1) -> List[str]: + """Generate a nested message as a static inner class.""" + lines = [] + ind = " " * indent + + # Class declaration + lines.append(f"public static class {message.name} {{") + + # Generate nested enums + for nested_enum in message.nested_enums: + for line in self.generate_nested_enum(nested_enum): + lines.append(f" {line}") + + # Generate nested messages (recursively) + for nested_msg in message.nested_messages: + for line in self.generate_nested_message(nested_msg, indent=1): + lines.append(f" {line}") + + # Fields + for field in message.fields: + field_lines = self.generate_field(field) + for line in field_lines: + lines.append(f" {line}") + + lines.append("") + + # Default constructor + lines.append(f" public {message.name}() {{") + lines.append(" }") + lines.append("") + + # Getters and setters + for field in message.fields: + getter_setter = self.generate_getter_setter(field) + for line in getter_setter: + lines.append(f" {line}") + + # equals method + for line in self.generate_equals_method(message): + lines.append(f" {line}") + + # hashCode method + for line in self.generate_hashcode_method(message): + lines.append(f" {line}") + + lines.append("}") + lines.append("") + return lines + + def generate_field(self, field: Field) -> List[str]: + """Generate field declaration with annotations.""" + lines = [] + + # Generate @ForyField annotation if needed + annotations = [] + if field.optional: + annotations.append("nullable = true") + if field.ref: + annotations.append("trackingRef = true") + + if annotations: + lines.append(f"@ForyField({', '.join(annotations)})") + + # Field type + java_type = self.generate_type(field.field_type, field.optional) + + lines.append(f"private {java_type} {self.to_camel_case(field.name)};") + lines.append("") + + return lines + + def generate_getter_setter(self, field: Field) -> List[str]: + """Generate getter and setter for a field.""" + lines = [] + java_type = self.generate_type(field.field_type, field.optional) + field_name = self.to_camel_case(field.name) + pascal_name = self.to_pascal_case(field.name) + + # Getter + lines.append(f"public {java_type} get{pascal_name}() {{") + lines.append(f" return {field_name};") + lines.append("}") + lines.append("") + + # Setter + lines.append(f"public void set{pascal_name}({java_type} {field_name}) {{") + lines.append(f" this.{field_name} = {field_name};") + lines.append("}") + lines.append("") + + return lines + + def generate_type(self, field_type: FieldType, nullable: bool = False) -> str: + """Generate Java type string.""" + if isinstance(field_type, PrimitiveType): + if nullable and field_type.kind in self.BOXED_MAP: + return self.BOXED_MAP[field_type.kind] + return self.PRIMITIVE_MAP[field_type.kind] + + elif isinstance(field_type, NamedType): + return field_type.name + + elif isinstance(field_type, ListType): + # Use primitive arrays for numeric types + if isinstance(field_type.element_type, PrimitiveType): + if field_type.element_type.kind in self.PRIMITIVE_ARRAY_MAP: + return self.PRIMITIVE_ARRAY_MAP[field_type.element_type.kind] + element_type = self.generate_type(field_type.element_type, True) + return f"List<{element_type}>" + + elif isinstance(field_type, MapType): + key_type = self.generate_type(field_type.key_type, True) + value_type = self.generate_type(field_type.value_type, True) + return f"Map<{key_type}, {value_type}>" + + return "Object" + + def collect_imports(self, field_type: FieldType, imports: Set[str]): + """Collect required imports for a field type.""" + if isinstance(field_type, PrimitiveType): + if field_type.kind == PrimitiveKind.DATE: + imports.add("java.time.LocalDate") + elif field_type.kind == PrimitiveKind.TIMESTAMP: + imports.add("java.time.Instant") + + elif isinstance(field_type, ListType): + # Primitive arrays don't need List import + if isinstance(field_type.element_type, PrimitiveType): + if field_type.element_type.kind in self.PRIMITIVE_ARRAY_MAP: + return # No import needed for primitive arrays + imports.add("java.util.List") + self.collect_imports(field_type.element_type, imports) + + elif isinstance(field_type, MapType): + imports.add("java.util.Map") + self.collect_imports(field_type.key_type, imports) + self.collect_imports(field_type.value_type, imports) + + def has_array_field(self, message: Message) -> bool: + """Check if message has any array fields (byte[] or primitive arrays).""" + for field in message.fields: + if isinstance(field.field_type, PrimitiveType): + if field.field_type.kind == PrimitiveKind.BYTES: + return True + elif isinstance(field.field_type, ListType): + if isinstance(field.field_type.element_type, PrimitiveType): + if field.field_type.element_type.kind in self.PRIMITIVE_ARRAY_MAP: + return True + return False + + def is_primitive_array_field(self, field: Field) -> bool: + """Check if field is a primitive array type.""" + if isinstance(field.field_type, PrimitiveType): + return field.field_type.kind == PrimitiveKind.BYTES + if isinstance(field.field_type, ListType): + if isinstance(field.field_type.element_type, PrimitiveType): + return field.field_type.element_type.kind in self.PRIMITIVE_ARRAY_MAP + return False + + def generate_equals_method(self, message: Message) -> List[str]: + """Generate equals() method for a message.""" + lines = [] + lines.append("@Override") + lines.append("public boolean equals(Object o) {") + lines.append(" if (this == o) return true;") + lines.append(f" if (o == null || getClass() != o.getClass()) return false;") + lines.append(f" {message.name} that = ({message.name}) o;") + + if not message.fields: + lines.append(" return true;") + else: + comparisons = [] + for field in message.fields: + field_name = self.to_camel_case(field.name) + if self.is_primitive_array_field(field): + comparisons.append(f"Arrays.equals({field_name}, that.{field_name})") + elif isinstance(field.field_type, PrimitiveType): + kind = field.field_type.kind + if kind in (PrimitiveKind.FLOAT32,): + comparisons.append(f"Float.compare({field_name}, that.{field_name}) == 0") + elif kind in (PrimitiveKind.FLOAT64,): + comparisons.append(f"Double.compare({field_name}, that.{field_name}) == 0") + elif kind in (PrimitiveKind.BOOL, PrimitiveKind.INT8, PrimitiveKind.INT16, + PrimitiveKind.INT32, PrimitiveKind.INT64) and not field.optional: + comparisons.append(f"{field_name} == that.{field_name}") + else: + comparisons.append(f"Objects.equals({field_name}, that.{field_name})") + else: + comparisons.append(f"Objects.equals({field_name}, that.{field_name})") + + if len(comparisons) == 1: + lines.append(f" return {comparisons[0]};") + else: + lines.append(f" return {comparisons[0]}") + for i, comp in enumerate(comparisons[1:], 1): + if i == len(comparisons) - 1: + lines.append(f" && {comp};") + else: + lines.append(f" && {comp}") + + lines.append("}") + lines.append("") + return lines + + def generate_hashcode_method(self, message: Message) -> List[str]: + """Generate hashCode() method for a message.""" + lines = [] + lines.append("@Override") + lines.append("public int hashCode() {") + + if not message.fields: + lines.append(" return 0;") + else: + hash_args = [] + array_fields = [] + for field in message.fields: + field_name = self.to_camel_case(field.name) + if self.is_primitive_array_field(field): + array_fields.append(field_name) + else: + hash_args.append(field_name) + + if array_fields and hash_args: + lines.append(f" int result = Objects.hash({', '.join(hash_args)});") + for arr in array_fields: + lines.append(f" result = 31 * result + Arrays.hashCode({arr});") + lines.append(" return result;") + elif array_fields: + if len(array_fields) == 1: + lines.append(f" return Arrays.hashCode({array_fields[0]});") + else: + lines.append(f" int result = Arrays.hashCode({array_fields[0]});") + for arr in array_fields[1:]: + lines.append(f" result = 31 * result + Arrays.hashCode({arr});") + lines.append(" return result;") + else: + lines.append(f" return Objects.hash({', '.join(hash_args)});") + + lines.append("}") + lines.append("") + return lines + + def generate_registration_file(self, outer_classname: Optional[str] = None) -> GeneratedFile: + """Generate the Fory registration helper class. + + Args: + outer_classname: If set, all type references will be prefixed with this outer class. + """ + lines = [] + java_package = self.get_java_package() + + # Determine class name + if java_package: + parts = java_package.split(".") + class_name = self.to_pascal_case(parts[-1]) + "ForyRegistration" + else: + class_name = "ForyRegistration" + + # License header + lines.append(self.get_license_header()) + lines.append("") + + # Package + if java_package: + lines.append(f"package {java_package};") + lines.append("") + + # Imports + lines.append("import org.apache.fory.Fory;") + lines.append("") + + # Class + lines.append(f"public class {class_name} {{") + lines.append("") + lines.append(" public static void register(Fory fory) {") + + # When outer_classname is set, all top-level types become inner classes + type_prefix = outer_classname if outer_classname else "" + + # Register enums (top-level) + for enum in self.schema.enums: + self.generate_enum_registration(lines, enum, type_prefix) + + # Register messages (top-level and nested) + for message in self.schema.messages: + self.generate_message_registration(lines, message, type_prefix) + + lines.append(" }") + lines.append("}") + lines.append("") + + # Build file path + path = self.get_java_package_path() + if path: + path = f"{path}/{class_name}.java" + else: + path = f"{class_name}.java" + + return GeneratedFile(path=path, content="\n".join(lines)) + + def generate_enum_registration(self, lines: List[str], enum: Enum, parent_path: str): + """Generate registration code for an enum.""" + # In Java, nested class references use OuterClass.InnerClass + class_ref = f"{parent_path}.{enum.name}" if parent_path else enum.name + type_name = class_ref.replace(".", "_") if parent_path else enum.name + + if enum.type_id is not None: + lines.append(f" fory.register({class_ref}.class, {enum.type_id});") + else: + # Use FDL package for namespace (consistent across languages) + ns = self.schema.package or "default" + lines.append(f' fory.register({class_ref}.class, "{ns}", "{type_name}");') + + def generate_message_registration(self, lines: List[str], message: Message, parent_path: str): + """Generate registration code for a message and its nested types.""" + # In Java, nested class references use OuterClass.InnerClass + class_ref = f"{parent_path}.{message.name}" if parent_path else message.name + type_name = class_ref.replace(".", "_") if parent_path else message.name + + if message.type_id is not None: + lines.append(f" fory.register({class_ref}.class, {message.type_id});") + else: + # Use FDL package for namespace (consistent across languages) + ns = self.schema.package or "default" + lines.append(f' fory.register({class_ref}.class, "{ns}", "{type_name}");') + + # Register nested enums + for nested_enum in message.nested_enums: + self.generate_enum_registration(lines, nested_enum, class_ref) + + # Register nested messages + for nested_msg in message.nested_messages: + self.generate_message_registration(lines, nested_msg, class_ref) diff --git a/compiler/fory_compiler/generators/python.py b/compiler/fory_compiler/generators/python.py new file mode 100644 index 0000000000..b68a5bb1b0 --- /dev/null +++ b/compiler/fory_compiler/generators/python.py @@ -0,0 +1,333 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Python code generator.""" + +from typing import List, Set + +from fory_compiler.generators.base import BaseGenerator, GeneratedFile +from fory_compiler.parser.ast import ( + Schema, + Message, + Enum, + Field, + FieldType, + PrimitiveType, + PrimitiveKind, + NamedType, + ListType, + MapType, +) + + +class PythonGenerator(BaseGenerator): + """Generates Python dataclasses with pyfory type hints.""" + + language_name = "python" + file_extension = ".py" + + # Mapping from FDL primitive types to Python types + PRIMITIVE_MAP = { + PrimitiveKind.BOOL: "bool", + PrimitiveKind.INT8: "pyfory.Int8Type", + PrimitiveKind.INT16: "pyfory.Int16Type", + PrimitiveKind.INT32: "pyfory.Int32Type", + PrimitiveKind.INT64: "int", + PrimitiveKind.FLOAT32: "pyfory.Float32Type", + PrimitiveKind.FLOAT64: "float", + PrimitiveKind.STRING: "str", + PrimitiveKind.BYTES: "bytes", + PrimitiveKind.DATE: "datetime.date", + PrimitiveKind.TIMESTAMP: "datetime.datetime", + } + + # Numpy dtype strings for primitive arrays + NUMPY_DTYPE_MAP = { + PrimitiveKind.BOOL: "np.bool_", + PrimitiveKind.INT8: "np.int8", + PrimitiveKind.INT16: "np.int16", + PrimitiveKind.INT32: "np.int32", + PrimitiveKind.INT64: "np.int64", + PrimitiveKind.FLOAT32: "np.float32", + PrimitiveKind.FLOAT64: "np.float64", + } + + # Default values for primitive types + DEFAULT_VALUES = { + PrimitiveKind.BOOL: "False", + PrimitiveKind.INT8: "0", + PrimitiveKind.INT16: "0", + PrimitiveKind.INT32: "0", + PrimitiveKind.INT64: "0", + PrimitiveKind.FLOAT32: "0.0", + PrimitiveKind.FLOAT64: "0.0", + PrimitiveKind.STRING: '""', + PrimitiveKind.BYTES: 'b""', + PrimitiveKind.DATE: "None", + PrimitiveKind.TIMESTAMP: "None", + } + + def generate(self) -> List[GeneratedFile]: + """Generate Python files for the schema.""" + files = [] + + # Generate a single module with all types + files.append(self.generate_module()) + + return files + + def get_module_name(self) -> str: + """Get the Python module name.""" + if self.package: + return self.package.replace(".", "_") + return "generated" + + def generate_module(self) -> GeneratedFile: + """Generate a Python module with all types.""" + lines = [] + imports: Set[str] = set() + + # Collect all imports + imports.add("from dataclasses import dataclass") + imports.add("from enum import IntEnum") + imports.add("from typing import Dict, List, Optional") + imports.add("import pyfory") + + for message in self.schema.messages: + self.collect_message_imports(message, imports) + + # License header + lines.append(self.get_license_header("#")) + lines.append("") + + # Imports + for imp in sorted(imports): + lines.append(imp) + lines.append("") + lines.append("") + + # Generate enums (top-level only) + for enum in self.schema.enums: + lines.extend(self.generate_enum(enum)) + lines.append("") + lines.append("") + + # Generate messages (including nested types) + for message in self.schema.messages: + lines.extend(self.generate_message(message, indent=0)) + lines.append("") + lines.append("") + + # Generate registration function + lines.extend(self.generate_registration()) + lines.append("") + + return GeneratedFile( + path=f"{self.get_module_name()}.py", + content="\n".join(lines), + ) + + def collect_message_imports(self, message: Message, imports: Set[str]): + """Collect imports for a message and its nested types recursively.""" + for field in message.fields: + self.collect_imports(field.field_type, imports) + for nested_msg in message.nested_messages: + self.collect_message_imports(nested_msg, imports) + + def generate_enum(self, enum: Enum, indent: int = 0) -> List[str]: + """Generate a Python IntEnum.""" + lines = [] + ind = " " * indent + lines.append(f"{ind}class {enum.name}(IntEnum):") + + # Enum values (strip prefix for scoped enums) + for value in enum.values: + stripped_name = self.strip_enum_prefix(enum.name, value.name) + lines.append(f"{ind} {stripped_name} = {value.value}") + + return lines + + def generate_message(self, message: Message, indent: int = 0) -> List[str]: + """Generate a Python dataclass with nested types.""" + lines = [] + ind = " " * indent + + lines.append(f"{ind}@dataclass") + lines.append(f"{ind}class {message.name}:") + + # Generate nested enums first (they need to be defined before fields reference them) + for nested_enum in message.nested_enums: + for line in self.generate_enum(nested_enum, indent=indent + 1): + lines.append(line) + lines.append("") + + # Generate nested messages + for nested_msg in message.nested_messages: + for line in self.generate_message(nested_msg, indent=indent + 1): + lines.append(line) + lines.append("") + + # Generate fields + if not message.fields and not message.nested_enums and not message.nested_messages: + lines.append(f"{ind} pass") + return lines + + for field in message.fields: + field_lines = self.generate_field(field) + for line in field_lines: + lines.append(f"{ind} {line}") + + # If there are nested types but no fields, add pass to avoid empty class body issues + if not message.fields and (message.nested_enums or message.nested_messages): + lines.append(f"{ind} pass") + + return lines + + def generate_field(self, field: Field) -> List[str]: + """Generate a dataclass field.""" + lines = [] + + python_type = self.generate_type(field.field_type, field.optional) + field_name = self.to_snake_case(field.name) + default = self.get_default_value(field.field_type, field.optional) + + lines.append(f"{field_name}: {python_type} = {default}") + + return lines + + def generate_type(self, field_type: FieldType, nullable: bool = False) -> str: + """Generate Python type hint.""" + if isinstance(field_type, PrimitiveType): + base_type = self.PRIMITIVE_MAP[field_type.kind] + if nullable: + return f"Optional[{base_type}]" + return base_type + + elif isinstance(field_type, NamedType): + if nullable: + return f"Optional[{field_type.name}]" + return field_type.name + + elif isinstance(field_type, ListType): + # Use numpy array for numeric primitive types + if isinstance(field_type.element_type, PrimitiveType): + if field_type.element_type.kind in self.NUMPY_DTYPE_MAP: + return "np.ndarray" + element_type = self.generate_type(field_type.element_type, False) + return f"List[{element_type}]" + + elif isinstance(field_type, MapType): + key_type = self.generate_type(field_type.key_type, False) + value_type = self.generate_type(field_type.value_type, False) + return f"Dict[{key_type}, {value_type}]" + + return "object" + + def get_default_value(self, field_type: FieldType, nullable: bool = False) -> str: + """Get default value for a field.""" + if nullable: + return "None" + + if isinstance(field_type, PrimitiveType): + return self.DEFAULT_VALUES.get(field_type.kind, "None") + + elif isinstance(field_type, NamedType): + return "None" + + elif isinstance(field_type, ListType): + # Use numpy empty array for numeric types + if isinstance(field_type.element_type, PrimitiveType): + if field_type.element_type.kind in self.NUMPY_DTYPE_MAP: + dtype = self.NUMPY_DTYPE_MAP[field_type.element_type.kind] + return f"None # Use np.array([], dtype={dtype}) to initialize" + return "None" + + elif isinstance(field_type, MapType): + return "None" + + return "None" + + def collect_imports(self, field_type: FieldType, imports: Set[str]): + """Collect required imports for a field type.""" + if isinstance(field_type, PrimitiveType): + if field_type.kind in (PrimitiveKind.DATE, PrimitiveKind.TIMESTAMP): + imports.add("import datetime") + + elif isinstance(field_type, ListType): + # Add numpy import for numeric primitive arrays + if isinstance(field_type.element_type, PrimitiveType): + if field_type.element_type.kind in self.NUMPY_DTYPE_MAP: + imports.add("import numpy as np") + return + self.collect_imports(field_type.element_type, imports) + + elif isinstance(field_type, MapType): + self.collect_imports(field_type.key_type, imports) + self.collect_imports(field_type.value_type, imports) + + def generate_registration(self) -> List[str]: + """Generate the Fory registration function.""" + lines = [] + + func_name = f"register_{self.get_module_name()}_types" + lines.append(f"def {func_name}(fory: pyfory.Fory):") + + if not self.schema.enums and not self.schema.messages: + lines.append(" pass") + return lines + + # Register enums (top-level) + for enum in self.schema.enums: + self.generate_enum_registration(lines, enum, "") + + # Register messages (including nested types) + for message in self.schema.messages: + self.generate_message_registration(lines, message, "") + + return lines + + def generate_enum_registration(self, lines: List[str], enum: Enum, parent_path: str): + """Generate registration code for an enum.""" + # In Python, nested class references use Outer.Inner syntax + class_ref = f"{parent_path}.{enum.name}" if parent_path else enum.name + type_name = class_ref.replace(".", "_") if parent_path else enum.name + + if enum.type_id is not None: + lines.append(f" fory.register_type({class_ref}, type_id={enum.type_id})") + else: + ns = self.package or "default" + lines.append(f' fory.register_type({class_ref}, namespace="{ns}", typename="{type_name}")') + + def generate_message_registration(self, lines: List[str], message: Message, parent_path: str): + """Generate registration code for a message and its nested types.""" + # In Python, nested class references use Outer.Inner syntax + class_ref = f"{parent_path}.{message.name}" if parent_path else message.name + type_name = class_ref.replace(".", "_") if parent_path else message.name + + if message.type_id is not None: + lines.append(f" fory.register_type({class_ref}, type_id={message.type_id})") + else: + ns = self.package or "default" + lines.append(f' fory.register_type({class_ref}, namespace="{ns}", typename="{type_name}")') + + # Register nested enums + for nested_enum in message.nested_enums: + self.generate_enum_registration(lines, nested_enum, class_ref) + + # Register nested messages + for nested_msg in message.nested_messages: + self.generate_message_registration(lines, nested_msg, class_ref) diff --git a/compiler/fory_compiler/generators/rust.py b/compiler/fory_compiler/generators/rust.py new file mode 100644 index 0000000000..7ddaf53014 --- /dev/null +++ b/compiler/fory_compiler/generators/rust.py @@ -0,0 +1,309 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Rust code generator.""" + +from typing import List, Set + +from fory_compiler.generators.base import BaseGenerator, GeneratedFile +from fory_compiler.parser.ast import ( + Schema, + Message, + Enum, + Field, + FieldType, + PrimitiveType, + PrimitiveKind, + NamedType, + ListType, + MapType, +) + + +class RustGenerator(BaseGenerator): + """Generates Rust structs with ForyObject derive macro.""" + + language_name = "rust" + file_extension = ".rs" + + # Mapping from FDL primitive types to Rust types + PRIMITIVE_MAP = { + PrimitiveKind.BOOL: "bool", + PrimitiveKind.INT8: "i8", + PrimitiveKind.INT16: "i16", + PrimitiveKind.INT32: "i32", + PrimitiveKind.INT64: "i64", + PrimitiveKind.FLOAT32: "f32", + PrimitiveKind.FLOAT64: "f64", + PrimitiveKind.STRING: "String", + PrimitiveKind.BYTES: "Vec", + PrimitiveKind.DATE: "chrono::NaiveDate", + PrimitiveKind.TIMESTAMP: "chrono::NaiveDateTime", + } + + def generate(self) -> List[GeneratedFile]: + """Generate Rust files for the schema.""" + files = [] + + # Generate a single module file with all types + files.append(self.generate_module()) + + return files + + def get_module_name(self) -> str: + """Get the Rust module name.""" + if self.package: + return self.package.replace(".", "_") + return "generated" + + def generate_module(self) -> GeneratedFile: + """Generate a Rust module with all types.""" + lines = [] + uses: Set[str] = set() + + # Collect uses (including from nested types) + uses.add("use fory::{Fory, ForyObject}") + + for message in self.schema.messages: + self.collect_message_uses(message, uses) + + # License header + lines.append(self.get_license_header("//")) + lines.append("") + + # Uses + for use in sorted(uses): + lines.append(f"{use};") + lines.append("") + + # Generate enums (top-level) + for enum in self.schema.enums: + lines.extend(self.generate_enum(enum, "")) + lines.append("") + + # Generate messages (including nested as flat types with qualified names) + for message in self.schema.messages: + lines.extend(self.generate_message_with_nested(message, "")) + + # Generate registration function + lines.extend(self.generate_registration()) + lines.append("") + + return GeneratedFile( + path=f"{self.get_module_name()}.rs", + content="\n".join(lines), + ) + + def collect_message_uses(self, message: Message, uses: Set[str]): + """Collect uses for a message and its nested types recursively.""" + for field in message.fields: + self.collect_uses_for_field(field, uses) + for nested_msg in message.nested_messages: + self.collect_message_uses(nested_msg, uses) + + def generate_enum(self, enum: Enum, parent_name: str = "") -> List[str]: + """Generate a Rust enum.""" + lines = [] + + # For nested enums, use Parent_Child naming + type_name = f"{parent_name}_{enum.name}" if parent_name else enum.name + + # Derive macros + lines.append("#[derive(ForyObject, Debug, Clone, PartialEq, Default)]") + lines.append("#[repr(i32)]") + + # Tag for name-based registration + if enum.type_id is None and self.package: + lines.append(f'#[tag("{self.package}.{type_name}")]') + + lines.append(f"pub enum {type_name} {{") + + # Enum values (strip prefix for scoped enums) + for i, value in enumerate(enum.values): + if i == 0: + lines.append(" #[default]") + stripped_name = self.strip_enum_prefix(enum.name, value.name) + lines.append(f" {self.to_pascal_case(stripped_name)} = {value.value},") + + lines.append("}") + + return lines + + def generate_message(self, message: Message, parent_name: str = "") -> List[str]: + """Generate a Rust struct.""" + lines = [] + + # For nested messages, use Parent_Child naming + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + + # Derive macros + lines.append("#[derive(ForyObject, Debug, Clone, PartialEq, Default)]") + + # Tag for name-based registration + if message.type_id is None and self.package: + lines.append(f'#[tag("{self.package}.{type_name}")]') + + lines.append(f"pub struct {type_name} {{") + + # Fields + for field in message.fields: + field_lines = self.generate_field(field, parent_name) + for line in field_lines: + lines.append(f" {line}") + + lines.append("}") + + return lines + + def generate_message_with_nested(self, message: Message, parent_name: str = "") -> List[str]: + """Generate a Rust struct and all its nested types (flattened).""" + lines = [] + + # Current message's type name + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + + # First, generate all nested enums + for nested_enum in message.nested_enums: + lines.extend(self.generate_enum(nested_enum, type_name)) + lines.append("") + + # Then, generate all nested messages (recursively) + for nested_msg in message.nested_messages: + lines.extend(self.generate_message_with_nested(nested_msg, type_name)) + + # Finally, generate this message + lines.extend(self.generate_message(message, parent_name)) + lines.append("") + + return lines + + def generate_field(self, field: Field, parent_name: str = "") -> List[str]: + """Generate a struct field.""" + lines = [] + + # Field attributes + if field.optional: + lines.append("#[fory(nullable = true)]") + + rust_type = self.generate_type(field.field_type, field.optional, field.ref, parent_name) + field_name = self.to_snake_case(field.name) + + lines.append(f"pub {field_name}: {rust_type},") + + return lines + + def generate_type(self, field_type: FieldType, nullable: bool = False, ref: bool = False, parent_name: str = "") -> str: + """Generate Rust type string.""" + if isinstance(field_type, PrimitiveType): + base_type = self.PRIMITIVE_MAP[field_type.kind] + if nullable: + return f"Option<{base_type}>" + return base_type + + elif isinstance(field_type, NamedType): + # Convert qualified names (Parent.Child) to Rust-style (Parent_Child) + type_name = field_type.name.replace(".", "_") + # If it's a simple name and we have a parent context, it might be a nested type + if "." not in field_type.name and parent_name: + type_name = f"{parent_name}_{type_name}" + if ref: + return f"Rc<{type_name}>" + if nullable: + return f"Option<{type_name}>" + return type_name + + elif isinstance(field_type, ListType): + element_type = self.generate_type(field_type.element_type, False, False, parent_name) + return f"Vec<{element_type}>" + + elif isinstance(field_type, MapType): + key_type = self.generate_type(field_type.key_type, False, False, parent_name) + value_type = self.generate_type(field_type.value_type, False, False, parent_name) + return f"HashMap<{key_type}, {value_type}>" + + return "()" + + def collect_uses(self, field_type: FieldType, uses: Set[str]): + """Collect required use statements for a field type.""" + if isinstance(field_type, PrimitiveType): + if field_type.kind in (PrimitiveKind.DATE, PrimitiveKind.TIMESTAMP): + uses.add("use chrono") + + elif isinstance(field_type, NamedType): + pass # No additional uses needed + + elif isinstance(field_type, ListType): + self.collect_uses(field_type.element_type, uses) + + elif isinstance(field_type, MapType): + uses.add("use std::collections::HashMap") + self.collect_uses(field_type.key_type, uses) + self.collect_uses(field_type.value_type, uses) + + def collect_uses_for_field(self, field: Field, uses: Set[str]): + """Collect uses for a field, including ref tracking.""" + if field.ref: + uses.add("use std::rc::Rc") + self.collect_uses(field.field_type, uses) + + def generate_registration(self) -> List[str]: + """Generate the Fory registration function.""" + lines = [] + + lines.append("pub fn register_types(fory: &mut Fory) -> Result<(), fory::Error> {") + + # Register enums (top-level) + for enum in self.schema.enums: + self.generate_enum_registration(lines, enum, "") + + # Register messages (including nested types) + for message in self.schema.messages: + self.generate_message_registration(lines, message, "") + + lines.append(" Ok(())") + lines.append("}") + + return lines + + def generate_enum_registration(self, lines: List[str], enum: Enum, parent_name: str): + """Generate registration code for an enum.""" + type_name = f"{parent_name}_{enum.name}" if parent_name else enum.name + + if enum.type_id is not None: + lines.append(f" fory.register::<{type_name}>({enum.type_id})?;") + else: + ns = self.package or "default" + lines.append(f' fory.register_by_namespace::<{type_name}>("{ns}", "{type_name}")?;') + + def generate_message_registration(self, lines: List[str], message: Message, parent_name: str): + """Generate registration code for a message and its nested types.""" + type_name = f"{parent_name}_{message.name}" if parent_name else message.name + + # Register nested enums first + for nested_enum in message.nested_enums: + self.generate_enum_registration(lines, nested_enum, type_name) + + # Register nested messages recursively + for nested_msg in message.nested_messages: + self.generate_message_registration(lines, nested_msg, type_name) + + # Register this message + if message.type_id is not None: + lines.append(f" fory.register::<{type_name}>({message.type_id})?;") + else: + ns = self.package or "default" + lines.append(f' fory.register_by_namespace::<{type_name}>("{ns}", "{type_name}")?;') diff --git a/compiler/fory_compiler/parser/__init__.py b/compiler/fory_compiler/parser/__init__.py new file mode 100644 index 0000000000..ca8463ebcc --- /dev/null +++ b/compiler/fory_compiler/parser/__init__.py @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""FDL parser components.""" + +from fory_compiler.parser.lexer import Lexer, Token, TokenType +from fory_compiler.parser.parser import Parser +from fory_compiler.parser.ast import ( + Schema, + Message, + Enum, + Field, + EnumValue, + Import, + FieldType, + PrimitiveType, + NamedType, + ListType, + MapType, +) + +__all__ = [ + "Lexer", + "Token", + "TokenType", + "Parser", + "Schema", + "Message", + "Enum", + "Field", + "EnumValue", + "Import", + "FieldType", + "PrimitiveType", + "NamedType", + "ListType", + "MapType", +] diff --git a/compiler/fory_compiler/parser/ast.py b/compiler/fory_compiler/parser/ast.py new file mode 100644 index 0000000000..57fa2b6add --- /dev/null +++ b/compiler/fory_compiler/parser/ast.py @@ -0,0 +1,390 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""AST node definitions for FDL.""" + +from dataclasses import dataclass, field +from typing import List, Optional, Union +from enum import Enum as PyEnum + + +class PrimitiveKind(PyEnum): + """Primitive type kinds.""" + + BOOL = "bool" + INT8 = "int8" + INT16 = "int16" + INT32 = "int32" + INT64 = "int64" + FLOAT32 = "float32" + FLOAT64 = "float64" + STRING = "string" + BYTES = "bytes" + DATE = "date" + TIMESTAMP = "timestamp" + + +# Type aliases for primitive type names +PRIMITIVE_TYPES = { + "bool": PrimitiveKind.BOOL, + "int8": PrimitiveKind.INT8, + "int16": PrimitiveKind.INT16, + "int32": PrimitiveKind.INT32, + "int64": PrimitiveKind.INT64, + "float32": PrimitiveKind.FLOAT32, + "float64": PrimitiveKind.FLOAT64, + "string": PrimitiveKind.STRING, + "bytes": PrimitiveKind.BYTES, + "date": PrimitiveKind.DATE, + "timestamp": PrimitiveKind.TIMESTAMP, +} + + +@dataclass +class PrimitiveType: + """A primitive type like int32, string, etc.""" + + kind: PrimitiveKind + + def __repr__(self) -> str: + return f"PrimitiveType({self.kind.value})" + + +@dataclass +class NamedType: + """A reference to a user-defined type (message or enum).""" + + name: str + + def __repr__(self) -> str: + return f"NamedType({self.name})" + + +@dataclass +class ListType: + """A list/repeated type.""" + + element_type: "FieldType" + + def __repr__(self) -> str: + return f"ListType({self.element_type})" + + +@dataclass +class MapType: + """A map type with key and value types.""" + + key_type: "FieldType" + value_type: "FieldType" + + def __repr__(self) -> str: + return f"MapType({self.key_type}, {self.value_type})" + + +# Union of all field types +FieldType = Union[PrimitiveType, NamedType, ListType, MapType] + + +@dataclass +class Field: + """A field in a message.""" + + name: str + field_type: FieldType + number: int + optional: bool = False + ref: bool = False + options: dict = field(default_factory=dict) + line: int = 0 + column: int = 0 + + def __repr__(self) -> str: + modifiers = [] + if self.optional: + modifiers.append("optional") + if self.ref: + modifiers.append("ref") + mod_str = " ".join(modifiers) + " " if modifiers else "" + opts_str = f" [{self.options}]" if self.options else "" + return f"Field({mod_str}{self.field_type} {self.name} = {self.number}{opts_str})" + + +@dataclass +class Import: + """An import statement.""" + + path: str + line: int = 0 + column: int = 0 + + def __repr__(self) -> str: + return f'Import("{self.path}")' + + +@dataclass +class EnumValue: + """A value in an enum.""" + + name: str + value: int + line: int = 0 + column: int = 0 + + def __repr__(self) -> str: + return f"EnumValue({self.name} = {self.value})" + + +@dataclass +class Message: + """A message definition.""" + + name: str + type_id: Optional[int] + fields: List[Field] = field(default_factory=list) + nested_messages: List["Message"] = field(default_factory=list) + nested_enums: List["Enum"] = field(default_factory=list) + options: dict = field(default_factory=dict) + line: int = 0 + column: int = 0 + + def __repr__(self) -> str: + id_str = f" [id={self.type_id}]" if self.type_id is not None else "" + nested_str = "" + if self.nested_messages or self.nested_enums: + nested_str = f", nested={len(self.nested_messages)}msg+{len(self.nested_enums)}enum" + opts_str = f", options={len(self.options)}" if self.options else "" + return f"Message({self.name}{id_str}, fields={self.fields}{nested_str}{opts_str})" + + def get_nested_type(self, name: str) -> Optional[Union["Message", "Enum"]]: + """Look up a nested type by name.""" + for msg in self.nested_messages: + if msg.name == name: + return msg + for enum in self.nested_enums: + if enum.name == name: + return enum + return None + + +@dataclass +class Enum: + """An enum definition.""" + + name: str + type_id: Optional[int] + values: List[EnumValue] = field(default_factory=list) + options: dict = field(default_factory=dict) + line: int = 0 + column: int = 0 + + def __repr__(self) -> str: + id_str = f" [id={self.type_id}]" if self.type_id is not None else "" + opts_str = f", options={len(self.options)}" if self.options else "" + return f"Enum({self.name}{id_str}, values={self.values}{opts_str})" + + +@dataclass +class Schema: + """The root AST node representing a complete FDL file.""" + + package: Optional[str] + imports: List[Import] = field(default_factory=list) + enums: List[Enum] = field(default_factory=list) + messages: List[Message] = field(default_factory=list) + options: dict = field(default_factory=dict) # File-level options (java_package, go_package, etc.) + + def __repr__(self) -> str: + opts = f", options={len(self.options)}" if self.options else "" + return f"Schema(package={self.package}, imports={len(self.imports)}, enums={len(self.enums)}, messages={len(self.messages)}{opts})" + + def get_option(self, name: str, default: Optional[str] = None) -> Optional[str]: + """Get a file-level option value.""" + return self.options.get(name, default) + + def get_type(self, name: str) -> Optional[Union[Message, Enum]]: + """Look up a type by name, supporting qualified names like Parent.Child.""" + # Handle qualified names (e.g., SearchResponse.Result) + if "." in name: + parts = name.split(".") + # Find the top-level type + current = self._get_top_level_type(parts[0]) + if current is None: + return None + # Navigate through nested types + for part in parts[1:]: + if isinstance(current, Message): + current = current.get_nested_type(part) + if current is None: + return None + else: + # Enums don't have nested types + return None + return current + else: + return self._get_top_level_type(name) + + def _get_top_level_type(self, name: str) -> Optional[Union[Message, Enum]]: + """Look up a top-level type by simple name.""" + for enum in self.enums: + if enum.name == name: + return enum + for message in self.messages: + if message.name == name: + return message + return None + + def get_all_types(self) -> List[Union[Message, Enum]]: + """Get all types including nested types (flattened).""" + result: List[Union[Message, Enum]] = [] + result.extend(self.enums) + for message in self.messages: + self._collect_types(message, result) + return result + + def _collect_types(self, message: Message, result: List[Union[Message, Enum]]): + """Recursively collect all types from a message and its nested types.""" + result.append(message) + for nested_enum in message.nested_enums: + result.append(nested_enum) + for nested_msg in message.nested_messages: + self._collect_types(nested_msg, result) + + def validate(self) -> List[str]: + """Validate the schema and return a list of errors.""" + errors = [] + + # Check for duplicate type names at top level + names = set() + for enum in self.enums: + if enum.name in names: + errors.append(f"Duplicate type name: {enum.name}") + names.add(enum.name) + for message in self.messages: + if message.name in names: + errors.append(f"Duplicate type name: {message.name}") + names.add(message.name) + + # Check for duplicate type IDs (including nested types) + type_ids = {} + all_types = self.get_all_types() + for t in all_types: + if t.type_id is not None: + if t.type_id in type_ids: + errors.append( + f"Duplicate type ID @{t.type_id}: " + f"{t.name} and {type_ids[t.type_id]}" + ) + type_ids[t.type_id] = t.name + + # Validate messages recursively (including nested) + def validate_message(message: Message, parent_path: str = ""): + full_name = f"{parent_path}.{message.name}" if parent_path else message.name + + # Check for duplicate nested type names + nested_names = set() + for nested_enum in message.nested_enums: + if nested_enum.name in nested_names: + errors.append(f"Duplicate nested type name in {full_name}: {nested_enum.name}") + nested_names.add(nested_enum.name) + for nested_msg in message.nested_messages: + if nested_msg.name in nested_names: + errors.append(f"Duplicate nested type name in {full_name}: {nested_msg.name}") + nested_names.add(nested_msg.name) + + # Check for duplicate field numbers and names + field_numbers = {} + field_names = set() + for f in message.fields: + if f.number in field_numbers: + errors.append( + f"Duplicate field number {f.number} in {full_name}: " + f"{f.name} and {field_numbers[f.number]}" + ) + field_numbers[f.number] = f.name + if f.name in field_names: + errors.append( + f"Duplicate field name in {full_name}: {f.name}" + ) + field_names.add(f.name) + + # Validate nested enums + for nested_enum in message.nested_enums: + validate_enum(nested_enum, full_name) + + # Recursively validate nested messages + for nested_msg in message.nested_messages: + validate_message(nested_msg, full_name) + + def validate_enum(enum: Enum, parent_path: str = ""): + full_name = f"{parent_path}.{enum.name}" if parent_path else enum.name + value_numbers = {} + value_names = set() + for v in enum.values: + if v.value in value_numbers: + errors.append( + f"Duplicate enum value {v.value} in {full_name}: " + f"{v.name} and {value_numbers[v.value]}" + ) + value_numbers[v.value] = v.name + if v.name in value_names: + errors.append( + f"Duplicate enum value name in {full_name}: {v.name}" + ) + value_names.add(v.name) + + # Validate all top-level enums + for enum in self.enums: + validate_enum(enum) + + # Validate all top-level messages (and their nested types) + for message in self.messages: + validate_message(message) + + # Check that referenced types exist (supports qualified names and nested type lookup) + def check_type_ref(field_type: FieldType, context: str, enclosing_message: Optional[Message] = None): + if isinstance(field_type, NamedType): + type_name = field_type.name + found = False + + # First, try to find as a nested type in the enclosing message + if enclosing_message is not None and "." not in type_name: + if enclosing_message.get_nested_type(type_name) is not None: + found = True + + # Then, try to find as a top-level or qualified type + if not found and self.get_type(type_name) is not None: + found = True + + if not found: + errors.append(f"Unknown type '{type_name}' in {context}") + elif isinstance(field_type, ListType): + check_type_ref(field_type.element_type, context, enclosing_message) + elif isinstance(field_type, MapType): + check_type_ref(field_type.key_type, context, enclosing_message) + check_type_ref(field_type.value_type, context, enclosing_message) + + def check_message_refs(message: Message, parent_path: str = ""): + full_name = f"{parent_path}.{message.name}" if parent_path else message.name + for f in message.fields: + check_type_ref(f.field_type, f"{full_name}.{f.name}", message) + for nested_msg in message.nested_messages: + check_message_refs(nested_msg, full_name) + + for message in self.messages: + check_message_refs(message) + + return errors diff --git a/compiler/fory_compiler/parser/lexer.py b/compiler/fory_compiler/parser/lexer.py new file mode 100644 index 0000000000..44b47ea6f1 --- /dev/null +++ b/compiler/fory_compiler/parser/lexer.py @@ -0,0 +1,297 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Hand-written lexer for FDL.""" + +from dataclasses import dataclass +from enum import Enum, auto +from typing import List + + +class TokenType(Enum): + """Token types for FDL.""" + + # Keywords + PACKAGE = auto() + IMPORT = auto() + PUBLIC = auto() + WEAK = auto() + MESSAGE = auto() + ENUM = auto() + OPTIONAL = auto() + REF = auto() + REPEATED = auto() + MAP = auto() + OPTION = auto() + TRUE = auto() + FALSE = auto() + RESERVED = auto() + TO = auto() + MAX = auto() + + # Literals + IDENT = auto() + INT = auto() + STRING = auto() # "quoted string" + + # Punctuation + LBRACE = auto() # { + RBRACE = auto() # } + LBRACKET = auto() # [ + RBRACKET = auto() # ] + LPAREN = auto() # ( + RPAREN = auto() # ) + LANGLE = auto() # < + RANGLE = auto() # > + SEMI = auto() # ; + COMMA = auto() # , + EQUALS = auto() # = + DOT = auto() # . + + EOF = auto() + + +@dataclass +class Token: + """A token produced by the lexer.""" + + type: TokenType + value: str + line: int + column: int + + def __repr__(self) -> str: + return f"Token({self.type.name}, {self.value!r}, {self.line}:{self.column})" + + +class LexerError(Exception): + """Error during lexing.""" + + def __init__(self, message: str, line: int, column: int): + self.message = message + self.line = line + self.column = column + super().__init__(f"Line {line}, Column {column}: {message}") + + +class Lexer: + """Hand-written tokenizer for FDL.""" + + KEYWORDS = { + "package": TokenType.PACKAGE, + "import": TokenType.IMPORT, + "public": TokenType.PUBLIC, + "weak": TokenType.WEAK, + "message": TokenType.MESSAGE, + "enum": TokenType.ENUM, + "optional": TokenType.OPTIONAL, + "ref": TokenType.REF, + "repeated": TokenType.REPEATED, + "map": TokenType.MAP, + "option": TokenType.OPTION, + "true": TokenType.TRUE, + "false": TokenType.FALSE, + "reserved": TokenType.RESERVED, + "to": TokenType.TO, + "max": TokenType.MAX, + } + + PUNCTUATION = { + "{": TokenType.LBRACE, + "}": TokenType.RBRACE, + "[": TokenType.LBRACKET, + "]": TokenType.RBRACKET, + "(": TokenType.LPAREN, + ")": TokenType.RPAREN, + "<": TokenType.LANGLE, + ">": TokenType.RANGLE, + ";": TokenType.SEMI, + ",": TokenType.COMMA, + "=": TokenType.EQUALS, + ".": TokenType.DOT, + } + + def __init__(self, source: str, filename: str = ""): + self.source = source + self.filename = filename + self.pos = 0 + self.line = 1 + self.column = 1 + self.line_start = 0 + + def at_end(self) -> bool: + """Check if we've reached the end of input.""" + return self.pos >= len(self.source) + + def peek(self, offset: int = 0) -> str: + """Peek at a character without consuming it.""" + pos = self.pos + offset + if pos >= len(self.source): + return "\0" + return self.source[pos] + + def advance(self) -> str: + """Consume and return the current character.""" + if self.at_end(): + return "\0" + ch = self.source[self.pos] + self.pos += 1 + if ch == "\n": + self.line += 1 + self.line_start = self.pos + self.column = 1 + else: + self.column += 1 + return ch + + def skip_whitespace(self): + """Skip whitespace characters.""" + while not self.at_end() and self.peek() in " \t\r\n": + self.advance() + + def skip_line_comment(self): + """Skip a // comment.""" + while not self.at_end() and self.peek() != "\n": + self.advance() + + def skip_block_comment(self): + """Skip a /* */ comment.""" + start_line = self.line + start_col = self.column + self.advance() # consume * + while not self.at_end(): + if self.peek() == "*" and self.peek(1) == "/": + self.advance() # consume * + self.advance() # consume / + return + self.advance() + raise LexerError("Unterminated block comment", start_line, start_col) + + def skip_whitespace_and_comments(self): + """Skip whitespace and comments.""" + while not self.at_end(): + ch = self.peek() + if ch in " \t\r\n": + self.skip_whitespace() + elif ch == "/" and self.peek(1) == "/": + self.advance() # consume first / + self.advance() # consume second / + self.skip_line_comment() + elif ch == "/" and self.peek(1) == "*": + self.advance() # consume / + self.skip_block_comment() + else: + break + + def read_identifier(self) -> str: + """Read an identifier.""" + start = self.pos + while not self.at_end(): + ch = self.peek() + if ch.isalnum() or ch == "_": + self.advance() + else: + break + return self.source[start : self.pos] + + def read_number(self) -> str: + """Read an integer literal.""" + start = self.pos + # Handle negative numbers + if self.peek() == "-": + self.advance() + while not self.at_end() and self.peek().isdigit(): + self.advance() + return self.source[start : self.pos] + + def read_string(self) -> str: + """Read a quoted string literal.""" + quote_char = self.advance() # consume opening quote + start_line = self.line + start_col = self.column - 1 + result = [] + + while not self.at_end(): + ch = self.peek() + if ch == quote_char: + self.advance() # consume closing quote + return "".join(result) + elif ch == "\\": + self.advance() # consume backslash + if self.at_end(): + raise LexerError("Unterminated string", start_line, start_col) + escape_ch = self.advance() + if escape_ch == "n": + result.append("\n") + elif escape_ch == "t": + result.append("\t") + elif escape_ch == "\\": + result.append("\\") + elif escape_ch == quote_char: + result.append(quote_char) + else: + result.append(escape_ch) + elif ch == "\n": + raise LexerError("Unterminated string (newline in string)", start_line, start_col) + else: + result.append(self.advance()) + + raise LexerError("Unterminated string", start_line, start_col) + + def next_token(self) -> Token: + """Read the next token.""" + self.skip_whitespace_and_comments() + + if self.at_end(): + return Token(TokenType.EOF, "", self.line, self.column) + + line = self.line + column = self.column + ch = self.peek() + + # String literal + if ch == '"' or ch == "'": + value = self.read_string() + return Token(TokenType.STRING, value, line, column) + + # Punctuation + if ch in self.PUNCTUATION: + self.advance() + return Token(self.PUNCTUATION[ch], ch, line, column) + + # Identifier or keyword + if ch.isalpha() or ch == "_": + ident = self.read_identifier() + token_type = self.KEYWORDS.get(ident, TokenType.IDENT) + return Token(token_type, ident, line, column) + + # Integer literal (including negative) + if ch.isdigit() or (ch == "-" and self.peek(1).isdigit()): + value = self.read_number() + return Token(TokenType.INT, value, line, column) + + raise LexerError(f"Unexpected character: '{ch}'", line, column) + + def tokenize(self) -> List[Token]: + """Tokenize the entire input and return a list of tokens.""" + tokens = [] + while True: + token = self.next_token() + tokens.append(token) + if token.type == TokenType.EOF: + break + return tokens diff --git a/compiler/fory_compiler/parser/parser.py b/compiler/fory_compiler/parser/parser.py new file mode 100644 index 0000000000..a761516ec9 --- /dev/null +++ b/compiler/fory_compiler/parser/parser.py @@ -0,0 +1,887 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Recursive descent parser for FDL.""" + +import warnings +from typing import List, Optional, Set + +from fory_compiler.parser.lexer import Lexer, Token, TokenType + +# Known file-level options (standard protobuf options) +KNOWN_FILE_OPTIONS: Set[str] = { + "java_package", + "java_outer_classname", + "java_multiple_files", + "go_package", + "deprecated", +} + +# Known Fory file-level options (extension options) +KNOWN_FORY_FILE_OPTIONS: Set[str] = { + "use_record_for_java_message", + "polymorphism", +} + +# Known field-level options (standard protobuf options) +KNOWN_FIELD_OPTIONS: Set[str] = { + "deprecated", + "json_name", + "packed", + "lazy", + "unverified_lazy", + "weak", + "debug_redact", + "retention", + "targets", + "edition_defaults", + "features", +} + +# Known Fory field-level options (extension options) +KNOWN_FORY_FIELD_OPTIONS: Set[str] = { + "ref", + "nullable", + "deprecated", +} + +# Known type-level options for inline syntax: [id=100, deprecated=true] +KNOWN_TYPE_OPTIONS: Set[str] = { + "id", + "deprecated", +} + +# Known Fory message-level options (option statements inside message body) +KNOWN_FORY_MESSAGE_OPTIONS: Set[str] = { + "id", + "evolving", + "use_record_for_java", + "deprecated", + "namespace", +} + +# Known Fory enum-level options (option statements inside enum body) +KNOWN_FORY_ENUM_OPTIONS: Set[str] = { + "id", + "deprecated", +} +from fory_compiler.parser.ast import ( + Schema, + Message, + Enum, + Field, + EnumValue, + Import, + FieldType, + PrimitiveType, + NamedType, + ListType, + MapType, + PRIMITIVE_TYPES, +) + + +class ParseError(Exception): + """Error during parsing.""" + + def __init__(self, message: str, line: int, column: int): + self.message = message + self.line = line + self.column = column + super().__init__(f"Line {line}, Column {column}: {message}") + + +class Parser: + """Recursive descent parser for FDL.""" + + def __init__(self, tokens: List[Token]): + self.tokens = tokens + self.pos = 0 + + @classmethod + def from_source(cls, source: str, filename: str = "") -> "Parser": + """Create a parser from source code.""" + lexer = Lexer(source, filename) + tokens = lexer.tokenize() + return cls(tokens) + + def at_end(self) -> bool: + """Check if we've reached the end of tokens.""" + return self.current().type == TokenType.EOF + + def current(self) -> Token: + """Get the current token.""" + if self.pos >= len(self.tokens): + return self.tokens[-1] # Return EOF + return self.tokens[self.pos] + + def previous(self) -> Token: + """Get the previous token.""" + return self.tokens[self.pos - 1] + + def peek(self, offset: int = 0) -> Token: + """Peek at a token without consuming it.""" + pos = self.pos + offset + if pos >= len(self.tokens): + return self.tokens[-1] # Return EOF + return self.tokens[pos] + + def check(self, token_type: TokenType) -> bool: + """Check if the current token has the given type.""" + return self.current().type == token_type + + def match(self, *types: TokenType) -> bool: + """If current token matches any of the types, consume and return True.""" + for token_type in types: + if self.check(token_type): + self.advance() + return True + return False + + def advance(self) -> Token: + """Consume and return the current token.""" + token = self.current() + if not self.at_end(): + self.pos += 1 + return token + + def consume(self, token_type: TokenType, message: str = None) -> Token: + """Consume a token of the expected type, or raise an error.""" + if self.check(token_type): + return self.advance() + token = self.current() + if message is None: + message = f"Expected {token_type.name}, got {token.type.name}" + raise ParseError(message, token.line, token.column) + + def error(self, message: str) -> ParseError: + """Create a parse error at the current position.""" + token = self.current() + return ParseError(message, token.line, token.column) + + def parse(self) -> Schema: + """Parse the entire input and return a Schema.""" + package = None + imports = [] + enums = [] + messages = [] + options = {} + + while not self.at_end(): + if self.check(TokenType.PACKAGE): + if package is not None: + raise self.error("Duplicate package declaration") + package = self.parse_package() + elif self.check(TokenType.IMPORT): + imports.append(self.parse_import()) + elif self.check(TokenType.OPTION): + # File-level option + name, value = self.parse_file_option() + options[name] = value + elif self.check(TokenType.ENUM): + enums.append(self.parse_enum()) + elif self.check(TokenType.MESSAGE): + messages.append(self.parse_message()) + else: + raise self.error(f"Unexpected token: {self.current().value}") + + return Schema(package, imports, enums, messages, options) + + def parse_package(self) -> str: + """Parse a package declaration: package foo.bar;""" + self.consume(TokenType.PACKAGE) + + # Package name can be dotted: foo.bar.baz + parts = [self.consume(TokenType.IDENT).value] + while self.check(TokenType.DOT): + self.advance() # consume the dot + parts.append(self.consume(TokenType.IDENT, "Expected identifier after '.'").value) + + self.consume(TokenType.SEMI, "Expected ';' after package name") + return ".".join(parts) + + def parse_option_value(self): + """Parse an option value (string, bool, int, or identifier).""" + if self.check(TokenType.STRING): + return self.advance().value + elif self.check(TokenType.TRUE): + self.advance() + return True + elif self.check(TokenType.FALSE): + self.advance() + return False + elif self.check(TokenType.INT): + return int(self.advance().value) + elif self.check(TokenType.IDENT): + return self.advance().value + else: + raise self.error(f"Expected option value, got {self.current().type.name}") + + def parse_file_option(self) -> tuple: + """Parse a file-level option. + + Supports two syntaxes: + 1. Standard: option java_package = "com.example"; + 2. Extension: option (fory).use_record_for_java_message = true; + + Returns a tuple of (option_name, option_value). + For extension options, the name is prefixed with the extension name: "fory.use_record_for_java_message" + """ + option_token = self.consume(TokenType.OPTION) + + # Check for extension syntax: (extension_name).option_name + extension_name = None + if self.check(TokenType.LPAREN): + self.advance() # consume ( + extension_name = self.consume(TokenType.IDENT, "Expected extension name").value + self.consume(TokenType.RPAREN, "Expected ')' after extension name") + self.consume(TokenType.DOT, "Expected '.' after extension name") + + name_token = self.consume(TokenType.IDENT, "Expected option name") + option_name = name_token.value + + # Build full option name for extension options + if extension_name: + full_option_name = f"{extension_name}.{option_name}" + else: + full_option_name = option_name + + self.consume(TokenType.EQUALS, "Expected '=' after option name") + + option_value = self.parse_option_value() + + self.consume(TokenType.SEMI, "Expected ';' after option statement") + + # Warn about unknown options + if extension_name: + if extension_name == "fory": + if option_name not in KNOWN_FORY_FILE_OPTIONS: + warnings.warn( + f"Line {name_token.line}: ignoring unknown fory option '{option_name}'", + stacklevel=2 + ) + else: + warnings.warn( + f"Line {name_token.line}: ignoring unknown extension '{extension_name}'", + stacklevel=2 + ) + else: + if option_name not in KNOWN_FILE_OPTIONS: + warnings.warn( + f"Line {name_token.line}: ignoring unknown option '{option_name}'", + stacklevel=2 + ) + + return (full_option_name, option_value) + + def parse_import(self) -> Import: + """Parse an import statement: import "path/to/file.fdl";""" + start = self.current() + self.consume(TokenType.IMPORT) + + # Check for forbidden import modifiers (protobuf syntax) + if self.check(TokenType.PUBLIC): + raise ParseError( + "'import public' is not supported in FDL.\n" + " Reason: FDL uses a simpler import model where all imported types\n" + " are available to the importing file. Re-exporting imports is not\n" + " supported. Simply use 'import \"path/to/file.fdl\";' instead.\n" + " If you need types from multiple files, import each file directly.", + start.line, + start.column, + ) + + if self.check(TokenType.WEAK): + raise ParseError( + "'import weak' is not supported in FDL.\n" + " Reason: Weak imports are a protobuf-specific feature for optional\n" + " dependencies. FDL requires all imports to be present at compile time.\n" + " Use 'import \"path/to/file.fdl\";' instead.", + start.line, + start.column, + ) + + path_token = self.consume(TokenType.STRING, "Expected import path string") + + self.consume(TokenType.SEMI, "Expected ';' after import statement") + + return Import( + path=path_token.value, + line=start.line, + column=start.column, + ) + + def parse_enum(self) -> Enum: + """Parse an enum: enum Color [id=101] { ... } + + Supports: + - Inline type options: enum Color [id=101] { ... } + - Body option statements: option (fory).id = 100; + """ + start = self.current() + self.consume(TokenType.ENUM) + name = self.consume(TokenType.IDENT, "Expected enum name").value + + # Optional inline type options: [id=101, deprecated=true] + type_id = None + inline_options = {} + if self.check(TokenType.LBRACKET): + inline_options = self.parse_type_options(name) + if "id" in inline_options: + type_id = inline_options["id"] + + self.consume(TokenType.LBRACE, "Expected '{' after enum name") + + values = [] + body_options = {} + while not self.check(TokenType.RBRACE): + # Check for option statements + if self.check(TokenType.OPTION): + opt_name, opt_value = self.parse_enum_option(name) + body_options[opt_name] = opt_value + # Handle fory.id option to set type_id + if opt_name == "fory.id" and type_id is None: + if isinstance(opt_value, int) and opt_value > 0: + type_id = opt_value + # Check for reserved statements + elif self.check(TokenType.RESERVED): + self.parse_reserved() + else: + values.append(self.parse_enum_value()) + + self.consume(TokenType.RBRACE, "Expected '}' after enum values") + + # Merge inline options and body options (body options take precedence) + all_options = {**inline_options, **body_options} + + return Enum( + name=name, + type_id=type_id, + values=values, + options=all_options, + line=start.line, + column=start.column, + ) + + def parse_enum_option(self, enum_name: str) -> tuple: + """Parse and validate an enum option statement. + + Supports two syntaxes: + 1. Standard: option deprecated = true; + 2. Extension: option (fory).id = 100; + + Forbidden options: + - allow_alias = true: Enum aliases are not supported + + Returns a tuple of (option_name, option_value). + """ + option_token = self.consume(TokenType.OPTION) + + # Check for extension syntax: (extension_name).option_name + extension_name = None + if self.check(TokenType.LPAREN): + self.advance() # consume ( + extension_name = self.consume(TokenType.IDENT, "Expected extension name").value + self.consume(TokenType.RPAREN, "Expected ')' after extension name") + self.consume(TokenType.DOT, "Expected '.' after extension name") + + name_token = self.consume(TokenType.IDENT, "Expected option name") + option_name = name_token.value + + # Build full option name for extension options + if extension_name: + full_option_name = f"{extension_name}.{option_name}" + else: + full_option_name = option_name + + self.consume(TokenType.EQUALS, "Expected '=' after option name") + + option_value = self.parse_option_value() + + self.consume(TokenType.SEMI, "Expected ';' after option statement") + + # Validate forbidden options + if option_name == "allow_alias" and option_value is True: + raise ParseError( + f"'option allow_alias = true' is forbidden in enum '{enum_name}'. " + "Enum aliases (multiple names for the same value) are not supported.", + option_token.line, + option_token.column, + ) + + # Warn about unknown options + if extension_name: + if extension_name == "fory": + if option_name not in KNOWN_FORY_ENUM_OPTIONS: + warnings.warn( + f"Line {name_token.line}: ignoring unknown fory enum option '{option_name}' in '{enum_name}'", + stacklevel=2 + ) + else: + warnings.warn( + f"Line {name_token.line}: ignoring unknown extension '{extension_name}'", + stacklevel=2 + ) + else: + # Standard options - currently we only recognize deprecated and allow_alias + if option_name not in {"deprecated", "allow_alias"}: + warnings.warn( + f"Line {name_token.line}: ignoring unknown enum option '{option_name}' in '{enum_name}'", + stacklevel=2 + ) + + return (full_option_name, option_value) + + def parse_message_option(self, message_name: str) -> tuple: + """Parse a message-level option statement. + + Supports two syntaxes: + 1. Standard: option deprecated = true; + 2. Extension: option (fory).id = 100; + + Returns a tuple of (option_name, option_value). + For extension options, the name is prefixed with the extension name. + """ + option_token = self.consume(TokenType.OPTION) + + # Check for extension syntax: (extension_name).option_name + extension_name = None + if self.check(TokenType.LPAREN): + self.advance() # consume ( + extension_name = self.consume(TokenType.IDENT, "Expected extension name").value + self.consume(TokenType.RPAREN, "Expected ')' after extension name") + self.consume(TokenType.DOT, "Expected '.' after extension name") + + name_token = self.consume(TokenType.IDENT, "Expected option name") + option_name = name_token.value + + # Build full option name for extension options + if extension_name: + full_option_name = f"{extension_name}.{option_name}" + else: + full_option_name = option_name + + self.consume(TokenType.EQUALS, "Expected '=' after option name") + + option_value = self.parse_option_value() + + self.consume(TokenType.SEMI, "Expected ';' after option statement") + + # Warn about unknown options + if extension_name: + if extension_name == "fory": + if option_name not in KNOWN_FORY_MESSAGE_OPTIONS: + warnings.warn( + f"Line {name_token.line}: ignoring unknown fory message option '{option_name}' in '{message_name}'", + stacklevel=2 + ) + else: + warnings.warn( + f"Line {name_token.line}: ignoring unknown extension '{extension_name}'", + stacklevel=2 + ) + else: + # Standard options - currently we only recognize deprecated + if option_name not in {"deprecated"}: + warnings.warn( + f"Line {name_token.line}: ignoring unknown message option '{option_name}' in '{message_name}'", + stacklevel=2 + ) + + return (full_option_name, option_value) + + def parse_reserved(self): + """Parse a reserved statement. + + Supports: + - reserved 2, 15, 9 to 11, 40 to max; (numbers and ranges) + - reserved "FOO", "BAR"; (field/value names) + """ + self.consume(TokenType.RESERVED) + + # Parse comma-separated list of reserved items + while True: + if self.check(TokenType.STRING): + # Reserved name: "FOO" + self.advance() + elif self.check(TokenType.INT): + # Reserved number or range start: 2 or 9 to 11 + self.advance() + # Check for range: N to M or N to max + if self.check(TokenType.TO): + self.advance() + if self.check(TokenType.MAX): + self.advance() + elif self.check(TokenType.INT): + self.advance() + else: + raise self.error("Expected integer or 'max' after 'to'") + else: + raise self.error( + f"Expected reserved number or string, got {self.current().type.name}" + ) + + # Check for comma (more items) or semicolon (end) + if self.check(TokenType.COMMA): + self.advance() + elif self.check(TokenType.SEMI): + break + else: + raise self.error("Expected ',' or ';' in reserved statement") + + self.consume(TokenType.SEMI, "Expected ';' after reserved statement") + + def parse_enum_value(self) -> EnumValue: + """Parse an enum value: NAME = 0;""" + start = self.current() + name = self.consume(TokenType.IDENT, "Expected enum value name").value + self.consume(TokenType.EQUALS, "Expected '=' after enum value name") + value_token = self.consume(TokenType.INT, "Expected integer value") + value = int(value_token.value) + self.consume(TokenType.SEMI, "Expected ';' after enum value") + + return EnumValue( + name=name, + value=value, + line=start.line, + column=start.column, + ) + + def parse_message(self) -> Message: + """Parse a message: message Dog [id=102] { ... } + + Supports: + - Inline type options: message Dog [id=102] { ... } + - Body option statements: option (fory).id = 100; + - Nested messages and enums: + message Outer { + message Inner { ... } + enum Status { ... } + Inner inner = 1; + } + """ + start = self.current() + self.consume(TokenType.MESSAGE) + name = self.consume(TokenType.IDENT, "Expected message name").value + + # Optional inline type options: [id=102, deprecated=true] + type_id = None + inline_options = {} + if self.check(TokenType.LBRACKET): + inline_options = self.parse_type_options(name) + if "id" in inline_options: + type_id = inline_options["id"] + + self.consume(TokenType.LBRACE, "Expected '{' after message name") + + fields = [] + nested_messages = [] + nested_enums = [] + body_options = {} + + while not self.check(TokenType.RBRACE): + # Check for reserved statements + if self.check(TokenType.RESERVED): + self.parse_reserved() + # Check for option statements (message-level options) + elif self.check(TokenType.OPTION): + opt_name, opt_value = self.parse_message_option(name) + body_options[opt_name] = opt_value + # Handle fory.id option to set type_id + if opt_name == "fory.id" and type_id is None: + if isinstance(opt_value, int) and opt_value > 0: + type_id = opt_value + # Check for nested message + elif self.check(TokenType.MESSAGE): + nested_messages.append(self.parse_message()) + # Check for nested enum + elif self.check(TokenType.ENUM): + nested_enums.append(self.parse_enum()) + else: + fields.append(self.parse_field()) + + self.consume(TokenType.RBRACE, "Expected '}' after message fields") + + # Merge inline options and body options (body options take precedence) + all_options = {**inline_options, **body_options} + + return Message( + name=name, + type_id=type_id, + fields=fields, + nested_messages=nested_messages, + nested_enums=nested_enums, + options=all_options, + line=start.line, + column=start.column, + ) + + def parse_field(self) -> Field: + """Parse a field: optional ref repeated Type name = 1 [options]; + + Supports: + - Keyword modifiers: optional ref repeated + - Bracket options: [deprecated=true, (fory).ref=true] + """ + start = self.current() + + # Parse modifiers + optional = self.match(TokenType.OPTIONAL) + ref = self.match(TokenType.REF) + repeated = self.match(TokenType.REPEATED) + + # Parse type + field_type = self.parse_type() + + # Wrap in ListType if repeated + if repeated: + field_type = ListType(field_type) + + # Parse field name + name = self.consume(TokenType.IDENT, "Expected field name").value + + # Parse field number + self.consume(TokenType.EQUALS, "Expected '=' after field name") + number_token = self.consume(TokenType.INT, "Expected field number") + number = int(number_token.value) + + # Parse optional field options: [deprecated=true, (fory).ref=true] + field_options = {} + if self.check(TokenType.LBRACKET): + field_options = self.parse_field_options(name) + # Handle fory.ref or ref option to set ref flag + if field_options.get("fory.ref") is True or field_options.get("fory.tracking_ref") is True: + ref = True + if field_options.get("ref") is True or field_options.get("tracking_ref") is True: + ref = True + # Handle fory.nullable or nullable option to set optional flag + if field_options.get("fory.nullable") is True or field_options.get("nullable") is True: + optional = True + + self.consume(TokenType.SEMI, "Expected ';' after field declaration") + + return Field( + name=name, + field_type=field_type, + number=number, + optional=optional, + ref=ref, + options=field_options, + line=start.line, + column=start.column, + ) + + def parse_field_options(self, field_name: str) -> dict: + """Parse field options: [deprecated=true, (fory).ref=true] + + Supports two syntaxes: + 1. Standard: [deprecated=true, json_name="foo"] + 2. Extension: [(fory).ref=true, (fory).nullable=true] + + Returns a dict of option names to values. + For extension options, the name is prefixed: "fory.ref" + """ + self.consume(TokenType.LBRACKET) + options = {} + + while True: + # Check for extension syntax: (extension_name).option_name + extension_name = None + if self.check(TokenType.LPAREN): + self.advance() # consume ( + extension_name = self.consume(TokenType.IDENT, "Expected extension name").value + self.consume(TokenType.RPAREN, "Expected ')' after extension name") + self.consume(TokenType.DOT, "Expected '.' after extension name") + + # Parse option name (can be IDENT or keyword like 'ref', 'optional', etc.) + name_token = self.current() + if self.check(TokenType.IDENT): + self.advance() + option_name = name_token.value + elif self.check(TokenType.REF): + # 'ref' is a keyword but valid as option name + self.advance() + option_name = "ref" + elif self.check(TokenType.OPTIONAL): + # 'optional' is a keyword but valid as option name + self.advance() + option_name = "optional" + elif self.check(TokenType.REPEATED): + # 'repeated' is a keyword but valid as option name + self.advance() + option_name = "repeated" + elif self.check(TokenType.WEAK): + # 'weak' is a keyword but valid as option name + self.advance() + option_name = "weak" + elif self.check(TokenType.TRUE): + # 'true' can be used as option name in some contexts + self.advance() + option_name = "true" + elif self.check(TokenType.FALSE): + # 'false' can be used as option name in some contexts + self.advance() + option_name = "false" + else: + raise self.error(f"Expected option name, got {self.current().type.name}") + + # Build full option name for extension options + if extension_name: + full_option_name = f"{extension_name}.{option_name}" + else: + full_option_name = option_name + + self.consume(TokenType.EQUALS, "Expected '=' after option name") + + # Parse option value + option_value = self.parse_option_value() + options[full_option_name] = option_value + + # Warn about unknown field options + if extension_name: + if extension_name == "fory": + if option_name not in KNOWN_FORY_FIELD_OPTIONS: + warnings.warn( + f"Line {name_token.line}: ignoring unknown fory field option '{option_name}' on field '{field_name}'", + stacklevel=2 + ) + else: + warnings.warn( + f"Line {name_token.line}: ignoring unknown extension '{extension_name}'", + stacklevel=2 + ) + else: + if option_name not in KNOWN_FIELD_OPTIONS: + warnings.warn( + f"Line {name_token.line}: ignoring unknown field option '{option_name}' on field '{field_name}'", + stacklevel=2 + ) + + # Check for comma (more options) or closing bracket (end) + if self.check(TokenType.COMMA): + self.advance() + elif self.check(TokenType.RBRACKET): + break + else: + raise self.error("Expected ',' or ']' in field options") + + self.consume(TokenType.RBRACKET, "Expected ']' after field options") + return options + + def parse_type_options(self, type_name: str) -> dict: + """Parse type options: [id=100, deprecated=true] + + Returns a dict of option names to values. + Warns about unknown options. + """ + self.consume(TokenType.LBRACKET) + options = {} + + while True: + # Parse option name + name_token = self.consume(TokenType.IDENT, "Expected option name") + option_name = name_token.value + + self.consume(TokenType.EQUALS, "Expected '=' after option name") + + # Parse option value (can be string, bool, int, or identifier) + if self.check(TokenType.STRING): + option_value = self.advance().value + elif self.check(TokenType.TRUE): + self.advance() + option_value = True + elif self.check(TokenType.FALSE): + self.advance() + option_value = False + elif self.check(TokenType.INT): + option_value = int(self.advance().value) + elif self.check(TokenType.IDENT): + option_value = self.advance().value + else: + raise self.error(f"Expected option value, got {self.current().type.name}") + + # Validate 'id' option must be a positive integer + if option_name == "id": + if not isinstance(option_value, int): + raise self.error(f"Type option 'id' must be an integer, got {type(option_value).__name__}") + if option_value <= 0: + raise self.error(f"Type option 'id' must be a positive integer, got {option_value}") + + # Warn about unknown type options + if option_name not in KNOWN_TYPE_OPTIONS: + warnings.warn( + f"Line {name_token.line}: ignoring unknown type option '{option_name}' on type '{type_name}'", + stacklevel=2 + ) + + options[option_name] = option_value + + # Check for comma (more options) or closing bracket (end) + if self.check(TokenType.COMMA): + self.advance() + elif self.check(TokenType.RBRACKET): + break + else: + raise self.error("Expected ',' or ']' in type options") + + self.consume(TokenType.RBRACKET, "Expected ']' after type options") + return options + + def parse_type(self) -> FieldType: + """Parse a type: int32, string, map, Parent.Child, or a named type.""" + if self.check(TokenType.MAP): + return self.parse_map_type() + + if not self.check(TokenType.IDENT): + raise self.error(f"Expected type name, got {self.current().type.name}") + + type_name = self.consume(TokenType.IDENT).value + + # Check if it's a primitive type + if type_name in PRIMITIVE_TYPES: + return PrimitiveType(PRIMITIVE_TYPES[type_name]) + + # Check for qualified name (e.g., Parent.Child or Outer.Middle.Inner) + while self.check(TokenType.DOT): + self.advance() # consume the dot + if not self.check(TokenType.IDENT): + raise self.error("Expected identifier after '.'") + type_name += "." + self.consume(TokenType.IDENT).value + + # It's a named type (reference to message or enum) + return NamedType(type_name) + + def parse_map_type(self) -> MapType: + """Parse a map type: map""" + self.consume(TokenType.MAP) + self.consume(TokenType.LANGLE, "Expected '<' after 'map'") + + key_type = self.parse_type() + + self.consume(TokenType.COMMA, "Expected ',' between map key and value types") + + value_type = self.parse_type() + + self.consume(TokenType.RANGLE, "Expected '>' after map value type") + + return MapType(key_type, value_type) + + +def parse(source: str, filename: str = "") -> Schema: + """Parse FDL source code and return a Schema.""" + parser = Parser.from_source(source, filename) + return parser.parse() diff --git a/compiler/pyproject.toml b/compiler/pyproject.toml new file mode 100644 index 0000000000..05f93625a4 --- /dev/null +++ b/compiler/pyproject.toml @@ -0,0 +1,37 @@ +[build-system] +requires = ["setuptools>=61.0"] +build-backend = "setuptools.build_meta" + +[project] +name = "fory-compiler" +version = "0.1.0" +description = "FDL (Fory Definition Language) compiler for Apache Fory cross-language serialization" +readme = "README.md" +license = {text = "Apache-2.0"} +requires-python = ">=3.8" +classifiers = [ + "Development Status :: 3 - Alpha", + "Intended Audience :: Developers", + "License :: OSI Approved :: Apache Software License", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.8", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Topic :: Software Development :: Code Generators", + "Topic :: Software Development :: Compilers", +] +keywords = ["fory", "serialization", "codegen", "idl", "compiler"] + +[project.scripts] +fory = "fory_compiler.cli:main" + +[project.urls] +Homepage = "https://github.com/apache/fory" +Documentation = "https://fory.apache.org" +Repository = "https://github.com/apache/fory" + +[tool.setuptools.packages.find] +where = ["."] +include = ["fory_compiler*"] diff --git a/compiler/tests/__init__.py b/compiler/tests/__init__.py new file mode 100644 index 0000000000..d28f0644bd --- /dev/null +++ b/compiler/tests/__init__.py @@ -0,0 +1,18 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Tests for FDL compiler.""" diff --git a/compiler/tests/test_imports.py b/compiler/tests/test_imports.py new file mode 100644 index 0000000000..9894925b17 --- /dev/null +++ b/compiler/tests/test_imports.py @@ -0,0 +1,471 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Tests for FDL import functionality.""" + +import pytest +import tempfile +import os +from pathlib import Path + +from fory_compiler.parser.lexer import Lexer, TokenType +from fory_compiler.parser.parser import Parser +from fory_compiler.cli import resolve_imports, ImportError + + +class TestLexerImport: + """Tests for lexer import token support.""" + + def test_import_keyword(self): + """Test that 'import' is recognized as a keyword.""" + lexer = Lexer('import "foo.fdl";') + tokens = lexer.tokenize() + + assert tokens[0].type == TokenType.IMPORT + assert tokens[0].value == "import" + + def test_string_literal_double_quotes(self): + """Test parsing double-quoted string literals.""" + lexer = Lexer('"hello/world.fdl"') + tokens = lexer.tokenize() + + assert tokens[0].type == TokenType.STRING + assert tokens[0].value == "hello/world.fdl" + + def test_string_literal_single_quotes(self): + """Test parsing single-quoted string literals.""" + lexer = Lexer("'hello/world.fdl'") + tokens = lexer.tokenize() + + assert tokens[0].type == TokenType.STRING + assert tokens[0].value == "hello/world.fdl" + + def test_string_literal_escape_sequences(self): + """Test escape sequences in string literals.""" + lexer = Lexer(r'"path\\to\\file.fdl"') + tokens = lexer.tokenize() + + assert tokens[0].type == TokenType.STRING + assert tokens[0].value == "path\\to\\file.fdl" + + def test_full_import_statement(self): + """Test tokenizing a complete import statement.""" + lexer = Lexer('import "common/types.fdl";') + tokens = lexer.tokenize() + + assert tokens[0].type == TokenType.IMPORT + assert tokens[1].type == TokenType.STRING + assert tokens[1].value == "common/types.fdl" + assert tokens[2].type == TokenType.SEMI + + +class TestParserImport: + """Tests for parser import statement support.""" + + def test_parse_single_import(self): + """Test parsing a single import statement.""" + source = ''' + import "common.fdl"; + + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.imports) == 1 + assert schema.imports[0].path == "common.fdl" + + def test_parse_multiple_imports(self): + """Test parsing multiple import statements.""" + source = ''' + import "common.fdl"; + import "types/address.fdl"; + import "types/contact.fdl"; + + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.imports) == 3 + assert schema.imports[0].path == "common.fdl" + assert schema.imports[1].path == "types/address.fdl" + assert schema.imports[2].path == "types/contact.fdl" + + def test_imports_after_package(self): + """Test imports can appear after package declaration.""" + source = ''' + package myapp; + import "common.fdl"; + + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.package == "myapp" + assert len(schema.imports) == 1 + assert schema.imports[0].path == "common.fdl" + + +class TestImportResolution: + """Tests for import resolution in CLI.""" + + def test_simple_import(self): + """Test resolving a simple import.""" + with tempfile.TemporaryDirectory() as tmpdir: + tmpdir = Path(tmpdir) + + # Create common.fdl + common_fdl = tmpdir / "common.fdl" + common_fdl.write_text(''' + package common; + + message Address [id=100] { + string street = 1; + string city = 2; + } + ''') + + # Create main.fdl that imports common.fdl + main_fdl = tmpdir / "main.fdl" + main_fdl.write_text(''' + package main; + import "common.fdl"; + + message User [id=101] { + string name = 1; + Address address = 2; + } + ''') + + # Resolve imports + schema = resolve_imports(main_fdl) + + # Should have both Address and User + assert len(schema.messages) == 2 + type_names = {m.name for m in schema.messages} + assert "Address" in type_names + assert "User" in type_names + + def test_nested_imports(self): + """Test resolving nested imports (A imports B, B imports C).""" + with tempfile.TemporaryDirectory() as tmpdir: + tmpdir = Path(tmpdir) + + # Create base.fdl + base_fdl = tmpdir / "base.fdl" + base_fdl.write_text(''' + package base; + + enum Status [id=100] { + ACTIVE = 0; + INACTIVE = 1; + } + ''') + + # Create common.fdl that imports base.fdl + common_fdl = tmpdir / "common.fdl" + common_fdl.write_text(''' + package common; + import "base.fdl"; + + message BaseEntity [id=101] { + Status status = 1; + } + ''') + + # Create main.fdl that imports common.fdl + main_fdl = tmpdir / "main.fdl" + main_fdl.write_text(''' + package main; + import "common.fdl"; + + message User [id=102] { + string name = 1; + Status status = 2; + } + ''') + + # Resolve imports + schema = resolve_imports(main_fdl) + + # Should have Status enum and both messages + assert len(schema.enums) == 1 + assert schema.enums[0].name == "Status" + assert len(schema.messages) == 2 + + def test_subdirectory_import(self): + """Test importing from a subdirectory.""" + with tempfile.TemporaryDirectory() as tmpdir: + tmpdir = Path(tmpdir) + + # Create types subdirectory + types_dir = tmpdir / "types" + types_dir.mkdir() + + # Create types/address.fdl + address_fdl = types_dir / "address.fdl" + address_fdl.write_text(''' + package types; + + message Address [id=100] { + string street = 1; + } + ''') + + # Create main.fdl + main_fdl = tmpdir / "main.fdl" + main_fdl.write_text(''' + package main; + import "types/address.fdl"; + + message User [id=101] { + Address home = 1; + } + ''') + + # Resolve imports + schema = resolve_imports(main_fdl) + + assert len(schema.messages) == 2 + type_names = {m.name for m in schema.messages} + assert "Address" in type_names + assert "User" in type_names + + def test_circular_import_detection(self): + """Test that circular imports are detected.""" + with tempfile.TemporaryDirectory() as tmpdir: + tmpdir = Path(tmpdir) + + # Create a.fdl that imports b.fdl + a_fdl = tmpdir / "a.fdl" + a_fdl.write_text(''' + package a; + import "b.fdl"; + + message A [id=100] { + string name = 1; + } + ''') + + # Create b.fdl that imports a.fdl (circular!) + b_fdl = tmpdir / "b.fdl" + b_fdl.write_text(''' + package b; + import "a.fdl"; + + message B [id=101] { + string name = 1; + } + ''') + + # Should raise ImportError + with pytest.raises(ImportError) as exc_info: + resolve_imports(a_fdl) + + assert "Circular import" in str(exc_info.value) + + def test_missing_import(self): + """Test error when imported file doesn't exist.""" + with tempfile.TemporaryDirectory() as tmpdir: + tmpdir = Path(tmpdir) + + # Create main.fdl that imports non-existent file + main_fdl = tmpdir / "main.fdl" + main_fdl.write_text(''' + package main; + import "nonexistent.fdl"; + + message User [id=100] { + string name = 1; + } + ''') + + # Should raise ImportError + with pytest.raises(ImportError) as exc_info: + resolve_imports(main_fdl) + + assert "Import not found" in str(exc_info.value) + + def test_diamond_import(self): + """Test diamond dependency pattern (A imports B and C, both import D).""" + with tempfile.TemporaryDirectory() as tmpdir: + tmpdir = Path(tmpdir) + + # Create d.fdl (base) + d_fdl = tmpdir / "d.fdl" + d_fdl.write_text(''' + package d; + + message Base [id=100] { + string id = 1; + } + ''') + + # Create b.fdl (imports d) + b_fdl = tmpdir / "b.fdl" + b_fdl.write_text(''' + package b; + import "d.fdl"; + + message B [id=101] { + Base base = 1; + } + ''') + + # Create c.fdl (imports d) + c_fdl = tmpdir / "c.fdl" + c_fdl.write_text(''' + package c; + import "d.fdl"; + + message C [id=102] { + Base base = 1; + } + ''') + + # Create a.fdl (imports b and c) + a_fdl = tmpdir / "a.fdl" + a_fdl.write_text(''' + package a; + import "b.fdl"; + import "c.fdl"; + + message A [id=103] { + B b = 1; + C c = 2; + } + ''') + + # Resolve imports - should handle diamond without error + schema = resolve_imports(a_fdl) + + # Note: Base will appear twice due to diamond, but validation will catch duplicates + type_names = [m.name for m in schema.messages] + assert "A" in type_names + assert "B" in type_names + assert "C" in type_names + + def test_relative_path_resolution(self): + """Test that relative paths are resolved correctly.""" + with tempfile.TemporaryDirectory() as tmpdir: + tmpdir = Path(tmpdir) + + # Create directory structure + src_dir = tmpdir / "src" + src_dir.mkdir() + common_dir = tmpdir / "common" + common_dir.mkdir() + + # Create common/types.fdl + types_fdl = common_dir / "types.fdl" + types_fdl.write_text(''' + package common; + + message CommonType [id=100] { + string value = 1; + } + ''') + + # Create src/main.fdl with relative path + main_fdl = src_dir / "main.fdl" + main_fdl.write_text(''' + package src; + import "../common/types.fdl"; + + message User [id=101] { + CommonType data = 1; + } + ''') + + # Resolve imports + schema = resolve_imports(main_fdl) + + assert len(schema.messages) == 2 + type_names = {m.name for m in schema.messages} + assert "CommonType" in type_names + assert "User" in type_names + + +class TestValidationWithImports: + """Tests for schema validation with imports.""" + + def test_valid_type_reference_from_import(self): + """Test that types from imports can be referenced.""" + with tempfile.TemporaryDirectory() as tmpdir: + tmpdir = Path(tmpdir) + + # Create common.fdl + common_fdl = tmpdir / "common.fdl" + common_fdl.write_text(''' + package common; + + message Address [id=100] { + string street = 1; + } + ''') + + # Create main.fdl that uses Address + main_fdl = tmpdir / "main.fdl" + main_fdl.write_text(''' + package main; + import "common.fdl"; + + message User [id=101] { + string name = 1; + Address address = 2; + } + ''') + + schema = resolve_imports(main_fdl) + errors = schema.validate() + + # Should have no errors - Address is imported + assert len(errors) == 0 + + def test_invalid_type_reference_without_import(self): + """Test that missing types are detected.""" + source = ''' + package main; + + message User [id=100] { + string name = 1; + Address address = 2; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + errors = schema.validate() + + # Should have error - Address is not defined + assert len(errors) == 1 + assert "Unknown type 'Address'" in errors[0] + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/compiler/tests/test_nested_types.py b/compiler/tests/test_nested_types.py new file mode 100644 index 0000000000..667848439b --- /dev/null +++ b/compiler/tests/test_nested_types.py @@ -0,0 +1,341 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Tests for FDL nested type support.""" + +import pytest + +from fory_compiler.parser.lexer import Lexer +from fory_compiler.parser.parser import Parser +from fory_compiler.parser.ast import NamedType, ListType + + +class TestNestedMessageParsing: + """Tests for parsing nested messages.""" + + def test_simple_nested_message(self): + """Test parsing a simple nested message.""" + source = ''' + message SearchResponse { + message Result { + string url = 1; + string title = 2; + } + repeated Result results = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + outer = schema.messages[0] + assert outer.name == "SearchResponse" + assert len(outer.nested_messages) == 1 + inner = outer.nested_messages[0] + assert inner.name == "Result" + assert len(inner.fields) == 2 + + def test_nested_enum(self): + """Test parsing a nested enum.""" + source = ''' + message Outer { + enum Status { + UNKNOWN = 0; + ACTIVE = 1; + INACTIVE = 2; + } + Status status = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + outer = schema.messages[0] + assert len(outer.nested_enums) == 1 + nested_enum = outer.nested_enums[0] + assert nested_enum.name == "Status" + assert len(nested_enum.values) == 3 + + def test_deeply_nested_message(self): + """Test parsing deeply nested messages.""" + source = ''' + message Outer { + message Middle { + message Inner { + string value = 1; + } + Inner inner = 1; + } + Middle middle = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + outer = schema.messages[0] + assert outer.name == "Outer" + middle = outer.nested_messages[0] + assert middle.name == "Middle" + inner = middle.nested_messages[0] + assert inner.name == "Inner" + + def test_mixed_nested_types(self): + """Test parsing messages with both nested messages and enums.""" + source = ''' + message Container { + enum Type { + TYPE_UNKNOWN = 0; + TYPE_A = 1; + TYPE_B = 2; + } + message Item { + string name = 1; + Type type = 2; + } + repeated Item items = 1; + Type default_type = 2; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + container = schema.messages[0] + assert len(container.nested_enums) == 1 + assert len(container.nested_messages) == 1 + assert container.nested_enums[0].name == "Type" + assert container.nested_messages[0].name == "Item" + + +class TestQualifiedTypeNames: + """Tests for qualified type names (Parent.Child).""" + + def test_qualified_type_in_field(self): + """Test using qualified type names in field definitions.""" + source = ''' + message SearchResponse { + message Result { + string url = 1; + } + } + message SearchRequest { + SearchResponse.Result cached_result = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + request = schema.messages[1] + assert request.name == "SearchRequest" + field = request.fields[0] + assert isinstance(field.field_type, NamedType) + assert field.field_type.name == "SearchResponse.Result" + + def test_qualified_type_in_list(self): + """Test using qualified type names in list fields.""" + source = ''' + message Outer { + message Inner { + string value = 1; + } + } + message Container { + repeated Outer.Inner items = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + container = schema.messages[1] + field = container.fields[0] + assert isinstance(field.field_type, ListType) + assert isinstance(field.field_type.element_type, NamedType) + assert field.field_type.element_type.name == "Outer.Inner" + + +class TestNestedTypeValidation: + """Tests for validation of nested types.""" + + def test_valid_nested_type_reference(self): + """Test that references to nested types are valid.""" + source = ''' + message SearchResponse { + message Result { + string url = 1; + } + repeated Result results = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + errors = schema.validate() + + assert len(errors) == 0 + + def test_valid_qualified_type_reference(self): + """Test that qualified type references are valid.""" + source = ''' + message SearchResponse { + message Result { + string url = 1; + } + } + message Collector { + SearchResponse.Result best_result = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + errors = schema.validate() + + assert len(errors) == 0 + + def test_duplicate_nested_type_names(self): + """Test that duplicate nested type names are detected.""" + source = ''' + message Container { + message Inner { + string a = 1; + } + message Inner { + string b = 1; + } + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + errors = schema.validate() + + assert len(errors) == 1 + assert "Duplicate nested type name" in errors[0] + + def test_duplicate_type_ids_in_nested(self): + """Test that duplicate type IDs in nested types are detected.""" + source = ''' + message Outer [id=100] { + message Inner [id=100] { + string value = 1; + } + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + errors = schema.validate() + + assert len(errors) == 1 + assert "Duplicate type ID @100" in errors[0] + + def test_unknown_nested_type(self): + """Test that references to unknown nested types are detected.""" + source = ''' + message Container { + NonExistent.Type field = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + errors = schema.validate() + + assert len(errors) == 1 + assert "Unknown type" in errors[0] + + +class TestSchemaTypeLookup: + """Tests for Schema.get_type with nested types.""" + + def test_get_nested_type_by_qualified_name(self): + """Test looking up nested types by qualified name.""" + source = ''' + message SearchResponse { + message Result { + string url = 1; + } + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + # Should find by qualified name + result_type = schema.get_type("SearchResponse.Result") + assert result_type is not None + assert result_type.name == "Result" + + # Should find top-level type by simple name + response_type = schema.get_type("SearchResponse") + assert response_type is not None + assert response_type.name == "SearchResponse" + + def test_get_deeply_nested_type(self): + """Test looking up deeply nested types.""" + source = ''' + message A { + message B { + message C { + string value = 1; + } + } + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + c_type = schema.get_type("A.B.C") + assert c_type is not None + assert c_type.name == "C" + + def test_get_all_types_includes_nested(self): + """Test that get_all_types includes nested types.""" + source = ''' + message Outer { + enum Status { + UNKNOWN = 0; + } + message Inner { + message Deep { + string value = 1; + } + } + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + all_types = schema.get_all_types() + type_names = [t.name for t in all_types] + + assert "Outer" in type_names + assert "Status" in type_names + assert "Inner" in type_names + assert "Deep" in type_names + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/compiler/tests/test_package_options.py b/compiler/tests/test_package_options.py new file mode 100644 index 0000000000..fe51ab0566 --- /dev/null +++ b/compiler/tests/test_package_options.py @@ -0,0 +1,1374 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Tests for FDL package options and qualified type names.""" + +import pytest +from pathlib import Path + +from fory_compiler.parser.lexer import Lexer +from fory_compiler.parser.parser import Parser +from fory_compiler.generators.java import JavaGenerator +from fory_compiler.generators.go import GoGenerator +from fory_compiler.generators.base import GeneratorOptions + + +class TestDottedPackageName: + """Tests for dotted package name parsing.""" + + def test_simple_package(self): + """Test parsing a simple package name.""" + source = ''' + package foo; + message Bar { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.package == "foo" + + def test_dotted_package(self): + """Test parsing a dotted package name.""" + source = ''' + package foo.bar; + message Baz { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.package == "foo.bar" + + def test_deeply_dotted_package(self): + """Test parsing a deeply nested package name.""" + source = ''' + package com.example.payment.v1; + message Payment { + string id = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.package == "com.example.payment.v1" + + +class TestUnknownOptionWarning: + """Tests for unknown option warnings.""" + + def test_unknown_option_warns(self): + """Test that unknown options produce a warning.""" + source = ''' + package myapp; + option unknown_option = "value"; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have one warning + assert len(w) == 1 + assert "ignoring unknown option 'unknown_option'" in str(w[0].message) + + # Option should still be stored + assert schema.get_option("unknown_option") == "value" + + def test_known_option_no_warning(self): + """Test that known options don't produce warnings.""" + source = ''' + package myapp; + option java_package = "com.example"; + option go_package = "github.com/example"; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have no warnings + assert len(w) == 0 + + def test_multiple_unknown_options_warn(self): + """Test that multiple unknown options each produce a warning.""" + source = ''' + package myapp; + option foo = "bar"; + option baz = 123; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have two warnings + assert len(w) == 2 + assert "foo" in str(w[0].message) + assert "baz" in str(w[1].message) + + +class TestFileOptions: + """Tests for file-level option parsing.""" + + def test_java_package_option(self): + """Test parsing java_package option.""" + source = ''' + package payment; + option java_package = "com.mycorp.payment.v1"; + message Payment { + string id = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.package == "payment" + assert schema.get_option("java_package") == "com.mycorp.payment.v1" + + def test_go_package_option(self): + """Test parsing go_package option.""" + source = ''' + package payment; + option go_package = "github.com/mycorp/apis/gen/payment/v1;paymentv1"; + message Payment { + string id = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.package == "payment" + assert schema.get_option("go_package") == "github.com/mycorp/apis/gen/payment/v1;paymentv1" + + def test_multiple_options(self): + """Test parsing multiple file-level options.""" + source = ''' + package payment; + option java_package = "com.mycorp.payment.v1"; + option go_package = "github.com/mycorp/payment/v1;paymentv1"; + message Payment { + string id = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.get_option("java_package") == "com.mycorp.payment.v1" + assert schema.get_option("go_package") == "github.com/mycorp/payment/v1;paymentv1" + + def test_option_with_boolean_value(self): + """Test parsing option with boolean value.""" + source = ''' + package test; + option deprecated = true; + message Foo { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.get_option("deprecated") is True + + def test_option_with_integer_value(self): + """Test parsing option with integer value.""" + source = ''' + package test; + option version = 1; + message Foo { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.get_option("version") == 1 + + +class TestQualifiedTypeNames: + """Tests for package-qualified type references.""" + + def test_qualified_type_in_field(self): + """Test using qualified type names in field definitions.""" + source = ''' + package myapp; + message SearchResponse { + message Result { + string url = 1; + } + } + message SearchRequest { + SearchResponse.Result cached_result = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + errors = schema.validate() + + assert len(errors) == 0 + request = schema.messages[1] + assert request.fields[0].field_type.name == "SearchResponse.Result" + + +class TestJavaPackageGeneration: + """Tests for Java package generation with java_package option.""" + + def test_java_package_option_used(self): + """Test that java_package option is used in generated Java code.""" + source = ''' + package payment; + option java_package = "com.mycorp.payment.v1"; + message Payment { + string id = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + # Check that files are in the java_package path + payment_file = next(f for f in files if "Payment.java" in f.path) + assert "com/mycorp/payment/v1/Payment.java" == payment_file.path + assert "package com.mycorp.payment.v1;" in payment_file.content + + def test_java_package_fallback_to_fdl_package(self): + """Test fallback to FDL package when java_package is not specified.""" + source = ''' + package com.example.models; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + user_file = next(f for f in files if "User.java" in f.path) + assert "com/example/models/User.java" == user_file.path + assert "package com.example.models;" in user_file.content + + +class TestGoPackageGeneration: + """Tests for Go package generation with go_package option.""" + + def test_go_package_with_semicolon(self): + """Test go_package option with explicit package name.""" + source = ''' + package payment; + option go_package = "github.com/mycorp/apis/gen/payment/v1;paymentv1"; + message Payment { + string id = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = GoGenerator(schema, options) + + import_path, package_name = generator.get_go_package_info() + assert import_path == "github.com/mycorp/apis/gen/payment/v1" + assert package_name == "paymentv1" + + files = generator.generate() + go_file = files[0] + assert "package paymentv1" in go_file.content + + def test_go_package_without_semicolon(self): + """Test go_package option without explicit package name.""" + source = ''' + package payment; + option go_package = "github.com/mycorp/apis/payment/v1"; + message Payment { + string id = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = GoGenerator(schema, options) + + import_path, package_name = generator.get_go_package_info() + assert import_path == "github.com/mycorp/apis/payment/v1" + assert package_name == "v1" + + def test_go_package_fallback_to_fdl_package(self): + """Test fallback to FDL package when go_package is not specified.""" + source = ''' + package com.example.models; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = GoGenerator(schema, options) + + import_path, package_name = generator.get_go_package_info() + assert import_path is None + assert package_name == "models" + + +class TestNamespaceConsistency: + """Tests for namespace consistency across languages.""" + + def test_java_uses_fdl_package_for_namespace(self): + """Test that Java uses FDL package for type namespace, not java_package.""" + source = ''' + package myapp.models; + option java_package = "com.mycorp.generated.models"; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + registration_file = next(f for f in files if "Registration" in f.path) + + # Should use myapp.models (FDL package) for namespace, not com.mycorp.generated.models + assert '"myapp.models"' in registration_file.content + + def test_go_uses_fdl_package_for_namespace(self): + """Test that Go uses FDL package for type namespace, not go_package.""" + source = ''' + package myapp.models; + option go_package = "github.com/mycorp/generated;genmodels"; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = GoGenerator(schema, options) + + files = generator.generate() + go_file = files[0] + + # Should use myapp.models (FDL package) for namespace + assert '"myapp.models.User"' in go_file.content + + +class TestJavaOuterClassname: + """Tests for java_outer_classname option.""" + + def test_outer_classname_generates_single_file(self): + """Test that java_outer_classname generates all types in a single file.""" + source = ''' + package myapp; + option java_outer_classname = "DescriptorProtos"; + + enum Status { + UNKNOWN = 0; + ACTIVE = 1; + } + + message User { + string name = 1; + Status status = 2; + } + + message Order { + string id = 1; + User customer = 2; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + # Should generate only 2 files: outer class and registration + assert len(files) == 2 + + # Find the outer class file + outer_file = next(f for f in files if "DescriptorProtos.java" in f.path) + assert outer_file is not None + assert "myapp/DescriptorProtos.java" == outer_file.path + + # Check content + assert "public final class DescriptorProtos" in outer_file.content + assert "private DescriptorProtos()" in outer_file.content + assert "public static enum Status" in outer_file.content + assert "public static class User" in outer_file.content + assert "public static class Order" in outer_file.content + + def test_outer_classname_registration_uses_prefix(self): + """Test that registration uses outer class as prefix.""" + source = ''' + package myapp; + option java_outer_classname = "DescriptorProtos"; + + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + registration_file = next(f for f in files if "Registration" in f.path) + + # Should reference types with outer class prefix + assert "DescriptorProtos.User.class" in registration_file.content + + def test_outer_classname_with_nested_types(self): + """Test java_outer_classname with nested types.""" + source = ''' + package myapp; + option java_outer_classname = "Protos"; + + message Container { + message Inner { + string value = 1; + } + Inner item = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + outer_file = next(f for f in files if "Protos.java" in f.path) + + # Should have nested types inside the outer class + assert "public static class Container" in outer_file.content + assert "public static class Inner" in outer_file.content + + def test_outer_classname_with_java_package(self): + """Test java_outer_classname combined with java_package.""" + source = ''' + package myapp; + option java_package = "com.example.proto"; + option java_outer_classname = "MyProtos"; + + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + outer_file = next(f for f in files if "MyProtos.java" in f.path) + + # Should use java_package for file path and package declaration + assert "com/example/proto/MyProtos.java" == outer_file.path + assert "package com.example.proto;" in outer_file.content + assert "public final class MyProtos" in outer_file.content + + def test_without_outer_classname_generates_separate_files(self): + """Test that without java_outer_classname, separate files are generated.""" + source = ''' + package myapp; + + enum Status { + UNKNOWN = 0; + } + + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + # Should generate separate files: Status.java, User.java, Registration.java + file_names = [f.path.split("/")[-1] for f in files] + assert "Status.java" in file_names + assert "User.java" in file_names + assert "MyappForyRegistration.java" in file_names + + +class TestJavaMultipleFiles: + """Tests for java_multiple_files option.""" + + def test_multiple_files_true_generates_separate_files(self): + """Test that java_multiple_files = true generates separate files.""" + source = ''' + package myapp; + option java_multiple_files = true; + + enum Status { + UNKNOWN = 0; + ACTIVE = 1; + } + + message User { + string name = 1; + } + + message Order { + string id = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + # Should generate separate files for each type + file_names = [f.path.split("/")[-1] for f in files] + assert "Status.java" in file_names + assert "User.java" in file_names + assert "Order.java" in file_names + assert "MyappForyRegistration.java" in file_names + + def test_multiple_files_false_with_outer_class_generates_single_file(self): + """Test that java_multiple_files = false with outer class generates single file.""" + source = ''' + package myapp; + option java_outer_classname = "MyProtos"; + option java_multiple_files = false; + + enum Status { + UNKNOWN = 0; + } + + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + # Should generate only 2 files: outer class and registration + assert len(files) == 2 + + file_names = [f.path.split("/")[-1] for f in files] + assert "MyProtos.java" in file_names + assert "MyappForyRegistration.java" in file_names + + def test_multiple_files_true_overrides_outer_classname(self): + """Test that java_multiple_files = true overrides java_outer_classname.""" + source = ''' + package myapp; + option java_outer_classname = "MyProtos"; + option java_multiple_files = true; + + enum Status { + UNKNOWN = 0; + } + + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + # Should generate separate files even though outer_classname is set + file_names = [f.path.split("/")[-1] for f in files] + assert "Status.java" in file_names + assert "User.java" in file_names + # MyProtos.java should NOT be generated + assert "MyProtos.java" not in file_names + + def test_multiple_files_default_is_false(self): + """Test that java_multiple_files defaults to false when outer_classname is set.""" + source = ''' + package myapp; + option java_outer_classname = "MyProtos"; + + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + # Should generate single outer class file (default behavior) + file_names = [f.path.split("/")[-1] for f in files] + assert "MyProtos.java" in file_names + assert "User.java" not in file_names + + def test_multiple_files_with_java_package(self): + """Test java_multiple_files combined with java_package.""" + source = ''' + package myapp; + option java_package = "com.example.generated"; + option java_multiple_files = true; + + message User { + string name = 1; + } + + message Order { + string id = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + options = GeneratorOptions(output_dir=Path("/tmp")) + generator = JavaGenerator(schema, options) + + files = generator.generate() + + # Should use java_package for paths + user_file = next(f for f in files if "User.java" in f.path) + assert "com/example/generated/User.java" == user_file.path + assert "package com.example.generated;" in user_file.content + + +class TestTypeOptions: + """Tests for type-level option parsing [id=100, deprecated=true].""" + + def test_message_with_id_option(self): + """Test parsing a message with id option.""" + source = ''' + package myapp; + message User [id=100] { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert user.name == "User" + assert user.type_id == 100 + + def test_enum_with_id_option(self): + """Test parsing an enum with id option.""" + source = ''' + package myapp; + enum Status [id=200] { + UNKNOWN = 0; + ACTIVE = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.enums) == 1 + status = schema.enums[0] + assert status.name == "Status" + assert status.type_id == 200 + + def test_type_with_multiple_options(self): + """Test parsing a type with multiple options.""" + source = ''' + package myapp; + message User [id=100, deprecated=true] { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert user.type_id == 100 + + def test_type_without_options(self): + """Test parsing a type without options (namespace-based).""" + source = ''' + package myapp; + message Config { + string key = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + config = schema.messages[0] + assert config.type_id is None + + def test_unknown_type_option_warns(self): + """Test that unknown type options produce a warning.""" + source = ''' + package myapp; + message User [id=100, unknown_opt=true] { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have one warning + assert len(w) == 1 + assert "ignoring unknown type option 'unknown_opt'" in str(w[0].message) + assert "type 'User'" in str(w[0].message) + + # id should still be parsed + assert schema.messages[0].type_id == 100 + + def test_known_type_options_no_warning(self): + """Test that known type options don't produce warnings.""" + source = ''' + package myapp; + message User [id=100, deprecated=true] { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have no warnings + assert len(w) == 0 + + def test_nested_type_with_id(self): + """Test parsing nested types with id options.""" + source = ''' + package myapp; + message Outer [id=100] { + message Inner [id=101] { + string value = 1; + } + Inner item = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + outer = schema.messages[0] + assert outer.type_id == 100 + inner = outer.nested_messages[0] + assert inner.type_id == 101 + + +class TestFieldOptions: + """Tests for field-level option parsing.""" + + def test_field_with_deprecated_option(self): + """Test parsing a field with deprecated option.""" + source = ''' + package myapp; + message User { + string name = 1; + int32 old_field = 2 [deprecated = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert len(user.fields) == 2 + assert user.fields[0].name == "name" + assert user.fields[1].name == "old_field" + + def test_field_with_json_name_option(self): + """Test parsing a field with json_name option.""" + source = ''' + package myapp; + message User { + string first_name = 1 [json_name = "firstName"]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert len(user.fields) == 1 + assert user.fields[0].name == "first_name" + + def test_field_with_multiple_options(self): + """Test parsing a field with multiple options.""" + source = ''' + package myapp; + message User { + string old_name = 1 [deprecated = true, json_name = "oldName"]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert len(user.fields) == 1 + assert user.fields[0].name == "old_name" + + def test_field_with_integer_option_value(self): + """Test parsing a field with integer option value.""" + source = ''' + package myapp; + message User { + int32 version = 1 [packed = 1]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert len(user.fields) == 1 + + def test_field_with_false_option_value(self): + """Test parsing a field with false option value.""" + source = ''' + package myapp; + message User { + string name = 1 [deprecated = false]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert len(user.fields) == 1 + + def test_unknown_field_option_warns(self): + """Test that unknown field options produce a warning.""" + source = ''' + package myapp; + message User { + string name = 1 [unknown_option = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have one warning + assert len(w) == 1 + assert "ignoring unknown field option 'unknown_option'" in str(w[0].message) + assert "field 'name'" in str(w[0].message) + + def test_known_field_option_no_warning(self): + """Test that known field options don't produce warnings.""" + source = ''' + package myapp; + message User { + string name = 1 [deprecated = true]; + string email = 2 [json_name = "emailAddress"]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have no warnings + assert len(w) == 0 + + def test_multiple_unknown_field_options_warn(self): + """Test that multiple unknown field options each produce a warning.""" + source = ''' + package myapp; + message User { + string name = 1 [foo = "bar", baz = 123]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have two warnings + assert len(w) == 2 + assert "foo" in str(w[0].message) + assert "baz" in str(w[1].message) + + def test_field_options_on_repeated_field(self): + """Test parsing field options on a repeated field.""" + source = ''' + package myapp; + message User { + repeated string tags = 1 [packed = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert len(user.fields) == 1 + assert user.fields[0].name == "tags" + + def test_field_options_on_optional_field(self): + """Test parsing field options on an optional field.""" + source = ''' + package myapp; + message User { + optional string nickname = 1 [deprecated = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert len(user.fields) == 1 + assert user.fields[0].name == "nickname" + assert user.fields[0].optional is True + + def test_field_options_on_map_field(self): + """Test parsing field options on a map field.""" + source = ''' + package myapp; + message User { + map scores = 1 [deprecated = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert len(schema.messages) == 1 + user = schema.messages[0] + assert len(user.fields) == 1 + assert user.fields[0].name == "scores" + + +class TestForyExtensionOptions: + """Tests for Fory extension option syntax: option (fory).key = value.""" + + def test_file_level_fory_option(self): + """Test parsing file-level Fory extension option.""" + source = ''' + package myapp; + option (fory).use_record_for_java_message = true; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.get_option("fory.use_record_for_java_message") is True + + def test_multiple_file_level_fory_options(self): + """Test parsing multiple file-level Fory extension options.""" + source = ''' + package myapp; + option (fory).use_record_for_java_message = true; + option (fory).polymorphism = false; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.get_option("fory.use_record_for_java_message") is True + assert schema.get_option("fory.polymorphism") is False + + def test_message_level_fory_option(self): + """Test parsing message-level Fory extension option.""" + source = ''' + package myapp; + message User { + option (fory).id = 100; + option (fory).evolving = false; + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + user = schema.messages[0] + assert user.type_id == 100 # fory.id should set type_id + assert user.options.get("fory.id") == 100 + assert user.options.get("fory.evolving") is False + + def test_message_fory_id_sets_type_id(self): + """Test that option (fory).id sets message type_id.""" + source = ''' + package myapp; + message User { + option (fory).id = 200; + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + user = schema.messages[0] + assert user.type_id == 200 + + def test_enum_level_fory_option(self): + """Test parsing enum-level Fory extension option.""" + source = ''' + package myapp; + enum Status { + option (fory).id = 300; + UNKNOWN = 0; + ACTIVE = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + status = schema.enums[0] + assert status.type_id == 300 + assert status.options.get("fory.id") == 300 + + def test_field_level_fory_option(self): + """Test parsing field-level Fory extension option.""" + source = ''' + package myapp; + message User { + MyType friend = 1 [(fory).ref = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + user = schema.messages[0] + field = user.fields[0] + assert field.ref is True # fory.ref should set ref flag + assert field.options.get("fory.ref") is True + + def test_field_fory_nullable_sets_optional(self): + """Test that (fory).nullable sets optional flag.""" + source = ''' + package myapp; + message User { + string nickname = 1 [(fory).nullable = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + user = schema.messages[0] + field = user.fields[0] + assert field.optional is True + assert field.options.get("fory.nullable") is True + + def test_field_multiple_fory_options(self): + """Test parsing multiple Fory extension options on a field.""" + source = ''' + package myapp; + message User { + MyType friend = 1 [(fory).ref = true, (fory).nullable = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + user = schema.messages[0] + field = user.fields[0] + assert field.ref is True + assert field.optional is True + assert field.options.get("fory.ref") is True + assert field.options.get("fory.nullable") is True + + def test_mixed_standard_and_fory_options(self): + """Test mixing standard and Fory extension options.""" + source = ''' + package myapp; + option java_package = "com.example"; + option (fory).use_record_for_java_message = true; + message User { + string name = 1 [deprecated = true, (fory).nullable = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + assert schema.get_option("java_package") == "com.example" + assert schema.get_option("fory.use_record_for_java_message") is True + + user = schema.messages[0] + field = user.fields[0] + assert field.optional is True + assert field.options.get("deprecated") is True + assert field.options.get("fory.nullable") is True + + def test_unknown_fory_file_option_warns(self): + """Test that unknown Fory file options produce a warning.""" + source = ''' + package myapp; + option (fory).unknown_option = true; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have one warning + assert len(w) == 1 + assert "ignoring unknown fory option 'unknown_option'" in str(w[0].message) + + # Option should still be stored + assert schema.get_option("fory.unknown_option") is True + + def test_unknown_fory_message_option_warns(self): + """Test that unknown Fory message options produce a warning.""" + source = ''' + package myapp; + message User { + option (fory).unknown_opt = true; + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have one warning + assert len(w) == 1 + assert "ignoring unknown fory message option 'unknown_opt'" in str(w[0].message) + + def test_unknown_fory_field_option_warns(self): + """Test that unknown Fory field options produce a warning.""" + source = ''' + package myapp; + message User { + string name = 1 [(fory).unknown_opt = true]; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have one warning + assert len(w) == 1 + assert "ignoring unknown fory field option 'unknown_opt'" in str(w[0].message) + + def test_unknown_extension_warns(self): + """Test that unknown extension names produce a warning.""" + source = ''' + package myapp; + option (custom).my_option = true; + message User { + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + schema = parser.parse() + + # Should have one warning + assert len(w) == 1 + assert "ignoring unknown extension 'custom'" in str(w[0].message) + + def test_inline_and_body_options_merge(self): + """Test that inline [id=100] and body option (fory).evolving merge.""" + source = ''' + package myapp; + message User [id=100] { + option (fory).evolving = false; + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + user = schema.messages[0] + assert user.type_id == 100 # From inline option + assert user.options.get("id") == 100 # Stored in options + assert user.options.get("fory.evolving") is False # From body option + + def test_body_option_overrides_inline_id(self): + """Test that body option (fory).id overrides inline [id=...].""" + source = ''' + package myapp; + message User [id=100] { + option (fory).id = 200; + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + user = schema.messages[0] + # Body option should take precedence, but since inline sets type_id first, + # and we only set type_id from body if it was None, inline wins + assert user.type_id == 100 + # Both should be in options + assert user.options.get("id") == 100 + assert user.options.get("fory.id") == 200 + + def test_message_use_record_for_java_option(self): + """Test message-level (fory).use_record_for_java option.""" + source = ''' + package myapp; + message User { + option (fory).use_record_for_java = true; + string name = 1; + } + ''' + lexer = Lexer(source) + parser = Parser(lexer.tokenize()) + schema = parser.parse() + + user = schema.messages[0] + assert user.options.get("fory.use_record_for_java") is True + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/docs/README.md b/docs/README.md index 4f80300eb9..6227c4916a 100644 --- a/docs/README.md +++ b/docs/README.md @@ -6,6 +6,17 @@ - For Scala Guide, see [scala guide](guide/scala_guide.md) doc. - For using Apache Fory™ with GraalVM native image, see [graalvm native image guide](guide/graalvm_guide.md) doc. +## FDL Schema (Fory Definition Language) + +Define cross-language data structures with FDL and generate native code for multiple languages. + +- [FDL Overview](schema/index.md) - Introduction and quick start +- [FDL Syntax Reference](schema/fdl-syntax.md) - Complete language syntax +- [Type System](schema/type-system.md) - Primitive types, collections, and mappings +- [Compiler Guide](schema/compiler-guide.md) - CLI usage and build integration +- [Generated Code](schema/generated-code.md) - Output format for each language +- [Protocol Buffers vs FDL](schema/proto-vs-fdl.md) - Feature comparison and migration + ## Serialization Format - For Cross Language Serialization Format, see [xlang serialization spec](specification/xlang_serialization_spec.md) doc. diff --git a/docs/schema/compiler-guide.md b/docs/schema/compiler-guide.md new file mode 100644 index 0000000000..ea1b4154f5 --- /dev/null +++ b/docs/schema/compiler-guide.md @@ -0,0 +1,612 @@ +--- +title: FDL Compiler Guide +sidebar_position: 4 +id: fdl_compiler_guide +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +# FDL Compiler Guide + +This guide covers installation, usage, and integration of the FDL compiler. + +## Installation + +### From Source + +```bash +cd compiler +pip install -e . +``` + +### Verify Installation + +```bash +fory compile --help +``` + +## Command Line Interface + +### Basic Usage + +```bash +fory compile [OPTIONS] FILES... +``` + +### Options + +| Option | Description | Default | +| ------------------------------------- | ----------------------------------------------------- | ------------- | +| `--lang` | Comma-separated target languages | `all` | +| `--output`, `-o` | Output directory | `./generated` | +| `--package` | Override package name from FDL file | (from file) | +| `-I`, `--proto_path`, `--import_path` | Add directory to import search path (can be repeated) | (none) | +| `--java_out=DST_DIR` | Generate Java code in DST_DIR | (none) | +| `--python_out=DST_DIR` | Generate Python code in DST_DIR | (none) | +| `--cpp_out=DST_DIR` | Generate C++ code in DST_DIR | (none) | +| `--go_out=DST_DIR` | Generate Go code in DST_DIR | (none) | +| `--rust_out=DST_DIR` | Generate Rust code in DST_DIR | (none) | + +### Examples + +**Compile for all languages:** + +```bash +fory compile schema.fdl +``` + +**Compile for specific languages:** + +```bash +fory compile schema.fdl --lang java,python +``` + +**Specify output directory:** + +```bash +fory compile schema.fdl --output ./src/generated +``` + +**Override package name:** + +```bash +fory compile schema.fdl --package com.myapp.models +``` + +**Compile multiple files:** + +```bash +fory compile user.fdl order.fdl product.fdl --output ./generated +``` + +**Use import search paths:** + +```bash +# Add a single import path +fory compile src/main.fdl -I libs/common + +# Add multiple import paths (repeated option) +fory compile src/main.fdl -I libs/common -I libs/types + +# Add multiple import paths (comma-separated) +fory compile src/main.fdl -I libs/common,libs/types,third_party/ + +# Using --proto_path (protoc-compatible alias) +fory compile src/main.fdl --proto_path=libs/common + +# Mix all styles +fory compile src/main.fdl -I libs/common,libs/types --proto_path third_party/ +``` + +**Language-specific output directories (protoc-style):** + +```bash +# Generate only Java code to a specific directory +fory compile schema.fdl --java_out=./src/main/java + +# Generate multiple languages to different directories +fory compile schema.fdl --java_out=./java/gen --python_out=./python/src --go_out=./go/gen + +# Combine with import paths +fory compile schema.fdl --java_out=./gen/java -I proto/ -I common/ +``` + +When using `--{lang}_out` options: +- Only the specified languages are generated (not all languages) +- Files are placed directly in the specified directory (not in a `{lang}/` subdirectory) +- This is compatible with protoc-style workflows + +## Import Path Resolution + +When compiling FDL files with imports, the compiler searches for imported files in this order: + +1. **Relative to the importing file (default)** - The directory containing the file with the import statement is always searched first, automatically. No `-I` flag needed for same-directory imports. +2. **Each `-I` path in order** - Additional search paths specified on the command line + +**Same-directory imports work automatically:** + +```fdl +// main.fdl +import "common.fdl"; // Found if common.fdl is in the same directory +``` + +```bash +# No -I needed for same-directory imports +fory compile main.fdl +``` + +**Example project structure:** + +``` +project/ +├── src/ +│ └── main.fdl # import "common.fdl"; +└── libs/ + └── common.fdl +``` + +**Without `-I` (fails):** + +```bash +$ fory compile src/main.fdl +Import error: Import not found: common.fdl + Searched in: /project/src +``` + +**With `-I` (succeeds):** + +```bash +$ fory compile src/main.fdl -I libs/ +Compiling src/main.fdl... + Resolved 1 import(s) +``` + +## Supported Languages + +| Language | Flag | Output Extension | Description | +| -------- | -------- | ---------------- | --------------------------- | +| Java | `java` | `.java` | POJOs with Fory annotations | +| Python | `python` | `.py` | Dataclasses with type hints | +| Go | `go` | `.go` | Structs with struct tags | +| Rust | `rust` | `.rs` | Structs with derive macros | +| C++ | `cpp` | `.h` | Structs with FORY macros | + +## Output Structure + +### Java + +``` +generated/ +└── java/ + └── com/ + └── example/ + ├── User.java + ├── Order.java + ├── Status.java + └── ExampleForyRegistration.java +``` + +- One file per type (enum or message) +- Package structure matches FDL package +- Registration helper class generated + +### Python + +``` +generated/ +└── python/ + └── example.py +``` + +- Single module with all types +- Module name derived from package +- Registration function included + +### Go + +``` +generated/ +└── go/ + └── example.go +``` + +- Single file with all types +- Package name from last component of FDL package +- Registration function included + +### Rust + +``` +generated/ +└── rust/ + └── example.rs +``` + +- Single module with all types +- Module name derived from package +- Registration function included + +### C++ + +``` +generated/ +└── cpp/ + └── example.h +``` + +- Single header file +- Namespace matches package (dots to `::`) +- Header guards and forward declarations + +## Build Integration + +### Maven (Java) + +Add to your `pom.xml`: + +```xml + + + + org.codehaus.mojo + exec-maven-plugin + 3.1.0 + + + generate-fory-types + generate-sources + + exec + + + fory + + compile + ${project.basedir}/src/main/fdl/schema.fdl + --lang + java + --output + ${project.build.directory}/generated-sources/fdl + + + + + + + +``` + +Add generated sources: + +```xml + + + + org.codehaus.mojo + build-helper-maven-plugin + 3.4.0 + + + generate-sources + + add-source + + + + ${project.build.directory}/generated-sources/fdl + + + + + + + +``` + +### Gradle (Java/Kotlin) + +Add to `build.gradle`: + +```groovy +task generateForyTypes(type: Exec) { + commandLine 'fory', 'compile', + "${projectDir}/src/main/fdl/schema.fdl", + '--lang', 'java', + '--output', "${buildDir}/generated/sources/fdl" +} + +compileJava.dependsOn generateForyTypes + +sourceSets { + main { + java { + srcDir "${buildDir}/generated/sources/fdl/java" + } + } +} +``` + +### Python (setuptools) + +Add to `setup.py` or `pyproject.toml`: + +```python +# setup.py +from setuptools import setup +from setuptools.command.build_py import build_py +import subprocess + +class BuildWithFdl(build_py): + def run(self): + subprocess.run([ + 'fory', 'compile', + 'schema.fdl', + '--lang', 'python', + '--output', 'src/generated' + ], check=True) + super().run() + +setup( + cmdclass={'build_py': BuildWithFdl}, + # ... +) +``` + +### Go (go generate) + +Add to your Go file: + +```go +//go:generate fory compile ../schema.fdl --lang go --output . +package models +``` + +Run: + +```bash +go generate ./... +``` + +### Rust (build.rs) + +Add to `build.rs`: + +```rust +use std::process::Command; + +fn main() { + println!("cargo:rerun-if-changed=schema.fdl"); + + let status = Command::new("fory") + .args(&["compile", "schema.fdl", "--lang", "rust", "--output", "src/generated"]) + .status() + .expect("Failed to run fory compiler"); + + if !status.success() { + panic!("FDL compilation failed"); + } +} +``` + +### CMake (C++) + +Add to `CMakeLists.txt`: + +```cmake +find_program(FORY_COMPILER fory) + +add_custom_command( + OUTPUT ${CMAKE_CURRENT_SOURCE_DIR}/generated/example.h + COMMAND ${FORY_COMPILER} compile + ${CMAKE_CURRENT_SOURCE_DIR}/schema.fdl + --lang cpp + --output ${CMAKE_CURRENT_SOURCE_DIR}/generated + DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/schema.fdl + COMMENT "Generating FDL types" +) + +add_custom_target(generate_fdl DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/generated/example.h) + +add_library(mylib ...) +add_dependencies(mylib generate_fdl) +target_include_directories(mylib PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/generated) +``` + +### Bazel + +Create a rule in `BUILD`: + +```python +genrule( + name = "generate_fdl", + srcs = ["schema.fdl"], + outs = ["generated/example.h"], + cmd = "$(location //:fory_compiler) compile $(SRCS) --lang cpp --output $(RULEDIR)/generated", + tools = ["//:fory_compiler"], +) + +cc_library( + name = "models", + hdrs = [":generate_fdl"], + # ... +) +``` + +## Error Handling + +### Syntax Errors + +``` +Error: Line 5, Column 12: Expected ';' after field declaration +``` + +Fix: Check the indicated line for missing semicolons or syntax issues. + +### Duplicate Type Names + +``` +Error: Duplicate type name: User +``` + +Fix: Ensure each enum and message has a unique name within the file. + +### Duplicate Type IDs + +``` +Error: Duplicate type ID 100: User and Order +``` + +Fix: Assign unique type IDs to each type. + +### Unknown Type References + +``` +Error: Unknown type 'Address' in Customer.address +``` + +Fix: Define the referenced type before using it, or check for typos. + +### Duplicate Field Numbers + +``` +Error: Duplicate field number 1 in User: name and id +``` + +Fix: Assign unique field numbers within each message. + +## Best Practices + +### Project Structure + +``` +project/ +├── fdl/ +│ ├── common.fdl # Shared types +│ ├── user.fdl # User domain +│ └── order.fdl # Order domain +├── src/ +│ └── generated/ # Generated code (git-ignored) +└── build.gradle +``` + +### Version Control + +- **Track**: FDL schema files +- **Ignore**: Generated code (can be regenerated) + +Add to `.gitignore`: + +``` +# Generated FDL code +src/generated/ +generated/ +``` + +### CI/CD Integration + +Always regenerate during builds: + +```yaml +# GitHub Actions example +steps: + - name: Install FDL Compiler + run: pip install ./compiler + + - name: Generate Types + run: fory compile fdl/*.fdl --output src/generated + + - name: Build + run: ./gradlew build +``` + +### Schema Evolution + +When modifying schemas: + +1. **Never reuse field numbers** - Mark as reserved instead +2. **Never change type IDs** - They're part of the binary format +3. **Add new fields** - Use new field numbers +4. **Use `optional`** - For backward compatibility + +```fdl +message User [id=100] { + string id = 1; + string name = 2; + // Field 3 was removed, don't reuse + optional string email = 4; // New field +} +``` + +## Troubleshooting + +### Command Not Found + +``` +fory: command not found +``` + +**Solution:** Ensure the compiler is installed and in your PATH: + +```bash +pip install -e ./compiler +# Or add to PATH +export PATH=$PATH:~/.local/bin +``` + +### Permission Denied + +``` +Permission denied: ./generated +``` + +**Solution:** Ensure write permissions on the output directory: + +```bash +chmod -R u+w ./generated +``` + +### Import Errors in Generated Code + +**Java:** Ensure Fory dependency is in your project: + +```xml + + org.apache.fory + fory-core + 0.14.1 + +``` + +**Python:** Ensure pyfory is installed: + +```bash +pip install pyfory +``` + +**Go:** Ensure fory module is available: + +```bash +go get github.com/apache/fory/go/fory +``` + +**Rust:** Ensure fory crate is in `Cargo.toml`: + +```toml +[dependencies] +fory = "0.13" +``` + +**C++:** Ensure Fory headers are in include path. diff --git a/docs/schema/fdl-syntax.md b/docs/schema/fdl-syntax.md new file mode 100644 index 0000000000..d3aced08bc --- /dev/null +++ b/docs/schema/fdl-syntax.md @@ -0,0 +1,1169 @@ +--- +title: FDL Syntax Reference +sidebar_position: 2 +id: fdl_syntax +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +# FDL Syntax Reference + +This document provides a complete reference for the Fory Definition Language (FDL) syntax. + +## File Structure + +An FDL file consists of: + +1. Optional package declaration +2. Optional import statements +3. Type definitions (enums and messages) + +```fdl +// Optional package declaration +package com.example.models; + +// Import statements +import "common/types.fdl"; + +// Type definitions +enum Color [id=100] { ... } +message User [id=101] { ... } +message Order [id=102] { ... } +``` + +## Comments + +FDL supports both single-line and block comments: + +```fdl +// This is a single-line comment + +/* + * This is a block comment + * that spans multiple lines + */ + +message Example { + string name = 1; // Inline comment +} +``` + +## Package Declaration + +The package declaration defines the namespace for all types in the file. + +```fdl +package com.example.models; +``` + +**Rules:** + +- Optional but recommended +- Must appear before any type definitions +- Only one package declaration per file +- Used for namespace-based type registration + +**Language Mapping:** + +| Language | Package Usage | +| -------- | --------------------------------- | +| Java | Java package | +| Python | Module name (dots to underscores) | +| Go | Package name (last component) | +| Rust | Module name (dots to underscores) | +| C++ | Namespace (dots to `::`) | + +## File-Level Options + +Options can be specified at file level to control language-specific code generation. + +### Syntax + +```fdl +option option_name = value; +``` + +### Java Package Option + +Override the Java package for generated code: + +```fdl +package payment; +option java_package = "com.mycorp.payment.v1"; + +message Payment { + string id = 1; +} +``` + +**Effect:** + +- Generated Java files will be in `com/mycorp/payment/v1/` directory +- Java package declaration will be `package com.mycorp.payment.v1;` +- Type registration still uses the FDL package (`payment`) for cross-language compatibility + +### Go Package Option + +Specify the Go import path and package name: + +```fdl +package payment; +option go_package = "github.com/mycorp/apis/gen/payment/v1;paymentv1"; + +message Payment { + string id = 1; +} +``` + +**Format:** `"import/path;package_name"` or just `"import/path"` (last segment used as package name) + +**Effect:** + +- Generated Go files will have `package paymentv1` +- The import path can be used in other Go code +- Type registration still uses the FDL package (`payment`) for cross-language compatibility + +### Java Outer Classname Option + +Generate all types as inner classes of a single outer wrapper class: + +```fdl +package payment; +option java_outer_classname = "DescriptorProtos"; + +enum Status { + UNKNOWN = 0; + ACTIVE = 1; +} + +message Payment { + string id = 1; + Status status = 2; +} +``` + +**Effect:** + +- Generates a single file `DescriptorProtos.java` instead of separate files +- All enums and messages become `public static` inner classes +- The outer class is `public final` with a private constructor +- Useful for grouping related types together + +**Generated structure:** + +```java +public final class DescriptorProtos { + private DescriptorProtos() {} + + public static enum Status { + UNKNOWN, + ACTIVE; + } + + public static class Payment { + private String id; + private Status status; + // ... + } +} +``` + +**Combined with java_package:** + +```fdl +package payment; +option java_package = "com.example.proto"; +option java_outer_classname = "PaymentProtos"; + +message Payment { + string id = 1; +} +``` + +This generates `com/example/proto/PaymentProtos.java` with all types as inner classes. + +### Java Multiple Files Option + +Control whether types are generated in separate files or as inner classes: + +```fdl +package payment; +option java_outer_classname = "PaymentProtos"; +option java_multiple_files = true; + +message Payment { + string id = 1; +} + +message Receipt { + string id = 1; +} +``` + +**Behavior:** + +| `java_outer_classname` | `java_multiple_files` | Result | +| ---------------------- | --------------------- | ------------------------------------------- | +| Not set | Any | Separate files (one per type) | +| Set | `false` (default) | Single file with all types as inner classes | +| Set | `true` | Separate files (overrides outer class) | + +**Effect of `java_multiple_files = true`:** + +- Each top-level enum and message gets its own `.java` file +- Overrides `java_outer_classname` behavior +- Useful when you want separate files but still specify an outer class name for other purposes + +**Example without java_multiple_files (default):** + +```fdl +option java_outer_classname = "PaymentProtos"; +// Generates: PaymentProtos.java containing Payment and Receipt as inner classes +``` + +**Example with java_multiple_files = true:** + +```fdl +option java_outer_classname = "PaymentProtos"; +option java_multiple_files = true; +// Generates: Payment.java, Receipt.java (separate files) +``` + +### Multiple Options + +Multiple options can be specified: + +```fdl +package payment; +option java_package = "com.mycorp.payment.v1"; +option go_package = "github.com/mycorp/apis/gen/payment/v1;paymentv1"; +option deprecated = true; + +message Payment { + string id = 1; +} +``` + +### Fory Extension Options + +FDL supports protobuf-style extension options for Fory-specific configuration: + +```fdl +option (fory).use_record_for_java_message = true; +option (fory).polymorphism = true; +``` + +**Available File Options:** + +| Option | Type | Description | +| ----------------------------- | ---- | ---------------------------------------- | +| `use_record_for_java_message` | bool | Generate Java records instead of classes | +| `polymorphism` | bool | Enable polymorphism for all types | + +See the [Fory Extension Options](#fory-extension-options) section for complete documentation of message, enum, and field options. + +### Option Priority + +For language-specific packages: + +1. Command-line package override (highest priority) +2. Language-specific option (`java_package`, `go_package`) +3. FDL package declaration (fallback) + +**Example:** + +```fdl +package myapp.models; +option java_package = "com.example.generated"; +``` + +| Scenario | Java Package Used | +| ------------------------- | ------------------------- | +| No override | `com.example.generated` | +| CLI: `--package=override` | `override` | +| No java_package option | `myapp.models` (fallback) | + +### Cross-Language Type Registration + +Language-specific options only affect where code is generated, not the type namespace used for serialization. This ensures cross-language compatibility: + +```fdl +package myapp.models; +option java_package = "com.mycorp.generated"; +option go_package = "github.com/mycorp/gen;genmodels"; + +message User { + string name = 1; +} +``` + +All languages will register `User` with namespace `myapp.models`, enabling: + +- Java serialized data → Go deserialization +- Go serialized data → Java deserialization +- Any language combination works seamlessly + +## Import Statement + +Import statements allow you to use types defined in other FDL files. + +### Basic Syntax + +```fdl +import "path/to/file.fdl"; +``` + +### Multiple Imports + +```fdl +import "common/types.fdl"; +import "common/enums.fdl"; +import "models/address.fdl"; +``` + +### Path Resolution + +Import paths are resolved relative to the importing file: + +``` +project/ +├── common/ +│ └── types.fdl +├── models/ +│ ├── user.fdl # import "../common/types.fdl" +│ └── order.fdl # import "../common/types.fdl" +└── main.fdl # import "common/types.fdl" +``` + +**Rules:** + +- Import paths are quoted strings (double or single quotes) +- Paths are resolved relative to the importing file's directory +- Imported types become available as if defined in the current file +- Circular imports are detected and reported as errors +- Transitive imports work (if A imports B and B imports C, A has access to C's types) + +### Complete Example + +**common/types.fdl:** + +```fdl +package common; + +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} + +message Address [id=101] { + string street = 1; + string city = 2; + string country = 3; +} +``` + +**models/user.fdl:** + +```fdl +package models; +import "../common/types.fdl"; + +message User [id=200] { + string id = 1; + string name = 2; + Address home_address = 3; // Uses imported type + Status status = 4; // Uses imported enum +} +``` + +### Unsupported Import Syntax + +The following protobuf import modifiers are **not supported**: + +```fdl +// NOT SUPPORTED - will produce an error +import public "other.fdl"; +import weak "other.fdl"; +``` + +**`import public`**: FDL uses a simpler import model. All imported types are available to the importing file only. Re-exporting is not supported. Import each file directly where needed. + +**`import weak`**: FDL requires all imports to be present at compile time. Optional dependencies are not supported. + +### Import Errors + +The compiler reports errors for: + +- **File not found**: The imported file doesn't exist +- **Circular import**: A imports B which imports A (directly or indirectly) +- **Parse errors**: Syntax errors in imported files +- **Unsupported syntax**: `import public` or `import weak` + +## Enum Definition + +Enums define a set of named integer constants. + +### Basic Syntax + +```fdl +enum Status { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} +``` + +### With Type ID + +```fdl +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} +``` + +### Reserved Values + +Reserve field numbers or names to prevent reuse: + +```fdl +enum Status { + reserved 2, 15, 9 to 11, 40 to max; // Reserved numbers + reserved "OLD_STATUS", "DEPRECATED"; // Reserved names + PENDING = 0; + ACTIVE = 1; + COMPLETED = 3; +} +``` + +### Enum Options + +Options can be specified within enums: + +```fdl +enum Status { + option deprecated = true; // Allowed + PENDING = 0; + ACTIVE = 1; +} +``` + +**Forbidden Options:** + +- `option allow_alias = true` is **not supported**. Each enum value must have a unique integer. + +### Enum Prefix Stripping + +When enum values use a protobuf-style prefix (enum name in UPPER_SNAKE_CASE), the compiler automatically strips the prefix for languages with scoped enums: + +```fdl +// Input with prefix +enum DeviceTier { + DEVICE_TIER_UNKNOWN = 0; + DEVICE_TIER_TIER1 = 1; + DEVICE_TIER_TIER2 = 2; +} +``` + +**Generated code:** + +| Language | Output | Style | +| -------- | ----------------------------------------- | -------------- | +| Java | `UNKNOWN, TIER1, TIER2` | Scoped enum | +| Rust | `Unknown, Tier1, Tier2` | Scoped enum | +| C++ | `UNKNOWN, TIER1, TIER2` | Scoped enum | +| Python | `UNKNOWN, TIER1, TIER2` | Scoped IntEnum | +| Go | `DeviceTierUnknown, DeviceTierTier1, ...` | Unscoped const | + +**Note:** The prefix is only stripped if the remainder is a valid identifier. For example, `DEVICE_TIER_1` is kept unchanged because `1` is not a valid identifier name. + +**Grammar:** + +``` +enum_def := 'enum' IDENTIFIER [type_options] '{' enum_body '}' +type_options := '[' type_option (',' type_option)* ']' +type_option := IDENTIFIER '=' option_value +enum_body := (option_stmt | reserved_stmt | enum_value)* +option_stmt := 'option' IDENTIFIER '=' option_value ';' +reserved_stmt := 'reserved' reserved_items ';' +enum_value := IDENTIFIER '=' INTEGER ';' +``` + +**Rules:** + +- Enum names must be unique within the file +- Enum values must have explicit integer assignments +- Value integers must be unique within the enum (no aliases) +- Type ID (`[id=100]`) is optional but recommended for cross-language use + +**Example with All Features:** + +```fdl +// HTTP status code categories +enum HttpCategory [id=200] { + reserved 10 to 20; // Reserved for future use + reserved "UNKNOWN"; // Reserved name + INFORMATIONAL = 1; + SUCCESS = 2; + REDIRECTION = 3; + CLIENT_ERROR = 4; + SERVER_ERROR = 5; +} +``` + +## Message Definition + +Messages define structured data types with typed fields. + +### Basic Syntax + +```fdl +message Person { + string name = 1; + int32 age = 2; +} +``` + +### With Type ID + +```fdl +message Person [id=101] { + string name = 1; + int32 age = 2; +} +``` + +### Reserved Fields + +Reserve field numbers or names to prevent reuse after removing fields: + +```fdl +message User { + reserved 2, 15, 9 to 11; // Reserved field numbers + reserved "old_field", "temp"; // Reserved field names + string id = 1; + string name = 3; +} +``` + +### Message Options + +Options can be specified within messages: + +```fdl +message User { + option deprecated = true; + string id = 1; + string name = 2; +} +``` + +**Grammar:** + +``` +message_def := 'message' IDENTIFIER [type_options] '{' message_body '}' +type_options := '[' type_option (',' type_option)* ']' +type_option := IDENTIFIER '=' option_value +message_body := (option_stmt | reserved_stmt | nested_type | field_def)* +nested_type := enum_def | message_def +``` + +## Nested Types + +Messages can contain nested message and enum definitions. This is useful for defining types that are closely related to their parent message. + +### Nested Messages + +```fdl +message SearchResponse { + message Result { + string url = 1; + string title = 2; + repeated string snippets = 3; + } + repeated Result results = 1; +} +``` + +### Nested Enums + +```fdl +message Container { + enum Status { + STATUS_UNKNOWN = 0; + STATUS_ACTIVE = 1; + STATUS_INACTIVE = 2; + } + Status status = 1; +} +``` + +### Qualified Type Names + +Nested types can be referenced from other messages using qualified names (Parent.Child): + +```fdl +message SearchResponse { + message Result { + string url = 1; + string title = 2; + } +} + +message SearchResultCache { + // Reference nested type with qualified name + SearchResponse.Result cached_result = 1; + repeated SearchResponse.Result all_results = 2; +} +``` + +### Deeply Nested Types + +Nesting can be multiple levels deep: + +```fdl +message Outer { + message Middle { + message Inner { + string value = 1; + } + Inner inner = 1; + } + Middle middle = 1; +} + +message OtherMessage { + // Reference deeply nested type + Outer.Middle.Inner deep_ref = 1; +} +``` + +### Language-Specific Generation + +| Language | Nested Type Generation | +| -------- | ------------------------------------------------------ | +| Java | Static inner classes (`SearchResponse.Result`) | +| Python | Nested classes within dataclass | +| Go | Flat structs with underscore (`SearchResponse_Result`) | +| Rust | Flat structs with underscore (`SearchResponse_Result`) | +| C++ | Flat structs with underscore (`SearchResponse_Result`) | + +**Note:** For Go, Rust, and C++, nested types are flattened to top-level types with qualified names using underscores because these languages don't have true nested type support or it's not idiomatic. + +### Nested Type Rules + +- Nested type names must be unique within their parent message +- Nested types can have their own type IDs +- Type IDs must be globally unique (including nested types) +- Within a message, you can reference nested types by simple name +- From outside, use the qualified name (Parent.Child) + +## Field Definition + +Fields define the properties of a message. + +### Basic Syntax + +```fdl +field_type field_name = field_number; +``` + +### With Modifiers + +```fdl +optional ref repeated field_type field_name = field_number; +``` + +**Grammar:** + +``` +field_def := [modifiers] field_type IDENTIFIER '=' INTEGER ';' +modifiers := ['optional'] ['ref'] ['repeated'] +field_type := primitive_type | named_type | map_type +``` + +### Field Modifiers + +#### `optional` + +Marks the field as nullable: + +```fdl +message User { + string name = 1; // Required, non-null + optional string email = 2; // Nullable +} +``` + +**Generated Code:** + +| Language | Non-optional | Optional | +| -------- | ------------------ | ----------------------------------------------- | +| Java | `String name` | `String email` with `@ForyField(nullable=true)` | +| Python | `name: str` | `name: Optional[str]` | +| Go | `Name string` | `Name *string` | +| Rust | `name: String` | `name: Option` | +| C++ | `std::string name` | `std::optional name` | + +#### `ref` + +Enables reference tracking for shared/circular references: + +```fdl +message Node { + string value = 1; + ref Node parent = 2; // Can point to shared object + repeated ref Node children = 3; +} +``` + +**Use Cases:** + +- Shared objects (same object referenced multiple times) +- Circular references (object graphs with cycles) +- Tree structures with parent pointers + +**Generated Code:** + +| Language | Without `ref` | With `ref` | +| -------- | -------------- | ------------------------------------------------- | +| Java | `Node parent` | `Node parent` with `@ForyField(trackingRef=true)` | +| Python | `parent: Node` | `parent: Node` (runtime tracking) | +| Go | `Parent Node` | `Parent *Node` with `fory:"trackRef"` | +| Rust | `parent: Node` | `parent: Rc` | +| C++ | `Node parent` | `std::shared_ptr parent` | + +#### `repeated` + +Marks the field as a list/array: + +```fdl +message Document { + repeated string tags = 1; + repeated User authors = 2; +} +``` + +**Generated Code:** + +| Language | Type | +| -------- | -------------------------- | +| Java | `List` | +| Python | `List[str]` | +| Go | `[]string` | +| Rust | `Vec` | +| C++ | `std::vector` | + +### Combining Modifiers + +Modifiers can be combined: + +```fdl +message Example { + optional repeated string tags = 1; // Nullable list + repeated ref Node nodes = 2; // List of tracked references + optional ref User owner = 3; // Nullable tracked reference +} +``` + +**Order:** `optional` must come before `ref`, which must come before `repeated`. + +## Type System + +### Primitive Types + +| Type | Description | Size | +| ----------- | --------------------------- | -------- | +| `bool` | Boolean value | 1 byte | +| `int8` | Signed 8-bit integer | 1 byte | +| `int16` | Signed 16-bit integer | 2 bytes | +| `int32` | Signed 32-bit integer | 4 bytes | +| `int64` | Signed 64-bit integer | 8 bytes | +| `float32` | 32-bit floating point | 4 bytes | +| `float64` | 64-bit floating point | 8 bytes | +| `string` | UTF-8 string | Variable | +| `bytes` | Binary data | Variable | +| `date` | Calendar date | Variable | +| `timestamp` | Date and time with timezone | Variable | + +See [Type System](type-system.md) for complete type mappings. + +### Named Types + +Reference other messages or enums by name: + +```fdl +enum Status { ... } +message User { ... } + +message Order { + User customer = 1; // Reference to User message + Status status = 2; // Reference to Status enum +} +``` + +### Map Types + +Maps with typed keys and values: + +```fdl +message Config { + map properties = 1; + map counts = 2; + map users = 3; +} +``` + +**Syntax:** `map` + +**Restrictions:** + +- Key type should be a primitive type (typically `string` or integer types) +- Value type can be any type including messages + +## Field Numbers + +Each field must have a unique positive integer identifier: + +```fdl +message Example { + string first = 1; + string second = 2; + string third = 3; +} +``` + +**Rules:** + +- Must be unique within a message +- Must be positive integers +- Used for field ordering and identification +- Gaps in numbering are allowed (useful for deprecating fields) + +**Best Practices:** + +- Use sequential numbers starting from 1 +- Reserve number ranges for different categories +- Never reuse numbers for different fields (even after deletion) + +## Type IDs + +Type IDs enable efficient cross-language serialization: + +```fdl +enum Color [id=100] { ... } +message User [id=101] { ... } +message Order [id=102] { ... } +``` + +### With Type ID (Recommended) + +```fdl +message User [id=101] { ... } +message User [id=101, deprecated=true] { ... } // Multiple options +``` + +- Serialized as compact integer +- Fast lookup during deserialization +- Must be globally unique across all types +- Recommended for production use + +### Without Type ID + +```fdl +message Config { ... } +``` + +- Registered using namespace + name +- More flexible for development +- Slightly larger serialized size +- Uses package as namespace: `"package.Config"` + +### ID Assignment Strategy + +```fdl +// Enums: 100-199 +enum Status [id=100] { ... } +enum Priority [id=101] { ... } + +// User domain: 200-299 +message User [id=200] { ... } +message UserProfile [id=201] { ... } + +// Order domain: 300-399 +message Order [id=300] { ... } +message OrderItem [id=301] { ... } +``` + +## Complete Example + +```fdl +// E-commerce domain model +package com.shop.models; + +// Enums with type IDs +enum OrderStatus [id=100] { + PENDING = 0; + CONFIRMED = 1; + SHIPPED = 2; + DELIVERED = 3; + CANCELLED = 4; +} + +enum PaymentMethod [id=101] { + CREDIT_CARD = 0; + DEBIT_CARD = 1; + PAYPAL = 2; + BANK_TRANSFER = 3; +} + +// Messages with type IDs +message Address [id=200] { + string street = 1; + string city = 2; + string state = 3; + string country = 4; + string postal_code = 5; +} + +message Customer [id=201] { + string id = 1; + string name = 2; + optional string email = 3; + optional string phone = 4; + optional Address billing_address = 5; + optional Address shipping_address = 6; +} + +message Product [id=202] { + string sku = 1; + string name = 2; + string description = 3; + float64 price = 4; + int32 stock = 5; + repeated string categories = 6; + map attributes = 7; +} + +message OrderItem [id=203] { + ref Product product = 1; // Track reference to avoid duplication + int32 quantity = 2; + float64 unit_price = 3; +} + +message Order [id=204] { + string id = 1; + ref Customer customer = 2; + repeated OrderItem items = 3; + OrderStatus status = 4; + PaymentMethod payment_method = 5; + float64 total = 6; + optional string notes = 7; + timestamp created_at = 8; + optional timestamp shipped_at = 9; +} + +// Config without type ID (uses namespace registration) +message ShopConfig { + string store_name = 1; + string currency = 2; + float64 tax_rate = 3; + repeated string supported_countries = 4; +} +``` + +## Fory Extension Options + +FDL supports protobuf-style extension options for Fory-specific configuration. These use the `(fory)` prefix to indicate they are Fory extensions. + +### File-Level Fory Options + +```fdl +option (fory).use_record_for_java_message = true; +option (fory).polymorphism = true; +``` + +| Option | Type | Description | +| ----------------------------- | ---- | ---------------------------------------- | +| `use_record_for_java_message` | bool | Generate Java records instead of classes | +| `polymorphism` | bool | Enable polymorphism for all types | + +### Message-Level Fory Options + +Options can be specified inside the message body: + +```fdl +message MyMessage { + option (fory).id = 100; + option (fory).evolving = false; + option (fory).use_record_for_java = true; + string name = 1; +} +``` + +| Option | Type | Description | +| --------------------- | ------ | ------------------------------------------ | +| `id` | int | Type ID for serialization (sets type_id) | +| `evolving` | bool | Schema evolution support (default: true). When false, schema is fixed like a struct | +| `use_record_for_java` | bool | Generate Java record for this message | +| `deprecated` | bool | Mark this message as deprecated | +| `namespace` | string | Custom namespace for type registration | + +**Note:** `option (fory).id = 100` is equivalent to the inline syntax `message MyMessage [id=100]`. + +### Enum-Level Fory Options + +```fdl +enum Status { + option (fory).id = 101; + option (fory).deprecated = true; + UNKNOWN = 0; + ACTIVE = 1; +} +``` + +| Option | Type | Description | +| ------------ | ---- | --------------------------------------- | +| `id` | int | Type ID for serialization (sets type_id)| +| `deprecated` | bool | Mark this enum as deprecated | + +### Field-Level Fory Options + +Field options are specified in brackets after the field number: + +```fdl +message Example { + MyType friend = 1 [(fory).ref = true]; + string nickname = 2 [(fory).nullable = true]; + MyType data = 3 [(fory).ref = true, (fory).nullable = true]; +} +``` + +| Option | Type | Description | +| ------------- | ------ | ------------------------------------------ | +| `ref` | bool | Enable reference tracking (sets ref flag) | +| `nullable` | bool | Mark field as nullable (sets optional flag)| +| `deprecated` | bool | Mark this field as deprecated | + +**Note:** `[(fory).ref = true]` is equivalent to using the `ref` modifier: `ref MyType friend = 1;` + +### Combining Standard and Fory Options + +You can combine standard options with Fory extension options: + +```fdl +message User { + option deprecated = true; // Standard option + option (fory).evolving = false; // Fory extension option + + string name = 1; + MyType data = 2 [deprecated = true, (fory).ref = true]; +} +``` + +### Fory Options Proto File + +For reference, the Fory options are defined in `extension/fory_options.proto`: + +```proto +// File-level options +extend google.protobuf.FileOptions { + optional ForyFileOptions fory = 50001; +} + +message ForyFileOptions { + optional bool use_record_for_java_message = 1; + optional bool polymorphism = 2; +} + +// Message-level options +extend google.protobuf.MessageOptions { + optional ForyMessageOptions fory = 50001; +} + +message ForyMessageOptions { + optional int32 id = 1; + optional bool evolving = 2; + optional bool use_record_for_java = 3; + optional bool deprecated = 4; + optional string namespace = 5; +} + +// Field-level options +extend google.protobuf.FieldOptions { + optional ForyFieldOptions fory = 50001; +} + +message ForyFieldOptions { + optional bool ref = 1; + optional bool nullable = 2; + optional bool deprecated = 3; +} +``` + +## Grammar Summary + +``` +file := [package_decl] file_option* import_decl* type_def* + +package_decl := 'package' package_name ';' +package_name := IDENTIFIER ('.' IDENTIFIER)* + +file_option := 'option' option_name '=' option_value ';' +option_name := IDENTIFIER | extension_name +extension_name := '(' IDENTIFIER ')' '.' IDENTIFIER // e.g., (fory).polymorphism + +import_decl := 'import' STRING ';' + +type_def := enum_def | message_def + +enum_def := 'enum' IDENTIFIER [type_options] '{' enum_body '}' +enum_body := (option_stmt | reserved_stmt | enum_value)* +enum_value := IDENTIFIER '=' INTEGER ';' + +message_def := 'message' IDENTIFIER [type_options] '{' message_body '}' +message_body := (option_stmt | reserved_stmt | nested_type | field_def)* +nested_type := enum_def | message_def +field_def := [modifiers] field_type IDENTIFIER '=' INTEGER [field_options] ';' + +option_stmt := 'option' option_name '=' option_value ';' +option_value := 'true' | 'false' | IDENTIFIER | INTEGER | STRING + +reserved_stmt := 'reserved' reserved_items ';' +reserved_items := reserved_item (',' reserved_item)* +reserved_item := INTEGER | INTEGER 'to' INTEGER | INTEGER 'to' 'max' | STRING + +modifiers := ['optional'] ['ref'] ['repeated'] + +field_type := primitive_type | named_type | map_type +primitive_type := 'bool' | 'int8' | 'int16' | 'int32' | 'int64' + | 'float32' | 'float64' | 'string' | 'bytes' + | 'date' | 'timestamp' +named_type := qualified_name +qualified_name := IDENTIFIER ('.' IDENTIFIER)* // e.g., Parent.Child +map_type := 'map' '<' field_type ',' field_type '>' + +type_options := '[' type_option (',' type_option)* ']' +type_option := IDENTIFIER '=' option_value // e.g., id=100, deprecated=true +field_options := '[' field_option (',' field_option)* ']' +field_option := option_name '=' option_value // e.g., deprecated=true, (fory).ref=true + +STRING := '"' [^"\n]* '"' | "'" [^'\n]* "'" +IDENTIFIER := [a-zA-Z_][a-zA-Z0-9_]* +INTEGER := '-'? [0-9]+ +``` diff --git a/docs/schema/generated-code.md b/docs/schema/generated-code.md new file mode 100644 index 0000000000..1a484b6505 --- /dev/null +++ b/docs/schema/generated-code.md @@ -0,0 +1,818 @@ +--- +title: Generated Code Reference +sidebar_position: 5 +id: fdl_generated_code +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +# Generated Code Reference + +This document explains the code generated by the FDL compiler for each target language. + +## Example Schema + +The examples in this document use this FDL schema: + +```fdl +package demo; + +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} + +message User [id=101] { + string id = 1; + string name = 2; + optional string email = 3; + int32 age = 4; +} + +message Order [id=102] { + string id = 1; + ref User customer = 2; + repeated string items = 3; + map quantities = 4; + Status status = 5; +} +``` + +## Enum Prefix Stripping + +When enum values use a protobuf-style prefix (enum name in UPPER_SNAKE_CASE), the compiler automatically strips the prefix for languages with scoped enums. This produces cleaner, more idiomatic code. + +**Input FDL:** + +```fdl +enum DeviceTier { + DEVICE_TIER_UNKNOWN = 0; + DEVICE_TIER_TIER1 = 1; + DEVICE_TIER_TIER2 = 2; +} +``` + +**Generated output by language:** + +| Language | Generated Values | Notes | +| -------- | ----------------------------------------- | ------------------------- | +| Java | `UNKNOWN, TIER1, TIER2` | Scoped enum | +| Rust | `Unknown, Tier1, Tier2` | PascalCase variants | +| C++ | `UNKNOWN, TIER1, TIER2` | Scoped `enum class` | +| Python | `UNKNOWN, TIER1, TIER2` | Scoped `IntEnum` | +| Go | `DeviceTierUnknown, DeviceTierTier1, ...` | Unscoped, prefix re-added | + +**Note:** Go uses unscoped constants, so the enum name prefix is added back to avoid naming collisions. + +## Nested Types + +When using nested message and enum definitions, the generated code varies by language. + +**Input FDL:** + +```fdl +message SearchResponse { + message Result { + string url = 1; + string title = 2; + } + repeated Result results = 1; +} +``` + +### Java - Inner Classes + +```java +public class SearchResponse { + public static class Result { + private String url; + private String title; + // getters, setters... + } + + private List results; + // getters, setters... +} +``` + +### Python - Nested Classes + +```python +@dataclass +class SearchResponse: + @dataclass + class Result: + url: str = "" + title: str = "" + + results: List[Result] = None +``` + +### Go - Flattened with Underscore + +```go +type SearchResponse_Result struct { + Url string + Title string +} + +type SearchResponse struct { + Results []SearchResponse_Result +} +``` + +### Rust - Flattened with Underscore + +```rust +#[derive(ForyObject)] +pub struct SearchResponse_Result { + pub url: String, + pub title: String, +} + +#[derive(ForyObject)] +pub struct SearchResponse { + pub results: Vec, +} +``` + +### C++ - Flattened with Underscore + +```cpp +struct SearchResponse_Result { + std::string url; + std::string title; +}; +FORY_STRUCT(SearchResponse_Result, url, title); + +struct SearchResponse { + std::vector results; +}; +FORY_STRUCT(SearchResponse, results); +``` + +**Summary:** + +| Language | Approach | Syntax Example | +| -------- | ------------------------- | ----------------------- | +| Java | Static inner classes | `SearchResponse.Result` | +| Python | Nested dataclasses | `SearchResponse.Result` | +| Go | Flattened with underscore | `SearchResponse_Result` | +| Rust | Flattened with underscore | `SearchResponse_Result` | +| C++ | Flattened with underscore | `SearchResponse_Result` | + +## Java + +### Enum Generation + +```java +package demo; + +public enum Status { + PENDING, + ACTIVE, + COMPLETED; +} +``` + +### Message Generation + +```java +package demo; + +import java.util.List; +import java.util.Map; +import org.apache.fory.annotation.ForyField; + +public class User { + private String id; + private String name; + + @ForyField(nullable = true) + private String email; + + private int age; + + public User() { + } + + public String getId() { + return id; + } + + public void setId(String id) { + this.id = id; + } + + public String getName() { + return name; + } + + public void setName(String name) { + this.name = name; + } + + public String getEmail() { + return email; + } + + public void setEmail(String email) { + this.email = email; + } + + public int getAge() { + return age; + } + + public void setAge(int age) { + this.age = age; + } +} +``` + +```java +package demo; + +import java.util.List; +import java.util.Map; +import org.apache.fory.annotation.ForyField; + +public class Order { + private String id; + + @ForyField(trackingRef = true) + private User customer; + + private List items; + private Map quantities; + private Status status; + + public Order() { + } + + // Getters and setters... +} +``` + +### Registration Helper + +```java +package demo; + +import org.apache.fory.Fory; + +public class DemoForyRegistration { + + public static void register(Fory fory) { + fory.register(Status.class, 100); + fory.register(User.class, 101); + fory.register(Order.class, 102); + } +} +``` + +### Usage + +```java +import demo.*; +import org.apache.fory.Fory; +import org.apache.fory.config.Language; + +public class Example { + public static void main(String[] args) { + Fory fory = Fory.builder() + .withLanguage(Language.XLANG) + .withRefTracking(true) + .build(); + + DemoForyRegistration.register(fory); + + User user = new User(); + user.setId("u123"); + user.setName("Alice"); + user.setAge(30); + + Order order = new Order(); + order.setId("o456"); + order.setCustomer(user); + order.setStatus(Status.ACTIVE); + + byte[] bytes = fory.serialize(order); + Order restored = (Order) fory.deserialize(bytes); + } +} +``` + +## Python + +### Module Generation + +```python +# Licensed to the Apache Software Foundation (ASF)... + +from dataclasses import dataclass +from enum import IntEnum +from typing import Dict, List, Optional +import pyfory + + +class Status(IntEnum): + PENDING = 0 + ACTIVE = 1 + COMPLETED = 2 + + +@dataclass +class User: + id: str = "" + name: str = "" + email: Optional[str] = None + age: pyfory.Int32Type = 0 + + +@dataclass +class Order: + id: str = "" + customer: Optional[User] = None + items: List[str] = None + quantities: Dict[str, pyfory.Int32Type] = None + status: Status = None + + +def register_demo_types(fory: pyfory.Fory): + fory.register_type(Status, type_id=100) + fory.register_type(User, type_id=101) + fory.register_type(Order, type_id=102) +``` + +### Usage + +```python +import pyfory +from demo import User, Order, Status, register_demo_types + +fory = pyfory.Fory(ref_tracking=True) +register_demo_types(fory) + +user = User(id="u123", name="Alice", age=30) +order = Order( + id="o456", + customer=user, + items=["item1", "item2"], + quantities={"item1": 2, "item2": 1}, + status=Status.ACTIVE +) + +data = fory.serialize(order) +restored = fory.deserialize(data) +``` + +## Go + +### File Generation + +```go +// Licensed to the Apache Software Foundation (ASF)... + +package demo + +import ( + fory "github.com/apache/fory/go/fory" +) + +type Status int32 + +const ( + StatusPending Status = 0 + StatusActive Status = 1 + StatusCompleted Status = 2 +) + +type User struct { + Id string + Name string + Email *string `fory:"nullable"` + Age int32 +} + +type Order struct { + Id string + Customer *User `fory:"trackRef"` + Items []string + Quantities map[string]int32 + Status Status +} + +func RegisterTypes(f *fory.Fory) error { + if err := f.RegisterEnum(Status(0), 100); err != nil { + return err + } + if err := f.Register(User{}, 101); err != nil { + return err + } + if err := f.Register(Order{}, 102); err != nil { + return err + } + return nil +} +``` + +### Usage + +```go +package main + +import ( + "demo" + fory "github.com/apache/fory/go/fory" +) + +func main() { + f := fory.NewFory(true) // Enable ref tracking + + if err := demo.RegisterTypes(f); err != nil { + panic(err) + } + + email := "alice@example.com" + user := &demo.User{ + Id: "u123", + Name: "Alice", + Email: &email, + Age: 30, + } + + order := &demo.Order{ + Id: "o456", + Customer: user, + Items: []string{"item1", "item2"}, + Quantities: map[string]int32{ + "item1": 2, + "item2": 1, + }, + Status: demo.StatusActive, + } + + bytes, err := f.Marshal(order) + if err != nil { + panic(err) + } + + var restored demo.Order + if err := f.Unmarshal(bytes, &restored); err != nil { + panic(err) + } +} +``` + +## Rust + +### Module Generation + +```rust +// Licensed to the Apache Software Foundation (ASF)... + +use fory::{Fory, ForyObject}; +use std::collections::HashMap; +use std::rc::Rc; + +#[derive(ForyObject, Debug, Clone, PartialEq, Default)] +#[repr(i32)] +pub enum Status { + #[default] + Pending = 0, + Active = 1, + Completed = 2, +} + +#[derive(ForyObject, Debug, Clone, PartialEq, Default)] +pub struct User { + pub id: String, + pub name: String, + #[fory(nullable = true)] + pub email: Option, + pub age: i32, +} + +#[derive(ForyObject, Debug, Clone, PartialEq, Default)] +pub struct Order { + pub id: String, + pub customer: Rc, + pub items: Vec, + pub quantities: HashMap, + pub status: Status, +} + +pub fn register_types(fory: &mut Fory) -> Result<(), fory::Error> { + fory.register::(100)?; + fory.register::(101)?; + fory.register::(102)?; + Ok(()) +} +``` + +### Usage + +```rust +use demo::{User, Order, Status, register_types}; +use fory::Fory; +use std::rc::Rc; +use std::collections::HashMap; + +fn main() -> Result<(), fory::Error> { + let mut fory = Fory::default(); + register_types(&mut fory)?; + + let user = Rc::new(User { + id: "u123".to_string(), + name: "Alice".to_string(), + email: Some("alice@example.com".to_string()), + age: 30, + }); + + let mut quantities = HashMap::new(); + quantities.insert("item1".to_string(), 2); + quantities.insert("item2".to_string(), 1); + + let order = Order { + id: "o456".to_string(), + customer: user, + items: vec!["item1".to_string(), "item2".to_string()], + quantities, + status: Status::Active, + }; + + let bytes = fory.serialize(&order)?; + let restored: Order = fory.deserialize(&bytes)?; + + Ok(()) +} +``` + +## C++ + +### Header Generation + +```cpp +/* + * Licensed to the Apache Software Foundation (ASF)... + */ + +#ifndef DEMO_H_ +#define DEMO_H_ + +#include +#include +#include +#include +#include +#include +#include "fory/serialization/fory.h" + +namespace demo { + +struct User; +struct Order; + +enum class Status : int32_t { + PENDING = 0, + ACTIVE = 1, + COMPLETED = 2, +}; +FORY_ENUM(Status, PENDING, ACTIVE, COMPLETED); + +struct User { + std::string id; + std::string name; + std::optional email; + int32_t age; + + bool operator==(const User& other) const { + return id == other.id && name == other.name && + email == other.email && age == other.age; + } +}; +FORY_STRUCT(User, id, name, email, age); + +struct Order { + std::string id; + std::shared_ptr customer; + std::vector items; + std::map quantities; + Status status; + + bool operator==(const Order& other) const { + return id == other.id && customer == other.customer && + items == other.items && quantities == other.quantities && + status == other.status; + } +}; +FORY_STRUCT(Order, id, customer, items, quantities, status); + +inline void RegisterTypes(fory::serialization::Fory& fory) { + fory.register_enum(100); + fory.register_struct(101); + fory.register_struct(102); +} + +} // namespace demo + +#endif // DEMO_H_ +``` + +### Usage + +```cpp +#include "demo.h" +#include + +int main() { + fory::serialization::Fory fory = fory::serialization::Fory::builder() + .xlang(true) + .ref_tracking(true) + .build(); + + demo::RegisterTypes(fory); + + auto user = std::make_shared(); + user->id = "u123"; + user->name = "Alice"; + user->email = "alice@example.com"; + user->age = 30; + + demo::Order order; + order.id = "o456"; + order.customer = user; + order.items = {"item1", "item2"}; + order.quantities = {{"item1", 2}, {"item2", 1}}; + order.status = demo::Status::ACTIVE; + + auto bytes = fory.serialize(order); + auto restored = fory.deserialize(bytes); + + return 0; +} +``` + +## Generated Annotations Summary + +### Java Annotations + +| Annotation | Purpose | +| -------------------------------- | -------------------------- | +| `@ForyField(nullable = true)` | Marks field as nullable | +| `@ForyField(trackingRef = true)` | Enables reference tracking | + +### Python Type Hints + +| Hint | Purpose | +| ------------------ | ------------------- | +| `Optional[T]` | Nullable field | +| `List[T]` | Repeated field | +| `Dict[K, V]` | Map field | +| `pyfory.Int32Type` | Fixed-width integer | + +### Go Struct Tags + +| Tag | Purpose | +| ----------------- | -------------------------- | +| `fory:"nullable"` | Marks field as nullable | +| `fory:"trackRef"` | Enables reference tracking | + +### Rust Attributes + +| Attribute | Purpose | +| -------------------------- | -------------------------- | +| `#[derive(ForyObject)]` | Enables Fory serialization | +| `#[fory(nullable = true)]` | Marks field as nullable | +| `#[tag("...")]` | Name-based registration | +| `#[repr(i32)]` | Enum representation | + +### C++ Macros + +| Macro | Purpose | +| -------------------------- | ----------------------- | +| `FORY_STRUCT(T, fields..)` | Registers struct fields | +| `FORY_ENUM(T, values..)` | Registers enum values | + +## Name-Based Registration + +When types don't have explicit type IDs, they use namespace-based registration: + +### FDL + +```fdl +package myapp.models; + +message Config { // No @id + string key = 1; + string value = 2; +} +``` + +### Generated Registration + +**Java:** + +```java +fory.register(Config.class, "myapp.models", "Config"); +``` + +**Python:** + +```python +fory.register_type(Config, namespace="myapp.models", typename="Config") +``` + +**Go:** + +```go +f.RegisterTagType("myapp.models.Config", Config{}) +``` + +**Rust:** + +```rust +#[derive(ForyObject)] +#[tag("myapp.models.Config")] +pub struct Config { ... } +``` + +**C++:** + +```cpp +fory.register_struct("myapp.models", "Config"); +``` + +## Customization + +### Extending Generated Code + +Generated code can be extended through language-specific mechanisms: + +**Java:** Use inheritance or composition: + +```java +public class ExtendedUser extends User { + public String getDisplayName() { + return getName() + " <" + getEmail() + ">"; + } +} +``` + +**Python:** Add methods after import: + +```python +from demo import User + +def get_display_name(self): + return f"{self.name} <{self.email}>" + +User.get_display_name = get_display_name +``` + +**Go:** Use separate file in same package: + +```go +package demo + +func (u *User) DisplayName() string { + return u.Name + " <" + *u.Email + ">" +} +``` + +**Rust:** Use trait extensions: + +```rust +trait UserExt { + fn display_name(&self) -> String; +} + +impl UserExt for User { + fn display_name(&self) -> String { + format!("{} <{}>", self.name, self.email.as_deref().unwrap_or("")) + } +} +``` + +**C++:** Use inheritance or free functions: + +```cpp +std::string display_name(const demo::User& user) { + return user.name + " <" + user.email.value_or("") + ">"; +} +``` diff --git a/docs/schema/index.md b/docs/schema/index.md new file mode 100644 index 0000000000..635fd89fc2 --- /dev/null +++ b/docs/schema/index.md @@ -0,0 +1,208 @@ +--- +title: FDL Schema Guide +sidebar_position: 1 +id: schema_index +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +# Fory Definition Language (FDL) + +Fory Definition Language (FDL) is a schema definition language for Apache Fory that enables type-safe cross-language serialization. Define your data structures once and generate native code for Java, Python, Go, Rust, and C++. + +## Overview + +FDL provides a simple, intuitive syntax for defining cross-language data structures: + +```fdl +package example; + +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} + +message User [id=101] { + string name = 1; + int32 age = 2; + optional string email = 3; + repeated string tags = 4; +} + +message Order [id=102] { + ref User customer = 1; + repeated Item items = 2; + Status status = 3; + map metadata = 4; +} +``` + +## Why FDL? + +### Schema-First Development + +Define your data model once in FDL and generate consistent, type-safe code across all languages. This ensures: + +- **Type Safety**: Catch type errors at compile time, not runtime +- **Consistency**: All languages use the same field names, types, and structures +- **Documentation**: Schema serves as living documentation +- **Evolution**: Managed schema changes across all implementations + +### Fory-Native Features + +Unlike generic IDLs, FDL is designed specifically for Fory serialization: + +- **Reference Tracking**: First-class support for shared and circular references via `ref` +- **Nullable Fields**: Explicit `optional` modifier for nullable types +- **Type Registration**: Built-in support for both numeric IDs and namespace-based registration +- **Native Code Generation**: Generates idiomatic code with Fory annotations/macros + +### Zero Runtime Overhead + +Generated code uses native language constructs: + +- Java: Plain POJOs with `@ForyField` annotations +- Python: Dataclasses with type hints +- Go: Structs with struct tags +- Rust: Structs with `#[derive(ForyObject)]` +- C++: Structs with `FORY_STRUCT` macros + +## Quick Start + +### 1. Install the Compiler + +```bash +cd compiler +pip install -e . +``` + +### 2. Write Your Schema + +Create `example.fdl`: + +```fdl +package example; + +message Person [id=100] { + string name = 1; + int32 age = 2; + optional string email = 3; +} +``` + +### 3. Generate Code + +```bash +# Generate for all languages +fory compile example.fdl --output ./generated + +# Generate for specific languages +fory compile example.fdl --lang java,python --output ./generated +``` + +### 4. Use Generated Code + +**Java:** + +```java +Fory fory = Fory.builder().withLanguage(Language.XLANG).build(); +ExampleForyRegistration.register(fory); + +Person person = new Person(); +person.setName("Alice"); +person.setAge(30); +byte[] data = fory.serialize(person); +``` + +**Python:** + +```python +import pyfory +from example import Person, register_example_types + +fory = pyfory.Fory() +register_example_types(fory) + +person = Person(name="Alice", age=30) +data = fory.serialize(person) +``` + +## Documentation + +| Document | Description | +| ------------------------------------------ | -------------------------------------------- | +| [FDL Syntax Reference](fdl-syntax.md) | Complete language syntax and grammar | +| [Type System](type-system.md) | Primitive types, collections, and type rules | +| [Compiler Guide](compiler-guide.md) | CLI options and build integration | +| [Generated Code](generated-code.md) | Output format for each target language | +| [Protocol Buffers vs FDL](proto-vs-fdl.md) | Comparison with protobuf and migration guide | + +## Key Concepts + +### Type Registration + +FDL supports two registration modes: + +**Numeric Type IDs** - Fast and compact: + +```fdl +message User [id=100] { ... } // Registered with ID 100 +``` + +**Namespace-based** - Flexible and readable: + +```fdl +message Config { ... } // Registered as "package.Config" +``` + +### Field Modifiers + +- **`optional`**: Field can be null/None +- **`ref`**: Enable reference tracking for shared/circular references +- **`repeated`**: Field is a list/array + +```fdl +message Example { + optional string nullable = 1; + ref Node parent = 2; + repeated int32 numbers = 3; +} +``` + +### Cross-Language Compatibility + +FDL types map to native types in each language: + +| FDL Type | Java | Python | Go | Rust | C++ | +| -------- | --------- | ------ | -------- | -------- | ------------- | +| `int32` | `int` | `int` | `int32` | `i32` | `int32_t` | +| `string` | `String` | `str` | `string` | `String` | `std::string` | +| `bool` | `boolean` | `bool` | `bool` | `bool` | `bool` | + +See [Type System](type-system.md) for complete mappings. + +## Best Practices + +1. **Use meaningful package names**: Group related types together +2. **Assign type IDs for performance**: Numeric IDs are faster than name-based registration +3. **Reserve ID ranges**: Leave gaps for future additions (e.g., 100-199 for users, 200-299 for orders) +4. **Use `optional` explicitly**: Make nullability clear in the schema +5. **Use `ref` for shared objects**: Enable reference tracking when objects are shared + +## Examples + +See the [examples](https://github.com/apache/fory/tree/main/compiler/examples) directory for complete working examples. diff --git a/docs/schema/proto-vs-fdl.md b/docs/schema/proto-vs-fdl.md new file mode 100644 index 0000000000..f528a2b07a --- /dev/null +++ b/docs/schema/proto-vs-fdl.md @@ -0,0 +1,508 @@ +--- +title: Protocol Buffers vs FDL +sidebar_position: 6 +id: proto_vs_fdl +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +# Protocol Buffers vs FDL + +This document compares Google's Protocol Buffers (protobuf) with Fory Definition Language (FDL), helping you understand when to use each and how to migrate between them. + +## Overview + +| Aspect | Protocol Buffers | FDL | +| ---------------------- | --------------------------------- | ----------------------------------- | +| **Primary Purpose** | RPC and message interchange | Cross-language object serialization | +| **Design Philosophy** | Schema evolution, backward compat | Performance, native integration | +| **Reference Tracking** | Not supported | First-class support (`ref`) | +| **Generated Code** | Custom message types | Native language constructs | +| **Serialization** | Tag-length-value encoding | Fory binary protocol | +| **Performance** | Good | Excellent (up to 170x faster) | + +## Syntax Comparison + +### Package Declaration + +**Protocol Buffers:** + +```protobuf +syntax = "proto3"; +package example.models; +option java_package = "com.example.models"; +option go_package = "example.com/models"; +``` + +**FDL:** + +```fdl +package example.models; +``` + +FDL uses a single package declaration that maps to all languages automatically. + +### Enum Definition + +**Protocol Buffers:** + +```protobuf +enum Status { + STATUS_UNSPECIFIED = 0; + STATUS_PENDING = 1; + STATUS_ACTIVE = 2; + STATUS_COMPLETED = 3; +} +``` + +**FDL:** + +```fdl +enum Status [id=100] { + PENDING = 0; + ACTIVE = 1; + COMPLETED = 2; +} +``` + +Key differences: + +- FDL supports optional type IDs (`[id=100]`) for efficient serialization +- Protobuf requires `_UNSPECIFIED = 0` by convention; FDL uses explicit values +- FDL enum values don't require prefixes + +### Message Definition + +**Protocol Buffers:** + +```protobuf +message User { + string id = 1; + string name = 2; + optional string email = 3; + int32 age = 4; + repeated string tags = 5; + map metadata = 6; +} +``` + +**FDL:** + +```fdl +message User [id=101] { + string id = 1; + string name = 2; + optional string email = 3; + int32 age = 4; + repeated string tags = 5; + map metadata = 6; +} +``` + +Syntax is nearly identical, but FDL adds: + +- Type IDs (`[id=101]`) for cross-language registration +- `ref` modifier for reference tracking + +### Nested Types + +**Protocol Buffers:** + +```protobuf +message Order { + message Item { + string product_id = 1; + int32 quantity = 2; + } + repeated Item items = 1; +} +``` + +**FDL:** + +```fdl +message OrderItem [id=200] { + string product_id = 1; + int32 quantity = 2; +} + +message Order [id=201] { + repeated OrderItem items = 1; +} +``` + +FDL uses flat type definitions rather than nested types for simplicity. + +### Imports + +**Protocol Buffers:** + +```protobuf +import "other.proto"; +import "google/protobuf/timestamp.proto"; +``` + +**FDL:** + +FDL currently requires all types in a single file or uses forward references within the same file. + +## Feature Comparison + +### Reference Tracking + +FDL's killer feature is first-class reference tracking: + +**FDL:** + +```fdl +message TreeNode [id=300] { + string value = 1; + ref TreeNode parent = 2; + repeated ref TreeNode children = 3; +} + +message Graph [id=301] { + repeated ref Node nodes = 1; // Shared references preserved +} +``` + +**Protocol Buffers:** + +Protobuf cannot represent circular or shared references. You must use workarounds: + +```protobuf +// Workaround: Use IDs instead of references +message TreeNode { + string id = 1; + string value = 2; + string parent_id = 3; // Manual ID reference + repeated string child_ids = 4; +} +``` + +### Type System + +| Type | Protocol Buffers | FDL | +| ---------- | ------------------------------------------------------------------------------------------------------ | --------------------------------- | +| Boolean | `bool` | `bool` | +| Integers | `int32`, `int64`, `sint32`, `sint64`, `uint32`, `uint64`, `fixed32`, `fixed64`, `sfixed32`, `sfixed64` | `int8`, `int16`, `int32`, `int64` | +| Floats | `float`, `double` | `float32`, `float64` | +| String | `string` | `string` | +| Binary | `bytes` | `bytes` | +| Timestamp | `google.protobuf.Timestamp` | `timestamp` | +| Date | Not built-in | `date` | +| Duration | `google.protobuf.Duration` | Not built-in | +| List | `repeated T` | `repeated T` | +| Map | `map` | `map` | +| Nullable | `optional T` (proto3) | `optional T` | +| Oneof | `oneof` | Not supported | +| Any | `google.protobuf.Any` | Not supported | +| Extensions | `extend` | Not supported | + +### Wire Format + +**Protocol Buffers:** + +- Tag-length-value encoding +- Variable-length integers (varints) +- Field numbers encoded in wire format +- Unknown fields preserved + +**FDL/Fory:** + +- Optimized binary format +- Schema-aware encoding +- Type IDs for fast lookup +- Reference tracking support +- Zero-copy deserialization where possible + +### Generated Code Style + +**Protocol Buffers** generates custom types with builders and accessors: + +```java +// Protobuf generated Java +User user = User.newBuilder() + .setId("u123") + .setName("Alice") + .setAge(30) + .build(); +``` + +**FDL** generates native POJOs: + +```java +// FDL generated Java +User user = new User(); +user.setId("u123"); +user.setName("Alice"); +user.setAge(30); +``` + +### Comparison Table + +| Feature | Protocol Buffers | FDL | +| -------------------------- | ----------------- | --------- | +| Schema evolution | Excellent | Good | +| Backward compatibility | Excellent | Good | +| Reference tracking | No | Yes | +| Circular references | No | Yes | +| Native code generation | No (custom types) | Yes | +| Unknown field preservation | Yes | No | +| Schema-less mode | No | Yes\* | +| RPC integration (gRPC) | Yes | No | +| Zero-copy deserialization | Limited | Yes | +| Human-readable format | JSON, TextFormat | No | +| Performance | Good | Excellent | + +\*Fory supports schema-less serialization without FDL + +## When to Use Each + +### Use Protocol Buffers When: + +1. **Building gRPC services**: Protobuf is the native format for gRPC +2. **Maximum backward compatibility**: Protobuf's unknown field handling is robust +3. **Schema evolution is critical**: Adding/removing fields across versions +4. **You need oneof/Any types**: Complex polymorphism requirements +5. **Human-readable debugging**: TextFormat and JSON transcoding available +6. **Ecosystem integration**: Wide tooling support (linting, documentation) + +### Use FDL/Fory When: + +1. **Performance is critical**: Up to 170x faster than protobuf +2. **Cross-language object graphs**: Serialize Java objects, deserialize in Python +3. **Circular/shared references**: Object graphs with cycles +4. **Native code preferred**: Standard POJOs, dataclasses, structs +5. **Memory efficiency**: Zero-copy deserialization +6. **Existing object models**: Minimal changes to existing code + +## Performance Comparison + +Benchmarks show Fory significantly outperforms Protocol Buffers: + +| Benchmark | Protocol Buffers | Fory | Improvement | +| ------------------------- | ---------------- | -------- | ----------- | +| Serialization (simple) | 1x | 10-20x | 10-20x | +| Deserialization (simple) | 1x | 10-20x | 10-20x | +| Serialization (complex) | 1x | 50-100x | 50-100x | +| Deserialization (complex) | 1x | 50-100x | 50-100x | +| Memory allocation | 1x | 0.1-0.5x | 2-10x less | + +_Benchmarks vary based on data structure and language. See [Fory benchmarks](../benchmarks/) for details._ + +## Migration Guide + +### From Protocol Buffers to FDL + +#### Step 1: Convert Syntax + +**Before (proto):** + +```protobuf +syntax = "proto3"; +package myapp; + +message Person { + string name = 1; + int32 age = 2; + repeated string emails = 3; + Address address = 4; +} + +message Address { + string street = 1; + string city = 2; +} +``` + +**After (FDL):** + +```fdl +package myapp; + +message Address [id=100] { + string street = 1; + string city = 2; +} + +message Person [id=101] { + string name = 1; + int32 age = 2; + repeated string emails = 3; + Address address = 4; +} +``` + +#### Step 2: Handle Special Cases + +**oneof fields:** + +```protobuf +// Proto +message Result { + oneof result { + Success success = 1; + Error error = 2; + } +} +``` + +```fdl +// FDL - Use separate optional fields +message Result [id=102] { + optional Success success = 1; + optional Error error = 2; +} +// Or model as sealed class hierarchy in generated code +``` + +**Well-known types:** + +```protobuf +// Proto +import "google/protobuf/timestamp.proto"; +message Event { + google.protobuf.Timestamp created_at = 1; +} +``` + +```fdl +// FDL +message Event [id=103] { + timestamp created_at = 1; +} +``` + +#### Step 3: Add Type IDs + +Assign unique type IDs for cross-language compatibility: + +```fdl +// Reserve ranges for different domains +// 100-199: Common types +// 200-299: User domain +// 300-399: Order domain + +message Address [id=100] { ... } +message Person [id=200] { ... } +message Order [id=300] { ... } +``` + +#### Step 4: Update Build Configuration + +**Before (Maven with protobuf):** + +```xml + + org.xolstice.maven.plugins + protobuf-maven-plugin + + +``` + +**After (Maven with FDL):** + +```xml + + org.codehaus.mojo + exec-maven-plugin + + + generate-fory-types + generate-sources + exec + + fory + + compile + ${project.basedir}/src/main/fdl/schema.fdl + --lang + java + --output + ${project.build.directory}/generated-sources/fdl + + + + + +``` + +#### Step 5: Update Application Code + +**Before (Protobuf Java):** + +```java +// Protobuf style +Person.Builder builder = Person.newBuilder(); +builder.setName("Alice"); +builder.setAge(30); +Person person = builder.build(); + +byte[] data = person.toByteArray(); +Person restored = Person.parseFrom(data); +``` + +**After (Fory Java):** + +```java +// Fory style +Person person = new Person(); +person.setName("Alice"); +person.setAge(30); + +Fory fory = Fory.builder().withLanguage(Language.XLANG).build(); +MyappForyRegistration.register(fory); + +byte[] data = fory.serialize(person); +Person restored = (Person) fory.deserialize(data); +``` + +### Coexistence Strategy + +For gradual migration, you can run both systems in parallel: + +```java +// Dual serialization during migration +public byte[] serialize(Object obj, Format format) { + if (format == Format.PROTOBUF) { + return ((MessageLite) obj).toByteArray(); + } else { + return fory.serialize(obj); + } +} + +// Convert between formats +public ForyPerson fromProto(ProtoPerson proto) { + ForyPerson person = new ForyPerson(); + person.setName(proto.getName()); + person.setAge(proto.getAge()); + return person; +} +``` + +## Summary + +| Aspect | Choose Protocol Buffers | Choose FDL/Fory | +| ---------------- | ----------------------- | ---------------------- | +| Use case | RPC, API contracts | Object serialization | +| Performance | Acceptable | Critical | +| References | Not needed | Circular/shared needed | +| Code style | Builder pattern OK | Native POJOs preferred | +| Schema evolution | Complex requirements | Simpler requirements | +| Ecosystem | Need gRPC, tooling | Need raw performance | + +Both tools excel in their domains. Protocol Buffers shines for RPC and API contracts with strong schema evolution guarantees. FDL/Fory excels at high-performance object serialization with native language integration and reference tracking support. diff --git a/docs/schema/type-system.md b/docs/schema/type-system.md new file mode 100644 index 0000000000..8985d41da0 --- /dev/null +++ b/docs/schema/type-system.md @@ -0,0 +1,438 @@ +--- +title: FDL Type System +sidebar_position: 3 +id: fdl_type_system +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +# FDL Type System + +This document describes the FDL type system and how types map to each target language. + +## Overview + +FDL provides a rich type system designed for cross-language compatibility: + +- **Primitive Types**: Basic scalar types (integers, floats, strings, etc.) +- **Enum Types**: Named integer constants +- **Message Types**: Structured compound types +- **Collection Types**: Lists and maps +- **Nullable Types**: Optional/nullable variants + +## Primitive Types + +### Boolean + +```fdl +bool is_active = 1; +``` + +| Language | Type | Notes | +| -------- | --------------------- | ------------------ | +| Java | `boolean` / `Boolean` | Primitive or boxed | +| Python | `bool` | | +| Go | `bool` | | +| Rust | `bool` | | +| C++ | `bool` | | + +### Integer Types + +FDL provides fixed-width signed integers: + +| FDL Type | Size | Range | +| -------- | ------ | ----------------- | +| `int8` | 8-bit | -128 to 127 | +| `int16` | 16-bit | -32,768 to 32,767 | +| `int32` | 32-bit | -2^31 to 2^31 - 1 | +| `int64` | 64-bit | -2^63 to 2^63 - 1 | + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| ------- | ------- | ------------------ | ------- | ----- | --------- | +| `int8` | `byte` | `pyfory.Int8Type` | `int8` | `i8` | `int8_t` | +| `int16` | `short` | `pyfory.Int16Type` | `int16` | `i16` | `int16_t` | +| `int32` | `int` | `pyfory.Int32Type` | `int32` | `i32` | `int32_t` | +| `int64` | `long` | `int` | `int64` | `i64` | `int64_t` | + +**Examples:** + +```fdl +message Counters { + int8 tiny = 1; + int16 small = 2; + int32 medium = 3; + int64 large = 4; +} +``` + +**Python Type Hints:** + +Python's native `int` is arbitrary precision, so FDL uses type wrappers for fixed-width integers: + +```python +from pyfory import Int8Type, Int16Type, Int32Type + +@dataclass +class Counters: + tiny: Int8Type + small: Int16Type + medium: Int32Type + large: int # int64 maps to native int +``` + +### Floating-Point Types + +| FDL Type | Size | Precision | +| --------- | ------ | ------------- | +| `float32` | 32-bit | ~7 digits | +| `float64` | 64-bit | ~15-16 digits | + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| --------- | -------- | -------------------- | --------- | ----- | -------- | +| `float32` | `float` | `pyfory.Float32Type` | `float32` | `f32` | `float` | +| `float64` | `double` | `float` | `float64` | `f64` | `double` | + +**Example:** + +```fdl +message Coordinates { + float64 latitude = 1; + float64 longitude = 2; + float32 altitude = 3; +} +``` + +### String Type + +UTF-8 encoded text: + +```fdl +string name = 1; +``` + +| Language | Type | Notes | +| -------- | ------------- | --------------------- | +| Java | `String` | Immutable | +| Python | `str` | | +| Go | `string` | Immutable | +| Rust | `String` | Owned, heap-allocated | +| C++ | `std::string` | | + +### Bytes Type + +Raw binary data: + +```fdl +bytes data = 1; +``` + +| Language | Type | Notes | +| -------- | ---------------------- | --------- | +| Java | `byte[]` | | +| Python | `bytes` | Immutable | +| Go | `[]byte` | | +| Rust | `Vec` | | +| C++ | `std::vector` | | + +### Temporal Types + +#### Date + +Calendar date without time: + +```fdl +date birth_date = 1; +``` + +| Language | Type | Notes | +| -------- | -------------------------------- | ----------------------- | +| Java | `java.time.LocalDate` | | +| Python | `datetime.date` | | +| Go | `time.Time` | Time portion ignored | +| Rust | `chrono::NaiveDate` | Requires `chrono` crate | +| C++ | `fory::serialization::LocalDate` | | + +#### Timestamp + +Date and time with nanosecond precision: + +```fdl +timestamp created_at = 1; +``` + +| Language | Type | Notes | +| -------- | -------------------------------- | ----------------------- | +| Java | `java.time.Instant` | UTC-based | +| Python | `datetime.datetime` | | +| Go | `time.Time` | | +| Rust | `chrono::NaiveDateTime` | Requires `chrono` crate | +| C++ | `fory::serialization::Timestamp` | | + +## Enum Types + +Enums define named integer constants: + +```fdl +enum Priority [id=100] { + LOW = 0; + MEDIUM = 1; + HIGH = 2; + CRITICAL = 3; +} +``` + +**Language Mapping:** + +| Language | Implementation | +| -------- | --------------------------------------- | +| Java | `enum Priority { LOW, MEDIUM, ... }` | +| Python | `class Priority(IntEnum): LOW = 0, ...` | +| Go | `type Priority int32` with constants | +| Rust | `#[repr(i32)] enum Priority { ... }` | +| C++ | `enum class Priority : int32_t { ... }` | + +**Java:** + +```java +public enum Priority { + LOW, + MEDIUM, + HIGH, + CRITICAL; +} +``` + +**Python:** + +```python +class Priority(IntEnum): + LOW = 0 + MEDIUM = 1 + HIGH = 2 + CRITICAL = 3 +``` + +**Go:** + +```go +type Priority int32 + +const ( + PriorityLow Priority = 0 + PriorityMedium Priority = 1 + PriorityHigh Priority = 2 + PriorityCritical Priority = 3 +) +``` + +**Rust:** + +```rust +#[derive(ForyObject, Debug, Clone, PartialEq, Default)] +#[repr(i32)] +pub enum Priority { + #[default] + Low = 0, + Medium = 1, + High = 2, + Critical = 3, +} +``` + +**C++:** + +```cpp +enum class Priority : int32_t { + LOW = 0, + MEDIUM = 1, + HIGH = 2, + CRITICAL = 3, +}; +FORY_ENUM(Priority, LOW, MEDIUM, HIGH, CRITICAL); +``` + +## Message Types + +Messages are structured types composed of fields: + +```fdl +message User [id=101] { + string id = 1; + string name = 2; + int32 age = 3; +} +``` + +**Language Mapping:** + +| Language | Implementation | +| -------- | ----------------------------------- | +| Java | POJO class with getters/setters | +| Python | `@dataclass` class | +| Go | Struct with exported fields | +| Rust | Struct with `#[derive(ForyObject)]` | +| C++ | Struct with `FORY_STRUCT` macro | + +## Collection Types + +### List (repeated) + +The `repeated` modifier creates a list: + +```fdl +repeated string tags = 1; +repeated User users = 2; +``` + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| ----------------- | --------------- | ------------ | ---------- | ------------- | -------------------------- | +| `repeated string` | `List` | `List[str]` | `[]string` | `Vec` | `std::vector` | +| `repeated int32` | `List` | `List[int]` | `[]int32` | `Vec` | `std::vector` | +| `repeated User` | `List` | `List[User]` | `[]User` | `Vec` | `std::vector` | + +### Map + +Maps with typed keys and values: + +```fdl +map counts = 1; +map users = 2; +``` + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| -------------------- | ---------------------- | ----------------- | ------------------ | ----------------------- | -------------------------------- | +| `map` | `Map` | `Dict[str, int]` | `map[string]int32` | `HashMap` | `std::map` | +| `map` | `Map` | `Dict[str, User]` | `map[string]User` | `HashMap` | `std::map` | + +**Key Type Restrictions:** + +Map keys should be hashable types: + +- `string` (most common) +- Integer types (`int8`, `int16`, `int32`, `int64`) +- `bool` + +Avoid using messages or complex types as keys. + +## Nullable Types + +The `optional` modifier makes a field nullable: + +```fdl +message Profile { + string name = 1; // Required + optional string bio = 2; // Nullable + optional int32 age = 3; // Nullable integer +} +``` + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| ----------------- | ---------- | --------------- | --------- | ---------------- | ---------------------------- | +| `optional string` | `String`\* | `Optional[str]` | `*string` | `Option` | `std::optional` | +| `optional int32` | `Integer` | `Optional[int]` | `*int32` | `Option` | `std::optional` | + +\*Java uses boxed types with `@ForyField(nullable = true)` annotation. + +**Default Values:** + +| Type | Default Value | +| ------------------ | ------------------- | +| Non-optional types | Language default | +| Optional types | `null`/`None`/`nil` | + +## Reference Types + +The `ref` modifier enables reference tracking: + +```fdl +message TreeNode { + string value = 1; + ref TreeNode parent = 2; + repeated ref TreeNode children = 3; +} +``` + +**Use Cases:** + +1. **Shared References**: Same object referenced from multiple places +2. **Circular References**: Object graphs with cycles +3. **Large Objects**: Avoid duplicate serialization + +**Language Mapping:** + +| FDL | Java | Python | Go | Rust | C++ | +| ---------- | -------- | ------ | ------- | ---------- | ----------------------- | +| `ref User` | `User`\* | `User` | `*User` | `Rc` | `std::shared_ptr` | + +\*Java uses `@ForyField(trackingRef = true)` annotation. + +## Type Compatibility Matrix + +This matrix shows which type conversions are safe across languages: + +| From → To | bool | int8 | int16 | int32 | int64 | float32 | float64 | string | +| ----------- | ---- | ---- | ----- | ----- | ----- | ------- | ------- | ------ | +| **bool** | ✓ | ✓ | ✓ | ✓ | ✓ | - | - | - | +| **int8** | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | - | +| **int16** | - | - | ✓ | ✓ | ✓ | ✓ | ✓ | - | +| **int32** | - | - | - | ✓ | ✓ | - | ✓ | - | +| **int64** | - | - | - | - | ✓ | - | - | - | +| **float32** | - | - | - | - | - | ✓ | ✓ | - | +| **float64** | - | - | - | - | - | - | ✓ | - | +| **string** | - | - | - | - | - | - | - | ✓ | + +✓ = Safe conversion, - = Not recommended + +## Best Practices + +### Choosing Integer Types + +- Use `int32` as the default for most integers +- Use `int64` for large values (timestamps, IDs) +- Use `int8`/`int16` only when storage size matters + +### String vs Bytes + +- Use `string` for text data (UTF-8) +- Use `bytes` for binary data (images, files, encrypted data) + +### Optional vs Required + +- Use `optional` when the field may legitimately be absent +- Default to required fields for better type safety +- Document why a field is optional + +### Reference Tracking + +- Use `ref` only when needed (shared/circular references) +- Reference tracking adds overhead +- Test with realistic data to ensure correctness + +### Collections + +- Prefer `repeated` for ordered sequences +- Use `map` for key-value lookups +- Consider message types for complex map values