Skip to content

Commit b4b564d

Browse files
authored
Add FieldExtractor & SchemaExtractor, redesign PromptLibrary (#1034)
1 parent 11a5cb2 commit b4b564d

File tree

20 files changed

+731
-94
lines changed

20 files changed

+731
-94
lines changed

docs/en/API Reference/data_process.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,15 @@
1212
members:
1313
exclude-members:
1414

15+
### LLM JSON Operators
16+
17+
::: lazyllm.tools.data.operators.llm_base_ops
18+
members:
19+
exclude-members:
20+
21+
::: lazyllm.tools.data.operators.llm_json_ops
22+
members:
23+
exclude-members:
1524

1625
## Data Processing Pipeline
1726

docs/en/API Reference/prompt_template.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,16 @@
1616
members:
1717
- validate_variables
1818
- format
19-
- partial
19+
- partial
20+
21+
::: lazyllm.prompt_templates.LazyLLMPromptLibraryBase
22+
members:
23+
exclude-members:
24+
25+
::: lazyllm.ActorPrompt
26+
members:
27+
exclude-members:
28+
29+
::: lazyllm.DataPrompt
30+
members:
31+
exclude-members:

docs/zh/API Reference/data_process.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,15 @@
1212
members:
1313
exclude-members:
1414

15+
### LLM JSON 算子
16+
17+
::: lazyllm.tools.data.operators.llm_base_ops
18+
members:
19+
exclude-members:
20+
21+
::: lazyllm.tools.data.operators.llm_json_ops
22+
members:
23+
exclude-members:
1524

1625
## 数据处理 Pipeline
1726

docs/zh/API Reference/prompt_template.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,15 @@
1717
- validate_variables
1818
- format
1919
- partial
20+
21+
::: lazyllm.prompt_templates.LazyLLMPromptLibraryBase
22+
members:
23+
exclude-members:
24+
25+
::: lazyllm.ActorPrompt
26+
members:
27+
exclude-members:
28+
29+
::: lazyllm.DataPrompt
30+
members:
31+
exclude-members:

lazyllm/__init__.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
ServerModule, TrialModule, register as module_register,
1818
OnlineModule, OnlineChatModule, OnlineEmbeddingModule, AutoModel, OnlineMultiModalModule)
1919
from .hook import LazyLLMHook, LazyLLMFuncHook
20-
from .prompt_templates import PromptLibrary
20+
from .prompt_templates import ActorPrompt, DataPrompt
2121
from typing import TYPE_CHECKING
2222
if TYPE_CHECKING:
2323
from .tools import (Document, Reranker, Retriever, WebModule, ToolManager, FunctionCall, SkillManager,
@@ -103,7 +103,8 @@ def __getattr__(name: str):
103103
'PlanAndSolveAgent',
104104
'ReWOOAgent',
105105
'SentenceSplitter',
106-
'PromptLibrary',
106+
'ActorPrompt',
107+
'DataPrompt',
107108

108109
# docs
109110
'add_doc',

lazyllm/docs/data_process.py

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,95 @@ def forward_batch_input(self, inputs):
372372
```
373373
""")
374374

375+
# LLM based JSON operators docs
376+
add_chinese_doc('data.operators.llm_base_ops.LLMDataJson', """\
377+
基于 LLM 的 JSON 数据处理算子基类。提供结构化输出的基础逻辑,包括自动配置 JsonFormatter、重试机制以及预处理/验证/后处理生命周期。
378+
379+
构造函数参数:\n
380+
- model: LazyLLM 模型实例。
381+
- prompt: 可选,用于引导 LLM 的 Prompt(ChatPrompter 或字符串)。
382+
- max_retries: 最大重试次数,默认 3。
383+
- **kwargs: 其它传递给基类的并发或持久化参数。
384+
""")
385+
386+
add_english_doc('data.operators.llm_base_ops.LLMDataJson', """\
387+
Base class for LLM-based JSON data processing operators. Provides foundational logic for structured output,
388+
including automatic JsonFormatter configuration, retry mechanisms, and a pre/verify/post-processing lifecycle.
389+
390+
Constructor args:\n
391+
- model: a LazyLLM model instance.
392+
- prompt: optional, ChatPrompter or string to guide the LLM.
393+
- max_retries: maximum number of retries, default 3.
394+
- **kwargs: additional concurrency or persistence arguments for the base class.
395+
""")
396+
397+
add_chinese_doc('data.operators.llm_json_ops.FieldExtractor', """\
398+
字段提取器。利用 LLM 根据提供的字段列表从输入文本中提取特定信息。
399+
400+
Args:
401+
model: LazyLLM 模型实例。
402+
prompt: 可选,自定义提取 Prompt。
403+
input_keys: 字段列表,默认为 ['persona', 'text', 'fields']。
404+
output_key: 结果存储在数据字典中的键名,默认 'structured_data'。
405+
""")
406+
407+
add_english_doc('data.operators.llm_json_ops.FieldExtractor', """\
408+
Field extractor. Uses LLM to extract specific information from input text based on a provided list of fields.
409+
410+
Args:
411+
model: a LazyLLM model instance.
412+
prompt: optional custom extraction prompt.
413+
input_keys: list of input keys, defaults to ['persona', 'text', 'fields'].
414+
output_key: key name to store results in the data dict, default 'structured_data'.
415+
""")
416+
417+
add_example('data.operators.llm_json_ops.FieldExtractor', """\
418+
```python
419+
from lazyllm import OnlineChatModule
420+
from lazyllm.tools.data.operators.llm_json_ops import FieldExtractor
421+
model = OnlineChatModule(source='sensenova')
422+
op = FieldExtractor(model=model)
423+
inputs = [{
424+
'text': '张三,28岁,目前在上海',
425+
'fields': ['name', 'age', 'location']
426+
}]
427+
res = op(inputs)
428+
print(res[0]['structured_data']) # {'name': '张三', 'age': '28', 'location': '上海'}
429+
```
430+
""")
431+
432+
add_chinese_doc('data.operators.llm_json_ops.SchemaExtractor', """\
433+
架构提取器。利用 LLM 根据指定的 Schema(字典或 Pydantic 模型)从文本中提取结构化数据。
434+
435+
Args:
436+
model: LazyLLM 模型实例。
437+
prompt: 可选,自定义提取 Prompt。
438+
input_key: 输入文本的键名,默认 'text'。
439+
output_key: 结果存储在数据字典中的键名,默认 'structured_data'。
440+
""")
441+
442+
add_english_doc('data.operators.llm_json_ops.SchemaExtractor', """\
443+
Schema extractor. Uses LLM to extract structured data from text according to a specified schema (dict or Pydantic model).
444+
445+
Args:
446+
model: a LazyLLM model instance.
447+
prompt: optional custom extraction prompt.
448+
input_key: key name for input text, default 'text'.
449+
output_key: key name to store results in the data dict, default 'structured_data'.
450+
""")
451+
452+
add_example('data.operators.llm_json_ops.SchemaExtractor', """\
453+
```python
454+
from lazyllm import OnlineChatModule
455+
from lazyllm.tools.data.operators.llm_json_ops import SchemaExtractor
456+
model = OnlineChatModule(source='sensenova')
457+
op = SchemaExtractor(model=model)
458+
inputs = [{'text': 'Math score is 95', 'schema': {'subject': 'str', 'score': 'int'}}]
459+
res = op(inputs)
460+
print(res[0]['structured_data']) # {'subject': 'Math', 'score': 95}
461+
```
462+
""")
463+
375464
# pipelines module docs
376465
add_chinese_doc( 'data.pipelines.demo_pipelines.build_demo_pipeline', """\
377466
构建演示用数据处理流水线(Pipeline),包含若干示例算子并展示如何在 pipeline 上组合使用这些算子。

0 commit comments

Comments
 (0)