|
9 | 9 |
|
10 | 10 | * `--filetype`: str, default `infer` |
11 | 11 | * the type of input. Depending on the value, different other parameters |
12 | | - are needed. If json_list is used, the line of the input file can contain |
| 12 | + are needed. If json_entries is used, the line of the input file can contain |
13 | 13 | any of those parameters as long as they are as json. You can find |
14 | | - an example of json_list file in `DocToolsLLM/docs/json_list_example.txt` |
| 14 | + an example of json_entries file in `DocToolsLLM/docs/json_entries_example.txt` |
15 | 15 |
|
16 | 16 | * Supported values: |
17 | 17 | * `infer`: will guess the appropriate filetype based on `--path`. |
|
28 | 28 | you must type or paste the string |
29 | 29 | * `local_audio`: must be set: `--whisper_prompt`, `--whisper_lang`. The model used will be `whisper-1` |
30 | 30 |
|
31 | | - * `json_list`: `--path` is path to a txt file that contains a json |
| 31 | + * `json_entries`: `--path` is path to a txt file that contains a json |
32 | 32 | for each line containing at least a filetype and a path key/value |
33 | 33 | but can contain any parameters described here |
34 | | - * `recursive`: `--path` is the starting path `--pattern` is the globbing |
| 34 | + * `recursive_paths`: `--path` is the starting path `--pattern` is the globbing |
35 | 35 | patterns to append `--exclude` and `--include` can be a list of regex |
36 | 36 | applying to found paths (include is run first then exclude, if the |
37 | 37 | pattern is only lowercase it will be case insensitive) `--recursed_filetype` |
|
47 | 47 |
|
48 | 48 | * `--modelname`: str, default `"openai/gpt-4o"` |
49 | 49 | * Keep in mind that given that the default backend used is litellm |
50 | | - the part of modelname before the slash (/) is the server name. |
| 50 | + the part of modelname before the slash (/) is the backend name (also called provider). |
51 | 51 | If the backend is 'testing/' then a fake LLM will be used |
52 | 52 | for debugging purposes. |
53 | 53 | If the value is not part of the model list of litellm, will use |
|
94 | 94 | --- |
95 | 95 |
|
96 | 96 | * `--query`: str, default `None` |
97 | | - * if str, will be directly used for the first query if task in `["query", "search"]` |
| 97 | + * if str, will be directly used for the first query if task in `["query", "search", "summarize_then_query"]` |
98 | 98 |
|
99 | 99 | * `--query_retrievers`: str, default `"default"` |
100 | 100 | * must be a string that specifies which retriever will be used for |
|
164 | 164 | * `--debug`: bool, default `False` |
165 | 165 | * if True will enable langchain tracing, increase verbosity, |
166 | 166 | disable multithreading for summaries and loading files, |
| 167 | + crash if an error is encountered when loading a file, |
167 | 168 | automatically trigger the debugger on exceptions. |
168 | 169 |
|
169 | 170 | * `--dollar_limit`: int, default `5` |
|
182 | 183 | * if True, will remember the messages across a given chat exchange. |
183 | 184 | Disabled if using a testing model. |
184 | 185 |
|
185 | | -* `--no_llm_cache`: bool, default `False` |
| 186 | +* `--disable_llm_cache`: bool, default `False` |
186 | 187 | * WARNING: The cache is temporarily ignored in non openaillms |
187 | 188 | generations because of an error with langchain's ChatLiteLLM. |
188 | 189 | Basically if you don't use `--private` and use llm form openai, |
|
227 | 228 |
|
228 | 229 | # Loader specific arguments |
229 | 230 | Those arguments can be set at cli time but can also be used |
230 | | - when using recursive filetype combination to have arguments specific |
| 231 | + when using recursive_paths filetype combination to have arguments specific |
231 | 232 | to a loader. They apply depending on the value of `--filetype`. |
232 | 233 | An unexpected argument for a given filetype will result in a crash. |
233 | 234 |
|
|
293 | 294 | the audio from the youtube link, and deepgram will be used to turn the audio into text. `--deepgram_kwargs` will be used if set. |
294 | 295 |
|
295 | 296 | * `--include`: str |
296 | | - * Only active if `--filetype` is one of json_list, recursive, |
297 | | - link_file, youtube_playlist. |
| 297 | + * Only active if `--filetype` is one of 'json_entries', 'recursive_paths', |
| 298 | + 'link_file', 'youtube_playlist'. |
298 | 299 | `--include` can be a list of regex that must be present in the |
299 | 300 | document PATH (not content!) |
300 | 301 | `--exclude` can be a list of regex that if present in the PATH |
|
387 | 388 | * `--loading_failure`: str, default `crash` |
388 | 389 | * either `crash` or `warn`. Determines what to do with |
389 | 390 | exceptions happening when loading a document. This can be set |
390 | | - per document if a recursive filetype is used. |
| 391 | + per document if a recursive_paths filetype is used. |
391 | 392 |
|
392 | 393 | # Runtime flags |
393 | 394 |
|
|
0 commit comments