Load Date Range from directory #10372
Replies: 1 comment
-
🤖 Hi, You're correct that the current implementation of DirectoryLoader in LangChain does not support any form of filtering such as by date or file type. The DirectoryLoader class is designed to load files from a directory based on the provided glob pattern, but it does not provide any built-in functionality for filtering files by date or file type. The glob pattern can be used to specify the types of files to load (for example, "*.txt" to load only text files), but this is a very basic form of file type filtering and does not support more complex criteria. There is no functionality for filtering files by date. Here is the relevant code from the DirectoryLoader class: def __init__(
self,
path: str,
glob: str = "**/[!.]*",
...
):
self.glob = glob
... The 'glob' parameter is used in the 'load' method to get a list of files in the directory: items = list(p.rglob(self.glob) if self.recursive else p.glob(self.glob)) However, there is no code in the DirectoryLoader class that filters these files by date or by any criteria other than the glob pattern. As for your question about whether there are any other classes or methods in LangChain that could be used to filter documents by date range after they have been loaded by DirectoryLoader, I wasn't able to find any such functionality in the LangChain codebase. If you need to load documents from a specific date range, you might need to implement this functionality yourself. One possible approach could be to subclass DirectoryLoader and override the 'load' method to add the date filtering logic. This would involve checking the last modified date of each file and only loading the files that fall within the desired date range. I hope this helps! Let me know if you have any other questions. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I have a question regarding document loaders, specifically those loading from directory's (file systems). I've found that it could be useful to load only a specific date range of documents. Say you have a directory that contains hundreds of thousands of documents, ranging over the past 10 years. But say you only want to load the documents from the past year. I'd prefer not to load them all then filter them out. I looked inside DirectoryLoader and found nothing about loading documents give a date range.
Please let me know if there's another loader for this.
Beta Was this translation helpful? Give feedback.
All reactions