-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Some docx files fail when parsing text. An example is FileResource 30381.
It logs:
Faraday::TimeoutError: Net::ReadTimeout with #<TCPSocket:(closed)>
in Sidekiq, but the metadata-listener pod logs:
Error performing MetadataListener::Job (Job ID: ae115d50-7f77-4e5b-a100-d9a580507371) from Async(metadata) in 18486.85ms: Errno::ENOENT (No such file or directory @ rb_file_s_size - /app/20250404-9-46q9tv.txt):
/app/lib/metadata_listener/report/extracted_text.rb:43:in 'FileTest.size'
/app/lib/metadata_listener/report/extracted_text.rb:43:in 'Pathname#size'
/app/lib/metadata_listener/report/extracted_text.rb:43:in 'MetadataListener::Report::ExtractedText#params'
/app/lib/metadata_listener/report/extracted_text.rb:34:in 'block in MetadataListener::Report::ExtractedText#response'
/app/vendor/bundle/ruby/3.4.0/gems/faraday-2.10.0/lib/faraday/connection.rb:441:in 'block in Faraday::Connection#run_request'
/app/vendor/bundle/ruby/3.4.0/gems/faraday-2.10.0/lib/faraday/connection.rb:458:in 'block in Faraday::Connection#build_request'
/app/vendor/bundle/ruby/3.4.0/gems/faraday-2.10.0/lib/faraday/request.rb:41:in 'block in Faraday::Request.create'
/app/vendor/bundle/ruby/3.4.0/gems/faraday-2.10.0/lib/faraday/request.rb:40:in 'Faraday::Request.create'
/app/vendor/bundle/ruby/3.4.0/gems/faraday-2.10.0/lib/faraday/connection.rb:454:in 'Faraday::Connection#build_request'
/app/vendor/bundle/ruby/3.4.0/gems/faraday-2.10.0/lib/faraday/connection.rb:436:in 'Faraday::Connection#run_request'
/app/vendor/bundle/ruby/3.4.0/gems/faraday-2.10.0/lib/faraday/connection.rb:280:in 'Faraday::Connection#put'
/app/lib/metadata_listener/report/extracted_text.rb:33:in 'MetadataListener::Report::ExtractedText#response'
Indicating an issue parsing the file. Oddly, it seems to be parsing a .txt temp file. Perhaps this was extracted from the docx.
docx files are known to cause issues (like with MiniMagick and thumbnail creation). This only affects the text extraction, so it's not vital. The virus check still works.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels