Binary document upload

SolrNet supports Solr "extract" feature (a.k.a. Solr "Cell") to index data from binary document formats such as Word, PDF, etc.

Here's a simple example showing how to extract text from a PDF file, without indexing it:

ISolrOperations<Something> solr = ...
using (var file = File.OpenRead(@"test.pdf")) {
    var response = solr.Extract(new ExtractParameters(file, "some_document_id") {
        ExtractOnly = true,
        ExtractFormat = ExtractFormat.Text,
    });
    Console.WriteLine(response.Content);
}

ExtractOnly = true tells Solr to just perform text extraction but not index the uploaded document. If ExtractOnly = false you can add more fields with the Fields property. Other options can be set through the properties of the ExtractParameters class. It's usually recommended to provide the StreamType for the content, as auto-detection might fail.

For more details about each option in ExtractParameters see the Solr wiki and the Solr reference guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary document upload

FilesExpand file tree

Extract.md

Latest commit

History

Extract.md

File metadata and controls

Binary document upload