camelot-sharp

A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).

Original Python source code available here: camelot-dev/camelot.

NuGet packages available on the releases page and on www.nuget.org:

Usage

Stream mode

using (PdfDocument doc = PdfDocument.Open(@"Files\foo.pdf", new ParsingOptions() { ClipPaths = true }))
{
	Stream stream = new Stream();
	var tables = stream.ExtractTables(doc.GetPage(1));

	Assert.Single(tables);
	Assert.Equal((612, 792), stream.Dimensions);
	Assert.Equal(612, stream.PdfWidth);
	Assert.Equal(792, stream.PdfHeight);
	//Assert.Equal(84, stream.HorizontalText.Count);

	var parsingReport = tables[0].ParsingReport();
	//   parsing_report = {"accuracy": 99.02, "whitespace": 12.24, "order": 1, "page": 1}
	parsingReport["order"] = 1;
	parsingReport["page"] = 1;
}

Lattice mode

using (var doc = PdfDocument.Open(@"Files\column_span_2.pdf", new ParsingOptions() { ClipPaths = true }))
{
	var page = doc.GetPage(1);

	Lattice lattice = new Lattice(new OpenCvImageProcesser(), new BasicSystemDrawingProcessor(), line_scale: 40);
	var tables = lattice.ExtractTables(page,
		layout_kwargs: new DlaOptions[]
		{
			new DocstrumBoundingBoxes.DocstrumBoundingBoxesOptions()
			{
				WithinLineMultiplier = 2
			}
		});
	Assert.Single(tables);
	Assert.Equal(DataLatticeShiftTextLeftTop.Length, tables[0].Cells.Count);
	Assert.Equal(DataLatticeShiftTextLeftTop, tables[0].Data().Select(r => r.Select(c => c).ToArray()).ToArray());
}

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
Camelot.ImageProcessing.OpenCvSharp4		Camelot.ImageProcessing.OpenCvSharp4
Camelot.ImageProcessing.Tests		Camelot.ImageProcessing.Tests
Camelot.Tests		Camelot.Tests
Camelot		Camelot
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
camelot-sharp.sln		camelot-sharp.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

camelot-sharp

Usage

Stream mode

Lattice mode

About

Uh oh!

Releases

Packages

Languages

License

BobLd/camelot-sharp

Folders and files

Latest commit

History

Repository files navigation

camelot-sharp

Usage

Stream mode

Lattice mode

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages