Skip to content

An improved library similar to Pandas-Profiling  #8

@lettergram

Description

@lettergram

Howdy!

I'm reaching out as a maintainer of the DataProfiler library.

I think it might be useful to your project so I'm reaching out!

We effectively wrote a library to improve upon the objectives of pandas-profiling with some neat added functionality:

  • Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL data = Data("your_filepath_or_url.csv")
  • Profile data: calculating statistics and doing entity detection (for PII) profile = Profiler(data)
  • Merge profiles: profile3 = profile1 + profile2; enabling distributed profile generation
  • Compare profiles: profile_diff = profile1.diff(profile2)
  • Generate reports: readable_report = profile.report(report_options={"output_format": "compact"})
import json
from dataprofiler import Data, Profiler

data = Data("your_file.csv") # Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL

print(data.data.head(5)) # Access data directly via a compatible Pandas DataFrame

profile = Profiler(data) # Calculate Statistics, Entity Recognition, etc

readable_report = profile.report(report_options={"output_format": "compact"})

print(json.dumps(readable_report, indent=4))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions