-
Notifications
You must be signed in to change notification settings - Fork 101
Open
Description
Howdy!
I'm reaching out as a maintainer of the DataProfiler library.
I think it might be useful to your project so I'm reaching out!
We effectively wrote a library to improve upon the objectives of pandas-profiling with some neat added functionality:
- Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL
data = Data("your_filepath_or_url.csv") - Profile data: calculating statistics and doing entity detection (for PII)
profile = Profiler(data) - Merge profiles:
profile3 = profile1 + profile2; enabling distributed profile generation - Compare profiles:
profile_diff = profile1.diff(profile2) - Generate reports:
readable_report = profile.report(report_options={"output_format": "compact"})
import json
from dataprofiler import Data, Profiler
data = Data("your_file.csv") # Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL
print(data.data.head(5)) # Access data directly via a compatible Pandas DataFrame
profile = Profiler(data) # Calculate Statistics, Entity Recognition, etc
readable_report = profile.report(report_options={"output_format": "compact"})
print(json.dumps(readable_report, indent=4))Metadata
Metadata
Assignees
Labels
No labels