Setup instructions for AutoCSV Profiler Suite multi-environment architecture.
- Prerequisites
- Installation Steps
- Verification Procedures
- Common Installation Issues
- Platform-Specific Instructions
- Offline Installation
- Environment Management
Operating System:
- Windows 10 or higher
- macOS 10.14 (Mojave) or higher
- Linux: Ubuntu 18.04+, CentOS 7+, Debian 10+, or equivalent
Hardware:
- 4GB RAM minimum, 8GB recommended for large files
- 3GB free disk space (2GB for conda environments, 1GB for data/outputs)
- Internet connection for initial setup
Install Anaconda or Miniconda before proceeding:
Option A: Anaconda
- Download from: Anaconda Download
- Full data science environment (~3GB download)
- Includes Jupyter, Spyder, and common packages
Option B: Miniconda
- Download from: Miniconda Download
- Minimal conda installation (~400MB download)
- Faster installation, smaller footprint
Verify conda installation:
conda --version
# Expected output: conda 23.x.x or higherBase environment requires Python 3.10 or higher:
- Python 3.10, 3.11, 3.12, or 3.13 supported
- The conda environments will use specific Python versions as needed
Check Python version:
python --version
# Expected: Python 3.10.x, 3.11.x, 3.12.x, or 3.13.xFor cloning the repository, install Git from git-scm.com or use the system package manager.
Option A: Clone with Git
git clone https://github.com/dhaneshbb/autocsv-profiler-suite.git
cd autocsv-profiler-suiteOption B: Download ZIP
- Go to GitHub repository
- Click "Code" → "Download ZIP"
- Extract to preferred location
- Navigate to the extracted folder
Install Python packages for the orchestration layer:
pip install -r requirements.txtThis installs:
pandas==2.3.1- Data processingpyyaml>=6.0- Configuration parsingrich>=13.0.0- Terminal interfacepsutil>=5.8.0- System monitoringcharset-normalizer>=3.0.0- File encoding detection
Verify base installation:
python -c "import pandas, yaml, rich, psutil, charset_normalizer; print('Base requirements installed successfully')"This is the core step that creates the isolated environments:
View all setup options:
python bin/setup_environments.py --helpParallel Installation (increased processing speed):
python bin/setup_environments.py create --parallelSequential Installation (fallback for system limitations):
python bin/setup_environments.py createWorker Configuration (hardware-specific optimization):
python bin/setup_environments.py create --parallel --workers 2What this creates (3 specialized conda environments):
csv-profiler-main- Python 3.11 with numpy 2.2.6, pandas 2.3.1, scipy 1.13.1csv-profiler-profiling- Python 3.10 with ydata-profiling 4.16.1, sweetviz 2.3.1csv-profiler-dataprep- Python 3.10 with dataprep 0.4.5, pandas 1.5.3
Note: The base environment (where these commands run) serves as the orchestrator and is not a conda environment.
Installation time: 3-8 minutes depending on internet speed and system performance.
For contributing or advanced usage:
pip install -r requirements-dev.txtThis adds:
pytest>=7.0.0- Testing frameworkmypy>=1.0.0- Type checkingblack>=23.0.0- Code formattingflake8>=6.0.0- Code lintingpre-commit>=3.0.0- Git hooks
1. Check conda environments exist:
conda env list | grep csv-profilerExpected output:
csv-profiler-main /path/to/anaconda3/envs/csv-profiler-main
csv-profiler-profiling /path/to/anaconda3/envs/csv-profiler-profiling
csv-profiler-dataprep /path/to/anaconda3/envs/csv-profiler-dataprep
2. Test orchestrator:
python bin/run_analysis.py --helpExpected: Help message showing available options.
3. Test environment activation:
# Test main environment
conda activate csv-profiler-main
python -c "import numpy, pandas, scipy, matplotlib, seaborn; print('Main environment OK')"
conda deactivate
# Test profiling environment
conda activate csv-profiler-profiling
python -c "import ydata_profiling, sweetviz; print('Profiling environment OK')"
conda deactivate
# Test dataprep environment
conda activate csv-profiler-dataprep
python -c "import dataprep; print('DataPrep environment OK')"
conda deactivateTest with a sample CSV file:
# Create test data
echo "name,age,city" > test_sample.csv
echo "Alice,25,New York" >> test_sample.csv
echo "Bob,30,San Francisco" >> test_sample.csv
echo "Carol,35,Chicago" >> test_sample.csv
# Run analysis
python bin/run_analysis.py test_sample.csv
# Clean up
rm test_sample.csvExpected: Interactive prompts and successful analysis completion.
Error: conda: command not found
Solutions:
- Restart terminal after conda installation
- Add conda to PATH:
- Windows: Add
C:\Anaconda3\Scriptsto PATH - macOS/Linux: Add
~/anaconda3/binto PATH
- Windows: Add
- Initialize conda:
~/anaconda3/bin/conda init source ~/.bashrc # or ~/.zshrc
Error: CondaHTTPError or package conflicts
Solutions:
-
Update conda:
conda update conda
-
Clear conda cache:
conda clean --all
-
Use sequential installation:
python bin/setup_environments.py create
-
Check internet connectivity:
ping conda-forge.org
Error: ImportError when testing environments
Solutions:
-
Recreate specific environment:
python bin/setup_environments.py recreate csv-profiler-main
-
Manual package installation:
conda activate csv-profiler-main conda install pandas=2.3.1 numpy=2.2.6
-
Check environment packages:
conda list -n csv-profiler-main
Error: Permission denied during installation
Solutions:
-
Run with elevated privileges (if necessary):
- Windows: Run as Administrator
- macOS/Linux: Use
sudoonly for system-wide conda installations
-
Use user-level conda installation (preferred):
- Install Anaconda/Miniconda in user directory
- No admin rights required
Error: No space left on device
Solutions:
-
Check available space:
df -h # Linux/macOS dir C:\ # Windows
-
Clean conda packages:
conda clean --packages --tarballs
-
Change conda environment location:
conda config --add envs_dirs /path/to/larger/disk
-
UTF-8 Console Support: The application automatically configures UTF-8 encoding, but manual configuration may be needed:
chcp 65001
-
Long Path Support (Windows 10/11): Enable long paths in Group Policy or Registry if path length errors occur.
-
PowerShell vs Command Prompt: Both work, but PowerShell is recommended for better Unicode support.
Follow the standard Installation Steps above. Use PowerShell (recommended) for better Unicode support.
-
Xcode Command Line Tools: Some packages require compilation tools:
xcode-select --install
-
Apple Silicon (M1/M2): Conda environments work natively on Apple Silicon. Use:
conda config --set subdir osx-arm64
-
Homebrew Compatibility: If Homebrew Python is installed, ensure conda environments take precedence.
# Install Xcode tools if needed
xcode-select --installExecute standard Installation Steps.
Python packages are installed via conda environments and do not require additional system dependencies beyond the standard conda installation.
Execute standard Installation Steps.
For environments without internet access, manual environment recreation is required using exported environment specifications.
On a machine with internet access and working environments:
# Export environment specifications
conda env export -n csv-profiler-main > env_main.yml
conda env export -n csv-profiler-profiling > env_profiling.yml
conda env export -n csv-profiler-dataprep > env_dataprep.ymlCopy the .yml files and project source code to the target machine.
# Create environments from exported specifications
conda env create -f env_main.yml
conda env create -f env_profiling.yml
conda env create -f env_dataprep.ymlNote: This requires conda and necessary packages to be available in the offline environment's conda channels.
List environments:
conda env listRemove environments:
python bin/setup_environments.py remove --parallelRecreate single environment:
python bin/setup_environments.py recreate csv-profiler-mainUpdate environments:
# Regenerate environment configs from master
python bin/setup_environments.py generate
# Recreate with updated configs
python bin/setup_environments.py remove --parallel
python bin/setup_environments.py create --parallelTo add custom packages to environments:
-
Edit master configuration:
# Edit config/master_config.yml # Add packages to appropriate environment sections
-
Regenerate environments:
python bin/setup_environments.py generate python bin/setup_environments.py recreate csv-profiler-main
Clean up conda caches (run periodically):
conda clean --packages --tarballs --index-cacheUpdate conda:
conda update condaCheck environment health:
conda list -n csv-profiler-main --explicit > main_env_check.txt
conda list -n csv-profiler-profiling --explicit > profiling_env_check.txt
conda list -n csv-profiler-dataprep --explicit > dataprep_env_check.txtAfter successful installation:
- Read the User Guide for usage instructions
- Try the interactive mode:
python bin/run_analysis.py - Check Troubleshooting Guide if issues occur
For Developers:
- Development Setup: See Development Guide for complete development environment setup
- Engine Testing: See Engine Testing Guide for testing individual engines
If issues occur:
- Check Troubleshooting Guide
- Search GitHub Issues
- Create a new issue with system details and error messages

