Guide to diagnosing and resolving common issues with AutoCSV Profiler Suite.
- Environment Setup Issues
- Import and Dependency Errors
- Memory and Performance Problems
- File Processing Errors
- Engine-Specific Issues
- Debug Mode Usage
- Log Analysis Guide
Error Messages:
conda: command not found
'conda' is not recognized as an internal or external command
Causes:
- Conda not installed
- Conda not in system PATH
- Terminal not restarted after conda installation
Solutions:
-
Verify conda installation:
# Check if conda is installed which conda # Linux/macOS where conda # Windows
-
Add conda to PATH:
# Linux/macOS - add to ~/.bashrc or ~/.zshrc export PATH="$HOME/anaconda3/bin:$PATH" source ~/.bashrc # Windows - add to system PATH # Add C:\Anaconda3\Scripts to your PATH environment variable
-
Initialize conda:
~/anaconda3/bin/conda init # Restart terminal
-
Reinstall conda if necessary:
- Download from Anaconda Download
- Follow platform-specific installation instructions
Error Messages:
CondaHTTPError: HTTP 000 CONNECTION FAILED
PackagesNotFoundError: The following packages are not available
CondaValueError: The channel is not accessible
Diagnostic Commands:
# Check conda version
conda --version
# Check channels
conda config --show channels
# Test internet connectivity
ping conda-forge.orgSolutions:
Conda troubleshooting procedures available in Installation Guide - Common Installation Issues.
Additional troubleshooting:
-
Reset conda configuration:
conda config --remove-key channels conda config --add channels conda-forge conda config --add channels defaults
-
Use sequential installation:
# View options first python bin/setup_environments.py --help # Try sequential installation python bin/setup_environments.py create # Instead of parallel: create --parallel
-
Check firewall/proxy settings:
# If behind corporate firewall conda config --set proxy_servers.http http://proxy.company.com:8080 conda config --set proxy_servers.https https://proxy.company.com:8080
Error Messages:
CondaValueError: prefix already exists
Solutions:
-
Remove existing environments:
python bin/setup_environments.py remove --parallel
-
Recreate specific environment:
python bin/setup_environments.py recreate csv-profiler-main
-
Manual removal:
conda env remove -n csv-profiler-main conda env remove -n csv-profiler-profiling conda env remove -n csv-profiler-dataprep
Error Messages:
PermissionError: [Errno 13] Permission denied
OSError: [Errno 13] Permission denied: '/opt/anaconda3/envs/'
Solutions:
-
Use user-level conda installation (preferred):
- Install Anaconda/Miniconda in your home directory
- No admin rights required
-
Fix conda permissions (Linux/macOS):
sudo chown -R $USER:$USER ~/anaconda3
-
Change conda environment location:
# Create environments in user-writable location conda config --add envs_dirs ~/conda-envs
Error Messages:
ImportError: No module named 'autocsv_profiler'
ImportError: cannot import name 'CleanInteractiveMethods'
ModuleNotFoundError: No module named 'ydata_profiling'
Diagnostic Commands:
# Check Python path
python -c "import sys; print('\n'.join(sys.path))"
# Test base requirements
python -c "import pandas, yaml, rich, psutil; print('Base imports OK')"
# Check environment packages
conda list -n csv-profiler-main | grep pandas
conda list -n csv-profiler-profiling | grep ydata-profilingSolutions:
- Follow installation guide: Complete the Installation Guide steps
- Recreate environments:
python bin/setup_environments.py recreate csv-profiler-profiling - Check environment activation:
conda info --envs - Fix project path: Ensure you're in the project root directory
Error Messages:
ImportError: cannot import name 'ProfileReport' from 'ydata_profiling'
TypeError: 'module' object has no attribute 'create_report'
Diagnostic Commands: See Engine Testing Guide for comprehensive diagnostic commands.
Solutions:
-
Regenerate environment configurations:
python bin/setup_environments.py generate python bin/setup_environments.py recreate csv-profiler-profiling
-
Manual package installation:
conda activate csv-profiler-profiling conda install ydata-profiling=4.16.1 sweetviz=2.3.1
-
Check environment specification:
# Verify environment files match master config cat config/environment_profiling.yml
Error Messages:
MemoryError: Unable to allocate array
pandas.errors.ParserError: Error tokenizing data
MemoryError: Memory limit exceeded
Diagnostic Commands:
# Check available memory
free -h # Linux
vm_stat # macOS
wmic OS get TotalVisibleMemorySize,FreePhysicalMemory # Windows
# Check file size
ls -lh data.csv # Linux/macOS
dir data.csv # WindowsSolutions:
-
Increase memory limits: Edit
config/master_config.ymlto increasememory_limit_gband decreasechunk_size. See User Guide Configuration for details. -
Use smaller chunks:
# Regenerate environments with new settings python bin/setup_environments.py generate -
Free system memory:
# Close other applications # Clear browser cache # Restart system if necessary
-
Process files in parts:
# Split large CSV files split -l 10000 large_file.csv part_
Symptoms:
- Analysis takes hours for medium files
- System becomes unresponsive
- High CPU usage
Diagnostic Commands:
# Monitor resource usage during analysis
top # Linux/macOS
taskmgr # Windows
# Check file characteristics
wc -l data.csv # Line count
file data.csv # File type and encodingSolutions:
-
Optimize chunk size: Adjust
chunk_sizeinconfig/master_config.ymlbased on your system and file size. See User Guide Performance Tuning for optimization strategies. -
Use selective engines:
# Run only fast engines # In interactive mode, select: 1,3 (Main + SweetViz) # Skip YData Profiling for large files
-
Enable performance monitoring:
# config/master_config.yml app: logging: app: performance_metrics: true
Error Messages:
OSError: [Errno 28] No space left on device
IOError: [Errno 28] No space left on device
Solutions:
-
Check available space:
df -h # Linux/macOS dir C:\ # Windows
-
Clean conda cache:
conda clean --packages --tarballs --index-cache
-
Change output location:
# Move to drive with more space python bin/run_analysis.py /path/to/data.csv # When prompted, specify output directory with more space
-
Remove old environments:
python bin/setup_environments.py remove --parallel
Error Messages:
FileNotFoundError: [Errno 2] No such file or directory: 'data.csv'
FileProcessingError: File not found: /path/to/data.csv
Solutions:
-
Check file path:
# Verify file exists ls -la data.csv # Linux/macOS dir data.csv # Windows # Use absolute path python bin/run_analysis.py /full/path/to/data.csv
-
Check file permissions:
# Linux/macOS chmod 644 data.csv # Windows - right-click file -> Properties -> Security
-
Handle special characters:
# Quote paths with spaces python bin/run_analysis.py "/path/with spaces/data.csv"
Error Messages:
UnicodeDecodeError: 'utf-8' codec can't decode byte
UnicodeDecodeError: 'charmap' codec can't decode byte
pandas.errors.ParserError: Error tokenizing data
Diagnostic Commands:
# Check file encoding
file -bi data.csv # Linux/macOS
python -c "import chardet; print(chardet.detect(open('data.csv', 'rb').read()))"Solutions:
-
Automatic encoding detection (built-in):
- The system automatically detects encoding using
charset-normalizer - Supports UTF-8, UTF-8-BOM, Latin1, ISO-8859-1, CP1252, ASCII
- The system automatically detects encoding using
-
Manual encoding conversion:
# Convert to UTF-8 iconv -f ISO-8859-1 -t UTF-8 data.csv > data_utf8.csv # Using Python python -c " with open('data.csv', 'r', encoding='iso-8859-1') as f: content = f.read() with open('data_utf8.csv', 'w', encoding='utf-8') as f: f.write(content) "
Error Messages:
DelimiterDetectionError: Could not determine delimiter
pandas.errors.ParserError: Error tokenizing data
Detected delimiter: None (confidence: 0.0)
Diagnostic Commands:
# Inspect first few lines
head -5 data.csvSolutions:
-
Manual delimiter specification:
# When prompted during interactive mode # Enter delimiter manually: ; or \t or |
-
Adjust detection settings:
# config/master_config.yml app: delimiter_detection: confidence_threshold: 0.5 # Lower threshold sample_lines: 50 # More sample lines
-
Use common delimiters:
- Comma:
, - Semicolon:
; - Tab:
\tor typetab - Pipe:
| - Space: type
space
- Comma:
Error Messages:
pandas.errors.ParserError: Error tokenizing data. C error: Expected X fields, got Y
pandas.errors.ParserError: EOF inside string starting at row X
Diagnostic Commands:
# Check line consistency
awk -F',' '{print NF}' data.csv | sort -nu # Count fields per lineSolutions:
-
Clean data format:
# Python script to fix common issues import pandas as pd import csv # Read with error handling df = pd.read_csv('data.csv', error_bad_lines=False, warn_bad_lines=True) df.to_csv('data_cleaned.csv', index=False)
-
Use robust parsing:
# For problematic files, pandas automatically handles many issues # The system includes error handling for malformed data
Error Messages:
ImportError: cannot import name 'ProfileReport' from 'ydata_profiling'
AttributeError: module 'ydata_profiling' has no attribute 'ProfileReport'
Solutions:
-
Check YData version:
conda activate csv-profiler-profiling python -c "import ydata_profiling; print(ydata_profiling.__version__)" # Should be 4.16.1
-
Reinstall YData Profiling:
conda activate csv-profiler-profiling conda remove ydata-profiling conda install ydata-profiling=4.16.1
-
Test YData manually: See Engine Testing Guide for testing commands.
Error Messages:
ImportError: No module named 'sweetviz'
AttributeError: module 'sweetviz' has no attribute 'analyze'
Solutions:
-
Reinstall SweetViz:
conda activate csv-profiler-profiling conda install sweetviz=2.3.1
-
Test SweetViz manually: See Engine Testing Guide for testing commands.
Error Messages:
ImportError: No module named 'dataprep'
AttributeError: module 'dataprep' has no attribute 'eda'
Solutions:
-
Check DataPrep environment:
conda activate csv-profiler-dataprep python -c "import dataprep; print(dataprep.__version__)" # Should be 0.4.5
-
Reinstall DataPrep:
conda activate csv-profiler-dataprep conda install dataprep=0.4.5
Error Messages:
ImportError: No module named 'scipy'
ImportError: No module named 'researchpy'
ImportError: No module named 'tableone'
Solutions:
-
Check main environment packages:
conda activate csv-profiler-main conda list | grep -E "scipy|numpy|pandas|researchpy|tableone"
-
Reinstall statistical packages:
conda activate csv-profiler-main conda install scipy=1.13.1 numpy=2.2.6 researchpy=0.3.6 tableone=0.9.5
Command-line debug mode:
python bin/run_analysis.py --debugEnvironment variable:
export DEBUG=1
python bin/run_analysis.pyEngine-specific debugging: See Engine Testing Guide for detailed debug commands.
Debug messages format:
[DEBUG Main] Main analyzer started
[DEBUG Main] File path: /path/to/data.csv
[DEBUG Main] Memory limit: 1GB, Chunk size: 10000
[DEBUG YData] YData Profiling report generation started
[DEBUG YData] Loading data with delimiter: ,
Key debug information:
- File paths and validation results
- Memory usage and limits
- Chunk processing progress
- Engine-specific operations
- Error stack traces
Console output: Real-time debug messages Log files:
# Check for log files in project directory
ls -la *.log
cat autocsv_profiler.log # If logging to file is enabledEnable detailed logging:
# config/master_config.yml
app:
logging:
level: "DEBUG"
file:
enabled: true
level: "DEBUG"
filename: "autocsv_profiler.log"
app:
performance_metrics: true
structured_debug: trueSuccessful analysis:
INFO - Analysis started for: data.csv
INFO - Delimiter detected: , (confidence: 0.95)
INFO - Engine main/analyzer completed successfully
INFO - Engine profiling/ydata_report completed successfully
INFO - Analysis completed in 45.2 seconds
Memory issues:
WARNING - Memory usage high: 85% of limit
ERROR - MemoryError: Memory limit exceeded
DEBUG - Chunk size reduced from 10000 to 5000
File processing issues:
ERROR - FileProcessingError: File not found: data.csv
WARNING - Encoding detection confidence low: 0.4
ERROR - DelimiterDetectionError: Could not determine delimiter
Environment issues:
ERROR - ImportError: No module named 'ydata_profiling'
WARNING - Engine profiling/ydata_report failed, skipping
INFO - Continuing with available engines
Search for errors:
grep -i error autocsv_profiler.log
grep -i "memory\|performance" autocsv_profiler.log
grep -E "ImportError|ModuleNotFoundError" autocsv_profiler.logPerformance analysis:
grep -i "completed in\|processing time" autocsv_profiler.log
grep -i "memory usage\|chunk size" autocsv_profiler.logEnvironment diagnostics:
grep -i "environment\|conda" autocsv_profiler.log
grep -E "main|profiling|dataprep" autocsv_profiler.logIf you've tried the solutions above and still have issues:
-
Gather diagnostic information:
# System information python --version conda --version conda env list # Run with debug mode python bin/run_analysis.py --debug > debug_output.txt 2>&1
-
Create minimal test case:
# Create small test file echo "name,age,city" > test.csv echo "John,25,NYC" >> test.csv # Test with debug python bin/run_analysis.py test.csv --debug
-
Submit issue with:
- Operating system and version
- Python and conda versions
- Error messages and debug output
- Steps to reproduce
- Test file (if possible)
# View all available options
python bin/setup_environments.py --help
# Restart from scratch - see INSTALLATION.md for complete options
python bin/setup_environments.py recreate --parallel # Recommended approach
# Test individual components
# Test environment health
# For detailed testing commands, see Engine Testing Guide
python -c "import pandas, yaml, rich; print('Base OK')"
# Debug mode
python bin/run_analysis.py --debug
# Check environment health
conda list -n csv-profiler-main --explicitComplete system reset (use as last resort):
- Remove environments:
python bin/setup_environments.py remove --parallel - Clean conda:
conda clean --all - Reinstall: Follow Installation Guide from step 2
- Check Installation Guide for setup issues
- Review User Guide for usage questions
- Use Engine Testing Guide for engine-specific debugging
- Search GitHub Issues
- Create new issue with debug output and system information
- Join GitHub Discussions for community help