|
| 1 | +# NeuroLite |
| 2 | + |
| 3 | +[](https://python.org) |
| 4 | +[](LICENSE) |
| 5 | +[](https://github.com/neurolite/neurolite) |
| 6 | +[](https://github.com/neurolite/neurolite) |
| 7 | + |
| 8 | +**NeuroLite** is an automated AI/ML library that intelligently detects data types and automatically applies the best models with minimal code. It simplifies the machine learning workflow by providing comprehensive data analysis, quality assessment, and model recommendations. |
| 9 | + |
| 10 | +## 🚀 Features |
| 11 | + |
| 12 | +### 🔍 Intelligent Data Detection |
| 13 | +- **File Format Detection**: Supports 20+ formats (CSV, JSON, XML, Excel, Parquet, HDF5, images, audio, video) |
| 14 | +- **Data Structure Analysis**: Automatically identifies tabular, time series, image, text, and audio data |
| 15 | +- **Column Type Classification**: Numerical, categorical, temporal, text, and binary data types |
| 16 | +- **Domain-Specific Detection**: Computer vision, NLP, and time series task identification |
| 17 | + |
| 18 | +### 📊 Comprehensive Quality Assessment |
| 19 | +- **Missing Data Analysis**: MCAR, MAR, MNAR pattern detection with imputation recommendations |
| 20 | +- **Data Consistency Validation**: Duplicate detection, format consistency, range validation |
| 21 | +- **Statistical Properties**: Distribution analysis, correlation detection, outlier identification |
| 22 | +- **Quality Metrics**: Completeness, consistency, validity, and uniqueness scoring |
| 23 | + |
| 24 | +### 🤖 Automated Model Recommendations |
| 25 | +- **Traditional ML Models**: Decision Trees, Random Forest, SVM, Linear models |
| 26 | +- **Deep Learning**: CNN, RNN/LSTM, Transformers, AutoEncoders |
| 27 | +- **Preprocessing Suggestions**: Normalization, encoding, feature scaling strategies |
| 28 | +- **Performance Estimation**: Resource requirements, complexity assessment |
| 29 | + |
| 30 | +## 📦 Installation |
| 31 | + |
| 32 | +### Basic Installation |
| 33 | +```bash |
| 34 | +pip install neurolite |
| 35 | +``` |
| 36 | + |
| 37 | +### Development Installation |
| 38 | +```bash |
| 39 | +git clone https://github.com/dot-css/neurolite |
| 40 | +cd neurolite |
| 41 | +pip install -e .[dev] |
| 42 | +``` |
| 43 | + |
| 44 | +### Full Installation (with all optional dependencies) |
| 45 | +```bash |
| 46 | +pip install neurolite[all] |
| 47 | +``` |
| 48 | + |
| 49 | +## 🏃♂️ Quick Start |
| 50 | + |
| 51 | +### Basic Usage (3 lines of code!) |
| 52 | +```python |
| 53 | +from neurolite import DataProfiler |
| 54 | + |
| 55 | +# Analyze any dataset with a single function call |
| 56 | +profiler = DataProfiler() |
| 57 | +report = profiler.analyze('your_data.csv') |
| 58 | + |
| 59 | +# Get comprehensive insights |
| 60 | +print(f"Data type: {report.data_structure.structure_type}") |
| 61 | +print(f"Quality score: {report.quality_metrics.completeness:.2f}") |
| 62 | +print(f"Recommended models: {[r.model_name for r in report.model_recommendations[:3]]}") |
| 63 | +``` |
| 64 | + |
| 65 | +### Advanced Usage |
| 66 | +```python |
| 67 | +from neurolite import DataProfiler |
| 68 | +from neurolite.detectors import QualityDetector, DataTypeDetector |
| 69 | +import pandas as pd |
| 70 | + |
| 71 | +# Load your data |
| 72 | +df = pd.read_csv('your_dataset.csv') |
| 73 | + |
| 74 | +# Initialize profiler |
| 75 | +profiler = DataProfiler() |
| 76 | + |
| 77 | +# Perform comprehensive analysis |
| 78 | +report = profiler.analyze(df) |
| 79 | + |
| 80 | +# Access detailed results |
| 81 | +print("=== File Information ===") |
| 82 | +print(f"Format: {report.file_info.format_type}") |
| 83 | +print(f"Structure: {report.data_structure.structure_type}") |
| 84 | +print(f"Dimensions: {report.data_structure.dimensions}") |
| 85 | + |
| 86 | +print("\n=== Quality Assessment ===") |
| 87 | +print(f"Completeness: {report.quality_metrics.completeness:.2%}") |
| 88 | +print(f"Consistency: {report.quality_metrics.consistency:.2%}") |
| 89 | +print(f"Missing Pattern: {report.quality_metrics.missing_pattern}") |
| 90 | + |
| 91 | +print("\n=== Column Analysis ===") |
| 92 | +for col, analysis in report.column_analysis.items(): |
| 93 | + print(f"{col}: {analysis.primary_type} ({analysis.subtype})") |
| 94 | + |
| 95 | +print("\n=== Model Recommendations ===") |
| 96 | +for rec in report.model_recommendations[:5]: |
| 97 | + print(f"- {rec.model_name} ({rec.confidence:.2%} confidence)") |
| 98 | + print(f" Rationale: {rec.rationale}") |
| 99 | +``` |
| 100 | + |
| 101 | +### Specific Detectors |
| 102 | +```python |
| 103 | +from neurolite.detectors import QualityDetector, DataTypeDetector, FileDetector |
| 104 | + |
| 105 | +# Quality assessment |
| 106 | +quality_detector = QualityDetector() |
| 107 | +quality_report = quality_detector.analyze_quality(df) |
| 108 | +missing_analysis = quality_detector.detect_missing_patterns(df) |
| 109 | +duplicate_analysis = quality_detector.find_duplicates(df) |
| 110 | + |
| 111 | +# Data type detection |
| 112 | +type_detector = DataTypeDetector() |
| 113 | +column_types = type_detector.classify_columns(df) |
| 114 | + |
| 115 | +# File format detection |
| 116 | +file_detector = FileDetector() |
| 117 | +file_format = file_detector.detect_format('data.csv') |
| 118 | +data_structure = file_detector.detect_structure(df) |
| 119 | +``` |
| 120 | + |
| 121 | +## 📋 Supported Data Types |
| 122 | + |
| 123 | +### File Formats |
| 124 | +- **Tabular**: CSV, TSV, Excel (.xlsx, .xls), Parquet, HDF5 |
| 125 | +- **Structured**: JSON, XML, YAML |
| 126 | +- **Images**: PNG, JPG, JPEG, TIFF, BMP, GIF |
| 127 | +- **Audio**: WAV, MP3, FLAC, OGG |
| 128 | +- **Video**: MP4, AVI, MOV, MKV |
| 129 | +- **Text**: TXT, MD, PDF, DOC |
| 130 | + |
| 131 | +### Data Structures |
| 132 | +- **Tabular Data**: Structured datasets with rows and columns |
| 133 | +- **Time Series**: Sequential data with temporal patterns |
| 134 | +- **Image Data**: Computer vision datasets |
| 135 | +- **Text Corpus**: Natural language processing datasets |
| 136 | +- **Audio Signals**: Speech and audio analysis datasets |
| 137 | + |
| 138 | +### Column Types |
| 139 | +- **Numerical**: Integer, float, continuous, discrete |
| 140 | +- **Categorical**: Nominal, ordinal, high/low cardinality |
| 141 | +- **Temporal**: Dates, timestamps, time series |
| 142 | +- **Text**: Natural language, categorical text, structured text |
| 143 | +- **Binary**: Boolean, binary encoded data |
| 144 | + |
| 145 | +## 🔧 Configuration |
| 146 | + |
| 147 | +### Environment Variables |
| 148 | +```bash |
| 149 | +export NEUROLITE_CACHE_DIR="/path/to/cache" |
| 150 | +export NEUROLITE_LOG_LEVEL="INFO" |
| 151 | +export NEUROLITE_MAX_MEMORY="8GB" |
| 152 | +``` |
| 153 | + |
| 154 | +### Configuration File |
| 155 | +Create `~/.neurolite/config.yaml`: |
| 156 | +```yaml |
| 157 | +cache: |
| 158 | + enabled: true |
| 159 | + directory: "~/.neurolite/cache" |
| 160 | + max_size: "1GB" |
| 161 | + |
| 162 | +analysis: |
| 163 | + max_file_size: "1GB" |
| 164 | + timeout: 300 |
| 165 | + confidence_threshold: 0.8 |
| 166 | + |
| 167 | +models: |
| 168 | + enable_deep_learning: true |
| 169 | + enable_traditional_ml: true |
| 170 | + max_recommendations: 10 |
| 171 | +``` |
| 172 | +
|
| 173 | +## 🧪 Testing |
| 174 | +
|
| 175 | +Run the test suite: |
| 176 | +```bash |
| 177 | +# Run all tests |
| 178 | +pytest |
| 179 | + |
| 180 | +# Run with coverage |
| 181 | +pytest --cov=neurolite --cov-report=html |
| 182 | + |
| 183 | +# Run specific test categories |
| 184 | +pytest tests/test_quality_detector.py |
| 185 | +pytest tests/test_data_type_detector.py |
| 186 | +``` |
| 187 | + |
| 188 | +## 📚 Documentation |
| 189 | + |
| 190 | +- **API Reference**: [https://neurolite.readthedocs.io/](https://neurolite.readthedocs.io/) |
| 191 | +- **User Guide**: [docs/user_guide.md](docs/user_guide.md) |
| 192 | +- **Examples**: [examples/](examples/) |
| 193 | +- **Contributing**: [CONTRIBUTING.md](CONTRIBUTING.md) |
| 194 | + |
| 195 | +## 🤝 Contributing |
| 196 | + |
| 197 | +We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details. |
| 198 | + |
| 199 | +### Development Setup |
| 200 | +```bash |
| 201 | +git clone https://github.com/dot-css/neurolite |
| 202 | +cd neurolite |
| 203 | +pip install -e .[dev] |
| 204 | +pre-commit install |
| 205 | +``` |
| 206 | + |
| 207 | +### Running Tests |
| 208 | +```bash |
| 209 | +pytest tests/ |
| 210 | +black neurolite/ tests/ |
| 211 | +flake8 neurolite/ tests/ |
| 212 | +mypy neurolite/ |
| 213 | +``` |
| 214 | + |
| 215 | +## 📈 Performance |
| 216 | + |
| 217 | +NeuroLite is designed for performance: |
| 218 | +- **Fast Analysis**: < 5 seconds for datasets up to 1GB |
| 219 | +- **Memory Efficient**: Streaming and lazy loading for large datasets |
| 220 | +- **Parallel Processing**: Multi-core support for complex analyses |
| 221 | +- **Caching**: Intelligent caching for repeated analyses |
| 222 | + |
| 223 | +## 🛣️ Roadmap |
| 224 | + |
| 225 | +- [ ] **v0.2.0**: Enhanced deep learning model recommendations |
| 226 | +- [ ] **v0.3.0**: Real-time data stream analysis |
| 227 | +- [ ] **v0.4.0**: AutoML pipeline integration |
| 228 | +- [ ] **v0.5.0**: Distributed processing support |
| 229 | +- [ ] **v1.0.0**: Production-ready release |
| 230 | + |
| 231 | +## 📄 License |
| 232 | + |
| 233 | +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
| 234 | + |
| 235 | +## 🙏 Acknowledgments |
| 236 | + |
| 237 | +- Built with ❤️ by the NeuroLite team |
| 238 | +- Inspired by the need for accessible AI/ML tools |
| 239 | +- Thanks to all contributors and the open-source community |
| 240 | + |
| 241 | +## 📞 Support |
| 242 | + |
| 243 | +- **Issues**: [GitHub Issues](https://github.com/dot-css/neurolite/issues) |
| 244 | +- **Discussions**: [GitHub Discussions](https://github.com/dot-css/neurolite/discussions) |
| 245 | + |
| 246 | +- **Documentation**: [https://neurolite.readthedocs.io/](https://neurolite.readthedocs.io/) |
| 247 | + |
| 248 | +--- |
| 249 | + |
| 250 | +**Made with ❤️ for the AI/ML community** |
0 commit comments