You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add comprehensive Excel support to nf-schema
Implements full Excel file processing functionality for nf-schema, addressing the need for
direct Excel workbook support without manual CSV conversion.
## Key Features
- **Full Excel Format Support**: XLSX, XLSM, XLSB, and XLS files using Apache POI 5.4.1
- **Sheet Selection**: Select specific sheets by name or index via options parameter
- **Data Type Preservation**: Proper handling of strings, numbers, booleans, dates, and formulas
- **Schema Integration**: Full compatibility with existing JSON schema validation pipeline
- **Backward Compatibility**: Zero impact on existing CSV/TSV/JSON/YAML functionality
## Implementation Details
### Core Components
- **WorkbookConverter.groovy**: Main Excel processing class with comprehensive error handling
- **Integration**: Seamless integration with SamplesheetConverter for transparent Excel processing
- **File Type Detection**: Enhanced file type detection in Files utility class
### Architecture
- **Clean Separation**: Excel processing handled in dedicated WorkbookConverter class
- **Configuration Integration**: Uses existing ValidationConfig for consistent error handling
- **Modular Design**: Separated header processing, row processing, and cell value extraction
### New Dependencies
- Apache POI 5.4.1 for Excel format support
- POI-OOXML for modern Excel formats (XLSX, XLSM)
- POI-Scratchpad for legacy Excel formats (XLS)
## Usage Examples
```nextflow
// Basic Excel usage - works just like CSV
params.input = "samplesheet.xlsx"
params.schema = "assets/schema_input.json"
include { samplesheetToList } from 'plugin/nf-schema'
workflow {
samplesheet = samplesheetToList(params.input, params.schema)
}
```
```nextflow
// Select specific sheet by name
samplesheet = samplesheetToList(params.input, params.schema, [sheet: "Sample_Data"])
// Select sheet by index (0-based)
samplesheet = samplesheetToList(params.input, params.schema, [sheet: 0])
```
## Testing
- WorkbookConverter unit tests with comprehensive error handling scenarios
- File type detection tests for all Excel formats
- Integration tests planned for full workflow validation
## Impact
- **User Experience**: Users can work directly with Excel files from data analysts/collaborators
- **Workflow Simplification**: Eliminates manual CSV conversion step
- **Data Fidelity**: Preserves original data types and formatting
- **Enterprise Ready**: Supports common Excel formats used in research/industry
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
// Function to detect if a file is a CSV, TSV, JSON, YAML or Excel file
36
46
//
37
47
publicstaticStringgetFileType(Pathfile) {
38
48
defString extension = file.getExtension()
39
-
if (extension in ["csv", "tsv", "yml", "yaml", "json"]) {
49
+
if (extension in ["csv", "tsv", "yml", "yaml", "json", "xlsx", "xlsm", "xlsb", "xls"]) {
40
50
return extension =="yml"?"yaml": extension
41
51
}
42
52
@@ -46,7 +56,7 @@ public class Files {
46
56
defInteger tabCount = header.count("\t")
47
57
48
58
if ( commaCount == tabCount ){
49
-
log.error("Could not derive file type from ${file}. Please specify the file extension (CSV, TSV, YML, YAMLand JSON are supported).".toString())
59
+
log.error("Could not derive file type from ${file}. Please specify the file extension (CSV, TSV, YML, YAML, JSON, and Excel formats are supported).".toString())
0 commit comments