Skip to content

Commit a05133a

Browse files
authored
Various stuff (#24)
1 parent 1add948 commit a05133a

14 files changed

+1242
-121
lines changed

CHANGELOG.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
This changelog was started for release 0.0.3.
99

10-
## [0.0.3] - Unreleased
10+
## [0.0.3] - 21/11/2022
1111

1212
### Added
1313

14-
- empty_ok_if key for validator
15-
- empty_ok_unless key for validator
14+
- empty_ok_if key for validator & templates
15+
- empty_ok_unless key for validator & templates
1616
- readme key for validator
1717
- unique key for validator
1818
- expected_rows key for templates
1919
- logs parameters for templates
20+
- na_ok key for validators & templates
21+
- skip_generation key for validators & templates
22+
- skip_validation key for validators & templates
2023

2124
### Fixed
2225

2326
- Bug for setValidator when using number values
27+
- Fixed regex for GPS
2428

2529
### Changed
2630

2731
- Better validation for integers
32+
- Refactor validation in excel for most validators (to include unique & na_ok)

README.md

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Checkcel
22

33
Checkcel is a generation & validation tool for CSV/ODS/XLSX/XLS files.
4-
Basic validations (sets, whole, decimals, unicity, emails, dates) are included, but also ontologies validation.
5-
(Using the [OLS API](https://www.ebi.ac.uk/ols/index))
4+
Basic validations (sets, whole, decimals, unicity, emails, dates, regex) are included, but also ontologies validation.
5+
(Using the [OLS API](https://www.ebi.ac.uk/ols/index), and the [INRAE thesaurus](https://consultation.vocabulaires-ouverts.inrae.fr))
66

77
Checkcel works with either python templates or json/yml files for the generation and validation.
88
Examples are available [here](https://github.com/mboudet/checkcel_templates) or in the [example folder](examples/).
@@ -98,6 +98,7 @@ Checkcel(
9898
sheet="0"
9999
).load_from_json_file(your_json_template_file).validate()
100100

101+
# You can access the logs from python with the 'logs' key of the Checkcel class
101102
```
102103

103104
# Templates
@@ -108,8 +109,12 @@ In all cases, you will need to at least include a list of validators and associa
108109
* *metadata*: A list of column names. This will create a metadata sheet with these columns, without validation on them
109110
* *expected_rows*: (Default 0): Number of *data* rows expected
110111
* *empty_ok* (Default False): Whether to accept empty values as valid
111-
* *ignore_space* (Default False): whether to trim the values for spaces before checking validity
112-
* *ignore_case* (Default False): whether to ignore the case
112+
* *na_ok* (Default False): whether to allow NA (or n/a) values as valid
113+
* *ignore_space* (Default False): whether to trim the values for spaces before checking validity in python
114+
* *ignore_case* (Default False): whether to ignore the case (when relevant)before checking validity in python
115+
* *skip_generation* (Default False): whether to skip the excel validation generation (for file generation) for all validators
116+
* *skip_validation* (Default False): whether to skip the python validation for all validators
117+
* *unique* (Default False): whether to require unicity for all validators
113118

114119
The last 3 parameters will affect all the validators (when relevant), but can be overriden at the validator level (eg, you can set 'empty_ok' to True for all, but set it to False for a specific validator).
115120

@@ -155,66 +160,69 @@ All validators (except NoValidator) have these options available. If relevant, t
155160
* The dict keys must be column names, and the values lists of 'rejected values'. The current column will accept empty values if the related column's value is **not** in the list of reject values
156161
* *ignore_space* (Default False): whether to trim the values for spaces before checking validity
157162
* *ignore_case* (Default False): whether to ignore the case
158-
* *unique* (Default False): whether to enforce unicity for this column. (Not enforced in excel yet, except if there are not other validation (ie TextValidator and RegexValidator in some cases))
163+
* *unique* (Default False): whether to enforce unicity for this column. (Not enforced in excel for 'Set-type' validators (set, linked-set, ontology, vocabulaireOuvert))
164+
* *na_ok* (Default False): whether to allow NA (or n/a) values as valid.
165+
* *skip_generation* (Default False): whether to skip the excel validation for this validator (for file generation)
166+
* *skip_validation* (Default False): whether to skip the python validation for this validator
159167

160168
*As excel validation for non-empty values is unreliable, the non-emptiness cannot be properly enforced in excel files*
161169

162170
### Validator-specific options
163171

164172
* NoValidator (always True)
165173
* **No in-file validation generated**
166-
* TextValidator(empty_ok=False)
174+
* TextValidator(**kwargs)
167175
* **No in-file validation generated** (unless *unique* is set)
168-
* IntValidator(min="", max="", empty_ok=False)
176+
* IntValidator(min="", max="", **kwargs)
169177
* Validate that a value is an integer
170178
* *min*: Minimal value allowed
171179
* *max*: Maximal value allowed
172-
* FloatValidator(min="", max="", empty_ok=False)
180+
* FloatValidator(min="", max="", **kwargs)
173181
* Validate that a value is an float
174182
* *min*: Minimal value allowed
175183
* *max*: Maximal value allowed
176-
* SetValidator(valid_values=[], empty_ok=False)
184+
* SetValidator(valid_values=[], **kwargs)
177185
* Validate that a value is part of a set of allowed values
178186
* *valid_values*: list of valid values
179-
* LinkedSetValidator(linked_column="", valid_values={}, empty_ok=False)
187+
* LinkedSetValidator(linked_column="", valid_values={}, **kwargs)
180188
* Validate that a value is part of a set of allowed values, in relation to another column value.
181189
* Eg: Valid values for column C will be '1' or '2' if column B value is 'Test', else '3' or '4'
182190
* *linked_column*: Linked column name
183191
* *valid_values*: Dict with the *linked_column* values as keys, and list of valid values as values
184192
* Ex: {"Test": ['1', '2'], "Test2": ['3', '4']}
185-
* EmailValidator(empty_ok=False)
186-
* DateValidator(day_first=True, empty_ok=False, before=None, after=None)
193+
* EmailValidator(**kwargs)
194+
* DateValidator(day_first=True, before=None, after=None, **kwargs)
187195
* Validate that a value is a date.
188196
* *day_first* (Default True): Whether to consider the day as the first part of the date for ambiguous values.
189197
* *before* Latest date allowed
190198
* *after*: Earliest date allowed
191-
* TimeValidator(empty_ok=False, before=None, after=None)
199+
* TimeValidator(before=None, after=None, **kwargs)
192200
* Validate that a value is a time of the day
193201
* *before* Latest value allowed
194202
* *after*: Earliest value allowed
195-
* UniqueValidator(unique_with=[], empty_ok=False)
203+
* UniqueValidator(unique_with=[], **kwargs)
196204
* Validate that a column has only unique values.
197205
* *unique_with*: List of column names if you need a tuple of column values to be unique.
198206
* Ex: *I want the tuple (value of column A, value of column B) to be unique*
199-
* OntologyValidator(ontology, root_term="", empty_ok=False)
207+
* OntologyValidator(ontology, root_term="", **kwargs)
200208
* Validate that a term is part of an ontology, using the [OLS API](https://www.ebi.ac.uk/ols/index) for validation
201209
* *ontology* needs to be a short-form ontology name (ex: ncbitaxon)
202210
* *root_term* can be used if you want to make sure your terms are *descendants* of a specific term
203211
* (Should be used when generating validated files using big ontologies)
204-
* VocabulaireOuvertValidator(root_term="", lang="en", labellang="en", vocab="thesaurus-inrae", empty_ok=False)
212+
* VocabulaireOuvertValidator(root_term="", lang="en", labellang="en", vocab="thesaurus-inrae", **kwargs)
205213
* Validate that a term is part of the INRAE(default) or IRSTEA thesaurus
206214
* **No in-file validation generated** *unless using root_term*
207215
* *root_term*: Same as OntologyValidator.
208216
* *lang*: Language for the queried terms *(en or fr)*
209217
* *labellang*: Language for the queries returns (ie, the generated validation in files). Default to *lang* values.
210218
* *vocab*: Vocabulary used. Either 'thesaurus-inrae' or 'thesaurus-irstea'.
211-
* GPSValidator(empty_ok=False, format="DD", only_long=False, only_lat=False)
219+
* GPSValidator(format="DD", only_long=False, only_lat=False, **kwargs)
212220
* Validate that a term is a valid GPS cordinate
213221
* **No in-file validation generated**
214222
* *format*: Expected GPS format. Valid values are *dd* (decimal degrees, default value) or *dms* (degree minutes seconds)
215223
* *only_long*: Expect only a longitude
216224
* *only_lat*: Expect only a latitude
217-
* RegexValidator(regex, excel_formulat="", empty_ok=False)
225+
* RegexValidator(regex, excel_formulat="", **kwargs)
218226
* Validate that a term match a specific regex
219227
* **No in-file validation generated** *unless using excel_formula*
220228
* *excel_formula*: Custom rules for in-file validation. [Examples here](http://www.contextures.com/xlDataVal07.html).

checkcel/checkerator.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def generate(self):
4242
if isinstance(validator, OntologyValidator) or isinstance(validator, VocabulaireOuvertValidator):
4343
if not ontology_sheet:
4444
ontology_sheet = wb.create_sheet(title="Ontologies")
45-
data_validation = validator.generate(get_column_letter(current_data_column), get_column_letter(current_ontology_column), ontology_sheet)
45+
data_validation = validator.generate(get_column_letter(current_data_column), column_name, get_column_letter(current_ontology_column), ontology_sheet)
4646
current_ontology_column += 1
4747
elif isinstance(validator, SetValidator):
4848
# Total size, including separators must be < 256
@@ -52,25 +52,28 @@ def generate(self):
5252
data_validation = validator.generate(get_column_letter(current_data_column), column_name, get_column_letter(current_set_column), set_sheet)
5353
current_set_column += 1
5454
else:
55-
data_validation = validator.generate(get_column_letter(current_data_column))
55+
data_validation = validator.generate(get_column_letter(current_data_column), column_name)
5656
set_columns[column_name] = get_column_letter(current_data_column)
5757
elif isinstance(validator, LinkedSetValidator):
5858
if not set_sheet:
5959
set_sheet = wb.create_sheet(title="Sets")
60-
data_validation = validator.generate(get_column_letter(current_data_column), set_columns, column_name, get_column_letter(current_set_column), set_sheet, wb)
60+
data_validation = validator.generate(get_column_letter(current_data_column), column_name, set_columns, get_column_letter(current_set_column), set_sheet, wb)
6161
current_set_column += 1
6262
set_columns[column_name] = get_column_letter(current_data_column)
6363
elif isinstance(validator, UniqueValidator):
64-
data_validation = validator.generate(get_column_letter(current_data_column), column_dict)
64+
data_validation = validator.generate(get_column_letter(current_data_column), column_name, column_dict)
6565
else:
66-
data_validation = validator.generate(get_column_letter(current_data_column))
66+
data_validation = validator.generate(get_column_letter(current_data_column), column_name)
6767
if data_validation:
6868
data_sheet.add_data_validation(data_validation)
6969
current_data_column += 1
7070
for sheet in wb.worksheets:
7171
for column_cells in sheet.columns:
7272
length = (max(len(self.as_text(cell.value)) for cell in column_cells) + 2) * 1.2
7373
sheet.column_dimensions[get_column_letter(column_cells[0].column)].width = length
74+
75+
if self.freeze_header:
76+
data_sheet.freeze_panes = "A2"
7477
wb.save(filename=self.output)
7578

7679
def as_text(self, value):

checkcel/checkplate.py

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,19 +15,24 @@
1515

1616
class Checkplate(object):
1717
""" Base class for templates """
18-
def __init__(self, validators={}, empty_ok=False, ignore_case=False, ignore_space=False, metadata=[], expected_rows=None):
18+
def __init__(self, validators={}, empty_ok=False, ignore_case=False, ignore_space=False, metadata=[], expected_rows=None, na_ok=False, unique=False, skip_generation=False, skip_validation=False, freeze_header=False):
1919
self.metadata = metadata
2020
self.logger = logs.logger
2121
self.validators = validators or getattr(self, "validators", {})
2222
self.logs = []
2323
# Will be overriden by validators config
2424
self.empty_ok = empty_ok
25+
self.na_ok = na_ok
26+
self.unique = unique
27+
self.skip_generation = skip_generation
28+
self.skip_validation = skip_validation
2529
self.ignore_case = ignore_case
2630
self.ignore_space = ignore_space
2731
self.expected_rows = expected_rows
32+
self.freeze_header = freeze_header
2833
# self.trim_values = False
2934
for validator in self.validators.values():
30-
validator._set_attributes(self.empty_ok, self.ignore_case, self.ignore_space)
35+
validator._set_attributes(self.empty_ok, self.ignore_case, self.ignore_space, self.na_ok, self.unique, self.skip_generation, self.skip_validation)
3136

3237
def debug(self, message):
3338
self.logger.debug(message)
@@ -69,9 +74,14 @@ def load_from_python_file(self, file_path):
6974
self.metadata = getattr(custom_class, 'metadata', [])
7075
self.validators = deepcopy(custom_class.validators)
7176
self.empty_ok = getattr(custom_class, 'empty_ok', False)
77+
self.na_ok = getattr(custom_class, 'na_ok', False)
78+
self.unique = getattr(custom_class, 'unique', False)
79+
self.skip_generation = getattr(custom_class, 'skip_generation', False)
80+
self.skip_validation = getattr(custom_class, 'skip_validation', False)
7281
self.ignore_case = getattr(custom_class, 'ignore_case', False)
7382
self.ignore_space = getattr(custom_class, 'ignore_space', False)
7483
self.expected_rows = getattr(custom_class, 'expected_rows', 0)
84+
self.freeze_header = getattr(custom_class, 'freeze_header', False)
7585
try:
7686
self.expected_rows = int(self.expected_rows)
7787
except ValueError:
@@ -80,7 +90,7 @@ def load_from_python_file(self, file_path):
8090
)
8191

8292
for key, validator in self.validators.items():
83-
validator._set_attributes(self.empty_ok, self.ignore_case, self.ignore_space)
93+
validator._set_attributes(self.empty_ok, self.ignore_case, self.ignore_space, self.na_ok, self.unique, self.skip_generation, self.skip_validation)
8494
return self
8595

8696
def load_from_json_file(self, file_path):
@@ -136,9 +146,14 @@ def _load_from_dict(self, data):
136146
return exits.UNAVAILABLE
137147

138148
self.empty_ok = data.get("empty_ok", False)
149+
self.na_ok = data.get("na_ok", False)
139150
self.ignore_case = data.get('ignore_case', False)
140151
self.ignore_space = data.get('ignore_space', False)
141152
self.expected_rows = data.get('expected_rows', 0)
153+
self.unique = data.get('unique', False)
154+
self.skip_generation = data.get('skip_generation', False)
155+
self.skip_validation = data.get('skip_validation', False)
156+
self.freeze_header = data.get('freeze_header', False)
142157
try:
143158
self.expected_rows = int(self.expected_rows)
144159
except ValueError:
@@ -161,7 +176,7 @@ def _load_from_dict(self, data):
161176
try:
162177
validator_class = getattr(validators, validator['type'])
163178
val = validator_class(**options)
164-
val._set_attributes(self.empty_ok, self.ignore_case, self.ignore_space)
179+
val._set_attributes(self.empty_ok, self.ignore_case, self.ignore_space, self.na_ok, self.unique, self.skip_generation, self.skip_validation)
165180
except AttributeError:
166181
self.error(
167182
"{} is not a valid Checkcel Validator".format(validator['type'])

0 commit comments

Comments
 (0)