Skip to content

Commit 1add948

Browse files
authored
Various (#23)
1 parent 89a51df commit 1add948

File tree

7 files changed

+380
-75
lines changed

7 files changed

+380
-75
lines changed

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,14 @@ This changelog was started for release 0.0.3.
1414
- empty_ok_if key for validator
1515
- empty_ok_unless key for validator
1616
- readme key for validator
17+
- unique key for validator
18+
- expected_rows key for templates
19+
- logs parameters for templates
20+
21+
### Fixed
22+
23+
- Bug for setValidator when using number values
24+
25+
### Changed
26+
27+
- Better validation for integers

README.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@ Invalid fields: [''] in rows: [3]
6464
IntValidator failed 5 time(s) (100.0%) on field: 'Pierraille surface (25)'
6565
```
6666

67+
When calling validate() (from python), you can access a list of logs with the 'logs' parameter of the Checkcel/Checkxtractor/Checkerator class
68+
6769
# Python library
6870

6971
```python
@@ -104,20 +106,13 @@ Validation templates can use three formats: json/yaml, and python files.
104106
In all cases, you will need to at least include a list of validators and associated column names. Several optional parameters are also available :
105107

106108
* *metadata*: A list of column names. This will create a metadata sheet with these columns, without validation on them
109+
* *expected_rows*: (Default 0): Number of *data* rows expected
107110
* *empty_ok* (Default False): Whether to accept empty values as valid
108-
* *empty_ok_if* (Default None): Accept empty value as valid if **another column** value is set
109-
* Accept either a string (column name), a list (list of column names), or a dict
110-
* The dict keys must be column names, and the values lists of 'accepted values'. The current column will accept empty values if the related column's value is in the list of accepted values
111-
* *empty_ok_unless* (Default None): Accept empty value as valid *unless* **another column** value is set
112-
* Accept either a string (column name), a list (list of column names), or a dict
113-
* The dict keys must be column names, and the values lists of 'rejected values'. The current column will accept empty values if the related column's value is **not** in the list of reject values
114111
* *ignore_space* (Default False): whether to trim the values for spaces before checking validity
115112
* *ignore_case* (Default False): whether to ignore the case
116-
* *readme* (Default None): Additional information to include on the readme page
117113

118114
The last 3 parameters will affect all the validators (when relevant), but can be overriden at the validator level (eg, you can set 'empty_ok' to True for all, but set it to False for a specific validator).
119115

120-
121116
## Python format
122117

123118
A template needs to contain a class inheriting the Checkplate class.
@@ -147,13 +142,29 @@ If needed, these dictionnaries can include an 'options' key, containing a dictio
147142

148143
## Validators
149144

150-
All validators (except NoValidator) have the 'empty_ok' option, which will consider empty values as valid.
151-
*As in-file validation for non-empty values is unreliable, the non-emptyness is not checked in-file*
145+
### Global options
146+
147+
All validators (except NoValidator) have these options available. If relevant, these options will override the ones set at the template-level
148+
149+
* *empty_ok* (Default False): Whether to accept empty values as valid (Not enforced in excel)
150+
* *empty_ok_if* (Default None): Accept empty value as valid if **another column** value is set
151+
* Accept either a string (column name), a list (list of column names), or a dict (Not enforced in excel)
152+
* The dict keys must be column names, and the values lists of 'accepted values'. The current column will accept empty values if the related column's value is in the list of accepted values
153+
* *empty_ok_unless* (Default None): Accept empty value as valid *unless* **another column** value is set. (Not enforced in excel)
154+
* Accept either a string (column name), a list (list of column names), or a dict
155+
* The dict keys must be column names, and the values lists of 'rejected values'. The current column will accept empty values if the related column's value is **not** in the list of reject values
156+
* *ignore_space* (Default False): whether to trim the values for spaces before checking validity
157+
* *ignore_case* (Default False): whether to ignore the case
158+
* *unique* (Default False): whether to enforce unicity for this column. (Not enforced in excel yet, except if there are not other validation (ie TextValidator and RegexValidator in some cases))
159+
160+
*As excel validation for non-empty values is unreliable, the non-emptiness cannot be properly enforced in excel files*
161+
162+
### Validator-specific options
152163

153164
* NoValidator (always True)
154165
* **No in-file validation generated**
155166
* TextValidator(empty_ok=False)
156-
* **No in-file validation generated**
167+
* **No in-file validation generated** (unless *unique* is set)
157168
* IntValidator(min="", max="", empty_ok=False)
158169
* Validate that a value is an integer
159170
* *min*: Minimal value allowed

checkcel/checkcel.py

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -41,16 +41,16 @@ def __init__(
4141

4242
def _log_debug_failures(self):
4343
for field_name, field_failure in self.failures.items():
44-
self.logger.debug('\nFailure on field: "{}":'.format(field_name))
44+
self.debug('\nFailure on field: "{}":'.format(field_name))
4545
for i, (row, errors) in enumerate(field_failure.items()):
46-
self.logger.debug(" {}:{}".format(self.source, row))
46+
self.debug(" {}:{}".format(self.source, row))
4747
for error in errors:
48-
self.logger.debug(" {}".format(error))
48+
self.debug(" {}".format(error))
4949

5050
def _log_validator_failures(self):
5151
for field_name, validator in self.validators.items():
5252
if validator.bad:
53-
self.logger.error(
53+
self.error(
5454
" {} failed {} time(s) ({:.1%}) on field: '{}'".format(
5555
validator.__class__.__name__,
5656
validator.fail_count,
@@ -64,22 +64,22 @@ def _log_validator_failures(self):
6464
data = validator.bad
6565
wrong_terms = ", ".join(["'{}'".format(val) for val in data["invalid_set"]])
6666
wrong_rows = ", ".join([str(val) for val in data["invalid_rows"]])
67-
self.logger.error(
67+
self.error(
6868
" Invalid fields: [{}] in rows: [{}]".format(wrong_terms, wrong_rows)
6969
)
7070
except TypeError as e:
7171
raise e
7272

7373
def _log_missing_validators(self):
74-
self.logger.error(" Missing validators for:")
74+
self.error(" Missing validators for:")
7575
self._log_missing(self.missing_validators)
7676

7777
def _log_missing_fields(self):
78-
self.logger.error(" Missing expected fields:")
78+
self.error(" Missing expected fields:")
7979
self._log_missing(self.missing_fields)
8080

8181
def _log_missing(self, missing_items):
82-
self.logger.error(
82+
self.error(
8383
"{}".format(
8484
"\n".join(
8585
[" '{}': [],".format(field) for field in sorted(missing_items)]
@@ -88,7 +88,7 @@ def _log_missing(self, missing_items):
8888
)
8989

9090
def validate(self):
91-
self.logger.info(
91+
self.info(
9292
"\nValidating {}{}".format(self.__class__.__name__, "(source={})".format(self.source) if self.source else "")
9393
)
9494

@@ -101,7 +101,7 @@ def validate(self):
101101
df = pandas.read_csv(self.source, sep=self.delimiter, skiprows=self.row)
102102

103103
if len(df) == 0:
104-
self.logger.info(
104+
self.info(
105105
"\033[1;33m" + "Source file has no data" + "\033[0m"
106106
)
107107
return False
@@ -115,28 +115,33 @@ def validate(self):
115115
validator_set = set(self.validators)
116116
self.missing_validators = self.column_set - validator_set
117117
if self.missing_validators:
118-
self.logger.info("\033[1;33m" + "Missing..." + "\033[0m")
118+
self.info("\033[1;33m" + "Missing..." + "\033[0m")
119119
self._log_missing_validators()
120120

121121
if not self.ignore_missing_validators:
122122
return False
123123

124124
self.missing_fields = validator_set - self.column_set
125125
if self.missing_fields:
126-
self.logger.info("\033[1;33m" + "Missing..." + "\033[0m")
126+
self.info("\033[1;33m" + "Missing..." + "\033[0m")
127127
self._log_missing_fields()
128128
return False
129129

130+
if self.expected_rows:
131+
if not self.expected_rows == len(df.index):
132+
self.error("Length issue: Expecting {} row(s), found {}".format(self.expected_rows, len(df.index)))
133+
return False
134+
130135
# Might be a way to do it more efficiently..
131136
df.apply(lambda row: self._validate(row), axis=1)
132137

133138
if self.failures:
134-
self.logger.info("\033[0;31m" + "Failed" + "\033[0m")
139+
self.info("\033[0;31m" + "Failed" + "\033[0m")
135140
self._log_debug_failures()
136141
self._log_validator_failures()
137142
return False
138143
else:
139-
self.logger.info("\033[0;32m" + "Passed" + "\033[0m")
144+
self.info("\033[0;32m" + "Passed" + "\033[0m")
140145
return True
141146

142147
def _validate(self, row):

checkcel/checkplate.py

Lines changed: 42 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,36 @@
1515

1616
class Checkplate(object):
1717
""" Base class for templates """
18-
def __init__(self, validators={}, empty_ok=False, ignore_case=False, ignore_space=False, metadata=[]):
18+
def __init__(self, validators={}, empty_ok=False, ignore_case=False, ignore_space=False, metadata=[], expected_rows=None):
1919
self.metadata = metadata
2020
self.logger = logs.logger
2121
self.validators = validators or getattr(self, "validators", {})
22+
self.logs = []
2223
# Will be overriden by validators config
2324
self.empty_ok = empty_ok
2425
self.ignore_case = ignore_case
2526
self.ignore_space = ignore_space
27+
self.expected_rows = expected_rows
2628
# self.trim_values = False
2729
for validator in self.validators.values():
2830
validator._set_attributes(self.empty_ok, self.ignore_case, self.ignore_space)
2931

32+
def debug(self, message):
33+
self.logger.debug(message)
34+
self.logs.append("Debug: {}".format(message))
35+
36+
def info(self, message):
37+
self.logger.info(message)
38+
self.logs.append("Info: {}".format(message))
39+
40+
def warn(self, message):
41+
self.logger.warn(message)
42+
self.logs.append("Warning: {}".format(message))
43+
44+
def error(self, message):
45+
self.logger.error(message)
46+
self.logs.append("Error: {}".format(message))
47+
3048
def load_from_python_file(self, file_path):
3149
# Limit conflicts in file name
3250
with tempfile.TemporaryDirectory() as dirpath:
@@ -44,7 +62,7 @@ def load_from_python_file(self, file_path):
4462
custom_class = list(filtered_classes.values())[0]
4563

4664
if not custom_class:
47-
self.logger.error(
65+
self.error(
4866
"Could not find a subclass of Checkplate in the provided file."
4967
)
5068
return exits.UNAVAILABLE
@@ -53,13 +71,21 @@ def load_from_python_file(self, file_path):
5371
self.empty_ok = getattr(custom_class, 'empty_ok', False)
5472
self.ignore_case = getattr(custom_class, 'ignore_case', False)
5573
self.ignore_space = getattr(custom_class, 'ignore_space', False)
74+
self.expected_rows = getattr(custom_class, 'expected_rows', 0)
75+
try:
76+
self.expected_rows = int(self.expected_rows)
77+
except ValueError:
78+
self.error(
79+
"Malformed Checkcel template: expected_rows is not an integer"
80+
)
81+
5682
for key, validator in self.validators.items():
5783
validator._set_attributes(self.empty_ok, self.ignore_case, self.ignore_space)
5884
return self
5985

6086
def load_from_json_file(self, file_path):
6187
if not os.path.isfile(file_path):
62-
self.logger.error(
88+
self.error(
6389
"Could not find a file at path {}".format(file_path)
6490
)
6591
return exits.NOINPUT
@@ -71,7 +97,7 @@ def load_from_json_file(self, file_path):
7197

7298
def load_from_yaml_file(self, file_path):
7399
if not os.path.isfile(file_path):
74-
self.logger.error(
100+
self.error(
75101
"Could not find a file at path {}".format(file_path)
76102
)
77103
return exits.NOINPUT
@@ -80,7 +106,7 @@ def load_from_yaml_file(self, file_path):
80106
try:
81107
data = yaml.safe_load(f)
82108
except yaml.YAMLError:
83-
self.logger.error(
109+
self.error(
84110
"File {} is not a valid yaml file".format(file_path)
85111
)
86112
return exits.UNAVAILABLE
@@ -104,21 +130,29 @@ def _is_valid_template(self, tup):
104130

105131
def _load_from_dict(self, data):
106132
if 'validators' not in data or not isinstance(data['validators'], list):
107-
self.logger.error(
133+
self.error(
108134
"Could not find a list of validators in data"
109135
)
110136
return exits.UNAVAILABLE
111137

112138
self.empty_ok = data.get("empty_ok", False)
113139
self.ignore_case = data.get('ignore_case', False)
114140
self.ignore_space = data.get('ignore_space', False)
141+
self.expected_rows = data.get('expected_rows', 0)
142+
try:
143+
self.expected_rows = int(self.expected_rows)
144+
except ValueError:
145+
self.error(
146+
"Malformed Checkcel template: expected_rows is not an integer"
147+
)
148+
115149
validators_list = []
116150
self.validators = {}
117151
self.metadata = data.get('metadata', [])
118152

119153
for validator in data['validators']:
120154
if 'type' not in validator or 'name' not in validator:
121-
self.logger.error(
155+
self.error(
122156
"Malformed Checkcel Validator. Require both 'type' and 'name' key"
123157
)
124158
return exits.UNAVAILABLE
@@ -129,7 +163,7 @@ def _load_from_dict(self, data):
129163
val = validator_class(**options)
130164
val._set_attributes(self.empty_ok, self.ignore_case, self.ignore_space)
131165
except AttributeError:
132-
self.logger.error(
166+
self.error(
133167
"{} is not a valid Checkcel Validator".format(validator['type'])
134168
)
135169
return exits.UNAVAILABLE

checkcel/checkxtractor.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
from checkcel import logs
21
from openpyxl import load_workbook
32
from openpyxl.worksheet.cell_range import CellRange
43
from openpyxl.utils import get_column_letter
@@ -10,7 +9,6 @@
109
class Checkxtractor(object):
1110
""" Extract validation value from xlsx file (only) """
1211
def __init__(self, source, output, sheet=0, row=0, template_type="python"):
13-
self.logger = logs.logger
1412
self.source = source
1513
self.output = output
1614
self.sheet = int(sheet)

0 commit comments

Comments
 (0)