Skip to content

Commit 3ebd81f

Browse files
committed
Fixed $480 - Ability to transform Excel formatted file
* Enhance functionality of transform to work with .xlsx * Update docs and changelog Signed-off-by: Chin Yeung Li <[email protected]>
1 parent 8dfccf4 commit 3ebd81f

File tree

9 files changed

+113
-36
lines changed

9 files changed

+113
-36
lines changed

CHANGELOG.rst

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
11
2021-xx-xx
2-
<<<<<<< HEAD
32
Release 7.0.0
4-
=======
5-
Release x.x.x
6-
>>>>>>> refs/heads/337_enhance_check_command
73

84
* Add '@' as a support character for filename #451
95
* Add support to collect redistributable sources #22
@@ -15,7 +11,8 @@
1511
* Update configuration scripts
1612
* Use readthedocs for documentation
1713
* Add Dockerfile to run aboutcode with docker
18-
* Add new option to choose extract license from ScanCode LicenseDB or DJC License Library
14+
* Add new option to choose extract license from ScanCode LicenseDB or DJC License Library
15+
* Add ability to transform Excel formatted file
1916

2017
2021-04-02
2118
Release 6.0.0

docs/source/general.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ AboutCode Toolkit is a tool for your software development team to document your
1919

2020
- **inventory**: Generate a Software Inventory list (.csv or .json format) from your codebase based on ABOUT file(s). Note that this Software Inventory will only include components that have AboutCode Toolkit data. In another word, if you do not create AboutCode Toolkit files for your own original software components, these components will not show up in the generated inventory.
2121

22-
- **transform**: A command to transform an input CSV/JSON by applying renaming and/or filtering and then output to a new CSV/JSON file.
22+
- **transform**: A command to transform an input CSV/JSON/Excel by applying renaming and/or filtering and then output to a new CSV/JSON/Excel file.
2323

2424
Additional AboutCode Toolkit information is available at:
2525

@@ -168,11 +168,11 @@ Fields Renaming and Optional Custom Fields
168168

169169
Since your input's field name may not match with the AboutCode Toolkit standard field name, you can use the transform subcommand to do the transformation.
170170

171-
A transform configuration file is used to describe which transformations and validations to apply to a source CSV/JSON file. This is a simple text file using YAML format, using the same format as an .ABOUT file.
171+
A transform configuration file is used to describe which transformations and validations to apply to a source CSV/JSON/Excel file. This is a simple text file using YAML format, using the same format as an .ABOUT file.
172172

173173
The attributes that can be set in a configuration file are:
174174

175-
- field_renamings: An optional map of source field name to target new field name that is used to rename CSV/JSON fields.
175+
- field_renamings: An optional map of source field name to target new field name that is used to rename CSV/JSON/Excel fields.
176176

177177
.. code-block:: none
178178
@@ -184,7 +184,7 @@ The attributes that can be set in a configuration file are:
184184
The renaming is always applied first before other transforms and checks. All other field names referenced below are AFTER the renaming have been applied.
185185
For instance with this configuration, the field "Directory/Location" will be renamed to "about_resource" and "foo" to "bar":
186186

187-
- required_fields: An optional list of required field names that must have a value, beyond the standard field names. If a source CSV/JSON does not have such a field or an entry is missing a value for a required field, an error is reported.
187+
- required_fields: An optional list of required field names that must have a value, beyond the standard field names. If a source CSV/JSON/Excel does not have such a field or an entry is missing a value for a required field, an error is reported.
188188

189189
For instance with this configuration, an error will be reported if the fields "name" and "version" are missing, or if any entry does not have a value set for these fields:
190190

docs/source/reference.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Commands
3434
gen Generate .ABOUT files from an inventory as CSV or JSON.
3535
inventory Collect the inventory of .ABOUT files to a CSV or JSON
3636
file.
37-
transform Transform a CSV/JSON by applying renamings, filters and checks.
37+
transform Transform a CSV/JSON/Excel by applying renamings, filters and checks.
3838
3939
attrib
4040
======
@@ -446,8 +446,8 @@ Syntax
446446
447447
about transform [OPTIONS] LOCATION OUTPUT
448448
449-
LOCATION: Path to a CSV/JSON file.
450-
OUTPUT: Path to CSV/JSON inventory file to create.
449+
LOCATION: Path to a CSV/JSON/Excel file.
450+
OUTPUT: Path to CSV/JSON/Excel inventory file to create.
451451
452452
Options
453453
-------
@@ -464,7 +464,7 @@ Options
464464
Purpose
465465
-------
466466

467-
Transform the CSV/JSON file at LOCATION by applying renamings, filters and checks and then write a new CSV/JSON to OUTPUT (Format for input and output need to be the same).
467+
Transform the CSV/JSON/Excel file at LOCATION by applying renamings, filters and checks and then write a new CSV/JSON/Excel to OUTPUT (Format for input and output need to be the same).
468468

469469
Details
470470
^^^^^^^

docs/source/specification.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ Fields order and multiple occurrences
116116

117117
The field order does not matter. Multiple occurrences of a field name is not supported.
118118

119-
The tool processing an ABOUT file or CSV/JSON input will issue an error when a field name occurs more than once in the input file.
119+
The tool processing an ABOUT file or CSV/JSON/Excel input will issue an error when a field name occurs more than once in the input file.
120120

121121
Field referencing a file
122122
------------------------

src/attributecode/cmd.py

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@
3737
from attributecode.model import copy_redist_src
3838
from attributecode.model import pre_process_and_fetch_license_dict
3939
from attributecode.model import write_output
40+
from attributecode.transform import transform_csv_to_csv
41+
from attributecode.transform import transform_json_to_json
42+
from attributecode.transform import transform_excel_to_excel
43+
from attributecode.transform import Transformer
4044
from attributecode.util import extract_zip
4145
from attributecode.util import filter_errors
4246
from attributecode.util import get_temp_dir
@@ -527,17 +531,17 @@ def print_config_help(ctx, param, value):
527531

528532

529533
@about.command(cls=AboutCommand,
530-
short_help='Transform a CSV/JSON by applying renamings, filters and checks.')
534+
short_help='Transform a CSV/JSON/Excel by applying renamings, filters and checks.')
531535

532536
@click.argument('location',
533537
required=True,
534-
callback=partial(validate_extensions, extensions=('.csv', '.json',)),
538+
callback=partial(validate_extensions, extensions=('.csv', '.json', '.xlsx',)),
535539
metavar='LOCATION',
536540
type=click.Path(exists=True, dir_okay=False, readable=True, resolve_path=True))
537541

538542
@click.argument('output',
539543
required=True,
540-
callback=partial(validate_extensions, extensions=('.csv', '.json',)),
544+
callback=partial(validate_extensions, extensions=('.csv', '.json', '.xlsx',)),
541545
metavar='OUTPUT',
542546
type=click.Path(exists=False, dir_okay=False, writable=True, resolve_path=True))
543547

@@ -563,18 +567,14 @@ def print_config_help(ctx, param, value):
563567
@click.help_option('-h', '--help')
564568
def transform(location, output, configuration, quiet, verbose): # NOQA
565569
"""
566-
Transform the CSV/JSON file at LOCATION by applying renamings, filters and checks
567-
and then write a new CSV/JSON to OUTPUT (Format for input and output need to be
570+
Transform the CSV/JSON/Excel file at LOCATION by applying renamings, filters and checks
571+
and then write a new CSV/JSON/Excel to OUTPUT (Format for input and output need to be
568572
the same).
569573
570-
LOCATION: Path to a CSV/JSON file.
574+
LOCATION: Path to a CSV/JSON/Excel file.
571575
572-
OUTPUT: Path to CSV/JSON inventory file to create.
576+
OUTPUT: Path to CSV/JSON/Excel inventory file to create.
573577
"""
574-
from attributecode.transform import transform_csv_to_csv
575-
from attributecode.transform import transform_json_to_json
576-
from attributecode.transform import Transformer
577-
578578
if not configuration:
579579
transformer = Transformer.default()
580580
else:
@@ -584,6 +584,8 @@ def transform(location, output, configuration, quiet, verbose): # NOQA
584584
errors = transform_csv_to_csv(location, output, transformer)
585585
elif location.endswith('.json') and output.endswith('.json'):
586586
errors = transform_json_to_json(location, output, transformer)
587+
elif location.endswith('.xlsx') and output.endswith('.xlsx'):
588+
errors = transform_excel_to_excel(location, output, transformer)
587589
else:
588590
msg = 'Extension for the input and output need to be the same.'
589591
click.echo(msg)

src/attributecode/transform.py

Lines changed: 79 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,12 @@
1515

1616
import io
1717
import json
18-
from collections import Counter
18+
from collections import Counter, OrderedDict
1919
from itertools import zip_longest
2020

2121
import attr
22+
import itertools
23+
import openpyxl
2224

2325
from attributecode import CRITICAL
2426
from attributecode import Error
@@ -48,7 +50,7 @@ def transform_csv_to_csv(location, output, transformer):
4850
msg = u'Duplicated field name: %(name)s'
4951
for name in dupes:
5052
errors.append(Error(CRITICAL, msg % locals()))
51-
return field_names, [], errors
53+
return errors
5254

5355
# Convert to dicts
5456
new_data = [dict(zip_longest(field_names, item)) for item in data]
@@ -84,6 +86,31 @@ def transform_json_to_json(location, output, transformer):
8486
return []
8587

8688

89+
def transform_excel_to_excel(location, output, transformer):
90+
"""
91+
Read a Excel file at `location` and write a new Excel file at `output`. Apply
92+
transformations using the `transformer` Transformer.
93+
Return a list of Error objects.
94+
"""
95+
if not transformer:
96+
raise ValueError('Cannot transform without Transformer')
97+
98+
dupes, new_data = read_excel(location)
99+
errors = []
100+
if dupes:
101+
msg = u'Duplicated field name: %(name)s'
102+
for name in dupes:
103+
errors.append(Error(CRITICAL, msg % locals()))
104+
return errors
105+
106+
_field_names, updated_data, errors = transform_data(new_data, transformer)
107+
if errors:
108+
return errors
109+
else:
110+
write_excel(output, updated_data)
111+
return []
112+
113+
87114
def strip_trailing_fields_csv(names):
88115
"""
89116
Strip trailing spaces for field names #456
@@ -385,3 +412,53 @@ def write_json(location, data):
385412
"""
386413
with open(location, 'w') as jsonfile:
387414
json.dump(data, jsonfile, indent=3)
415+
416+
def read_excel(location):
417+
"""
418+
Read Excel at `location`, return a list of ordered dictionaries, one
419+
for each row.
420+
"""
421+
results = []
422+
errors = []
423+
sheet_obj = openpyxl.load_workbook(location).active
424+
max_col = sheet_obj.max_column
425+
426+
index = 1
427+
col_keys = []
428+
mapping_dict = {}
429+
while index <= max_col:
430+
value = sheet_obj.cell(row=1, column=index).value
431+
if value in col_keys:
432+
msg = 'Duplicated column name, ' + str(value) + ', detected.'
433+
errors.append(Error(CRITICAL, msg))
434+
return errors, results
435+
if value in mapping_dict:
436+
value = mapping_dict[value]
437+
col_keys.append(value)
438+
index = index + 1
439+
440+
for row in sheet_obj.iter_rows(min_row=2, values_only=True):
441+
row_dict = OrderedDict()
442+
index = 0
443+
while index < max_col:
444+
value = row[index]
445+
if value:
446+
row_dict[col_keys[index]] = value
447+
else:
448+
row_dict[col_keys[index]] = ''
449+
index = index + 1
450+
results.append(row_dict)
451+
return errors, results
452+
453+
454+
def write_excel(location, data):
455+
wb = openpyxl.Workbook()
456+
ws = wb.active
457+
458+
headers = list(set(itertools.chain.from_iterable(data)))
459+
ws.append(headers)
460+
461+
for elements in data:
462+
ws.append([elements.get(h) for h in headers])
463+
464+
wb.save(location)

tests/test_cmd.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,8 @@ def check_about_stdout(options, expected_loc, regen=False):
347347
expected_file = get_test_loc(expected_loc, must_exists=True)
348348
with open(expected_file, 'r') as ef:
349349
expected = ef.read()
350-
350+
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
351+
print(result.output)
351352
assert expected.splitlines(False) == result.output.splitlines(False)
352353

353354

tests/testdata/test_cmd/help/about_help.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,5 @@ Commands:
2020
gen Generate .ABOUT files from an inventory as CSV or JSON.
2121
inventory Collect the inventory of .ABOUT files to a CSV or JSON
2222
file.
23-
transform Transform a CSV/JSON by applying renamings, filters and
24-
checks.
23+
transform Transform a CSV/JSON/Excel by applying renamings, filters
24+
and checks.
Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
Usage: about transform [OPTIONS] LOCATION OUTPUT
22

3-
Transform the CSV/JSON file at LOCATION by applying renamings, filters and
4-
checks and then write a new CSV/JSON to OUTPUT (Format for input and output
5-
need to be the same).
3+
Transform the CSV/JSON/Excel file at LOCATION by applying renamings, filters
4+
and checks and then write a new CSV/JSON/Excel to OUTPUT (Format for input and
5+
output need to be the same).
66

7-
LOCATION: Path to a CSV/JSON file.
7+
LOCATION: Path to a CSV/JSON/Excel file.
88

9-
OUTPUT: Path to CSV/JSON inventory file to create.
9+
OUTPUT: Path to CSV/JSON/Excel inventory file to create.
1010

1111
Options:
1212
-c, --configuration FILE Path to an optional YAML configuration file. See
1313
--help-format for format help.
1414
--help-format Show configuration file format help and exit.
1515
-q, --quiet Do not print error or warning messages.
1616
--verbose Show all error and warning messages.
17-
-h, --help Show this message and exit.
17+
-h, --help Show this message and exit.

0 commit comments

Comments
 (0)