Skip to content

Commit 4cd3708

Browse files
authored
Merge pull request #22 from roskakori/8-modernize-build-process
#8 Modernize build process
2 parents ccc3bb2 + d273a4f commit 4cd3708

File tree

11 files changed

+309
-320
lines changed

11 files changed

+309
-320
lines changed

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
.DS_Store
12
.idea
23
*.class
34
*.ear
@@ -9,6 +10,4 @@
910
build
1011
dist
1112
/ebcdic/ebcdic/cp*.*
12-
!/ebcdic/ebcdic/cp*ms.py
1313
/ebcdic/ebcdic/temp/
14-
/venv/

.pre-commit-config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
repos:
22
- repo: https://github.com/astral-sh/ruff-pre-commit
3-
rev: v0.14.6
3+
rev: v0.15.2
44
hooks:
55
- id: ruff-check
6-
args: [--fix]
6+
args: ["--fix"]
77
- id: ruff-format
88

99
- repo: https://github.com/pre-commit/mirrors-prettier
@@ -26,4 +26,4 @@ repos:
2626
- id: check-yaml
2727
- id: debug-statements
2828
- id: no-commit-to-branch
29-
args: [--branch, master]
29+
args: ["--branch", "master"]

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ Bump version number:
66

77
- `ebcdic/__init__.py`
88
- `LICENSE.txt` and
9-
- `README.rst`
9+
- `README.md`
1010

1111
2. Edit `ebcdic/_version.py:__version__`.
12-
3. Describe changes in `README.rst`.
12+
3. Describe changes in `README.md`.
1313

1414
Upload release to PyPI::
1515

ebcdic/LICENSE.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2013 - 2019, Thomas Aglassinger
1+
Copyright (c) 2013 - 2026, Thomas Aglassinger
22
All rights reserved.
33

44
Redistribution and use in source and binary forms, with or without

ebcdic/README.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# ebcdic
2+
3+
`ebcdic` is a Python package adding additional EBCDIC codecs for data exchange with legacy systems. It supports Python 3.9 and later.
4+
5+
[EBCDIC](https://en.wikipedia.org/wiki/EBCDIC) is short for "E"xtended Binary Coded Decimal Interchange Code" and is a family of character encodings that is mainly used on mainframe computers. There is no real point in using it unless you have to exchange data with legacy systems that still only support EBCDIC as character encoding.
6+
7+
For compatibility with Python 2.7 to 3.3, use [version 1.1.1](https://pypi.org/project/ebcdic/1.1.1/).
8+
9+
For compatibility with Python 2.6 to 3.2, use [version 1.0.0](https://pypi.org/project/ebcdic/1.0.0/).
10+
11+
## Installation
12+
13+
The `ebcdic` package is available from <https://pypi.python.org/pypi/ebcdic> and can be installed using pip:
14+
15+
pip install ebcdic
16+
17+
## Example usage
18+
19+
To encode `'hello world'` on EBCDIC systems in German-speaking countries, use:
20+
21+
```pycon
22+
>>> import ebcdic
23+
>>> 'hello world'.encode('cp1141')
24+
b'\x88\x85\x93\x93\x96@\xa6\x96\x99\x93\x84O'
25+
```
26+
27+
## Supported codecs
28+
29+
The `ebcdic` package includes EBCDIC codecs for the following regions:
30+
31+
- cp290 - Japan (Katakana)
32+
- cp420 - Arabic bilingual
33+
- cp424 - Israel (Hebrew)
34+
- cp833 - Korea Extended (single byte)
35+
- cp838 - Thailand
36+
- cp870 - Eastern Europe (Poland, Hungary, Czech, Slovakia, Slovenia, Croatian, Serbia, Bulgarian); represents Latin-2
37+
- cp1097 - Iran (Farsi)
38+
- cp1140 - Australia, Brazil, Canada, New Zealand, Portugal, South Africa, USA
39+
- cp1141 - Austria, Germany, Switzerland
40+
- cp1142 - Denmark, Norway
41+
- cp1143 - Finland, Sweden
42+
- cp1144 - Italy
43+
- cp1145 - Latin America, Spain
44+
- cp1146 - Great Britain, Ireland, North Ireland
45+
- cp1147 - France
46+
- cp1148 - International
47+
- cp1148ms - International, Microsoft interpretation; similar to cp1148 except that 0x15 is mapped to 0x85 ("next line") instead of 0x0a ("linefeed")
48+
- cp1149 - Iceland
49+
50+
It also includes legacy codecs:
51+
52+
- cp037 - Australia, Brazil, Canada, New Zealand, Portugal, South Africa; similar to cp1140 but without Euro sign
53+
- cp273 - Austria, Germany, Switzerland; similar to cp1141 but without Euro sign
54+
- cp277 - Denmark, Norway; similar to cp1142 but without Euro sign
55+
- cp278 - Finland, Sweden; similar to cp1143 but without Euro sign
56+
- cp280 - Italy; similar to cp1141 but without Euro sign
57+
- cp284 - Latin America, Spain; similar to cp1145 but without Euro sign
58+
- cp285 - Great Britain, Ireland, North Ireland; similar to cp1146 but without Euro sign
59+
- cp297 - France; similar to cp1147 but without Euro sign
60+
- cp500 - International; similar to cp1148 but without Euro sign
61+
- cp500ms - International, Microsoft interpretation; identical to codecs.cp500 similar to ebcdic.cp500 except that 0x15 is mapped to 0x85 ("next line") instead of 0x0a ("linefeed")
62+
- cp871 - Iceland; similar to cp1149 but without Euro sign
63+
- cp875 - Greece; similar to cp9067 but without Euro sign and a few other characters
64+
- cp1025 - Cyrillic
65+
- cp1047 - Open Systems (MVS C compiler)
66+
- cp1112 - Estonia, Latvia, Lithuania (Baltic)
67+
- cp1122 - Estonia; similar to cp1157 but without Euro sign
68+
- cp1123 - Ukraine; similar to cp1158 but without Euro sign
69+
70+
Codecs in the standard library overrule some of these codecs. At the time of this writing this concerns cp037, cp273 (since 3.4), cp500 and cp1140.
71+
72+
To see get a list of EBCDIC codecs that are already provided by different sources, use `ebcdic.ignored_codec_names()`. For example, with Python 3.13 the result is:
73+
74+
```pycon
75+
>>> ebcdic.ignored_codec_names()
76+
['cp037', 'cp1140', 'cp273', 'cp424', 'cp500', 'cp875']
77+
```
78+
79+
## Unsupported codecs
80+
81+
According to a [comprehensive list of code pages](https://www.aivosto.com/articles/charsets-codepages.html), there are additional codecs this package does not support yet. Possible reasons and solutions are:
82+
83+
1. It's a double byte codec e.g., cp834 (Korea). Technically `CodecMapper` can support them by increasing the mapping size from 256 to 65536. Due to lack of test data and access to Asian mainframes this was deemed too experimental for now.
84+
2. The codec contains combining characters e.g., cp1132 (Lao) which allows representing more than 256 characters combining several characters.
85+
3. Java does not include a mapping for the respective code page e.g., cp410/880 (Cyrillic). You can add such a codec based on the information found at the link above and submit an enhancement request for the Java standard library. Once it is released, add the new codec to the `build.xml` as described below.
86+
4. I missed a codec. Open an issue on GitHub at <https://github.com/roskakori/CodecMapper/issues>, and it will be added with the next version.
87+
88+
## Source code
89+
90+
These codecs have been generated using CodecMapper, available from <https://github.com/roskakori/CodecMapper>. Read the README to build the ebcdic package from source.
91+
92+
To add another 8-bit EBCDIC codec, extend the ant target `ebcdic` in `build.xml` using a line like:
93+
94+
```xml
95+
<arg value="cpXXX" />
96+
```
97+
98+
Replace `XXX` by the number of the 8-bit code page you want to include.
99+
100+
Then run:
101+
102+
```bash
103+
ant test
104+
```
105+
106+
to build and test the distribution.
107+
108+
## Changes
109+
110+
Version 2.0.0, 2026-ß2-20
111+
112+
This is a pure technical release that does not change the functionality of the package.
113+
114+
It ensures that the package builds with a modern Python toolchain and continuous integration systems. For details, see [#8](https://github.com/roskakori/CodecMapper/issues/8), contributed by [Branch Vincent](https://github.com/branchv)). In addition, it cleans up some source files missing from the distribution (see [#17](https://github.com/roskakori/CodecMapper/issues/17)) and several minor issues in the documentation.
115+
116+
Because of that, support for Python 2 and Python 3.8 or older had to be dropped. If you are stuck with such a version, use [ebcdic 1.1.1](https://pypi.org/project/ebcdic/1.1.1/), which currently has the same functionality.
117+
118+
Version 1.1.1, 2019-08-09
119+
120+
- Moved license information from README to LICENSE (#5). This required
121+
the distribution to change from sdist to wheel because apparently it
122+
is a major challenge to include a text file in a platform
123+
independent way (#11).
124+
125+
Sadly this breaks compatibility with Python 2.6, 3.1, 3.2 and 3.3.
126+
If you still need `ebcdic` with one of these Python versions, use
127+
`ebcdic-1.0.0`.
128+
129+
This took several attempts and intermediate releases that where
130+
broken in different ways on different platforms. To prevent people
131+
from accidentally installing one of these broken releases they have
132+
been removed from PyPI. If you still want to take a look at them,
133+
use the [respective
134+
tags](https://github.com/roskakori/CodecMapper/releases).
135+
136+
Version 1.0.0, 2019-06-06
137+
138+
- Changed development status to "Production/Stable".
139+
- Added international code pages cp500ms and cp1148ms which are the
140+
Microsoft interpretations of the respective IBM code pages. The only
141+
difference is that 0x1f is mapped to 0x85 ("next line") instead of
142+
0x0a ("new line"). Note that codecs.cp500 included with the Python
143+
standard library also uses the Microsoft interpretation (#4).
144+
- Added Arabian bilingual code page 420.
145+
- Added Baltic code page 1112.
146+
- Added Cyrillic code page 1025.
147+
- Added Eastern Europe code page 870.
148+
- Added Estonian code pages 1122 and 1157.
149+
- Added Greek code page 875.
150+
- Added Farsi Bilingual code page 1097.
151+
- Added Hebrew code page 424 and 803.
152+
- Added Korean code page 833.
153+
- Added Meahreb/French code page 425.
154+
- Added Japanese (Katakana) code page 290.
155+
- Added Thailand code page 838.
156+
- Added Turkish code page 322.
157+
- Added Ukraine code page 1123.
158+
- Added Python 3.5 to 3.8 as supported versions.
159+
- Improved PEP8 conformance of generated codecs.
160+
161+
Version 0.7, 2014-11-17
162+
163+
- Clarified which codecs are already part of the standard library and
164+
that these codecs overrule the `ebcdic` package. Also added a
165+
function `ebcdic.ignored_codec_names()` that returns the name of the
166+
EBCDIC codecs provided by other means. To obtain access to `ebcdic`
167+
codecs overruled by the standard library, use `ebcdic.lookup()`.
168+
- Cleaned up (PEP8, \_\_all\_\_, typos, ...).
169+
170+
Version 0.6, 2014-11-15
171+
172+
- Added support for Python 2.6+ and 3.1+ (#1).
173+
- Included a modified version of `gencodec.py` that still builds maps
174+
instead of tables so the generated codecs work with Python versions
175+
earlier than 3.3. It also does a <span class="title-ref">from
176+
\_\_future\_\_ import unicode_literals</span> so the codecs even
177+
work with Python 2.6+ using the same source code. As a side effect,
178+
this simplifies building the codecs because it removes the the need
179+
for a local copy of the cpython source code.
180+
181+
Version 0.5, 2014-11-13
182+
183+
- Initial public release

0 commit comments

Comments
 (0)