You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: COBOL Programming Course #2 - Advanced Topics/COBOL Programming Course #2 - Advanced Topics.md
+207Lines changed: 207 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,213 @@ header-includes:
18
18
- \hypersetup{colorlinks=true,
19
19
linkcolor=blue}
20
20
---
21
+
\newpage
22
+
# Numerical Data Representation
23
+
24
+
In the first COBOL Programming Course, various types of data representation were discussed. This chapter seeks to expand upon the binary and hexadecimal numbering systems as well as the various numeric representations in COBOL.
25
+
26
+
-**Numbering Systems**
27
+
-**Binary System**
28
+
-**Hexadecimal System**
29
+
-**EBCDIC Encoding**
30
+
-**COBOL Picture Clause**
31
+
-**Numeric Representations in COBOL**
32
+
-**Zoned Decimal Format**
33
+
-**Packed Decimal Format**
34
+
-**Binary Format**
35
+
-**Single Precision Floating Point**
36
+
-**Double Precision Floating Point**
37
+
38
+
39
+
## Numbering systems
40
+
41
+
A numbering system provides a means to represent numbers. We are most familiar with using the base-10 number system known as decimal. Data such as numerical values and text are internally represented by zeros and ones in most computers, including mainframe computers used by enterprises. This base-2 number system known as binary. Although data is encoded in binary on computers, it is much easier to work with base-16 known as hexadecimal. Each sequence of four binary digits is represented by a hexadecimal value.
42
+
43
+
### Binary System
44
+
45
+
Just as in our decimal system, a binary integer is a sequence of binary digits 0 and 1 arranged in such an order that the position of each bit implies its value in the integer. The binary representation of the number 21 is:
46
+
47
+

48
+
49
+
On the IBM Mainframe system, the two’s complement form is used for the representation of binary integers. In this form, the leftmost bit is used to represent the sign of the number: 0 for positive and 1 for negative. For a positive number, the two’s complement form is simply the binary form of the number with leading zero(s). For a negative number, the two’s complement is obtained by writing out the positive value of the number in binary, then complementing each bit and finally adding 1 to the result. Assuming one byte of storage and b<sub>0</sub>b<sub>1</sub>b<sub>2</sub>b<sub>3</sub>b<sub>4</sub>b<sub>5</sub>b<sub>6</sub>b<sub>7</sub> are the bits, let us look at some examples.
50
+
51
+

52
+
53
+
As we can see, the sign bit b<sub>0</sub> is 0 for a positive integer and 1 for a negative integer. This bit will participate in all arithmetic operations as though it represented the value (-b<sub>0</sub> * 2<sup>k-1</sup>) for a k bit number. In the above binary representation of -28:
54
+
55
+

56
+
57
+
The number of bits will clearly dictate the range of values that can be stored. With k bits, the maximum positive value that can be stored correctly is 2<sup>k-1</sup> - 1 and the minimum (negative) value will be -2<sup>k-1</sup>. The number Zero is always represented with sign bit zero. With k=4:
58
+
59
+

60
+
61
+
For K=32 bits, the range is -2<sup>31</sup> to +2<sup>31</sup> - 1
62
+
63
+
### Hexadecimal System
64
+
65
+
There are sixteen digits, represented by 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E and F. The first 10 symbols have their usual meaning; the remaining six, A through F, represent the values 10 through 15 when used as hexadecimal digits.
66
+
67
+

68
+
69
+
A423 is the hexadecimal equivalent of the decimal value 42019:
70
+
71
+

72
+
73
+
Although data is encoded in binary on computers, it is rather cumbersome to work with binary. The hexadecimal numerals provide a human friendly representation of the binary coded values. An understanding of this system is invaluable to the COBOL programmer as he designs, develops and tests code. Often, hex dumps of the data in memory are used to debug a program and understand what is going on. The conversion between binary and hexadecimal system is easy as 2<sup>4</sup> = 16. Each hexadecimal digit represents 4 binary digits, also known as a nibble, which is half a byte.
74
+
75
+
- To convert from hexadecimal to binary, replace each hexadecimal digit with its equivalent 4-bit binary representation
76
+
- To convert from binary to hexadecimal, replace every four consecutive binary digits by their equivalent hexadecimal digits, starting from the rightmost digit and adding zeros, on the left if necessary
77
+
78
+
79
+

80
+
81
+
The usual convention is to use X’ ‘ to denote a hexadecimal value, B’ ‘ to denote a binary value.
82
+
83
+
### EBCDIC Encoding
84
+
85
+
C' ' is used to represent a character value. It is helpful to familiarize yourself with the 8-bit EBCDIC encoding scheme that is used on the zOS and most IBM mainframes.
86
+
87
+

88
+
89
+
8-bit EBCDIC Encoding
90
+
91
+
For numerical representations, the last column is of particular interest here; the character representations of numerical digits 0-9 in the EBCDIC encoding. C’5’ is encoded, for example, as X’F5’ and C’9’ as encoded as X’F9’.
92
+
93
+
### COBOL Picture Clause
94
+
95
+
As a quick reminder, COBOL leverages numeric data with a PIC clause that can contain a 9, V and/or S. These symbols keep the number purely mathematical that can participate in arithmetic.
96
+
97
+
- 9 is used to indicate numeric data consisting of the digits from 0 to 9
98
+
- V indicates where the assumed decimal place is located
99
+
- S will remember the sign which is necessary if the data is negative
100
+
101
+
102
+

103
+
104
+
Since the number of decimal places is determined and fixed in place by the V, this representation is called fixed point . Let us illustrate with an example to show how the V determines the value stored.
105
+
106
+

107
+
108
+
## Numeric Representations in COBOL
109
+
110
+
In this section, we will investigate the numeric representations in COBOL:
111
+
- Zoned Decimal (Fixed Point)
112
+
- Packed Decimal (Fixed Point)
113
+
- Binary (Fixed Point)
114
+
- Single Precision Floating Point
115
+
- Double Precision Floating Point
116
+
117
+
### Zoned Decimal Format
118
+
119
+
In this format, each byte of storage contains one digit. The high order 4-bits (or nibble) are called the Zone bits. The low order 4-bits are called the Decimal or Numeric bits and will contain the binary value for the digit. Considering a simple case, the number 25 is represented as X’F2F5’.
120
+
121
+

122
+
123
+
The zone portion is the ‘upper half byte’ and numeric portion is the ‘lower half byte’. This format is the default numeric encoding in COBOL. The coding syntax of USAGE IS DISPLAY can also be used. Let us look at a few valid zoned decimal declarations.
124
+
125
+

126
+
127
+
As discussed earlier, the first two declarations above are unsigned, indicated by the absence of a S. Such numbers are ‘implied positive’. The next two declarations are explicitly signed by the symbol S and are capable of representing positive and negative numbers. The sign is represented by the rightmost zone bits (in the above example the F above the 5) and is determined as follows:
128
+
129
+
- F indicates the number is unsigned
130
+
- C indicates the number is positive
131
+
- D indicates the number is negative
132
+
133
+

134
+
135
+
It is clear some adjusting (or editing) needs to be done before printing a signed number. The V in the declarations above has no storage allocated for it and will also need to be edited for printing purposes. To illustrate the way it works, consider an input file with a number 12345 and the input PIC clause is 999V99. This means that there is a decimal point assumed between 3 and 4. When the number is later aligned with an edited field, say ‘999.99’, the result is printed as 123.45.
136
+
137
+
When we code arithmetic statements involving zoned decimal fields, under the covers, COBOL converts the data to packed decimal and/or binary representations in order to do the math and the result is converted back to zoned decimal, all seamlessly. This extra step and hence a loss in efficiency is the price to pay for the easy readability that this format provides.
138
+
139
+

140
+
141
+
### Packed Decimal Format
142
+
143
+
In the zoned decimal format, the rightmost zone bits determine the sign; the other F’s are redundant. When a number is ‘packed’, those extra zone bits are removed, and only the rightmost zone bits are retained. Hence, the move from an unpacked field changes every byte in the field (except the last) from X’Fn’ to X’n’. The nibbles in the last byte get flipped (X’C2’ becomes X’2C’).
144
+
145
+

146
+
147
+
As we can observe, when the number is packed into a field that is larger than necessary to hold that number, it is padded with zeroes on the left.
148
+
149
+
- Number 1 will be stored as X’1F’ in 1 byte
150
+
- Number +12 will be stored as X’012C’ in 2 bytes
151
+
- Number -123 will be stored as X’123D’ in 2 bytes
152
+
- Number 1234 (unsigned) will be stored as X’01234F’ in 3 bytes
153
+
- Number +12345 will be stored as X’12345C’ in 3 bytes
154
+
155
+
The COBOL syntax for this format is USAGE IS COMP-3 or just COMP-3.
156
+
157
+

158
+
159
+
As the packed decimal representation stores two digits in one byte, it is a variable length format. Also, as we can see, the digits are stored in decimal notation, and each digit is binary coded. So, COMP-3 exactly represents values with decimal places. A COMP-3 value can have up to 31 decimal digits. This format is somewhat unique and native to mainframe computers such as the IBM z architecture. The zOS has specialized hardware for packed decimal arithmetic and so the system can perform mathematical calculations without having to convert the format. This is, by far, the most utilized numerical value representation in COBOL programs. Storing information in this format may save a significant amount of storage space.
160
+
161
+

162
+
163
+
164
+
It is usually the best choice for arithmetic involving decimal points/fractions. After numerical processing, a packed decimal field is (moved) unpacked into a zoned decimal format which can then be edited for printing purposes.
165
+
166
+

167
+
168
+
### Binary Format
169
+
170
+
On the IBM Mainframe systems, the other main arithmetic type besides the packed decimal is the binary format which is built for efficiency in integer arithmetic operations. This encoding finds many uses in Legacy applications. Many datasets are created with binary fields. Variable length records and table processing in COBOL use this representation. The binary format is largely implementation dependent and has many variations. On the zOS and IBM Mainframes, the twos complement encoding is used.
171
+
172
+
The COBOL clauses for this format are COMP, COMP-4, COMPUTATIONAL or BINARY which can be used interchangeably. COMP-5 clause also falls in this category. Let us look at some valid declarations.
173
+
174
+

175
+
176
+
The PIC Clause determines the storage space:
177
+
178
+
- PIC 9(1) through PIC 9(4) will reserve 2 bytes (Binary halfword)
179
+
- PIC 9(5) through PIC 9(9) will reserve 4 bytes (Binary fullword)
180
+
- PIC 9(10) through PIC 9(18) will reserve 8 bytes (Binary doubleword)
181
+
182
+
Next, let’s look at what numbers can be stored. For the COMP and COMP-4 fields, although the data is stored as binary numbers, the range is limited by the full value of the PIC Clause used in the field definition. The binary format, COMP-5 (also known as ‘Native Binary’) in which the PIC clause still defines the size of the field but the range of values that can be represented is much higher as every possible bit-value combination is valid.
183
+
184
+

185
+

186
+

187
+
188
+
Those numbers are more than sufficient for most business applications! To give a quick comparison, a two byte packed decimal field can range in value from -999 to +999 only. When faced with larger than capacity values, COMP truncates to the decimal value of the PIC clause and COMP-5 truncates to the size of the field.
189
+
190
+
Although very much suited for integer processing, the binary format is not a good choice for non-integer arithmetic. Many banking and insurance applications rely on accuracy for their business processing logic and packed decimal format is preferred in such cases. Let’s see why.
191
+
In decimal systems, fractions are represented in terms of negative powers of 10:
192
+
193
+

194
+
195
+
In binary system fractions are represented in terms of negative powers of 2:
196
+
197
+

198
+
199
+
There is a possible loss of accuracy when converting a decimal fraction to a binary fraction as there is not a one-to-one correspondence between the set of numbers expressible in a finite number of binary digits and the set of numbers expressible in a finite number of decimal digits. Let’s take the example of the fraction 1/10. In the decimal system:
200
+
201
+

202
+
203
+
However, in the binary system, this is a never ending sequence of bits…!!
204
+
205
+

206
+
207
+
So, we get different values when, for example, we multiply 1/10 by 10 in the decimal and in the binary systems. In the decimal system, 10 X 0.1 = 1.0. In the binary system, we get
208
+
209
+

210
+
211
+
i.e. not quite 1.0 !! In scenarios involving a large number of calculations, this type of discrepancy may lead to cumulative rounding errors that may not be acceptable in many business applications. The use of packed decimal works very well in such cases.
212
+
213
+

214
+
215
+
216
+
### COMP-1: Single Precision Floating Point
217
+
218
+
Due to the floating-point nature, a COMP-1 value can be very small and close to zero, or it can be very large (about 10 to the power of 38). However, a COMP-1 value has limited precision. This means that even though a COMP-1 value can be up to 10 to the power of 38, it can only maintain about seven significant decimal digits. Any value that has more than seven significant digits are rounded. This means that a COMP-1 value cannot exactly represent a bank balance like $1,234,567.89 because this value has nine significant digits. Instead, the amount is rounded. The main application of COMP-1 is for scientific numerical value storage as well as computation.
219
+
220
+
### COMP-2: Double Precision Floating Point
221
+
222
+
COMP-2 extends the range of value that can be represented compared to COMP-1. COMP-2 can represent values up to about 10 to the power of 307. Like COMP-1, COMP-2 values also have a limited precision. Due to the expanded format, COMP-2 has more significant digits, approximately 15 decimal digits. This means that once a value reaches certain quadrillions (with no decimal places), it can no longer be exactly represented in COMP-2.
223
+
224
+
COMP-2 supersedes COMP-1 for more precise scientific data storage as well as computation. Note that COMP-1 and COMP-2 have limited applications in financial data representation or computation.
225
+
226
+
**Note** : [This](https://www.ibm.com/support/pages/how-display-hexadecimal-using-cobol) COBOL program can display the hexadecimal contents (and hence the exact internal representation) of a field. You can declare binary, packed decimal or zoned variable (or anything else, for that matter), do arithmetic with them and use the program to see how they are internally stored.
227
+
21
228
\newpage
22
229
# COBOL Application Programming Interface (API)
23
230
API is the acronym for Application Programming Interface. An API allows two applications to communicate. We use API's everyday from our phones, personal computers, using a credit card to make a payment at a point of sale, etc.
0 commit comments