Commit f0fb3f8
EC2 Default User
The chembl.activties table has four columns with unbounded generic 'numeric' postgres data type which Spark's data frame chokes on due to its 38 significant digit precision limit. None of the actual values in the table exceed 20 significant digits, but they do so on either side of the decimal point leading to a scale+precision exceeding the spark 38 scale+precision limit. Its understandable while chembl chose to do this as the columns in question are generic 'units' that may huge or small and precise. We are converting these columns to string/varchar 100 to avoid Spark's precision/scale quirks. This does put the effort on the person eventually consuming these numerics-but-strings to cast them back to an appropriate data type when you need to do math on the values (if ever).
1 parent 796aee3 commit f0fb3f8
1 file changed
+3
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| |||
0 commit comments