Skip to content

Commit f0fb3f8

Browse files
author
EC2 Default User
committed
The chembl.activties table has four columns with unbounded generic 'numeric' postgres data type which Spark's data frame chokes on due to its 38 significant digit precision limit. None of the actual values in the table exceed 20 significant digits, but they do so on either side of the decimal point leading to a scale+precision exceeding the spark 38 scale+precision limit. Its understandable while chembl chose to do this as the columns in question are generic 'units' that may huge or small and precise. We are converting these columns to string/varchar 100 to avoid Spark's precision/scale quirks. This does put the effort on the person eventually consuming these numerics-but-strings to cast them back to an appropriate data type when you need to do math on the values (if ever).
1 parent 796aee3 commit f0fb3f8

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

scripts/ssmdoc.importchembl25.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,9 @@
3838
"dropdb --if-exists -h $chembl25HostName --username=$dbUn chembl_25 >> /home/ssm-user/progressLog 2>/home/ssm-user/progressLog",
3939
"createdb -h $chembl25HostName --username=$dbUn chembl_25",
4040
"createuser -h $chembl25HostName --username=$dbUn user",
41-
"pg_restore -v -h $chembl25HostName -p 5432 -U $dbUn -d chembl_25 chembl/sourceExports/ChEMBLdb/releases/chembl_25/chembl_25/chembl_25_postgresql/chembl_25_postgresql.dmp >> /home/ssm-user/progressLog 2>/home/ssm-user/progressLog"
41+
"pg_restore -v -h $chembl25HostName -p 5432 -U $dbUn -d chembl_25 chembl/sourceExports/ChEMBLdb/releases/chembl_25/chembl_25/chembl_25_postgresql/chembl_25_postgresql.dmp >> /home/ssm-user/progressLog 2>/home/ssm-user/progressLog",
42+
"psql -h $chembl25HostName -p 5432 -U $dbUn -d chembl_25 -c 'alter table activities ALTER COLUMN value TYPE varchar(100), ALTER COLUMN standard_value TYPE varchar(100), ALTER COLUMN upper_value TYPE varchar(100), ALTER COLUMN standard_upper_value TYPE varchar(100);'"
43+
4244
]
4345
}
4446
}

0 commit comments

Comments
 (0)