The chembl.activties table has four columns with unbounded generic 'numeric' postgres data type which Spark's data frame chokes on due to its 38 significant digit precision limit. None of the actual values in the table exceed 20 significant digits, but they do so on either side of the decimal point leading to a scale+precision exceeding the spark 38 scale+precision limit. Its understandable while chembl chose to do this as the columns in question are generic 'units' that may huge or small and precise. We are converting these columns to string/varchar 100 to avoid Spark's precision/scale quirks. This does put the effort on the person eventually consuming these numerics-but-strings to cast them back to an appropriate data type when you need to do math on the values (if ever).

EC2 Default User · EC2 Default User · commit f0fb3f8a5eae · 2020-03-09T21:47:09.000Z
diff --git a/scripts/ssmdoc.importchembl25.json b/scripts/ssmdoc.importchembl25.json
@@ -38,7 +38,9 @@
                 "dropdb --if-exists -h $chembl25HostName --username=$dbUn chembl_25 >> /home/ssm-user/progressLog 2>/home/ssm-user/progressLog",
                 "createdb -h $chembl25HostName --username=$dbUn chembl_25",
                 "createuser -h $chembl25HostName --username=$dbUn user",
-                "pg_restore -v -h $chembl25HostName -p 5432 -U $dbUn -d chembl_25 chembl/sourceExports/ChEMBLdb/releases/chembl_25/chembl_25/chembl_25_postgresql/chembl_25_postgresql.dmp >> /home/ssm-user/progressLog 2>/home/ssm-user/progressLog"
+                "pg_restore -v -h $chembl25HostName -p 5432 -U $dbUn -d chembl_25 chembl/sourceExports/ChEMBLdb/releases/chembl_25/chembl_25/chembl_25_postgresql/chembl_25_postgresql.dmp >> /home/ssm-user/progressLog 2>/home/ssm-user/progressLog",
+                "psql -h $chembl25HostName -p 5432 -U $dbUn -d chembl_25 -c 'alter table activities ALTER COLUMN value TYPE varchar(100), ALTER COLUMN standard_value TYPE varchar(100), ALTER COLUMN upper_value TYPE varchar(100), ALTER COLUMN standard_upper_value TYPE varchar(100);'"
+                
             ]
          }
       }

Original file line number	Diff line number	Diff line change
`@@ -38,7 +38,9 @@`
`38`	`38`	`"dropdb --if-exists -h $chembl25HostName --username=$dbUn chembl_25 >> /home/ssm-user/progressLog 2>/home/ssm-user/progressLog",`
`39`	`39`	`"createdb -h $chembl25HostName --username=$dbUn chembl_25",`
`40`	`40`	`"createuser -h $chembl25HostName --username=$dbUn user",`
`41`		`- "pg_restore -v -h $chembl25HostName -p 5432 -U $dbUn -d chembl_25 chembl/sourceExports/ChEMBLdb/releases/chembl_25/chembl_25/chembl_25_postgresql/chembl_25_postgresql.dmp >> /home/ssm-user/progressLog 2>/home/ssm-user/progressLog"`
	`41`	`+ "pg_restore -v -h $chembl25HostName -p 5432 -U $dbUn -d chembl_25 chembl/sourceExports/ChEMBLdb/releases/chembl_25/chembl_25/chembl_25_postgresql/chembl_25_postgresql.dmp >> /home/ssm-user/progressLog 2>/home/ssm-user/progressLog",`
	`42`	`+ "psql -h $chembl25HostName -p 5432 -U $dbUn -d chembl_25 -c 'alter table activities ALTER COLUMN value TYPE varchar(100), ALTER COLUMN standard_value TYPE varchar(100), ALTER COLUMN upper_value TYPE varchar(100), ALTER COLUMN standard_upper_value TYPE varchar(100);'"`
	`43`	`+`
`42`	`44`	`]`
`43`	`45`	`}`
`44`	`46`	`}`