-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
- I have read through the quick start and installation sections of the README.
Info
| Info | Value |
|---|---|
| Operating System | macOS Big Sur 11.2.1 |
| Geni Version | 0.0.38 |
| Spark Java lib Version | org.apache.spark/spark-XXX_2.12 "3.1.0" |
| JDK | openjdk version "1.8.0_282" |
| Spark Version | 3.0.2 |
Problem / Steps to reproduce
Standard lein new geni …, bitnami/spark 3.0.2 docker, then used code from geni cookbook chapter 4.
The following code from cookbook example 4 fails with ArityException:
(def null-counts
(-> raw-weather-mar-2012
(g/agg (->> (g/column-names raw-weather-mar-2012)
(map #(vector % (g/null-count %)))
(into {})))
(g/first)))
Exception is
Execution error (AnalysisException) at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt/failAnalysis (package.scala:42).
cannot resolve '`Precip. Amount (mm)`' given input columns: [Climate ID, Date/Time (LST), Day, Dew Point Temp (°C), Dew Point Temp Flag, Hmdx, Hmdx Flag, Latitude (y), Longitude (x), Month, Precip. Amount (mm), Precip. Amount Flag, Rel Hum (%), Rel Hum Flag, Station Name, Stn Press (kPa), Stn Press Flag, Temp (°C), Temp Flag, Time (LST), Visibility (km), Visibility Flag, Weather, Wind Chill, Wind Chill Flag, Wind Dir (10s deg), Wind Dir Flag, Wind Spd (km/h), Wind Spd Flag, Year];
'Aggregate [sum(cast(cast(isnull(Latitude (y)#13272) as int) as bigint)) AS Latitude (y)#16363L, sum(cast(cast(isnull(Stn Press Flag#13295) as int) as bigint)) AS Stn Press Flag#16364L, sum(cast(cast(isnull(Temp (°C)#13280) as int) as bigint)) AS Temp (°C)#16365L, sum(cast(cast(isnull(Wind Spd (km/h)#13290) as int) as bigint)) AS Wind Spd (km/h)#16366L, sum(cast(cast(isnull(Rel Hum Flag#13285) as int) as bigint)) AS Rel Hum Flag#16367L, sum(cast(cast(isnull(Date/Time (LST)#13275) as int) as bigint)) AS Date/Time (LST)#16368L, sum(cast(cast(isnull(Visibility Flag#13293) as int) as bigint)) AS Visibility Flag#16369L, sum(cast(cast(isnull(Visibility (km)#13292) as int) as bigint)) AS Visibility (km)#16370L, sum(cast(isnull('Precip. Amount (mm)) as int)) AS Precip. Amount (mm)#16371, sum(cast(cast(isnull(Dew Point Temp Flag#13283) as int) as bigint)) AS Dew Point Temp Flag#16372L, sum(cast(isnull('Precip. Amount Flag) as int)) AS Precip. Amount Flag#16373, sum(cast(cast(isnull(Station Name#13273) as int) as bigint)) AS Station Name#16374L, sum(cast(cast(isnull(Wind Chill Flag#13299) as int) as bigint)) AS Wind Chill Flag#16375L, sum(cast(cast(isnull(Longitude (x)#13271) as int) as bigint)) AS Longitude (x)#16376L, sum(cast(cast(isnull(Time (LST)#13279) as int) as bigint)) AS Time (LST)#16377L, sum(cast(cast(isnull(Dew Point Temp (°C)#13282) as int) as bigint)) AS Dew Point Temp (°C)#16378L, sum(cast(cast(isnull(Rel Hum (%)#13284) as int) as bigint)) AS Rel Hum (%)#16379L, sum(cast(cast(isnull(Wind Dir Flag#13289) as int) as bigint)) AS Wind Dir Flag#16380L, sum(cast(cast(isnull(Climate ID#13274) as int) as bigint)) AS Climate ID#16381L, sum(cast(cast(isnull(Wind Dir (10s deg)#13288) as int) as bigint)) AS Wind Dir (10s deg)#16382L, sum(cast(cast(isnull(Stn Press (kPa)#13294) as int) as bigint)) AS Stn Press (kPa)#16383L, sum(cast(cast(isnull(Year#13276) as int) as bigint)) AS Year#16384L, sum(cast(cast(isnull(Temp Flag#13281) as int) as bigint)) AS Temp Flag#16385L, sum(cast(cast(isnull(Hmdx#13296) as int) as bigint)) AS Hmdx#16386L, ... 6 more fields]
+- Project [Longitude (x)#13211 AS Longitude (x)#13271, Latitude (y)#13212 AS Latitude (y)#13272, Station Name#13213 AS Station Name#13273, Climate ID#13214 AS Climate ID#13274, Date/Time (LST)#13215 AS Date/Time (LST)#13275, Year#13216 AS Year#13276, Month#13217 AS Month#13277, Day#13218 AS Day#13278, Time (LST)#13219 AS Time (LST)#13279, Temp (°C)#13220 AS Temp (°C)#13280, Temp Flag#13221 AS Temp Flag#13281, Dew Point Temp (°C)#13222 AS Dew Point Temp (°C)#13282, Dew Point Temp Flag#13223 AS Dew Point Temp Flag#13283, Rel Hum (%)#13224 AS Rel Hum (%)#13284, Rel Hum Flag#13225 AS Rel Hum Flag#13285, Precip. Amount (mm)#13226 AS Precip. Amount (mm)#13286, Precip. Amount Flag#13227 AS Precip. Amount Flag#13287, Wind Dir (10s deg)#13228 AS Wind Dir (10s deg)#13288, Wind Dir Flag#13229 AS Wind Dir Flag#13289, Wind Spd (km/h)#13230 AS Wind Spd (km/h)#13290, Wind Spd Flag#13231 AS Wind Spd Flag#13291, Visibility (km)#13232 AS Visibility (km)#13292, Visibility Flag#13233 AS Visibility Flag#13293, Stn Press (kPa)#13234 AS Stn Press (kPa)#13294, ... 6 more fields]
+- Relation[Longitude (x)#13211,Latitude (y)#13212,Station Name#13213,Climate ID#13214,Date/Time (LST)#13215,Year#13216,Month#13217,Day#13218,Time (LST)#13219,Temp (°C)#13220,Temp Flag#13221,Dew Point Temp (°C)#13222,Dew Point Temp Flag#13223,Rel Hum (%)#13224,Rel Hum Flag#13225,Precip. Amount (mm)#13226,Precip. Amount Flag#13227,Wind Dir (10s deg)#13228,Wind Dir Flag#13229,Wind Spd (km/h)#13230,Wind Spd Flag#13231,Visibility (km)#13232,Visibility Flag#13233,Stn Press (kPa)#13234,... 6 more fields] csv
Also manual select with column named "Precip. Amount (mm)" does not work. It seems that they have backticks around them internally.
I tried to rename all columns with
(g/to-df weather-data
"Longitude (x)" "Latitude (y)" "Station Name" "Climate ID" "Date/Time (LST)" "Year" "Month" "Day"
"Time (LST)" "Temp (°C)" "Temp Flag" "Dew Point Temp (°C)" "Dew Point Temp Flag" "Rel Hum (%)"
"Rel Hum Flag" "Precip. Amount (mm)" "Precip. Amount Flag" "Wind Dir (10s deg)" "Wind Dir Flag"
"Wind Spd (km/h)" "Wind Spd Flag" "Visibility (km)" "Visibility Flag" "Stn Press (kPa)" "Stn Press Flag"
"Hmdx" "Hmdx Flag" "Wind Chill" "Wind Chill Flag" "Weather"))
but the problem persists, still backticks.
Crashes: (g/select raw-weather-mar-2012 "Precip. Amount (mm)")
Works: (g/select raw-weather-mar-2012 "`Precip. Amount (mm)`")
(g/column-names (g/select raw-weather-mar-2012 "`Precip. Amount (mm)`"))
yields "Precip. Amount (mm)" without backticks.
This lead me to believe that there is some issue in geni or spark with these column names.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels