-
Notifications
You must be signed in to change notification settings - Fork 373
Description
When having dicts like below, DataFrames creates (for the HorsePower column) a vector of type Vector{Union{Nothing, Int64}}. What is expected is Vector{Union{Missing, Int64}}, since all tools in DataFrames counts on using Missing. There are solutions to this, but the current situation is very hard to handle for a developer without experience. A fix might break workarounds, but nothing else.
Dict{String, Any} with 9 entries:
"Miles_per_Gallon" => 25
"Cylinders" => 4
"Origin" => "USA"
"Weight_in_lbs" => 2046
"Displacement" => 98
"Acceleration" => 19
"Name" => "ford pinto"
"Year" => "1971-01-01"
"Horsepower" => nothing
Full example, which was genereated by Gemini in Colab (which works great by the way)
# prompt: parse https://github.com/altair-viz/vega_datasets/blob/4f67bdaad10f45e3549984e17e1b3088c731503d/vega_datasets/_data/cars.json in Julia drop missing Horsepower Group by origin calculate mean of cylinders and median of cylinders and mean horsepower
using HTTP
using JSON
using DataFrames
using Statistics
# Download the JSON data
url = "https://raw.githubusercontent.com/altair-viz/vega_datasets/4f67bdaad10f45e3549984e17e1b3088c731503d/vega_datasets/_data/cars.json"
response = HTTP.get(url)
# Check for successful request
if response.status == 200
# Parse the JSON data
data = JSON.parse(String(response.body))
# Convert JSON to DataFrame
df = DataFrame(data)
# Drop rows with missing "Horsepower"
df = dropmissing(df, :Horsepower)
# Group by origin
grouped_df = groupby(df, :Origin)
# Calculate mean and median of cylinders, and mean of horsepower for each group
result = combine(grouped_df, :Cylinders => mean => :mean_cylinders,
:Cylinders => median => :median_cylinders,
:Horsepower => mean => :mean_horsepower)
# Display the result
println(result)
else
println("Error downloading the file: ", response.status)
end```
Expected result: Working example
Acutal result:
'''
MethodError: no method matching +(::Float64, ::Nothing)
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...)
@ Base operators.jl:587
+(::Real, ::Complex{Bool})
@ Base complex.jl:319
+(::AbstractFloat, ::Bool)
@ Base bool.jl:176
...