Skip to content

Creating DataFrames from Dicts does not work as Expected #3495

@stensmo

Description

@stensmo

When having dicts like below, DataFrames creates (for the HorsePower column) a vector of type Vector{Union{Nothing, Int64}}. What is expected is Vector{Union{Missing, Int64}}, since all tools in DataFrames counts on using Missing. There are solutions to this, but the current situation is very hard to handle for a developer without experience. A fix might break workarounds, but nothing else.

Dict{String, Any} with 9 entries:
"Miles_per_Gallon" => 25
"Cylinders" => 4
"Origin" => "USA"
"Weight_in_lbs" => 2046
"Displacement" => 98
"Acceleration" => 19
"Name" => "ford pinto"
"Year" => "1971-01-01"
"Horsepower" => nothing

Full example, which was genereated by Gemini in Colab (which works great by the way)

# prompt: parse https://github.com/altair-viz/vega_datasets/blob/4f67bdaad10f45e3549984e17e1b3088c731503d/vega_datasets/_data/cars.json in Julia drop missing Horsepower Group by origin calculate mean of cylinders and median of cylinders and mean horsepower

using HTTP
using JSON
using DataFrames
using Statistics

# Download the JSON data
url = "https://raw.githubusercontent.com/altair-viz/vega_datasets/4f67bdaad10f45e3549984e17e1b3088c731503d/vega_datasets/_data/cars.json"
response = HTTP.get(url)

# Check for successful request
if response.status == 200
    # Parse the JSON data
    data = JSON.parse(String(response.body))

    # Convert JSON to DataFrame
    df = DataFrame(data)

    # Drop rows with missing "Horsepower"
    df = dropmissing(df, :Horsepower)

    # Group by origin
    grouped_df = groupby(df, :Origin)


    # Calculate mean and median of cylinders, and mean of horsepower for each group
    result = combine(grouped_df, :Cylinders => mean => :mean_cylinders,
                             :Cylinders => median => :median_cylinders,
                             :Horsepower => mean => :mean_horsepower)


    # Display the result
    println(result)
else
    println("Error downloading the file: ", response.status)
end```



Expected result: Working example

Acutal result: 

'''
MethodError: no method matching +(::Float64, ::Nothing)

Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...)
   @ Base operators.jl:587
  +(::Real, ::Complex{Bool})
   @ Base complex.jl:319
  +(::AbstractFloat, ::Bool)
   @ Base bool.jl:176
  ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions