Skip to content

Merge two DataFrames only to missing values #2243

@chedieck

Description

@chedieck

Suppose I have an dataframe called df_missing:

6×3 DataFrame
│ Row │ id    │ val     │ other |
│     │ Int64 │ Int64?  │ Int64 │
├─────┼───────┼─────────┼───────┤
│ 1   │ 5     │ 1       │ 4     │
│ 2   │ 2     │ missing │ 4     │
│ 3   │ 1     │ 3       │ 3     │
│ 4   │ 4     │ 8       │ 4     │
│ 5   │ 6     │ 2       │ 4     │
│ 6   │ 8     │ missing │ 3     │

and I also have another dataset, called df_completion:

2x2 DataFrame
│ Row │ id    │ val   |
│     │ Int64 │ Int64 │ 
├─────┼───────┼───────┤
│ 1   │ 2     │ 5     │ 
│ 2   │ 8     │ 13    │ 

So my suggestion is: couldn't there be a more straightforward way to replace the missing values using the common id of the two dataframes, thus creating df_full?

6×3 DataFrame
│ Row │ id    │ val     │ other |
│     │ Int64 │ Int64?  │ Int64 │
├─────┼───────┼─────────┼───────┤
│ 1   │ 5     │ 1       │ 4     │
│ 2   │ 2     │ 5       │ 4     │
│ 3   │ 1     │ 3       │ 3     │
│ 4   │ 4     │ 8       │ 4     │
│ 5   │ 6     │ 2       │ 4     │
│ 6   │ 8     │ 13      │ 3     │

The two current ways that seem to be the best ones are:
julia> df_missing[in(df_completion.id).(df_missing.id), :val] = df_completion.val
or

julia> df_full = leftjoin(df_missing, df_completion, on = :id, makeunique = true);

julia> df_full.val = map(df_full.val, df_full.val_1) do a, b
       ismissing(a) ? b : a
       end;

julia> select!(df_full, Not(:val_1))

besides, of course, creating a loop. Maybe there could be a method for merging two DataFrames in this way? I think it's a very common problem.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions