-
Notifications
You must be signed in to change notification settings - Fork 373
Open
Description
Suppose I have an dataframe called df_missing:
6×3 DataFrame
│ Row │ id │ val │ other |
│ │ Int64 │ Int64? │ Int64 │
├─────┼───────┼─────────┼───────┤
│ 1 │ 5 │ 1 │ 4 │
│ 2 │ 2 │ missing │ 4 │
│ 3 │ 1 │ 3 │ 3 │
│ 4 │ 4 │ 8 │ 4 │
│ 5 │ 6 │ 2 │ 4 │
│ 6 │ 8 │ missing │ 3 │
and I also have another dataset, called df_completion:
2x2 DataFrame
│ Row │ id │ val |
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 2 │ 5 │
│ 2 │ 8 │ 13 │
So my suggestion is: couldn't there be a more straightforward way to replace the missing values using the common id of the two dataframes, thus creating df_full?
6×3 DataFrame
│ Row │ id │ val │ other |
│ │ Int64 │ Int64? │ Int64 │
├─────┼───────┼─────────┼───────┤
│ 1 │ 5 │ 1 │ 4 │
│ 2 │ 2 │ 5 │ 4 │
│ 3 │ 1 │ 3 │ 3 │
│ 4 │ 4 │ 8 │ 4 │
│ 5 │ 6 │ 2 │ 4 │
│ 6 │ 8 │ 13 │ 3 │
The two current ways that seem to be the best ones are:
julia> df_missing[in(df_completion.id).(df_missing.id), :val] = df_completion.val
or
julia> df_full = leftjoin(df_missing, df_completion, on = :id, makeunique = true);
julia> df_full.val = map(df_full.val, df_full.val_1) do a, b
ismissing(a) ? b : a
end;
julia> select!(df_full, Not(:val_1))
besides, of course, creating a loop. Maybe there could be a method for merging two DataFrames in this way? I think it's a very common problem.