Skip to content

Commit f958c58

Browse files
author
jdm204
committed
Add DataFrame constructor for AnnData
1 parent 284597a commit f958c58

File tree

5 files changed

+63
-1
lines changed

5 files changed

+63
-1
lines changed

docs/Project.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
[deps]
2+
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
23
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
34
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"

docs/make.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
using Documenter, Muon
1+
using Documenter, Muon, DataFrames
22

33
makedocs(sitename="Muon Documentation", warnonly=:cross_references)
44

docs/src/objects.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,21 @@ import Muon: obs_names_make_unique! # hide
7676
obs_names_make_unique!(ad)
7777
```
7878

79+
The data matrices of `AnnData` objects can be converted to a `DataFrame`, annotated with `obs` and `var` names.
80+
81+
```@example 1
82+
using DataFrames
83+
DataFrame(ad)
84+
```
85+
86+
By default, the first column `obs` corresponds to the `obs_names` and the remaining columns are named according to the `var_names`. To obtain the transpose of this, pass `columns=:obs`.
87+
88+
To use a different data matrix (the default is `ad.X`), pass the name of the layer:
89+
90+
```julia
91+
DataFrame(ad, layer="raw")
92+
```
93+
7994
## MuData
8095

8196
The basic idea behind a multimodal object is _key_ ``\rightarrow`` _value_ relationship where _keys_ represent the unique names of individual modalities and _values_ are `AnnData` objects that contain the correposnding data. Similarly to `AnnData` objects, `MuData` objects can also contain rich multimodal annotations.

src/util.jl

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,3 +189,35 @@ function duplicateindices(v::Muon.Index{T, I}) where {T <: AbstractString, I <:
189189
filter!(x -> length(last(x)) > 1, varnames)
190190
varnames
191191
end
192+
193+
"""
194+
DataFrame(A::AnnData; layer=nothing, columns=:var)
195+
196+
Return a DataFrame containing the data matrix `A.X` (or `layer` by
197+
passing `layer="layername"`). By default, the first column contains
198+
`A.obs_names` and the remaining columns are named according to
199+
`A.var_names`, to obtain the transpose, pass `columns=:obs`.
200+
"""
201+
function DataFrames.DataFrame(A::AnnData; layer::Union{String, Nothing}=nothing, columns=:var)
202+
if columns [:obs, :var]
203+
throw(ArgumentError("columns must be :obs or :var (got: $columns)"))
204+
end
205+
rows = columns == :var ? :obs : :var
206+
colnames = getproperty(A, Symbol(columns, :_names))
207+
if !allunique(colnames)
208+
throw(ArgumentError("duplicate column names ($(columns)_names); run $(columns)_names_make_unique!"))
209+
end
210+
rownames = getproperty(A, Symbol(rows, :_names))
211+
212+
M = if isnothing(layer)
213+
A.X
214+
elseif layer in keys(A.layers)
215+
A.layers[layer]
216+
else
217+
throw(ArgumentError("no layer $layer in adata layers"))
218+
end
219+
df = DataFrame(columns == :var ? M : transpose(M), colnames)
220+
setproperty!(df, rows, rownames)
221+
select!(df, rows, All())
222+
df
223+
end

test/anndata.jl

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,17 @@ end
6161
@test allunique(ad2.var_names)
6262
@test allunique(ad2.obs_names)
6363
end
64+
65+
@testset "DataFrame conversion" begin
66+
using DataFrames
67+
df = DataFrame(ad)
68+
@test names(df) == ["obs"; ad.var_names]
69+
@test df.obs == ad.obs_names
70+
ad.var_names[3] = "10"
71+
@test_throws ArgumentError DataFrame(ad)
72+
@test_throws ArgumentError DataFrame(ad, columns=:foo)
73+
@test_throws ArgumentError DataFrame(ad, layer="doesn't exist")
74+
df2 = DataFrame(ad, columns=:obs)
75+
@test names(df2) == ["var"; ad.obs_names]
76+
@test df2.var == ad.var_names
77+
end

0 commit comments

Comments
 (0)