Skip to content

Read error on TabulaSapients 10X data #22

@rasmushenningsson

Description

@rasmushenningsson

Transferred issue from here:
rasmushenningsson/SingleCell10x.jl#5

Hi Rasmus,

Just for fun, I thought I'd try applying SingleCellProjections to a dataset I had on hand, the TabulaSapiens eye .h5ad file. If you want to grab this file yourself, you can find it here: https://figshare.com/articles/dataset/Tabula_Sapiens_release_1_0/14267219?file=34701970.

However, it seems that load_h5ad isn't particularly happy with this file:

(jl_NPMjKi) pkg> st # Julia 1.11.0-rc2
Status `/tmp/jl_NPMjKi/Project.toml`
  [03d38035] SingleCellProjections v0.4.0

julia> counts = load_counts("/data/gene_variants_data/extracted/tabula_sapiens/TS_Eye.h5ad/TS_Eye.h5ad")
ERROR: unexpected character '\x04' after quoted field at row 250 column 1
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] 
    @ DelimitedFiles ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:727
  [3] readdlm_string(sbuff::String, dlm::Char, T::Type, eol::Char, auto::Bool, optsd::Dict{Symbol, Union{…}})
    @ DelimitedFiles ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:461
  [4] readdlm_auto(input::IOStream, dlm::Char, T::Type, eol::Char, auto::Bool; opts::@Kwargs{})
    @ DelimitedFiles ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:231
  [5] readdlm_auto
    @ ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:231 [inlined]
  [6] readdlm
    @ ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:226 [inlined]
  [7] readdlm
    @ ~/.julia/packages/DelimitedFiles/aGcsu/src/DelimitedFiles.jl:86 [inlined]
  [8] _read10x_features(io::IOStream; delim::Char)
    @ SingleCell10x ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:306
  [9] _read10x_features
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:305 [inlined]
 [10] #19
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:315 [inlined]
 [11] #1
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:57 [inlined]
 [12] open(f::SingleCell10x.var"#1#2"{SingleCell10x.var"#19#20"{@Kwargs{delim::Char}}, Bool}, args::String; kwargs::@Kwargs{})
    @ Base ./io.jl:410
 [13] open
    @ ./io.jl:407 [inlined]
 [14] _open(f::SingleCell10x.var"#19#20"{@Kwargs{delim::Char}}, filename::String)
    @ SingleCell10x ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:55
 [15] #_read10x_features#18
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:314 [inlined]
 [16] _read10x_features_triplet(filename::String; guess::Function, kwargs::@Kwargs{})
    @ SingleCell10x ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:329
 [17] _read10x_features_triplet
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:327 [inlined]
 [18] #_read10x_features_autodetect#22
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:324 [inlined]
 [19] _read10x_features_autodetect
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:320 [inlined]
 [20] read10x_features(io::String, featuretype::Type; kwargs::@Kwargs{})
    @ SingleCell10x ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:357
 [21] read10x_features
    @ ~/.julia/packages/SingleCell10x/XLhW5/src/fileio.jl:356 [inlined]
 [22] _load10x_metadata
    @ ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:96 [inlined]
 [23] load10x(filename::String; lazy::Bool, var_id::Nothing, var_id_delim::Char, kwargs::@Kwargs{})
    @ SingleCellProjections ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:135
 [24] load10x
    @ ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:130 [inlined]
 [25] #29
    @ ./broadcast.jl:1306 [inlined]
 [26] _broadcast_getindex_evalf
    @ ./broadcast.jl:673 [inlined]
 [27] _broadcast_getindex
    @ ./broadcast.jl:646 [inlined]
 [28] getindex
    @ ./broadcast.jl:605 [inlined]
 [29] copy
    @ ./broadcast.jl:906 [inlined]
 [30] materialize
    @ ./broadcast.jl:867 [inlined]
 [31] load_counts(loadfun::typeof(load10x), filenames::String; sample_names::Nothing, sample_name_col::Nothing, lazy::Bool, lazy_merge::Bool, obs_id_col::String, obs_id_delim::Char, obs_id_prefixes::Nothing, extra_var_id_cols::Symbol, duplicate_var::Nothing, duplicate_obs::Nothing, callback::Nothing, kwargs::@Kwargs{})
    @ SingleCellProjections ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:221
 [32] load_counts
    @ ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:199 [inlined]
 [33] load_counts(filenames::String)
    @ SingleCellProjections ~/.julia/packages/SingleCellProjections/0yZXZ/src/load.jl:227
 [34] top-level scope
    @ REPL[8]:1
Some type information was truncated. Use `show(err)` to see complete types.

Locally modifying _read10x_features to get the first 200 bytes of the IO that readdlm was called on, this is what I find:

\x89HDF\r\n\x1a\n\0\0\0\0\0\b\b\0\x04\0\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\xff\xff\xff\xff\xff\xff\xff\xff\$\x84\xd5F\0\0\0\0\xff\xff\xff\xff\xff\xff\xff\xff\0\0\0\0\0\0\0\0`\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\x88\0\0\0\0\0\0\0\xa8\x02\0\0\0\0\0\0\x01\0\x01\0\x01\0\0\0\x18\0\0\0\0\0\0\0\x11\0\x10\0\0\0\0\0\x88\0\0\0\0\0\0\0\xa8\x02\0\0\0\0\0\0TREE\0\0\x01\0\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\0\0\0\0\0\0\0\0\xe0\x05\0\0\0\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions