Skip to content

Cannot open file (failed to parse internal XML file) #243

@tbeason

Description

@tbeason

I'm not quite sure what the problem is here, but I've encountered a file that I am not able to open! Luckily the file is public so maybe somebody can figure this out. Link: https://www.aqr.com/-/media/AQR/Documents/Insights/Data-Sets/Quality-Minus-Junk-Factors-Monthly.xlsx

With that file downloaded, I try this:

df = @chain begin
        XLSX.readtable("Quality-Minus-Junk-Factors-Monthly.xlsx", "QMJ Factors","A:AE";first_row=19,infer_eltypes=true)
        DataFrame
        transform("DATE"=>ByRow(d->Date(d,dateformat"m/d/Y"))=>"DATE")
        select("DATE","USA"=>"QMJ")
    end

But get the error

┌ Error: Failed to parse internal XML file `_rels/.rels`
└ @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:446
ERROR: EOFError: read end of file
Stacktrace:
  [1] read!
    @ ZipFile C:\Users\beasont\.julia\packages\ZipFile\evaHP\src\Zlib.jl:299 [inlined]
  [2] unsafe_read(f::ZipFile.ReadableFile, p::Ptr{UInt8}, n::UInt64)
    @ ZipFile C:\Users\beasont\.julia\packages\ZipFile\evaHP\src\ZipFile.jl:498
  [3] unsafe_read
    @ EzXML .\io.jl:774 [inlined]
  [4] (::EzXML.var"#7#8")(context::ZipFile.ReadableFile, buffer::Ptr{UInt8}, len::Int32)
    @ EzXML C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\document.jl:218
  [5] macro expansion
    @ XLSX C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\error.jl:50 [inlined]
  [6] readxml
    @ XLSX C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\document.jl:154 [inlined]
  [7] internal_xml_file_read(xf::XLSX.XLSXFile, filename::String)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:444
  [8] xmldocument
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:480 [inlined]
  [9] xmlroot
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:484 [inlined]
 [10] get_package_relationship_root(xf::XLSX.XLSXFile)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\relationship.jl:51
 [11] parse_relationships!(xf::XLSX.XLSXFile)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:296
 [12] open_or_read_xlsx(source::String, read_files::Bool, enable_cache::Bool, read_as_template::Bool)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:235
 [13] openxlsx(f::XLSX.var"#32#33"{Int64, Nothing, Bool, Bool, Bool, Nothing, Bool, String, String}, source::String; mode::String, enable_cache::Bool)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:135
 [14] openxlsx
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:128 [inlined]
 [15] #readtable#31
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:611 [inlined]
 [16] top-level scope
    @ REPL[12]:2

Here is my Project.toml (I am on 1.10 beta 2)

  [336ed68f] CSV v0.10.11
  [13f3f980] CairoMakie v0.10.9
  [8be319e6] Chain v0.5.0
  [992eb4ea] CondaPkg v0.2.18
  [60f91f6f] CovarianceMatrices v0.10.4
  [a10d1c49] DBInterface v2.5.0
  [a93c6f00] DataFrames v1.6.1
  [d2f5444f] DuckDB v0.8.1
  [bd2a388e] FamaFrenchData v0.4.3
⌃ [38e38edf] GLM v1.8.3
  [5432bcbf] PaddedViews v0.5.12
  [6099a3de] PythonCall v0.9.14
  [cbe49d4c] RemoteFiles v0.5.0
⌅ [2913bbd2] StatsBase v0.33.21
⌅ [3eaba693] StatsModels v0.6.33
  [bd369af6] Tables v1.10.1
  [fdbf4ff8] XLSX v0.10.0
  [ade2ca70] Dates
  [f43a241f] Downloads v1.6.0
  [37e2e46d] LinearAlgebra
  [10745b16] Statistics v1.9.0
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

If I open the file in Excel and Save As to a new file, everything works fine. I don't know how to do that programmatically though!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions