Skip to content

Conversation

@DilumAluthge
Copy link
Member

@DilumAluthge DilumAluthge commented May 31, 2018

This is a better way of fixing #3 that isn't hacky.

@DilumAluthge DilumAluthge changed the title Better way of saving Dicts with non-Symbol keys Better way of saving Dicts with non-Symbol keys - fixes #3 May 31, 2018
@DilumAluthge
Copy link
Member Author

cc: @MikeInnes

@DilumAluthge
Copy link
Member Author

Bump @MikeInnes

@aviks
Copy link
Contributor

aviks commented Jun 21, 2018

I can confirm that this fixes the issue around writing Dicts.

@aviks
Copy link
Contributor

aviks commented Jun 21, 2018

Unfortunately, even with this, Dicts do not read back correctly. The keys are visible, but the indexing fails. Which probably means the hashes are incorrect.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Jun 21, 2018 via email

@aviks
Copy link
Contributor

aviks commented Jun 22, 2018

If you download the file at https://github.com/JuliaText/TextAnalysis.jl/releases/download/v0.3.0/sentiment.tar.gz (15.4MB) and unzip it, you'll find a file called model/sentiment-analysis-weights.bson

julia> w = BSON.load("model/sentiment-analysis-weights.bson")
Dict{Symbol,Any} with 4 entries:
  :dense_1     => Dict{String,Any}(Pair{String,Any}("dense_1", Dict{String,Any}(Pair{String,Any}("bias:0", Float32[-0.000456777, -…
  :dense_2     => Dict{String,Any}(Pair{String,Any}("dense_2", Dict{String,Any}(Pair{String,Any}("bias:0", Float32[-0.00026646]),P…
  :flatten_1   => Dict{String,Any}()
  :embedding_1 => Dict{String,Any}(Pair{String,Any}("embedding_1", Dict{String,Any}(Pair{String,Any}("embeddings:0", Float32[0.004…

julia> x=w[:dense_1]["dense_1"]
Dict{String,Any} with 2 entries:
  "bias:0"   => Float32[-0.000456777, -0.00394565, 0.0167285, 0.00663774, 0.00539125, -0.00755487, 0.00562309, 0.0169308, 0.000605…
  "kernel:0" => Float32[-0.0138779 0.0164853 … 0.0222927 -0.0899643; -0.01212 0.00812849 … 0.00999072 -0.0178676; … ; 0.00156742 0…

julia> x["bias:0"]
ERROR: KeyError: key "bias:0" not found
Stacktrace:
 [1] getindex(::Dict{String,Any}, ::String) at ./dict.jl:474

julia> collect(keys(x))[1] == "bias:0"
true

julia> x.slots
2-element Array{UInt8,1}:
 0x01
 0x01

julia> Base.rehash!(x);

julia> x["bias:0"]
250-element Array{Float32,1}:
 -0.000456777
  ⋮

julia> x.slots
16-element Array{UInt8,1}:
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x01
 0x01
 0x00
 0x00
 0x00
 0x00
 0x00

Hope this helps. This file was written and read using this branch of BSON.jl

@DilumAluthge
Copy link
Member Author

Thanks! I’ll work on this today.

@DilumAluthge
Copy link
Member Author

@aviks I believe I've fixed this. Basically I just rehash every dictionary at load-time. It's an easy fix, and now your example seems to work. Can you confirm that it works for you?

@aviks
Copy link
Contributor

aviks commented Jun 24, 2018

Yes, this does work for me, thanks! I'll let Mike chime in on whether rehashing on load is a valid strategy.

@MikeInnes
Copy link
Collaborator

Good stuff, but we can make this a bit simpler. Rather than trying to preserve the internal structure of the dict, we really just need to save a Vector{K} and Vector{V} for the keys and values respectively, and we can reconstruct the dict from scratch on read (avoiding the need for rehash, too). How does that sound?

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Jun 25, 2018 via email

@MikeInnes
Copy link
Collaborator

I think you just need structdata, initstruct and newstruct!.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Jun 25, 2018 via email

@MikeInnes
Copy link
Collaborator

Sure, feel free to bump this thread.

@DilumAluthge
Copy link
Member Author

I worked on this for a bit, but I reached the conclusion that it will actually be better to get #8 to work and use that. That way we are covered for any structs with undefined values.

@DilumAluthge
Copy link
Member Author

Closing in favor of #8

@DilumAluthge DilumAluthge deleted the da/dicts branch July 1, 2018 20:08
@MikeInnes
Copy link
Collaborator

I disagree actually – we will end up having to do tricks like rehash anyway, and this just makes things a lot more robust to changes to Dict's internal representation. Ideally we'd have both fixes.

The undefined values patch will also be bigger / involve some design decisions, so it'll also be easier to get this in quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants