Skip to content

Conversation

@johnnychen94
Copy link
Member

@johnnychen94 johnnychen94 commented Jul 6, 2020

This is a sub PR of #61

Changes:

  • rename _convert to maybe_encode
  • add tests for maybe_encode
  • more patches to ignore CRLF/LF differences

I'm unclear what should maybe_encode(format"TXT", df::DataFrame) and maybe_encode(format"SHA256", df::DataFrame) behave. The current behavior is maybe_encode(fmt, string(df))

Converting it using string can be prune to show changes. Hence I suggest we do something like

function maybe_encode(::Type{DataFormat{:TXT}}, df::DataFrame)
    save("tmp.csv", df)
    _ignore_crlf(read(String, "tmp.csv"))
end

function maybe_encode(::Type{DataFormat{:SHA256}}, df::DataFrame)
    save("tmp.csv", df)
    bytes2hex(sha256(read("tmp.csv")))
end

@oxinabox if you like this idea, I can make a PR for this change.

* add tests for maybe_encode
* ignore CRLF/LF differences
It has no practical usage except that it's conceptually right.
end

function query_extended(filename)
function query_extended(filename::AbstractString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function expects only string types splitext, I added this to make sure nothing strange is passed here as a safety net.

src/fileio.jl Outdated
# Helpers
_join(x::AbstractArray{<:AbstractString}) = mapreduce(_ignore_crlf, (x,y)->x*"\n"*y, x)
_sha256(x) = bytes2hex(sha256(x))
_ignore_crlf(x::AbstractString) = replace(x, "\r"=>"")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just say we are ignoring \r ?

Suggested change
_ignore_crlf(x::AbstractString) = replace(x, "\r"=>"")
_ignore_linefeed(x::AbstractString) = replace(x, "\r"=>"")

Copy link
Member Author

@johnnychen94 johnnychen94 Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're suggesting Carriage Return (\r) instead of linefeed (\n)?

I renamed to _ignore_CR with additional docstring.

@johnnychen94
Copy link
Member Author

johnnychen94 commented Jul 7, 2020

I'm merging this to avoid possible conflict and save myself some efforts from rebasing.

If there're any further comments, I'll address them in future PRs.

@johnnychen94 johnnychen94 merged commit afc31d4 into jc/test Jul 7, 2020
@johnnychen94 johnnychen94 deleted the jc/fileio branch July 7, 2020 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants