Skip to content

Commit fe36768

Browse files
authored
Merge pull request #78 from ssfrr/loadsavestreaming
Implement loadstreaming/savestreaming API
2 parents 50c4ef3 + 239af17 commit fe36768

File tree

4 files changed

+312
-44
lines changed

4 files changed

+312
-44
lines changed

README.md

Lines changed: 87 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,44 @@ s = query(io) # io is a stream
3737
will return a `File` or `Stream` object that also encodes the detected
3838
file format.
3939

40+
Sometimes you want to read or write files that are larger than your available
41+
memory, or might be an unknown or infinite length (e.g. reading an audio or
42+
video stream from a socket). In these cases it might not make sense to process
43+
the whole file at once, but instead process it a chunk at a time. For these
44+
situations FileIO provides the `loadstreaming` and `savestreaming` functions,
45+
which return an object that you can `read` or `write`, rather than the file data
46+
itself.
47+
48+
This would look something like:
49+
50+
```jl
51+
using FileIO
52+
audio = loadstreaming("bigfile.wav")
53+
try
54+
while !eof(audio)
55+
chunk = read(audio, 4096) # read 4096 frames
56+
# process the chunk
57+
end
58+
finally
59+
close(audio)
60+
end
61+
```
62+
63+
or use `do` syntax to auto-close the stream:
64+
65+
```jl
66+
using FileIO
67+
loadstreaming("bigfile.wav") do audio
68+
while !eof(audio)
69+
chunk = read(audio, 4096) # read 4096 frames
70+
# process the chunk
71+
end
72+
end
73+
```
74+
75+
Note that in these cases you may want to use `read!` with a pre-allocated buffer
76+
for maximum efficiency.
77+
4078
## Adding new formats
4179

4280
You register a new format by adding `add_format(fmt, magic,
@@ -130,15 +168,62 @@ end
130168
Note that these are `load` and `save`, **not** `FileIO.load` and `FileIO.save`.
131169
Because a given format might have multiple packages that are capable of reading it,
132170
FileIO will dispatch to these using module-scoping, e.g., `SomePkg.load(args...)`.
133-
Consequently, **packages should define "private" `load` and `save` methods, and
134-
not extend (import) FileIO's**.
171+
Consequently, **packages should define "private" `load` and `save` methods (also
172+
`loadstreaming` and `savestreaming` if you implement them), and not extend
173+
(import) FileIO's**.
135174

136175
`load(::File)` and `save(::File)` should close any streams
137176
they open. (If you use the `do` syntax, this happens for you
138177
automatically even if the code inside the `do` scope throws an error.)
139178
Conversely, `load(::Stream)` and `save(::Stream)` should not close the
140179
input stream.
141180

181+
`loadstreaming` and `savestreaming` use the same query mechanism, but return a
182+
decoded stream that users can `read` or `write`. You should also implement a
183+
`close` method on your reader or writer type. Just like with `load` and `save`,
184+
if the user provided a filename, your `close` method should be responsible for
185+
closing any streams you opened in order to read or write the file. If you are
186+
given a `Stream`, your `close` method should only do the clean up for your
187+
reader or writer type, not close the stream.
188+
189+
```jl
190+
struct WAVReader
191+
io::IO
192+
ownstream::Bool
193+
end
194+
195+
function Base.read(reader::WAVReader, frames::Int)
196+
# read and decode audio samples from reader.io
197+
end
198+
199+
function Base.close(reader::WAVReader)
200+
# do whatever cleanup the reader needs
201+
reader.ownstream && close(reader.io)
202+
end
203+
204+
# FileIO has fallback functions that make these work using `do` syntax as well,
205+
# and will automatically call `close` on the returned object.
206+
loadstreaming(f::File{format"WAV"}) = WAVReader(open(f), true)
207+
loadstreaming(s::Stream{format"WAV"}) = WAVReader(s, false)
208+
```
209+
210+
If you choose to implement `loadstreaming` and `savestreaming` in your package,
211+
you can easily add `save` and `load` methods in the form of:
212+
213+
```jl
214+
function save(q::Formatted{format"WAV"}, data, args...; kwargs...)
215+
savestreaming(q, args...; kwargs...) do stream
216+
write(stream, data)
217+
end
218+
end
219+
220+
function load(q::Formatted{format"WAV"}, args...; kwargs...)
221+
loadstreaming(q, args...; kwargs...) do stream
222+
read(stream)
223+
end
224+
end
225+
```
226+
142227
## Help
143228

144229
You can get an API overview by typing `?FileIO` at the REPL prompt.

src/FileIO.jl

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,11 @@ export DataFormat,
1717
file_extension,
1818
info,
1919
load,
20+
loadstreaming,
2021
magic,
2122
query,
2223
save,
24+
savestreaming,
2325
skipmagic,
2426
stream,
2527
unknown
@@ -40,7 +42,9 @@ include("registry.jl")
4042
4143
- `load([filename|stream])`: read data in formatted file, inferring the format
4244
- `load(File(format"PNG",filename))`: specify the format manually
45+
- `loadstreaming([filename|stream])`: similar to `load`, except that it returns an object that can be read from
4346
- `save(filename, data...)` for similar operations involving saving data
47+
- `savestreaming([filename|stream])`: similar to `save`, except that it returns an object that can be written to
4448
4549
- `io = open(f::File, args...)` opens a file
4650
- `io = stream(s::Stream)` returns the IOStream from the query object `s`

src/loadsave.jl

Lines changed: 114 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -40,86 +40,163 @@ add_loader
4040
"`add_saver(fmt, :Package)` triggers `using Package` before saving format `fmt`"
4141
add_saver
4242

43-
4443
"""
4544
- `load(filename)` loads the contents of a formatted file, trying to infer
4645
the format from `filename` and/or magic bytes in the file.
4746
- `load(strm)` loads from an `IOStream` or similar object. In this case,
48-
the magic bytes are essential.
49-
- `load(File(format"PNG",filename))` specifies the format directly, and bypasses inference.
47+
there is no filename extension, so we rely on the magic bytes for format
48+
identification.
49+
- `load(File(format"PNG", filename))` specifies the format directly, and bypasses inference.
50+
- `load(Stream(format"PNG", io))` specifies the format directly, and bypasses inference.
5051
- `load(f; options...)` passes keyword arguments on to the loader.
5152
"""
52-
load(s::Union{AbstractString,IO}, args...; options...) =
53-
load(query(s), args...; options...)
53+
load
54+
55+
"""
56+
Some packages may implement a streaming API, where the contents of the file can
57+
be read in chunks and processed, rather than all at once. Reading from these
58+
higher-level streams should return a formatted object, like an image or chunk of
59+
video or audio.
60+
61+
- `loadstreaming(filename)` loads the contents of a formatted file, trying to infer
62+
the format from `filename` and/or magic bytes in the file. It returns a streaming
63+
type that can be read from in chunks, rather than loading the whole contents all
64+
at once
65+
- `loadstreaming(strm)` loads the stream from an `IOStream` or similar object.
66+
In this case, there is no filename extension, so we rely on the magic bytes
67+
for format identification.
68+
- `loadstreaming(File(format"WAV",filename))` specifies the format directly, and
69+
bypasses inference.
70+
- `loadstreaming(Stream(format"WAV", io))` specifies the format directly, and
71+
bypasses inference.
72+
- `loadstreaming(f; options...)` passes keyword arguments on to the loader.
73+
"""
74+
loadstreaming
5475

5576
"""
5677
- `save(filename, data...)` saves the contents of a formatted file,
5778
trying to infer the format from `filename`.
5879
- `save(Stream(format"PNG",io), data...)` specifies the format directly, and bypasses inference.
80+
- `save(File(format"PNG",filename), data...)` specifies the format directly, and bypasses inference.
5981
- `save(f, data...; options...)` passes keyword arguments on to the saver.
6082
"""
61-
save(s::Union{AbstractString,IO}, data...; options...) =
62-
save(query(s), data...; options...)
83+
save
84+
85+
"""
86+
Some packages may implement a streaming API, where the contents of the file can
87+
be written in chunks, rather than all at once. These higher-level streams should
88+
accept formatted objects, like an image or chunk of video or audio.
89+
90+
- `savestreaming(filename, data...)` saves the contents of a formatted file,
91+
trying to infer the format from `filename`.
92+
- `savestreaming(File(format"WAV",filename))` specifies the format directly, and
93+
bypasses inference.
94+
- `savestreaming(Stream(format"WAV", io))` specifies the format directly, and
95+
bypasses inference.
96+
- `savestreaming(f, data...; options...)` passes keyword arguments on to the saver.
97+
"""
98+
savestreaming
99+
100+
# if a bare filename or IO stream are given, query for the format and dispatch
101+
# to the formatted handlers below
102+
for fn in (:load, :loadstreaming, :save, :savestreaming)
103+
@eval $fn(s::Union{AbstractString,IO}, args...; options...) =
104+
$fn(query(s), args...; options...)
105+
end
63106

107+
# return a save function, so you can do `thing_to_save |> save("filename.ext")`
64108
function save(s::Union{AbstractString,IO}; options...)
65109
data -> save(s, data; options...)
66110
end
67111

68-
# Forced format
112+
# Allow format to be overridden with first argument
69113
function save{sym}(df::Type{DataFormat{sym}}, f::AbstractString, data...; options...)
70114
libraries = applicable_savers(df)
71115
checked_import(libraries[1])
72116
eval(Main, :($save($File($(DataFormat{sym}), $f),
73117
$data...; $options...)))
74118
end
75119

120+
function savestreaming{sym}(df::Type{DataFormat{sym}}, s::IO, data...; options...)
121+
libraries = applicable_savers(df)
122+
checked_import(libraries[1])
123+
eval(Main, :($savestreaming($Stream($(DataFormat{sym}), $s),
124+
$data...; $options...)))
125+
end
126+
76127
function save{sym}(df::Type{DataFormat{sym}}, s::IO, data...; options...)
77128
libraries = applicable_savers(df)
78129
checked_import(libraries[1])
79130
eval(Main, :($save($Stream($(DataFormat{sym}), $s),
80131
$data...; $options...)))
81132
end
82133

134+
function savestreaming{sym}(df::Type{DataFormat{sym}}, f::AbstractString, data...; options...)
135+
libraries = applicable_savers(df)
136+
checked_import(libraries[1])
137+
eval(Main, :($savestreaming($File($(DataFormat{sym}), $f),
138+
$data...; $options...)))
139+
end
83140

84-
# Fallbacks
85-
function load{F}(q::Formatted{F}, args...; options...)
86-
if unknown(q)
87-
isfile(filename(q)) || open(filename(q)) # force systemerror
88-
throw(UnknownFormat(q))
89-
end
90-
libraries = applicable_loaders(q)
91-
failures = Any[]
92-
for library in libraries
141+
# do-syntax for streaming IO
142+
for fn in (:loadstreaming, :savestreaming)
143+
@eval function $fn(f::Function, args...; kwargs...)
144+
str = $fn(args...; kwargs...)
93145
try
94-
Library = checked_import(library)
95-
if !has_method_from(methods(Library.load), Library)
96-
throw(LoaderError(string(library), "load not defined"))
146+
f(str)
147+
finally
148+
close(str)
149+
end
150+
end
151+
end
152+
153+
# Handlers for formatted files/streams
154+
155+
for fn in (:load, :loadstreaming)
156+
@eval function $fn{F}(q::Formatted{F}, args...; options...)
157+
if unknown(q)
158+
isfile(filename(q)) || open(filename(q)) # force systemerror
159+
throw(UnknownFormat(q))
160+
end
161+
libraries = applicable_loaders(q)
162+
failures = Any[]
163+
for library in libraries
164+
try
165+
Library = checked_import(library)
166+
if !has_method_from(methods(Library.$fn), Library)
167+
throw(LoaderError(string(library), "$($fn) not defined"))
168+
end
169+
return eval(Main, :($(Library.$fn)($q, $args...; $options...)))
170+
catch e
171+
push!(failures, (e, q))
97172
end
98-
return eval(Main, :($(Library.load)($q, $args...; $options...)))
99-
catch e
100-
push!(failures, (e, q))
101173
end
174+
handle_exceptions(failures, "loading \"$(filename(q))\"")
102175
end
103-
handle_exceptions(failures, "loading \"$(filename(q))\"")
104176
end
105-
function save{F}(q::Formatted{F}, data...; options...)
106-
unknown(q) && throw(UnknownFormat(q))
107-
libraries = applicable_savers(q)
108-
failures = Any[]
109-
for library in libraries
110-
try
111-
Library = checked_import(library)
112-
if !has_method_from(methods(Library.save), Library)
113-
throw(WriterError(string(library), "save not defined"))
177+
178+
for fn in (:save, :savestreaming)
179+
@eval function $fn{F}(q::Formatted{F}, data...; options...)
180+
unknown(q) && throw(UnknownFormat(q))
181+
libraries = applicable_savers(q)
182+
failures = Any[]
183+
for library in libraries
184+
try
185+
Library = checked_import(library)
186+
if !has_method_from(methods(Library.$fn), Library)
187+
throw(WriterError(string(library), "$($fn) not defined"))
188+
end
189+
return eval(Main, :($(Library.$fn)($q, $data...; $options...)))
190+
catch e
191+
push!(failures, (e, q))
114192
end
115-
return eval(Main, :($(Library.save)($q, $data...; $options...)))
116-
catch e
117-
push!(failures, (e, q))
118193
end
194+
handle_exceptions(failures, "saving \"$(filename(q))\"")
119195
end
120-
handle_exceptions(failures, "saving \"$(filename(q))\"")
121196
end
122197

198+
# returns true if the given method table includes a method defined by the given
199+
# module, false otherwise
123200
function has_method_from(mt, Library)
124201
for m in mt
125202
if getmodule(m) == Library

0 commit comments

Comments
 (0)