Skip to content

Commit e976c92

Browse files
authored
Merge pull request #9 from visr/table
almost complete rewrite / support Tables interface
2 parents 93f7385 + 32fec11 commit e976c92

File tree

6 files changed

+448
-218
lines changed

6 files changed

+448
-218
lines changed

.travis.yml

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,30 @@
11
# Documentation: http://docs.travis-ci.com/user/languages/julia/
22
language: julia
3-
os:
4-
- linux
5-
- osx
3+
64
julia:
7-
- 0.7
85
- 1.0
6+
- 1
97
- nightly
8+
9+
os:
10+
- linux
11+
- osx
12+
- windows
13+
14+
arch:
15+
- x64
16+
- x86
17+
1018
matrix:
19+
exclude:
20+
- os: osx
21+
arch: x86
22+
fast_finish: true
1123
allow_failures:
1224
- julia: nightly
25+
1326
notifications:
1427
email: false
15-
#script: # the default script is equivalent to the following
16-
# - if [[ -a .git/shallow ]]; then git fetch --unshallow; fi
17-
# - julia -e 'Pkg.clone(pwd()); Pkg.build("DBFTables"); Pkg.test("DBFTables"; coverage=true)';
18-
after_success:
19-
- julia -e 'using Pkg, DBFTables; cd(joinpath(dirname(pathof(DBFTables)),"..")); Pkg.add("Coverage"); using Coverage; Coveralls.submit(Coveralls.process_folder())';
20-
- julia -e 'using Pkg, DBFTables; cd(joinpath(dirname(pathof(DBFTables)),"..")); Pkg.add("Coverage"); using Coverage; Codecov.submit(Codecov.process_folder())';
28+
29+
codecov: true
30+
coveralls: true

Project.toml

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,20 @@
11
name = "DBFTables"
22
uuid = "75c7ada1-017a-5fb6-b8c7-2125ff2d6c93"
3-
version = "0.1.1"
3+
version = "0.2.0"
44

55
[deps]
6-
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
76
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
7+
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
8+
WeakRefStrings = "ea10d353-3f73-51f8-a26c-33c1cb351aa5"
89

910
[compat]
10-
julia = "0.7, 1.0"
11+
Tables = "0.2"
12+
WeakRefStrings = "0.6"
13+
julia = "1.0"
1114

1215
[extras]
13-
Missings = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
16+
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
1417
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
1518

1619
[targets]
17-
test = ["Missings", "Test"]
20+
test = ["Test", "DataFrames"]

README.md

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,49 @@
11
# DBFTables
22

33
[![Build Status](https://travis-ci.org/JuliaData/DBFTables.jl.svg?branch=master)](https://travis-ci.org/JuliaData/DBFTables.jl)
4-
54
[![Coverage Status](https://coveralls.io/repos/JuliaData/DBFTables.jl/badge.svg?branch=master&service=github)](https://coveralls.io/github/JuliaData/DBFTables.jl?branch=master)
6-
75
[![codecov.io](http://codecov.io/github/JuliaData/DBFTables.jl/coverage.svg?branch=master)](http://codecov.io/github/JuliaData/DBFTables.jl?branch=master)
86

9-
For reading [.dbf](https://en.wikipedia.org/wiki/.dbf) files in Julia.
7+
Read xBase / dBASE III+ [.dbf](https://en.wikipedia.org/wiki/.dbf) files in Julia. Supports the [Tables.jl](https://github.com/JuliaData/Tables.jl) interface.
8+
9+
[Shapefile.jl](https://github.com/JuliaGeo/Shapefile.jl) uses this package to read the information associated to the geometries of the `.shp` file.
1010

11-
#### Usage
11+
## Usage
1212

1313
```julia
1414
using DBFTables
15-
io = open("test.dbf")
16-
df = DBFTables.read_dbf(io)
17-
```
15+
dbf = DBFTables.Table("test.dbf")
16+
17+
# whole columns can be retrieved by their name
18+
# note that this creates a copy, so instead of repeated `dbf.field` calls,
19+
# it is faster to once do `field = dbf.field` and then use `field` instead
20+
dbf.INTEGER # => Union{Missing, Int64}[100, 101, 102, 0, 2222222222, 4444444444, missing]
21+
22+
# example function that iterates over the rows and uses two columns
23+
function sumif(dbf)
24+
total = 0.0
25+
for row in dbf
26+
if row.BOOLEAN && !ismissing(row.NUMERIC)
27+
value += row.NUMERIC
28+
end
29+
end
30+
return total
31+
end
32+
33+
# for other functionality, convert to other Tables such as DataFrame
34+
using DataFrames
35+
df = DataFrame(dbf)
36+
```
37+
38+
## Format description resources
39+
- https://en.wikipedia.org/wiki/.dbf
40+
- https://www.clicketyclick.dk/databases/xbase/format/dbf.html
41+
- http://www.independent-software.com/dbase-dbf-dbt-file-format.html
42+
43+
## Implementation details
44+
45+
The DBF header contains information on the amount of rows, which columns are present, what type they are, and how many bytes the entries are. Based on this we can create a `Tables.Schema`. Each row is a fixed amount of bytes. All data is represented as strings, with different conventions based on the specified type. There are no delimiters between the entries, but since we know the sizes from the header, it is not needed.
46+
47+
The `DBFTables.Table` struct holds both the header and data. All data is read into memory in one go as a `Vector{UInt8}`. To provide efficient access into the individual entries, we use [WeakRefStrings](https://github.com/JuliaData/WeakRefStrings.jl/). WeakRefStrings' `StringArray` only holds the offsets and lengths into the `Vector{UInt8}` with all the data. Then we still need to convert from the string to the julia type. This is done on demand with `dbf_value`.
48+
49+
Note that the format also contains a "record deleted" flag, which is represented by a `'*'` at the start of the row. When this is encountered the record should be treated as if it doesn't exist. Since normally writers strip these records when writing, they are rarely encountered. For that reason this package ignores these flags by default right now. To check for the flags yourself, there is the `isdeleted` function. A sample file with deleted record flags is available [here](https://issues.qgis.org/issues/11007#note-30).

appveyor.yml

Lines changed: 0 additions & 34 deletions
This file was deleted.

0 commit comments

Comments
 (0)