Skip to content

Commit 155334c

Browse files
committed
port jbrae/MLJOpenML.jl#arfffiles to this new package OpenML
1 parent 776f559 commit 155334c

File tree

10 files changed

+132
-91
lines changed

10 files changed

+132
-91
lines changed

Project.toml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
name = "MLJOpenML"
2-
uuid = "cbea4545-8c96-4583-ad3a-44078d60d369"
3-
authors = ["Anthony D. Blaom <[email protected]>"]
4-
version = "1.2.0"
1+
name = "OpenML"
2+
uuid = "8b6db2d4-7670-4922-a472-f9537c81ab66"
3+
authors = ["Diego Arenas <[email protected]>", "Anthony D. Blaom <[email protected]>"]
4+
version = "0.1.0"
55

66
[deps]
77
ARFFFiles = "da404889-ca92-49ff-9e8b-0aa6b4d38dc8"
@@ -11,10 +11,10 @@ Markdown = "d6f4376e-aef5-505a-96c1-9c027394607a"
1111
ScientificTypes = "321657f4-b219-11e9-178b-2701a2544e81"
1212

1313
[compat]
14+
ARFFFiles = "1.3"
1415
HTTP = "0.8, 0.9"
1516
JSON = "0.21"
1617
ScientificTypes = "2"
17-
ARFFFiles = "1.3"
1818
julia = "1"
1919

2020
[extras]

README.md

Lines changed: 15 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,21 @@
1-
# MLJOpenML.jl
1+
# OpenML.jl
22

3-
| Linux | Coverage |
4-
| :-----------: | :------: |
5-
| [![Build status](https://github.com/JuliaAI/MLJOpenML.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/MLJOpenML.jl/actions)| [![codecov.io](http://codecov.io/github/JuliaAI/MLJOpenML.jl/coverage.svg?branch=master)](http://codecov.io/github/JuliaAI/MLJOpenML.jl?branch=master) |
3+
| Linux | Coverage | Documentation |
4+
| :-----------: | :------: | :-------: |
5+
| [![Build status](https://github.com/JuliaAI/OpenML.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/OpenML.jl/actions)| [![codecov.io](http://codecov.io/github/JuliaAI/OpenML.jl/coverage.svg?branch=master)](http://codecov.io/github/JuliaAI/OpenML.jl?branch=master) | [![](https://img.shields.io/badge/docs-dev-blue.svg)](https://JuliaAI.github.io/OpenML.jl/dev) |
66

7-
A package providing integration of [OpenML](https://www.openml.org) with the
8-
[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) machine
9-
learning framework.
7+
Partial implementation of the [OpenML](https://www.openml.org) API for
8+
Julia. At present this package allows querying and
9+
downloading of OpenML datasets.
1010

11-
Based entirely on Diego Arenas' original code contribution to MLJBase.jl.
11+
For further integration with the
12+
[MLJ](https://JuliaAI.github.io/MLJ.jl/dev/) machine
13+
learning framework (such as uploading MLJ runs) see
14+
[MLJOpenML.jl](https://github.com/JuliaAI/MLJOpenML.jl).
1215

1316

14-
## Installation
15-
16-
```julia
17-
using Pkg
18-
Pkg.add("MLJOpenML")
19-
```
20-
21-
## Sample usage
22-
23-
Load the iris data set from OpenML:
24-
25-
```julia
26-
using MLJOpenML
27-
table = MLJOpenML.load(61) # a Tables.DictColumnTable
28-
```
29-
30-
Convert to a `DataFrame`:
31-
32-
```julia
33-
Pkg.add("DataFrames")
34-
using DataFrames
35-
df = DataFrame(table)
36-
```
37-
38-
Browsing and filtering datasets:
39-
40-
```julia
41-
using DataFrames
42-
ds = MLJOpenML.list_datasets(output_format = DataFrame)
43-
MLJOpenML.describe_dataset(6)
44-
MLJOpenML.list_tags() # lists valid tags
45-
ds = MLJOpenML.list_datasets(tag = "OpenML100",
46-
filter = "number_instances/100..1000/number_features/1..10",
47-
output_format = DataFrame)
48-
```
49-
50-
## Documentation
51-
52-
Documentation is provided in the [OpenML
53-
Integration](https://alan-turing-institute.github.io/MLJ.jl/dev/openml_integration/)
54-
section of the
55-
[MLJManual](https://alan-turing-institute.github.io/MLJ.jl/dev/)
56-
17+
The code in this repository is based on contributions of Diego Arenas
18+
to [MLJBase.jl](https://github.com/JuliaAI/MLJBase.jl) which do not
19+
appear in the commit history of this repository.
5720

21+
Package documentation is [here](https://JuliaAI.github.io/OpenML.jl/dev).

docs/Project.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
[deps]
2+
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
23
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
4+
OpenML = "8b6db2d4-7670-4922-a472-f9537c81ab66"
5+
ScientificTypes = "321657f4-b219-11e9-178b-2701a2544e81"
36

47
[compat]
58
Documenter = "~0.26"

docs/make.jl

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
using Documenter, MLJOpenML
1+
using Documenter, OpenML, DataFrames
22

33
makedocs(
4-
modules = [MLJOpenML],
5-
sitename = "MLJOpenML.jl",
4+
modules = [OpenML,],
5+
sitename = "OpenML.jl",
66
)
77

88
deploydocs(
9-
repo = "github.com/alan-turing-institute/MLJOpenML.jl.git",
9+
repo = "github.com/JuliaAI/OpenML.jl.git",
1010
)

docs/src/index.md

Lines changed: 79 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,83 @@
1-
# MLJOpenML.jl Documentation
1+
# OpenML.jl Documentation
22

33
This is the reference documentation of
4-
[`MLJOpenML.jl`](https://github.com/alan-turing-institute/MLJOpenML.jl).
4+
[`OpenML.jl`](https://github.com/JuliaAI/OpenML.jl).
55

6-
```@autodocs
7-
Modules = [MLJOpenML]
6+
The [OpenML platform](https://www.openml.org) provides an integration
7+
platform for carrying out and comparing machine learning solutions
8+
across a broad collection of public datasets and software platforms.
9+
10+
Summary of OpenML.jl functionality:
11+
12+
- [`OpenML.list_tags`](@ref)`()`: for listing all dataset tags
13+
14+
- [`OpenML.list_datasets`](@ref)`(; tag=nothing, filter=nothing, output_format=...)`: for listing available datasets
15+
16+
- [`OpenML.describe_dataset`](@ref)`(id)`: to describe a particular dataset
17+
18+
- [`OpenML.load`](@ref)`(id; parser=:arff)`: to download a dataset
19+
20+
21+
## Installation
22+
23+
```julia
24+
using Pkg
25+
Pkg.add("OpenML")
26+
```
27+
28+
If running the demonstration below:
29+
30+
```julia
31+
Pkg.add("DataFrames")
32+
Pkg.add("ScientificTypes")
33+
```
34+
35+
## Sample usage
36+
37+
```@repl new
38+
using OpenML # or using MLJ
39+
using DataFrames
40+
41+
OpenML.list_tags()
842
```
43+
44+
Listing all datasets with the "OpenML100" tag which also have `n`
45+
instances and `p` features, where `100 < n < 1000` and `1 < p < 10`:
46+
47+
```@repl new
48+
ds = OpenML.list_datasets(
49+
tag = "OpenML100",
50+
filter = "number_instances/100..1000/number_features/1..10",
51+
output_format = DataFrame)
52+
```
53+
54+
Describing and loading one of these datasets:
55+
56+
```@repl new
57+
OpenML.describe_dataset(15)
58+
table = OpenML.load(15)
59+
```
60+
61+
Converting to a data frame:
62+
63+
```@repl new
64+
df = DataFrame(table)
65+
```
66+
67+
Inspecting it's schema:
68+
69+
```@repl new
70+
using ScientificTypes
71+
schema(table)
72+
```
73+
74+
## Public API
75+
76+
```@docs
77+
OpenML.list_tags
78+
OpenML.list_datasets
79+
OpenML.describe_dataset
80+
OpenML.load
81+
```
82+
83+

src/MLJOpenML.jl

Lines changed: 0 additions & 8 deletions
This file was deleted.

src/OpenML.jl

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
module OpenML
2+
3+
using HTTP
4+
using JSON
5+
import ARFFFiles
6+
import ScientificTypes: Continuous, Count, Textual, Multiclass, coerce, autotype
7+
using Markdown
8+
9+
export OpenML
10+
11+
include("data.jl")
12+
13+
end # module

src/openml.jl renamed to src/data.jl

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,3 @@
1-
using HTTP
2-
using JSON
3-
import ARFFFiles
4-
import ScientificTypes: Continuous, Count, Textual, Multiclass, coerce, autotype
5-
using Markdown
6-
71
const API_URL = "https://www.openml.org/api/v1/json"
82

93
# Data API
@@ -45,7 +39,7 @@ function load_Dataset_Description(id::Int; api_key::String="")
4539
end
4640

4741
"""
48-
MLJOpenML.load(id; parser = :arff)
42+
OpenML.load(id; parser = :arff)
4943
5044
Load the OpenML dataset with specified `id`, from those listed by
5145
[`list_datasets`](@ref) or on the [OpenML site](https://www.openml.org/search?type=data).
@@ -59,7 +53,7 @@ Returns a table.
5953
6054
```julia
6155
using DataFrames
62-
table = MLJOpenML.load(61);
56+
table = OpenML.load(61);
6357
df = DataFrame(table);
6458
```
6559
"""
@@ -225,7 +219,7 @@ API](https://www.openml.org/api_docs#!/data/get_data_list_filters).
225219
```
226220
julia> using DataFrames
227221
228-
julia> ds = MLJOpenML.list_datasets(
222+
julia> ds = OpenML.list_datasets(
229223
tag = "OpenML100",
230224
filter = "number_instances/100..1000/number_features/1..10",
231225
output_format = DataFrame
@@ -244,7 +238,7 @@ function list_datasets(; tag = nothing, filter = "", filters=filter,
244238
return
245239
end
246240
end
247-
data = MLJOpenML.load_List_And_Filter(filters; api_key = api_key)
241+
data = OpenML.load_List_And_Filter(filters; api_key = api_key)
248242
datasets = data["data"]["dataset"]
249243
qualities = Symbol.(union(vcat([vcat(qualitynames.(entry["quality"])...) for entry in datasets]...)))
250244
result = merge((id = Int[], name = String[], status = String[]),
@@ -292,7 +286,7 @@ Use [`list_datasets`](@ref) to browse available data sets.
292286
293287
# Examples
294288
```
295-
julia> MLJOpenML.describe_dataset(6)
289+
julia> OpenML.describe_dataset(6)
296290
Author: David J. Slate Source: UCI
297291
(https://archive.ics.uci.edu/ml/datasets/Letter+Recognition) - 01-01-1991 Please cite: P.
298292
W. Frey and D. J. Slate. "Letter Recognition Using Holland-style Adaptive Classifiers".

test/openml.jl renamed to test/data.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,18 @@ module TestOpenml
22

33
using Test
44
using HTTP
5-
using MLJOpenML
5+
using OpenML
66
import Tables.istable
77

8-
response_test = MLJOpenML.load_Dataset_Description(61)
9-
ntp_test = MLJOpenML.load(61)
8+
response_test = OpenML.load_Dataset_Description(61)
9+
ntp_test = OpenML.load(61)
1010
@test istable(ntp_test)
11-
dqlist_test = MLJOpenML.load_Data_Qualities_List()
12-
data_features_test = MLJOpenML.load_Data_Features(61)
13-
data_qualities_test = MLJOpenML.load_Data_Qualities(61)
11+
dqlist_test = OpenML.load_Data_Qualities_List()
12+
data_features_test = OpenML.load_Data_Features(61)
13+
data_qualities_test = OpenML.load_Data_Qualities(61)
1414
limit = 5
1515
offset = 8
16-
filters_test = MLJOpenML.load_List_And_Filter("limit/$limit/offset/$offset")
16+
filters_test = OpenML.load_List_And_Filter("limit/$limit/offset/$offset")
1717

1818
@testset "HTTP connection" begin
1919
@test typeof(response_test) <: Dict

test/runtests.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
include("openml.jl")
1+
include("data.jl")

0 commit comments

Comments
 (0)