You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -37,7 +37,7 @@ For an idea of what JSONoid does, you can view [example schemas with their corre
37
37
</details>
38
38
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
39
39
40
-
## Input/Output Format
40
+
## Input/Output Format:clipboard:
41
41
42
42
JSONoid accepts [newline-delimited JSON](http://ndjson.org/) either from standard input or a file.
43
43
This means there should be exactly one JSON value per line in the input.
@@ -48,7 +48,7 @@ The generated schema will be printed [JSON Schema](https://json-schema.org/) as
48
48
Note that depending on the configuration, JSONoid will add additional properties which are not part of the JSON Schema standard.
49
49
The format is described in the [JSON Schema Profile](https://github.com/dataunitylab/json-schema-profile) draft and is subject to change..
50
50
51
-
## Running
51
+
## Running:running:
52
52
53
53
To quickly run jsonoid, you can use the Docker image which is built from the latest commit on the `main` branch.
54
54
Note that by default, jsonoid accepts [newline-delimited JSON](http://ndjson.org/) on standard input, so it will hang waiting for input.
@@ -61,14 +61,14 @@ To simplify, you may wish to add a shell alias so `jsonoid` can be run directly
61
61
alias jsonoid='docker run -i --rm michaelmior/jsonoid-discovery'
62
62
jsonoid --help
63
63
64
-
## Compiling
64
+
## Compiling:construction_worker:
65
65
66
66
To produce a JAR file which is suitable for running either locally or via Spark, run `sbt assembly`.
67
67
This requires an installation of [sbt](https://www.scala-sbt.org/).
68
68
Alternatively, you can use `./sbtx assembly` to attempt to automatically download install the appropriate sbt and Scala versions using [sbt-extras](https://github.com/dwijnand/sbt-extras).
69
69
This will produce a JAR file under `target/scala-2.13/` which can either be run directly or passed to `spark-submit` to run via Spark.
70
70
71
-
## Schema monoids
71
+
## Schema monoids:heavy_multiplication_x:
72
72
73
73
In JSONoid, the primary way information is collected from a schema is using [monoids](https://en.wikipedia.org/wiki/Monoid).
74
74
A monoid simply stores a piece of information extracted from a JSON document along with information on how to combine together information from all documents in a collection in a scalable way.
@@ -108,7 +108,7 @@ For each primitive type, the following monoids are defined.
108
108
-`LengthHistogram`, `MaxLength`, `MinLength` - Both the minimum and maximum length of strings as well as a histogram of all string lengths will be included.
109
109
-`Format` - This attempts to infer a value for the [`pattern`](https://json-schema.org/understanding-json-schema/reference/string.html#regular-expressions) keyword. A pattern is a regular expression which all string values must match. Currently this property simply finds common prefixes and suffixes of strings in the schema.
110
110
111
-
## Equivalence relations
111
+
## Equivalence relations:left_right_arrow:
112
112
113
113
The concept of equivalence relations was first introduced by Baazizi et al. in [Parametric schema inference for massive JSON datasets](https://link.springer.com/article/10.1007/s00778-018-0532-7.)
114
114
The idea is that some JSON Schemas may contain some level of variation such as optional values and multiple possible types for a given key.
@@ -185,20 +185,20 @@ The result of the reduction will be a `JsonSchema` object.
185
185
Tests can be run via [ScalaTest](https://www.scalatest.org/) via `sbt test`.
186
186
It is also possible to run fuzz tests via [Jazzer](https://github.com/CodeIntelligenceTesting/jazzer) with `./run-fuzzer.sh`.
187
187
188
-
## Reporting issues
188
+
## Reporting issues:triangular_flag_on_post:
189
189
190
190
If you encounter any issues, please open an issue on the [GitHub repository](https://github.com/dataunitylab/jsonoid-discovery).
191
191
Any potential security vulnerabilities should be [reported privately](https://github.com/dataunitylab/jsonoid-discovery/security/advisories/new).
0 commit comments