fix typos (#1026)

musvaage · web-flow · commit b4360cc45af0 · 2022-09-16T22:35:43.000+01:00
* typos

* typo
diff --git a/README.md b/README.md
@@ -46,7 +46,7 @@ Contributions are very welcome, as are feature requests and suggestions. Please
 There are several other packages for reading CSV files in Julia, which may suit your needs better:
 
 * The standard library contains [DelimitedFiles.jl](https://docs.julialang.org/en/v1/stdlib/DelimitedFiles/), at least until Julia 1.8.
-  This returns a `Matrix` rather than a [Tables.jl](https://github.com/JuliaData/Tables.jl)-style container, thus works best for files of homogenous element type. 
+  This returns a `Matrix` rather than a [Tables.jl](https://github.com/JuliaData/Tables.jl)-style container, thus works best for files of homogeneous element type. 
   On large files, CSV.jl will be much faster.
 
 * [CSVFiles.jl](https://github.com/queryverse/CSVFiles.jl) uses the [FileIO.jl](https://github.com/JuliaIO/FileIO.jl)'s `load` / `save` API,
diff --git a/docs/src/examples.md b/docs/src/examples.md
@@ -29,7 +29,7 @@ using CSV
 
 # in this case, I have a vector of delimited data inputs that each have
 # matching schema (the same column names and types). I'd like to process all
-# of the inputs together and vertically concatenante them into one "long" table.
+# of the inputs together and vertically concatenate them into one "long" table.
 data = [
     "a,b,c\n1,2,3\n4,5,6\n",
     "a,b,c\n7,8,9\n10,11,12\n",
@@ -310,13 +310,13 @@ file = CSV.File(IOBuffer(data); transpose=true)
 ```julia
 using CSV
 
-# here, we have several non-data rows that all beging with the "#" string
+# here, we have several non-data rows that all begin with the "#" string
 data = """
 # row describing column names
 a,b,c
 # row describing first row of data
 1,2,3
-# row describign second row of data
+# row describing second row of data
 4,5,6
 """
 
@@ -700,4 +700,4 @@ BF392,GC
 """
 
 file = CSV.File(IOBuffer(data); pool=(0.5, 2))
-```
+```
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -22,8 +22,8 @@ to load the package.
 To start out, let's discuss the high-level functionality provided by the package, which hopefully will help direct you to more specific documentation for your use-case:
 
   * [`CSV.File`](@ref): the most commonly used function for ingesting delimited data; will read an entire data input or vector of data inputs, detecting number of columns and rows, along with the type of data for each column. Returns a `CSV.File` object, which is like a lightweight table/DataFrame. Assuming `file` is a variable of a `CSV.File` object, individual columns can be accessed like `file.col1`, `file[:col1]`, or `file["col"]`. You can see parsed column names via `file.names`. A `CSV.File` can also be iterated, where a `CSV.Row` is produced on each iteration, which allows access to each value in the row via `row.col1`, `row[:col1]`, or `row[1]`. You can also index a `CSV.File` directly, like `file[1]` to return the entire `CSV.Row` at the provided index/row number. Multiple threads will be used while parsing the input data if the input is large enough, and full return column buffers to hold the parsed data will be allocated. `CSV.File` satisfies the [Tables.jl](https://github.com/JuliaData/Tables.jl) "source" interface, and so can be passed to valid sink functions like `DataFrame`, `SQLite.load!`, `Arrow.write`, etc. Supports a number of keyword arguments to control parsing, column type, and other file metadata options.
-  * [`CSV.read`](@ref): a convenience function identical to `CSV.File`, but used when a `CSV.File` will be passed direclty to a sink function, like a `DataFrame`. In some cases, sinks may make copies of incoming data for their own safety; by calling `CSV.read(file, DataFrame)`, no copies of the parsed `CSV.File` will be made, and the `DataFrame` will take direct ownership of the `CSV.File`'s columns, which is more efficient than doing `CSV.File(file) |> DataFrame` which will result in an extra copy of each column being made. Keyword arguments are identical to `CSV.File`. Any valid Tables.jl sink function/table type can be passed as the 2nd argument. Like `CSV.File`, a vector of data inputs can be passed as the 1st argument, which will result in a single "long" table of all the inputs vertically concatenanted. Each input must have identical schemas (column names and types).
-  * [`CSV.Rows`](@ref): an alternative approach for consuming delimited data, where the input is only consumed one row at a time, which allows "streaming" the data with a lower memory footrpint than `CSV.File`. Supports many of the same options as `CSV.File`, except column type handling is a little different. By default, every column type will be essentially `Union{Missing, String}`, i.e. no automatic type detection is done, but column types can be provided manually. Multithreading is not used while parsing. After constructing a `CSV.Rows` object, rows can be "streamed" by iterating, where each iteration produces a `CSV.Row2` object, which operates similar to `CSV.File`'s `CSV.Row` type where individual row values can be accessed via `row.col1`, `row[:col1]`, or `row[1]`. If each row is processed individually, additional memory can be saved by passing `reusebuffer=true`, which means a single buffer will be allocated to hold the values of only the currently iterated row. `CSV.Rows` also supports the Tables.jl interface and can also be passed to valid sink functions.
+  * [`CSV.read`](@ref): a convenience function identical to `CSV.File`, but used when a `CSV.File` will be passed directly to a sink function, like a `DataFrame`. In some cases, sinks may make copies of incoming data for their own safety; by calling `CSV.read(file, DataFrame)`, no copies of the parsed `CSV.File` will be made, and the `DataFrame` will take direct ownership of the `CSV.File`'s columns, which is more efficient than doing `CSV.File(file) |> DataFrame` which will result in an extra copy of each column being made. Keyword arguments are identical to `CSV.File`. Any valid Tables.jl sink function/table type can be passed as the 2nd argument. Like `CSV.File`, a vector of data inputs can be passed as the 1st argument, which will result in a single "long" table of all the inputs vertically concatenated. Each input must have identical schemas (column names and types).
+  * [`CSV.Rows`](@ref): an alternative approach for consuming delimited data, where the input is only consumed one row at a time, which allows "streaming" the data with a lower memory footprint than `CSV.File`. Supports many of the same options as `CSV.File`, except column type handling is a little different. By default, every column type will be essentially `Union{Missing, String}`, i.e. no automatic type detection is done, but column types can be provided manually. Multithreading is not used while parsing. After constructing a `CSV.Rows` object, rows can be "streamed" by iterating, where each iteration produces a `CSV.Row2` object, which operates similar to `CSV.File`'s `CSV.Row` type where individual row values can be accessed via `row.col1`, `row[:col1]`, or `row[1]`. If each row is processed individually, additional memory can be saved by passing `reusebuffer=true`, which means a single buffer will be allocated to hold the values of only the currently iterated row. `CSV.Rows` also supports the Tables.jl interface and can also be passed to valid sink functions.
   * [`CSV.Chunks`](@ref): similar to `CSV.File`, but allows passing a `ntasks::Integer` keyword argument which will cause the input file to be "chunked" up into `ntasks` number of chunks. After constructing a `CSV.Chunks` object, each iteration of the object will return a `CSV.File` of the next parsed chunk. Useful for processing extremely large files in "chunks". Because each iterated element is a valid Tables.jl "source", `CSV.Chunks` satisfies the `Tables.partitions` interface, so sinks that can process input partitions can operate by passing `CSV.Chunks` as the "source".
   * [`CSV.write`](@ref): A valid Tables.jl "sink" function for writing any valid input table out in a delimited text format. Supports many options for controlling the output like delimiter, quote characters, etc. Writes data to an internal buffer, which is flushed out when full, buffer size is configurable. Also supports writing out partitioned inputs as separate output files, one file per input partition. To write out a `DataFrame`, for example, it's simply `CSV.write("data.csv", df)`, or to write out a matrix, it's `using Tables; CSV.write("data.csv", Tables.table(mat))`
   * [`CSV.RowWriter`](@ref): An alternative way to produce csv output; takes any valid Tables.jl input, and on each iteration, produces a single csv-formatted string from the input table's row.
diff --git a/src/README.md b/src/README.md
@@ -18,7 +18,7 @@ So the general strategy is to get the overall `CSV.Context` for a delimited file
 
 CSV.jl provides a native integration with the [PooledArrays.jl](https://github.com/JuliaData/PooledArrays.jl/) package, which provides an array storage optimization by having a (hopefully) small pool of "expensive" (big or heap-allocated, or whatever) values, along with a memory-efficient integer array of "refs" where each ref maps to one of the values in the pool. This is sometimes referred to as a "dictionary encoding" in various data formats. As an example, if you have a column with 1,000,000 elements, but only 10 unique string values, you can have a `Vector{String}` pool to store the 10 unique strings and give each a unique `UInt32` value, and a `Vector{UInt32}` "ref array" for the million elements, where each element just indexes into the pool to get the actual string value.
 
-By providing the `pool` keyword argument, users can control how this optimization will be applied to individual columns, or to all columns of the delimted text being read.
+By providing the `pool` keyword argument, users can control how this optimization will be applied to individual columns, or to all columns of the delimited text being read.
 
 Valid inputs for `pool` include:
   * A `Bool`, `true` or `false`, which will apply to all string columns parsed; string columns either will _all_ be pooled, or _all_ not pooled
diff --git a/src/detection.jl b/src/detection.jl
@@ -504,7 +504,7 @@ function detecttranspose(buf, pos, len, options, @nospecialize(header), skipto,
         # column names span several columns
         throw(ArgumentError("not implemented for transposed csv files"))
     elseif pos > len
-        # emtpy file, use column names if provided
+        # empty file, use column names if provided
         datapos = pos
         columnpositions = Int[]
         endpositions = Int[]
diff --git a/src/rows.jl b/src/rows.jl
@@ -5,7 +5,7 @@ end
 # structure for iterating over a csv file
 # no automatic type inference is done, but types are allowed to be passed
 # for as many columns as desired; `CSV.detect(row, i)` can also be used to
-# use the same inference logic used in `CSV.File` for determing a cell's typed value
+# use the same inference logic used in `CSV.File` for determining a cell's typed value
 struct Rows{IO, customtypes, V, stringtype}
     name::String
     names::Vector{Symbol} # only includes "select"ed columns
diff --git a/src/utils.jl b/src/utils.jl
@@ -568,7 +568,7 @@ end
 Base.getindex(x::Arg) = x.x
 
 macro refargs(ex)
-    ex isa Expr || throw(ArgumentError("must pass an expressiong to @refargs"))
+    ex isa Expr || throw(ArgumentError("must pass an expression to @refargs"))
     (ex.head == :call || ex.head == :function) || throw(ArgumentError("@refargs ex must be function call or definition"))
     if ex.head == :call
         for i = 2:length(ex.args)
diff --git a/test/testfiles.jl b/test/testfiles.jl
@@ -366,7 +366,7 @@ testfiles = [
         NamedTuple{(:col1,),Tuple{Int}},
         (col1 = [1],)
     ),
-    ("transposed_emtpy.csv", (transpose=true,),
+    ("transposed_empty.csv", (transpose=true,),
         (0, 1),
         NamedTuple{(:col1,),Tuple{Missing}},
         (col1 = Missing[],)
diff --git a/test/testfiles/transposed_empty.csv b/test/testfiles/transposed_empty.csv