Skip to content

Commit 333f2fb

Browse files
committed
Updated Readme.md with example use
Add an example use run through for the program to the Readme.md
1 parent 728e31a commit 333f2fb

File tree

1 file changed

+89
-35
lines changed

1 file changed

+89
-35
lines changed

README.md

Lines changed: 89 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -3,28 +3,28 @@
33

44
## Application Summary
55

6-
`csv2sql` is a small simple program specifically designed to quickly convert a [comma separated value (CSV)](http://en.wikipedia.org/wiki/Comma-separated_values) file into simple [structured query language (SQL)](http://en.wikibooks.org/wiki/Structured_Query_Language) statements, which can then be used as an import source for a [SQLite](http://www.sqlite.org/) database.
6+
`csv2sql` is a small simple program specifically designed to quickly convert a [comma separated value (CSV)](http://en.wikipedia.org/wiki/Comma-separated_values) file into simple [structured query language (SQL)](http://en.wikipedia.org/wiki/SQL) statements, which can then be used as an import source for a [SQLite](http://www.sqlite.org/) database.
77

88
## About CSV2SQL
99

10-
The program was originally created to speed up the process of checking and then importing large (often greater than 1GB) CSV files into SQLite databases. The data would vary quite a bit, often being sourced from multiple financial, audit, billing and business support corporate computer systems—so there was no consistency in the CSV file formats provided, from project to project. The data was used for ad-hoc revenue assurance investigations, and often to aide recovery processes, and reporting for the associated projects.
10+
The program was originally created to speed up the process of checking and then importing large (often greater than 1GB) CSV files into SQLite databases. The data would vary quite a bit, often being sourced from multiple financial, audit, billing and business support corporate computer systems—so there was no consistency in the CSV file formats provided, from project to project. The data was used for ad-hoc revenue assurance investigations and analysis, to aide recovery processes, and often in reporting for the associated wider projects.
1111

12-
The different data sources (starting as CSV files—as data was extracted from business system by a different teams) would be loaded into an ad-hoc SQLite database as tables, and then analysed with the benefit of SQL, and sometimes in later stages, scripts to produce recovery data and reports on the more complex projects. The work often required a quick turn around—so any tools that could provide increased efficiency, but still maintain integrity (or even increase the integrity checking) became key. Having simple tools to improve the work flow, and produce consistent repeatable results was very important!
12+
The different data sources (starting as CSV files—as data was extracted from business systems by a different team) would be loaded into an ad-hoc SQLite database as tables, and then analysed with the benefit of SQL, and sometimes in later stages, through the use of scripts—mainly on the more complex projects. The work often required a quick turn around—so any tools that could provide increased efficiency, but still maintain integrity (or even increase the integrity checking) became key. Having simple tools to improve the work flow, and produce consistent repeatable results was very important!
1313

14-
The `csv2sql` tool was created to quickly integrity check the source CSV file, report on it size (simple stats), and convert it into a text file that contained simple SQL statements. The simple SQL statements purpose was to both create a new database table to hold the CSV file data, and then to also insert the data directly into that new database table. These steps can be done by SQLite also, as it can directly import CSV files—but there was a wish to separately prepare and managed the CSV data files, prior to involving the SQL database. This added the benefit of a simple additional integrity step, and put all the source CSV files data into a known state, and file format, prior to them being used with SQLite.
14+
The `csv2sql` tool was created to quickly integrity check the source CSV file, report on its size (simple stats), and convert it into a text file that contained simple SQL statements. The simple SQL statements purpose was to both create a new database table to hold the CSV file data, and then to also insert the data directly into that new database table. These steps can be done by SQLite also, as it can directly import CSV files—but there was a wish to separately prepare and managed the CSV data files, prior to involving the SQL database. This added the benefit of a simple additional integrity step, and put all the source CSV files data into a known state, and file format, prior to them being used with SQLite.
1515

1616
So key requirements were:
1717

18-
- be very fast—the source CSV files are often large, and therefore speed to process then quickly was important
19-
- check the CSV file contents (integrity check)—if there are any discrepancies found they are reported in a helpful way—so the CSV file can be fixed quickly
20-
- the output coveted data, in SQL format, should be consistent and basic
21-
- the data should remain in text format to allow future access, or use with other text manipulation tools if needed
22-
- the SQL statement format should be as simple as possible to reduce complexity—and as SQLite treats all data as text by default—this approach was adopted. Casting using SQL can be used then if needed at a later stage—or it was handled by high-level scripting languages
23-
- create consistently formatted SQL table column names (ie without spaces or 'strange' characters)—to allow future reference to the columns easily when constructing new queries. Different source computer systems (and their databases) had some very varied approaches to characters and formatting used!
24-
- should be cross platform if possible—so it can be used on any computer system so one toll works everywhere
18+
- be very fast—the source CSV files are often large, and therefore speed to process them quickly was important;
19+
- check the CSV file contents (integrity check)—if there are any discrepancies found they are reported in a helpful way—so the CSV file can be fixed quickly;
20+
- the output (converted data), now in SQL format, should be consistent and basic;
21+
- the data should remain in text format to allow future access, or use with other text manipulation tools if needed;
22+
- the SQL statement format should be as simple as possible to reduce complexity—and as SQLite treats all data as text by default—this approach was adopted. Casting using SQL can be used then if needed at a later stage—or it was handled by high-level scripting languages;
23+
- create consistently formatted SQL table column names (ie without spaces or 'strange' characters)—to allow future reference to the columns easily when constructing new queries. Different source computer systems (and their databases) had some very varied approaches to characters and formatting used;
24+
- should be cross platform if possible—so it can be used on any computer system so one tool works everywhere;
2525
- should be command line based to reduce development time—and keep it simple to use. SQLite was used via the command line anyway (either directly of via scripts)—so continuing this approach was chosen.
2626

27-
A few different approaches were tried over time (using tcl, Python, and c) none of which were bad at the job—however the application was ported over to Go (golang)—and it immediately benefited as Go has a great built-in CSV file handling (as well as other formats), and the speed to process the file was impressive too. It might not be as fast as c, or it might not be as simple to understand as tcl or Pyhton code wise at first, but overall it suited my requirements best! Go also supports UTF8 characters without extra work, and was cross platform too!
27+
A few different approaches were tried over time (using [tcl](http://en.wikipedia.org/wiki/Tcl), [Python](http://en.wikipedia.org/wiki/Python_(programming_language)), and [c](http://en.wikipedia.org/wiki/C_(programming_language))) none of which were bad at the job—however the application was ported over to [Go](http://en.wikipedia.org/wiki/Go_(programming_language)) (golang)—and it immediately benefited as Go has a great built-in CSV file handling (as well as other formats), and the speed to process the file was impressive too. It might not be as fast as c, or it might not be as simple to understand as tcl or Pyhton code wise at first, but overall it suited my requirements best. Go also supports UTF8 characters without extra work, and was cross platform too!
2828

2929
Key features of `csv2sql` include:
3030

@@ -37,16 +37,15 @@ throughout the rest of the file too.
3737
database table.
3838

3939
- Any spaces or the following characters `| - + @ # / \ : ( ) '`
40-
found in the header line of you CSV file, will be replaced when they are
40+
found in the header line of your CSV file, will be replaced when they are
4141
used as the subsequent column names for your new SQLite table. These
4242
characters will be replaced with the underscore character (ie '_'). These
4343
changes only apply to the header line, and are carried out to avoid SQL
4444
syntax import issues, and make any future SQL statements referencing these
4545
column names easier to construct. This default feature can be disabled by
4646
using the command line parameter `-k=true` if you wish.
4747

48-
- You choose and specify the table name the CSV file contents will be
49-
imported into in your SQLite database when you run the program.
48+
- You choose and specify the table name the CSV file contents will be imported into in your SQLite database. This is done with the `-t tablename` parameter when you run the program.
5049

5150
- The output file is a plain text file. It just contains the SQL commands
5251
that are used by SQLite to create and then insert all your data into your
@@ -72,25 +71,25 @@ Further details of each of these command line options is below:
7271
This enables debug output when the program is run—so it prints additional information to the screen while it is running. This additional output might be useful to better understand what the application is doing—or to pin point where in the program a problem is occurring. For normal use it is not needed—so is turn off by default (ie `-d=false`)
7372

7473
- **CSV INPUT FILENAME:** `-f filename.csv` [MANDATORY]
75-
This command line parameter is required to allow the program to run properly, so is mandatory for successful use. It specifies the name of the input CSV file, that will be used as the source data by the program to check and convert in to SQL, ready for import into an SQLite database table. The `filename.csv` as shown in the example should be replaced with the name of your actuals source CSV file. If you need to include the path to the CSV file, and it contains any spaces (or special characters)—you should wrap the filename and path in quotes: `"/data-disk/datastore one/my_csv-data.csv"` or `"c:\Users\Fred Jones\My Documents\my csv-file.csv"`. There is no default value for this command line parameter - so the user my provide a CSV file to use to allow the program to run.
74+
This command line parameter is required to allow the program to run properly, so is mandatory for successful use. It specifies the name of the input CSV file, that will be used as the source data by the program to check and convert in to SQL, ready for import into an SQLite database table. The `filename.csv` as shown in the example should be replaced with the name of your actual source CSV file. If you need to include the path to the CSV file, and it contains any spaces (or special characters)—you should wrap the filename and path in quotes: `"/data-disk/datastore one/my_csv-data.csv"` or `"c:\Users\Fred Jones\My Documents\my csv-file.csv"`. There is no default value for this command line parameterso the user must provide a valid source CSV file to use, to allow the program to run.
7675

7776
- **ADDITIONAL HELP:** `-h` or `-h=true`
7877
This will output additional information about the program, its purpose, and explanation of its usage. It may be useful to someone who did not originally install the program, so needs to know a bit more about it. If the program is run with this option, it will exit after displaying the help output. The default is not to show the additional help screen (ie `-h=false`)
7978

8079
- **CSV HEADER CHANGES:** `-k=false` [default]
81-
By default the program will change certain characters (ie space and `| - + @ # / \ : ( ) '`) to an underscore (ie `_`) when it uses the header of the CSV file, to create the new SQL database table column names. If you want to maintain you column names as they are in your source CSV file, then use this command line parameter to disable this behaviour. By default it will make the changes, so on the command lines specify `-k=true` to override. The -k stands for 'keep'.
80+
By default the program will change certain characters (ie space and `| - + @ # / \ : ( ) '`) to an underscore (ie `_`) when it uses the header of the CSV file, to create the new SQL database table column names. If you want to maintain your column names as they are in your source CSV file, then use this command line parameter to disable this behaviour. By default it will make the changes, so on the command line specify `-k=true` to override. The -k stands for 'keep'.
8281

8382
- **TABLE NAME:** `-t tablename` [MANDATORY]
8483
This command line parameter is required to allow the program to run properly, so is mandatory for successful use. It specifies the name of the table to be created in the SQLite database when your data is imported. Change the example `tablenname` to a name of your choice, that can be used within an SQLite database. If you need to use a tablename that contains any spaces (or special characters that SQLite allows of course)—you should wrap the tablename in quotes. Examples are: `mytablename` or `"my table name"` or `my_table_name`.
8584

8685

8786
## Compiling the Program
8887

89-
Assuming you already have Go install and set-up on your computer—you only need to download the single source file '[csv2sql.go](https://github.com/wiremoons/csv2sql/blob/master/csv2sql.go)'. This can then be built using the command, assuming the `csv2sql.go` file is in you current directory:
88+
Assuming you already have Go install and set-up on your computer—you only need to download the single source file '[csv2sql.go](https://github.com/wiremoons/csv2sql/blob/master/csv2sql.go)'. This can then be built using the command below, assuming the `csv2sql.go` file is in you current directory:
9089
```
9190
go build ./csv2sql.go
9291
```
93-
There is also a Makefile that I use on a computer running Linux to cross compile the program for Linux (64 bit version), Windows (32 bit and 64 bit versions) and Mac OS X (64 bit version). This can be done (assuming you have your computer and Go set-up correctly) by also downloading the 'Makefile', and then entering:
92+
There is also a `Makefile` that I use on a computer running Linux to cross compile the program for Linux (64 bit version), Windows (32 bit and 64 bit versions) and Mac OS X (64 bit version). This can be done (assuming you have your computer and Go set-up correctly) by also downloading the 'Makefile', and then entering:
9493
```
9594
make all
9695
```
@@ -99,25 +98,80 @@ make all
9998

10099
The following binary versions are available for download. Just download the file to your computer (either the .exe. or .zip version), copy the file so it is in you current path, and then run it. You may want to rename the downloaded file to just `csv2sql` (Linux & Mac OS X) or `csv2sql.exe` (Windows) as well.
101100

102-
Linux (64 bit)
103-
[Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-linx64.zip)
104-
[Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-linx64)
101+
**Linux (64 bit)**
102+
- [Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-linx64.zip)
103+
- [Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-linx64)
105104

106-
Mac OS X (64 bit)
107-
[Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-macx64.zip)
108-
[Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-macx64)
105+
**Mac OS X (64 bit)**
106+
- [Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-macx64.zip)
107+
- [Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-macx64)
109108

110-
Mac OS X (64 bit)
111-
[Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-macx64.zip)
112-
[Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-macx64)
109+
**Mac OS X (64 bit)**
110+
- [Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-macx64.zip)
111+
- [Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-macx64)
113112

114-
Windows (64 bit)
115-
[Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-x64.zip)
116-
[Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-x64.exe)
113+
**Windows (64 bit)**
114+
- [Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-x64.zip)
115+
- [Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-x64.exe)
117116

118-
Windows (32 bit)
119-
[Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-x386.zip)
120-
[Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-x386.exe)
117+
**Windows (32 bit)**
118+
- [Binary in zip file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-x386.zip)
119+
- [Binary file](https://github.com/wiremoons/csv2sql/blob/master/binaries/csv2sql-x386.exe)
120+
121+
122+
## Example Usage
123+
124+
Below is an example of the `csv2sql` program being used to convert a CSV file called `Test.csv` and then importing the output into a SQLite database call `test.db` in a Windows Powershell session. The same commands would be used on Linux or Mac OS X:
125+
126+
```
127+
C:\Users\Simon $PS> csv2sql.exe -f .\test-data\Test.csv -t MyTestTable
128+
129+
DONE
130+
CSV file processing complete, and the new SQL file format was written to: SQL-Test.sql
131+
132+
STATS
133+
CSV file .\test-data\Test.csv has 1000 lines with 15 CSV fields per record
134+
The conversion took 17.001ms to run.
135+
136+
All is well.
137+
138+
C:\Users\Simon $PS> sqlite3 test.db
139+
140+
SQLite version 3.8.4.2 2014-03-26 18:51:19
141+
Enter ".help" for usage hints.
142+
143+
sqlite> .read SQL-Test.sql
144+
145+
sqlite> .tab
146+
MyTestTable
147+
148+
sqlite> select count(*) from MyTestTable;
149+
999
150+
151+
sqlite> select * from MyTestTable limit 1;
152+
{0C7ADEF5-878D-4066-B785-0000003ED74A}|163000|2003-02-21 00:00|UB5 4PJ|T|N|F|106||READING ROAD|NORTHOLT|NORTHOLT|EALING|GREATER LONDON|A
153+
sqlite> .mode line
154+
sqlite> select * from MyTestTable limit 1;
155+
Trans_ID = {0C7ADEF5-878D-4066-B785-0000003ED74A}
156+
Price = 163000
157+
Trans_Date = 2003-02-21 00:00
158+
PostCode = UB5 4PJ
159+
Prop_Type = T
160+
New_Build = N
161+
Hold_Type = F
162+
PAON = 106
163+
SAON =
164+
Street = READING ROAD
165+
Locality = NORTHOLT
166+
Town = NORTHOLT
167+
District = EALING
168+
County = GREATER LONDON
169+
Flag = A
170+
sqlite> .quit
171+
172+
C:\Users\Simon $PS>
173+
174+
```
121175

122176

123177
## License
@@ -126,7 +180,7 @@ The program is licensed under the "New BSD License" or "BSD 3-Clause License". A
126180

127181
## OTHER INFORMATION
128182

129-
- Latest version is kept on GitHub here: [Wiremoon GitHub](https://github.com/wiremoons)
183+
- Latest version is kept on GitHub here: [Wiremoons GitHub Pages](https://github.com/wiremoons)
130184
- The program is written in Go - more information here: [Go](http://www.golang.org/)
131185
- More information on SQLite can be found here: [SQLite](http://www.sqlite.org/)
132186
- The program was written by Simon Rowe, licensed under [New BSD License](http://opensource.org/licenses/BSD-3-Clause)

0 commit comments

Comments
 (0)