@@ -12,7 +12,8 @@ CSV Lint plug-in documentation
1212
1313Use the ** CSV Lint** plug-in to quickly and easily inspect csv data files,
1414apply syntax highlighting to columns, detect technical errors and fix datetime
15- and decimal formatting. It's not meant as a replacement for a spreadsheet
15+ and decimal formatting. It's not meant as a replacement for a
16+ [ spreadsheet] ( https://www.reddit.com/r/datascience/comments/1dsnbww/youre_not_helping_excel_please_stop_helping/ )
1617program, but rather it's a quality control tool to examine, verify or polish up
1718a dataset before further processing.
1819
@@ -138,6 +139,8 @@ The plug-in expects the column character positions, but if you instead enter
138139the individual column widths, so ` 10, 6, 5 ` in this example, that will also
139140work in most cases. Leave "Fixed positions" empty and the plug-in will
140141try to detect fixed width columns same as auto-detect.
142+ Click the ` [..] ` button to paste the current column widths into the textbox,
143+ either as absolute column positions or right-click for individual column widths.
141144
142145You can select the "Skip lines" option and the column detection process will skip the first X lines of the file.
143146For example when the first line of data (including header names) starts on
@@ -523,49 +526,58 @@ In this example the seventh column has a header name `TestStage`,
523526and it only contains the values ` Recovery ` , ` Training ` and ` Warmup ` and empty values.
524527The amount of each of these 3 values is listed under "Unique values".
525528
526- Count unique values
527- -------------------
528- Count unique values, this will list all unique values in a column, or
529+ Select Columns
530+ --------------
531+ Select columns and/or put columns in a different order.
532+ Select one or more columns.
533+
534+ ![ CSV Lint select columns dialog] ( /docs/csvlint_select_columns.png?raw=true " CSV Lint plug-in select columns dialog ")
535+
536+ Check the ` Select distinct values ` checkbox to list all unique values in a column, or
529537combination of columns, and count how often that unique value or combination
530538of values was found. This can be useful to check if the dataset contains the
531539expected amount of unique names, patients, product codes, barcodes etc.
532540
533- ![ CSV Lint unique values dialog] ( /docs/csvlint_unique_values.png?raw=true " CSV Lint plug-in unique values dialog ")
534-
535541As an example, if you have a data file where each line is one blood pressure
536542measurement of a participant, and you want to verify that each participant in
537543the data file has exactly 3 measurements. In that case you can select just the
538544column participantId and select sort by ` count ` , to sort the result by the new
539- ` count_unique ` column.
545+ ` count_distinct ` column.
540546
541- If the data is correct, it should list all participantId with a ` count_unique `
542- value of 3. And, because it's sorted by ` count_unique ` , you can check the
547+ If the data is correct, it should list all participantId with a ` count_distinct `
548+ value of 3. And, because it's sorted by ` count_distinct ` , you can check the
543549beginning and end of the list to see if there are any participants with fewer
544550or more than 3 measurements.
545551
546- When you disable sorting, the resulting list of values will be in the order as
547- the values were first found in the dataset.
552+ Right-click Ascending or Descing to disable sorting, the resulting list
553+ of values will be in the order as the values were first found in the dataset.
548554
549555Convert data
550556------------
551557Convert the currently selected CSV file to SQL, XML or JSON format.
552558
553559![ CSV Lint Convert data dialog] ( /docs/csvlint_convert_data.png?raw=true " CSV Lint plug-in Convert data dialog ")
554560
561+ Select XML or JSON to convert the data to an XML or JSON dataset.
562+ The plug-in will automatically apply formatting based on the metadata,
563+ as well as applying character escaping where needed for these formats.
564+ For XML, enter a ` Table/tag name ` to use as tag name for each record,
565+ or leave it empty to use the current filename.
566+
555567Select SQL to convert the data to an SQL script to create a database
556568table and inserts all records from the csv datafile into that table.
557569The insert statement will be grouped in batches of X lines of csv data,
558570as set by the Batch size number in the plug-in Settings.
559571
560- Depending on which database type you select, MySQL, MS-SQL or PostgreSQL,
561- the create table part and the autonumber field ` _record_number ` will be
562- slightly different. Enter a table name to use, or leave it empty to use the
563- current filename as table name.
572+ Depending on which database type you select, MySQL/MariaDB , MS-SQL or
573+ PostgreSQL, the create table part and the autonumber field ` _record_number `
574+ will be slightly different. Enter a table name to use, or leave it empty to
575+ use the current filename as table name.
564576
565577See below for an example of an SQL insert script the plugin will generate:
566578
567579 -- -------------------------------------
568- -- CSV Lint plug-in: v0.4.6.8
580+ -- CSV Lint plug-in: v0.4.7
569581 -- File: cardio.txt
570582 -- SQL type: MySQL
571583 -- -------------------------------------
@@ -585,16 +597,26 @@ See below for an example of an SQL insert script the plugin will generate:
585597 visitdat,
586598 labpth
587599 ) VALUES
588- (1001, '2025-08-21', 10.8),
589- (2002, '2025-09-05', 143.5),
590- (3003, '2025-09-24', 76.4),
600+ (1001, '2025-08-21 00:00:00 ', 10.8),
601+ (2002, '2025-09-05 00:00:00 ', 143.5),
602+ (3003, '2025-09-24 00:00:00 ', 76.4),
591603 -- etc.
592604
593- Select XML or JSON to convert the data to an XML or JSON dataset.
594- The plug-in will automatically apply formatting based on the metadata,
595- as well as applying character escaping where needed for these formats.
596- For XML, enter a ` Table/tag name ` to use as tag name for each record,
597- or leave it empty to use the current filename.
605+ Note: Oracle is not supported as Database type, but you can work around this by
606+ using Database type ` MySQL / MariaDB ` and setting the Batch size to ` 1 ` .
607+ If there are any Date or DateTime values, then add this line
608+ at the top of the script.
609+
610+ ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';
611+
612+ Or, alternatively, use Notepad++ Search and Replace to apply ` TO_DATE() `
613+ to any datetime values, so in the resulting INSERT script press ` Ctrl + H `
614+ and then:
615+
616+ Find what: '(\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2})'
617+ Replace with: TO_DATE\('\1', 'YYYY-MM-DD HH24:MI:SS'\)
618+ Search mode: Regular expression
619+ -> [Replace all]
598620
599621Generate metadata
600622-----------------
@@ -717,5 +739,6 @@ History
71773916-dec-2023 - v0.4.6.6 PowerShell support and various updates
71874025-jun-2024 - v0.4.6.7 Reformat bugfix, improved enumeration, sort on length
71974128-feb-2025 - v0.4.6.8 Improved sorting for enumeration columns, minor updates
742+ 08-aug-2025 - v0.4.7 Select columns, large files warning, Fixed Width detection improved, minor updates
720743
721744BdR©2019-2025 Free to use - send questions or comments: Bas de Reuver - bdr1976@gmail.com
0 commit comments