Skip to content

Commit b36024f

Browse files
committed
Enhanced delimiter sniffer
1 parent 51e3af6 commit b36024f

File tree

3 files changed

+30
-5
lines changed

3 files changed

+30
-5
lines changed

src/CSVSniffer.cls

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -496,18 +496,18 @@ Private Function RecordScore(ByRef strArray As Variant) As Double
496496
Case FieldDataType.Known
497497
tmpSUM = tmpSUM + 100
498498
Case Else
499-
tmpSUM = tmpSUM + 0.1 '20
499+
tmpSUM = tmpSUM + 0.1
500500
End Select
501501
Next L0
502-
RecordScore = (tmpSUM / FieldsCount)
502+
RecordScore = (tmpSUM ^ 2) / (100 * FieldsCount ^ 2)
503503
End Function
504504
''' <summary>
505505
''' Calculates a factor for table scoring based in the standard
506506
''' deviation of the number of fields contained in the specified
507507
''' array list.
508508
''' </summary>
509509
''' <param name="ArrayList">CSV array list.</param>
510-
Private Function RecordsConsistencyFactor(ArrayList As CSVArrayList) As Double
510+
Private Function RecordsConsistencyFactor(ArrayList As CSVArrayList, cScore As Double) As Double
511511
Dim AvgFields As Double
512512
Dim CumulativeDiff As Double
513513
Dim L0 As Long
@@ -523,7 +523,7 @@ Private Function RecordsConsistencyFactor(ArrayList As CSVArrayList) As Double
523523
Else
524524
tmpResult = (CumulativeDiff / ArrayList.count)
525525
End If
526-
RecordsConsistencyFactor = 1 / (1 + tmpResult ^ 0.5)
526+
RecordsConsistencyFactor = (1 / (1 + tmpResult ^ 0.5)) * cScore / (100 * ArrayList.count)
527527
End Function
528528
''' <summary>
529529
''' Calculates a score for the imported data based on the congruence
@@ -541,7 +541,11 @@ Public Function TableScore(ByRef ArrayList As CSVArrayList) As Double
541541
For L0 = 0 To ArrayList.count - 1
542542
SumRecScores = SumRecScores + RecordScore(ArrayList(L0))
543543
Next L0
544-
TableScore = RecordsConsistencyFactor(ArrayList) * SumRecScores / ArrayList.count
544+
If ArrayList.count > 1 Then
545+
TableScore = RecordsConsistencyFactor(ArrayList, SumRecScores) * SumRecScores / ArrayList.count
546+
Else
547+
TableScore = RecordsConsistencyFactor(ArrayList, SumRecScores) * SumRecScores / 2
548+
End If
545549
End If
546550
End If
547551
End Function
107 KB
Binary file not shown.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
=== Delimiters guessing test ===
2+
+ Mixed comma and semicolon
3+
+ File with multi-line field
4+
+ Optional quoted fields
5+
+ Mixed comma and semicolon - file B
6+
+ Geometric CSV
7+
+ Table embedded in the last record
8+
+ Table embedded in the second record
9+
+ Multiple commas in fields
10+
+ Uncommon char as field delimiter
11+
+ Wrong delimiters have been added to guessing operation
12+
+ FEC data - [clevercsv issue #15]
13+
+ Mixed comma and colon - [clevercsv issue #35]
14+
+ Json data type - [clevercsv issue #37]
15+
+ Undefined field delimiter
16+
+ Rainbow CSV [issue #92]
17+
+ Pipe character is more frequent than the comma
18+
+ Pipe character is more frequent than the semicolon
19+
+ Short pipe separated table embedded
20+
= PASS (18 of 18 passed) = 19/1/2024 7:06:07 p.�m. =
21+

0 commit comments

Comments
 (0)