Skip to content

Commit 8b828fd

Browse files
committed
autoencoding with rchardet
1 parent 03b9131 commit 8b828fd

File tree

4 files changed

+92
-60
lines changed

4 files changed

+92
-60
lines changed

README.md

Lines changed: 34 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ActiveAdminImport
2-
The most fastest and efficient CSV import for Active Admin (based on activerecord-import gem)
3-
with support of validations and bulk inserts
2+
The most fastest and efficient CSV import for Active Admin
3+
with support of validations, bulk inserts and encodings handling
44

55

66

@@ -35,32 +35,16 @@ And then execute:
3535
$ bundle
3636

3737

38-
39-
#Why yet another import for ActiveAdmin ? Now with activerecord-import ....
40-
41-
<p>Because plain-vanilla, out-of-the-box ActiveRecord doesn’t provide support for inserting large amounts of data efficiently</p>
42-
43-
Features of activerecord-import
44-
45-
<ol>
46-
<li>activerecord-import can perform validations (fast)</li>
47-
<li>activerecord-import can perform on duplicate key updates (requires mysql)</li>
48-
</ol>
49-
50-
51-
52-
53-
5438
# active_admin_import features
5539
<ol>
56-
<li>Encoding handling</li>
57-
<li>Preview before importing (Example 2)</li>
40+
<li> Replacements (Ex 2)</li>
41+
<li> Encoding handling (Ex 4, 5)</li>
5842
<li> CSV options</li>
5943
<li> Ability to prepend CSV headers automatically</li>
60-
<li>Bulk import (activerecord-import)</li>
61-
<li>Callbacks</li>
62-
<li>Zip files</li>
63-
<li>more...</li>
44+
<li> Bulk import (activerecord-import)</li>
45+
<li> Callbacks</li>
46+
<li> Zip files</li>
47+
<li> more...</li>
6448
</ol>
6549

6650

@@ -151,22 +135,33 @@ Features of activerecord-import
151135
end
152136
```
153137

154-
#### Example4 Importing without forcing to UTF-8 and disallow archives
138+
#### Example4 Importing ISO-8859-1 encoded file and disallow archives
155139

156140

157141
```ruby
158142
ActiveAdmin.register Post do
159143
active_admin_import validate: true,
160144
template_object: ActiveAdminImport::Model.new(
161-
hint: "file will be encoded to ISO-8859-1",
145+
hint: "file encoded in ISO-8859-1",
162146
force_encoding: "ISO-8859-1",
163147
allow_archive: false
164148
)
165149
end
166150
```
167151

152+
#### Example5 Importing file with unknown encoding and autodetect it
153+
168154

169-
#### Example5 Callbacks for each bulk insert iteration
155+
```ruby
156+
ActiveAdmin.register Post do
157+
active_admin_import validate: true,
158+
template_object: ActiveAdminImport::Model.new(
159+
force_encoding: :auto
160+
)
161+
end
162+
```
163+
164+
#### Example6 Callbacks for each bulk insert iteration
170165

171166

172167
```ruby
@@ -187,7 +182,7 @@ Features of activerecord-import
187182
end
188183
```
189184

190-
#### Example6 dynamic CSV options, template overriding
185+
#### Example7 dynamic CSV options, template overriding
191186

192187
- put overrided template to ```app/views/import.html.erb```
193188

@@ -228,10 +223,18 @@ Features of activerecord-import
228223
end
229224
```
230225

231-
#Links
232-
https://github.com/gregbell/active_admin
226+
## Dependencies
227+
228+
Tool | Description
229+
--------------------- | -----------
230+
[rchardet] | Character encoding auto-detection in Ruby. As smart as your browser. Open source.
231+
[activerecord-import] | Powerful library for bulk inserting data using ActiveRecord.
232+
233+
[rchardet]: https://github.com/jmhodges/rchardet
234+
[activerecord-import]: https://github.com/jmhodges/rchardet
235+
236+
233237

234-
https://github.com/zdennis/activerecord-import
235238

236239

237240

active_admin_import.gemspec

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Gem::Specification.new do |gem|
1717

1818

1919
gem.add_runtime_dependency 'activerecord-import', '~> 0.7.0'
20+
gem.add_runtime_dependency 'rchardet', '~> 1.5.0'
2021

2122
gem.add_runtime_dependency 'rubyzip', '~> 1.0', '>= 1.0.0'
2223
gem.add_dependency "rails", ">= 4.0"

lib/active_admin_import/model.rb

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# encoding: utf-8
22

3+
require 'rchardet'
4+
35
module ActiveAdminImport
46
class Model
57

@@ -132,7 +134,7 @@ def define_methods_for(attr_name)
132134
end
133135

134136
def encode(data)
135-
data = data.force_encoding(force_encoding) if force_encoding?
137+
data = content_encode(data) if force_encoding?
136138
data = data.encode('UTF-8',
137139
invalid: :replace, undef: :replace)
138140
begin
@@ -142,6 +144,23 @@ def encode(data)
142144
end
143145
end
144146

147+
def detect_encoding?
148+
force_encoding == :auto
149+
end
150+
151+
def dynamic_encoding(data)
152+
CharDet.detect(data)['encoding']
153+
end
154+
155+
def content_encode(data)
156+
encoding_name = if detect_encoding?
157+
dynamic_encoding(data)
158+
else
159+
force_encoding.to_s
160+
end
161+
data.force_encoding(encoding_name)
162+
end
163+
145164
class <<self
146165
def define_set_method(attr_name)
147166
define_method(attr_name) { self.attributes[attr_name] } unless method_defined? attr_name

spec/import_spec.rb

Lines changed: 37 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,28 @@
22

33
describe 'import', type: :feature do
44

5+
shared_examples 'successful inserts' do |encoding, csv_file_name|
6+
let(:options) do
7+
attributes = { force_encoding: encoding }
8+
{ template_object: ActiveAdminImport::Model.new(attributes) }
9+
end
10+
11+
before do
12+
upload_file!(csv_file_name)
13+
end
14+
15+
it "should import file with many records" do
16+
expect(page).to have_content "Successfully imported 2 authors"
17+
expect(Author.count).to eq(2)
18+
Author.all.each do |author|
19+
expect(author).to be_valid
20+
expect(author.name).to be_present
21+
expect(author.last_name).to be_present
22+
end
23+
end
24+
25+
end
26+
527
def with_zipped_csv(name, &block)
628

729
zip_file = File.expand_path("./spec/fixtures/files/#{name}.zip")
@@ -25,7 +47,6 @@ def upload_file!(name, ext='csv')
2547
context "authors index" do
2648
before do
2749
add_author_resource
28-
2950
end
3051

3152
it "should navigate to import page" do
@@ -81,16 +102,14 @@ def upload_file!(name, ext='csv')
81102
end
82103

83104
context "with hint defined" do
84-
let(:options) {
85-
{template_object: ActiveAdminImport::Model.new(hint: "hint")}
86-
}
105+
let(:options) do
106+
{ template_object: ActiveAdminImport::Model.new(hint: "hint") }
107+
end
87108
it "renders hint at upload page" do
88109
expect(page).to have_content options[:template_object].hint
89110
end
90-
91111
end
92112

93-
94113
context "when importing file" do
95114

96115
[:empty, :only_headers].each do |file|
@@ -111,26 +130,16 @@ def upload_file!(name, ext='csv')
111130
end
112131
end
113132

114-
context "Win1251" do
115-
let(:options) do
116-
attributes = { force_encoding: "Windows-1251" }
117-
{ template_object: ActiveAdminImport::Model.new(attributes) }
118-
end
119-
120-
before do
121-
upload_file!(:authors_win1251_win_endline)
122-
end
123-
124-
it "should import file with many records" do
125-
expect(page).to have_content "Successfully imported 2 authors"
126-
expect(Author.count).to eq(2)
127-
Author.all.each do |author|
128-
expect(author).to be_valid
129-
expect(author.name).to be_present
130-
expect(author.last_name).to be_present
131-
end
132-
end
133+
context "auto detect encoding" do
134+
include_examples 'successful inserts',
135+
:auto,
136+
:authors_win1251_win_endline
137+
end
133138

139+
context "Win1251" do
140+
include_examples 'successful inserts',
141+
'windows-1251',
142+
:authors_win1251_win_endline
134143
end
135144

136145
context "BOM" do
@@ -205,7 +214,7 @@ def upload_file!(name, ext='csv')
205214

206215

207216
context "without validation" do
208-
let(:options) { {validate: false} }
217+
let(:options) { { validate: false } }
209218
it "should render error" do
210219
upload_file!(:author_invalid)
211220
expect(page).to have_content "Successfully imported 1 author"
@@ -248,7 +257,7 @@ def upload_file!(name, ext='csv')
248257
context "with different header attribute names" do
249258

250259
let(:options) do
251-
{ headers_rewrites: { :'Second name' => :last_name } }
260+
{ headers_rewrites: {:'Second name' => :last_name} }
252261
end
253262

254263
it "should import file" do
@@ -299,7 +308,7 @@ def upload_file!(name, ext='csv')
299308
end
300309

301310
context "with invalid options" do
302-
let(:options) { {invalid_option: :invalid_value} }
311+
let(:options) { { invalid_option: :invalid_value } }
303312

304313
it "should raise TypeError" do
305314
expect { add_author_resource(options) }.to raise_error(ArgumentError)

0 commit comments

Comments
 (0)