Not making use of multiple cores

I've got a file with 8M records and I'm trying to split it up into words and do a word count.  Here's my code.  When I run it, I see 4 new Ruby processes start up on my machine but only one of them shoots to 100%.  The others just sit there idle.  I don't think it's parallelizing properly.  Am I missing a configuration setting somewhere?

``` ruby
require 'ruby-spark'
Spark.config do
  set_app_name 'RubySpark'
  set_master 'local[*]'
  set 'spark.ruby.serializer', 'oj'
  set 'spark.ruby.serializer.batch_size', 2048
end
Spark.start
sc = Spark.sc

tfile = sc.text_file('work/Contact.csv')
words = tfile.flat_map('lambda { |x| x.downcase.gsub(/[^a-z]/, " ").split(" ")}')
words.count
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not making use of multiple cores #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Not making use of multiple cores #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions