Skip to content

chadmichael/homework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HOWTO Run FlowCaptureProcessor

  * Extract the zip archive
  * Execute one of the scripts in /bin, passing a single argument that points to a flow_capture
  
      chadmichael@heraclitus:$ ./bin/netflowprocessor /home/chadmichael/netflow/flow_capture
       
  * processor will intake the flow capture and ingest all packets 
  * processor will display all packets with records on the console, with a summary at the end

-------------
  
Design / Scale Considerations

I tried to make efficient use of memory and IO on a single thread.  Storing values in the smallest JVM type that would hold the unsigned values, as
well as using buffered input.  Aggregating the packets was of concern, but Vector's are supposed to be very fast for appending ( whereas Lists 
suck at appending in Scala).  I think I was pretty successful in writing all the core logic in a pure functional fashion.  

Things I would consider moving forward with this project:

1) Parallelism for scaling up.  The question is what the unit of work would be to parallelize on.  In this case, you might want to 
   parallel the efforts on processing of individual packets, but since you have to read the header to know where the next packet begins,
   it would be non trivial to chunk that out; seems like it would require stateful coordination between workers.  Maybe a pre-processor 
   that locates the packet boundaries, but that doesn't seem right either.  
   
   Maybe the file itself, the flow_capture, would be a good thing to parallelize on.  If there were more than one capture file they could easily
   be processed in parallel. 
   
2) Some asynchronous IO might also help, but would require some thinking about how to design for that.  If the flow capture files were quite 
   large, perhaps there could be a layer that asynchronously reads some chunks size from the capture files with the Future containing the
   IO operation triggering, on completion, the processing logic I have created, applied to each chunk parallel.    


---------------

TODOS

If I were going to spend more time on this ...

1) flesh out the test coverage
2) test the performance against larger files and adjust the buffer size and generally research more about efficiency in the IO

   

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages