Skip to content

ohadfarkash/phone-finder

Repository files navigation

Phone Finder

A NodeJS application designed to automate the collection of phone numbers for contact enrichment. This application utilizes web scraping methods to collect data from usphonebook.com.

Installation

Prerequisites

NodeJS is required to run this applications. It can be downloaded here.

Downloading the applicaiton

Start by downloading the application, which can be done by either downloading the zip file or cloning the repository.

Once the application files are loaded to a directory, the user must double click on the setup.bat file to complete the installation process.

Usage

Making sure your workfile is correct

It is important to only process files which have the specific headers defined for use with this application. If the headers differ in any way (even in case, or having extra space before or after), the application may behave in unpredictable ways. Because of this, a template file has been included in the documentation for this application. It can be found at /docs/template.xlsx. If needed, copy the headers from this template to be certain your headers are correct.

The following formats can be processed by this application:

  • XLSX (MS Excel 2010+)
  • CSV (Comma Separated Values)
  • TSV (Tab Separated Values)

How to process the file

Click and drag files into the start.bat file. This will launch the application with that file's path set as the workfile.

If you find it difficult to have both windows open, you can optionally launch start.bat without a file, by double clicking on it. Then, you can drag the file into the console and hit the Enter.

Once the file is found, the application will open a browser window and begin the scraping process. There will be an initial Cloudflare verification. You may need to manually verify this by checking the box, if it does not go through automatically. After this, the program will sense access and take over. Make some tea, sit back, relax, and peek from time to time to make sure all is well.

Suggested practices for usage

This application relies on the use of a chromium browser instance. In short, this is an internet browser application, open in such a way that we can interact with it through code. To guarantee the highest level of clearance through security measures, we must keep the browser open. Likewise, the browser application should remain in the forefront, and be given priority on the desktop environment.

To make things simple, here is a checklist:

  • The device (laptop) is plugged into a charger.
  • The browser window is on top of all the other windows.
  • Device is set to stay awake for the duration of the application's running time.

Where are the output files?

There is a folder in the root directory called output which contains all the output files. You are free to move, modify, rename, and delete them as you see fit.

Configuration

The configuration file app.config.json can be found in the root directory of the application (same folder as start.bat). It can be modified using any basic text editor application. Changes only apply the next time you run the application.

Property Default Description
output_type xlsx Output file type (by file extension)
Commonly: xlsx, tsv, csv
min_cooldown_time 20000 The minimum time to wait in milliseconds
max_cooldown_time 50000 The maximum time to wait in milliseconds

Cool Down Time

Cool down time is important, as it decreases your request rate to the site being scraped, and thus deceases your footprint on that site's security system. The program will choose, at random, some number between the min and max values in the configuration. This cool down time will be applied every time the program requests search results for a single permutation.

Further Reading

Development Docs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors