Skip to content

pontouamringab68/author-finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Author Finder Scraper

Author Finder Scraper helps you discover and verify author information from web pages, including professional details and verified contact data. It streamlines author identification for research, outreach, and content attribution with reliable, structured results.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for author-finder you've just found your team — Let’s Chat. 👆👆

Introduction

This project analyzes web pages to identify authors and extract their professional and contact information. It solves the challenge of manually finding accurate author details across articles and blogs. It is built for marketers, researchers, journalists, and content teams who need trustworthy author data at scale.

Intelligent Author Discovery

  • Identifies authors from articles, blogs, and content pages
  • Verifies email addresses with confidence scoring
  • Enriches profiles with professional and social data
  • Supports bulk URL processing with controlled request rates

Features

Feature Description
Author Identification Detects authors from content pages and bylines.
Email Verification Returns validated email addresses with confidence scores.
Professional Enrichment Extracts company, position, and website details.
Social Profiles Collects linked Twitter and LinkedIn profiles.
Bulk Processing Handles multiple URLs efficiently with rate control.

What Data This Scraper Extracts

Field Name Field Description
email Verified author email address.
first_name Author first name.
last_name Author last name.
full_name Combined full name of the author.
company Company or organization affiliation.
position Professional role or title.
website_url Associated personal or company website.
country Country information when available.
twitter Twitter profile URL.
linkedin LinkedIn profile URL.
score Confidence score indicating data reliability.
verification Email validation status and timestamp.
sources Pages where the author data was found.

Example Output

[
      {
        "email": "author@example.com",
        "first_name": "John",
        "last_name": "Doe",
        "full_name": "John Doe",
        "website_url": "example.com",
        "company": "Example Corp",
        "position": "Content Writer",
        "country": "US",
        "twitter": "https://twitter.com/johndoe",
        "linkedin": "https://linkedin.com/in/johndoe",
        "score": 95,
        "verification": {
          "date": "2025-10-17T00:00:00+02:00",
          "status": "valid"
        },
        "sources": [
          {
            "uri": "https://example.com/blog/article",
            "website_url": "example.com",
            "extracted_on": "2024-09-17T11:26:56+02:00",
            "last_seen_on": "2025-09-06T04:51:06+02:00",
            "still_on_page": true
          }
        ]
      }
    ]

Directory Structure Tree

Author Finder/
├── src/
│   ├── main.py
│   ├── services/
│   │   ├── author_finder.py
│   │   └── email_verifier.py
│   ├── utils/
│   │   ├── rate_limiter.py
│   │   └── validators.py
│   └── config/
│       └── settings.example.json
├── data/
│   ├── input_urls.sample.json
│   └── output.sample.json
├── requirements.txt
└── README.md

Use Cases

  • Content marketers use it to identify article authors, so they can build targeted outreach campaigns.
  • Journalists use it to verify author identities, so they can ensure accurate attribution.
  • Researchers use it to map authorship across publications, so they can analyze content trends.
  • Agencies use it to discover influencers, so they can create partnership opportunities.

FAQs

Q: What types of pages work best with this tool? A: Blog posts, articles, and pages with clear author bylines provide the most complete results.

Q: Can it process many URLs at once? A: Yes, bulk URL processing is supported with built-in rate control for stable performance.

Q: What happens if no author is found on a page? A: The result may be empty or partial if the page lacks clear authorship information.


Performance Benchmarks and Results

Primary Metric: Processes an average of 120–140 pages per minute under standard rate limits.

Reliability Metric: Maintains a success rate above 97% for reachable and valid URLs.

Efficiency Metric: Optimized batching minimizes idle time and reduces unnecessary requests.

Quality Metric: Delivers high data completeness with confidence scores to assess accuracy.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors