Show our technical findings #93

AseelOmer · 2025-07-23T13:32:20Z

AseelOmer
Jul 23, 2025
Maintainer

What did you notice?
How did your analysis supported the research question?
any observations?

AseelOmer · 2025-07-24T20:27:21Z

AseelOmer
Jul 24, 2025
Maintainer Author

From the readability analysis, real job posts had a balanced readability level—professional but still understandable, with moderate Flesch scores and grade levels. Human-written fake jobs, however, tended to be simpler and more emotionally persuasive, often using more accessible language. In contrast, AI-refined fake posts appeared smoother and sometimes even too polished, resulting in higher grade levels and slightly lower readability, which could be a sign of synthetic or overly optimized language.

From the n-gram analysis, real job posts frequently used concrete and role-specific phrases (e.g., “project management”, “customer service”), while fake posts—especially human-written ones—focused more on general and appealing phrases (like “great opportunity”, “flexible hours”). AI-generated fake posts often used more structured corporate terms and filler language (like “responsible for ensuring”, “strong communication skills”).

This dual analysis helps support the research question on detecting scams or synthetic job posts. Readability metrics reveal that AI and scam posts may deviate from natural, professional human writing—either too simple (scammy) or too smooth (AI). Meanwhile, n-gram patterns help highlight linguistic differences, especially overused buzzwords or templated phrasing, which are useful indicators of inauthentic posts.

1 reply

AseelOmer Jul 26, 2025
Maintainer Author

Fake job posts often use complex or broken language, especially in the requirements.
Real job posts are more readable and maintain clarity across sections, indicating professional authorship.

Elocodes · 2025-07-24T22:05:29Z

Elocodes
Jul 24, 2025
Maintainer

Analysis Summary

Overview

This post summarizes my analysis on detecting AI-generated fraudulent job postings. I'll cover our research question, my data strategy and model choices, key findings, and future directions, presented in an easy-to-digest "slide" format.

Slide 1: Our Core Research Question

"To what extent can humans and classifier models detect fraudulent job postings when the scam text is written by AI?"

This project aims to assess the capabilities of both humans (implicitly, for future work) and, primarily, machine learning models in identifying job scams where the text content is generated by Artificial Intelligence.

Slide 2: Data Strategy - Building the Dataset

To answer our question, I curated a balanced dataset:

Fraudulent Job Postings:
- Source: Generated by a Large Language Model (LLM), designed to mimic authentic job descriptions.
- Quantity: 500 unique fake job descriptions.
Real Job Postings:
- Source: Scraped from a legitimate online job board (Indeed.com).
- Quantity: 500 unique real job descriptions.
Preprocessing: Both sets underwent comprehensive cleaning, tokenization, stemming/lemmatization, and stop-word removal to prepare them for feature extraction.

Slide 3: Feature Engineering - What my Models "Saw"

I transformed the raw text into numerical features suitable for machine learning:

TF-IDF (Term Frequency-Inverse Document Frequency):
- Captures the importance of words in a document relative to the entire corpus.
- Forms the primary representation of the job description's content.
NER (Named Entity Recognition) Counts:
- Counts of specific entity types (e.g., ORG, LOC, PERSON, MONEY, DATE) within each description.
- Provides additional structured information about the content.
Combination: TF-IDF and NER features were horizontally stacked (X_train_final, X_test_final) to create a rich feature set for the classifiers.

Slide 4: Model Selection - Stress-Testing Classifiers

To thoroughly evaluate detection capabilities, I selected a diverse range of traditional machine learning classification models, commonly used and recognized in text classification literature:

Logistic Regression: A robust and interpretable linear baseline, excellent for high-dimensional, sparse data.
Linear Support Vector Classifier (LinearSVC): A powerful linear model, highly effective for text classification tasks.
Gradient Boosting Classifier: An ensemble method known for strong performance and ability to capture complex patterns.
XGBoost Classifier: A highly optimized and popular gradient boosting library, frequently top-performing in various ML competitions.

The goal was to "stress-test" these different algorithmic approaches on the engineered features.

Slide 5: Model Results - The Uncanny Accuracy

The performance of all trained models on the held-out test set was exceptionally high:

(A simple table showing Accuracy, Precision, Recall, F1-Score for all models)

Accuracy: 100.00%
Precision (Fake Jobs): 100.00%
Recall (Fake Jobs): 100.00%
F1-Score (Fake Jobs): 100.00%

This level of perfect performance is highly unusual in real-world machine learning tasks and prompted further investigation.

Slide 6: Interpreting the 100% Accuracy - The Role of Data Leakage

Further investigation revealed that the perfect accuracy is attributable to data leakage stemming from the distinct stylistic and linguistic patterns inherent in the two specific data sources.

LLM-Generated Fake Jobs: These descriptions displayed a clear "linguistic fingerprint" characterized by:
- High usage of generic, aspirational corporate buzzwords (e.g., 'unparalleled', 'strategic', 'highly skilled', 'transformative', 'comprehensive').
- Vague positive language aimed at sounding legitimate without specific detail.
Indeed-Scraped Real Jobs: These descriptions exhibited a different, more grounded linguistic signature with:
- More specific, concrete terms related to actual job duties, benefits, and workplace specifics (e.g., 'insurance', 'payroll', 'customer', 'employee', 'skill', 'required').

The classifier models became exceptionally proficient at distinguishing between "text generated by our specific LLM" and "text scraped from Indeed." This systematic difference acted as a perfect, albeit dataset-specific, signal for classification.

Slide 7: Key Insights & Answering the Research Question

These findings provide a significant answer to the "classifier model" aspect of our research question:

Classifier Models' Effectiveness: For the specific type of AI-generated scam text used in this study, classifier models (Logistic Regression, LinearSVC, Gradient Boosting, XGBoost) can achieve 100% accuracy in detection. This demonstrates their powerful capability to discern subtle stylistic cues.
Job Description is Paramount: This perfect classification, achieved solely by analyzing features derived from the job description text, strongly supports my hypothesis that the job description is the most critical component for fraud detection. Its inherent textual characteristics provide highly discriminative signals.
Implications for Human Detection: While we didn't directly test humans, these results raise a crucial question: If machines can perfectly identify these AI-generated linguistic signatures, to what extent are human applicants equally discerning, or might they be more susceptible to such AI-crafted deception?

Slide 8: Limitations and Future Work

Our current analysis, while conclusive for its specific scope, has limitations that inform future research:

Dataset Specificity: The 100% accuracy is tied to the unique stylistic differences between our specific LLM-generated data and our single source of real data. The model is highly effective at distinguishing these two sources, rather than universally recognizing "fakeness."
Limited Generalizability: The current model's performance might not generalize well to:
- Fraudulent job postings crafted by humans.
- AI-generated job postings from different LLMs or using different prompting strategies.
- Real job postings from diverse sources with varying writing styles.

Future Research Directions:

To enhance the model's robustness and generalizability:

Diversify Data Sources:
- Gather real job postings from a wider variety of platforms (e.g., LinkedIn, Glassdoor, company career pages).
- Incorporate AI-generated scam texts from different LLMs and with varied prompting strategies.
- Include a dataset of known human-crafted fraudulent job postings to broaden the definition of "fake."
Advanced Text Analysis: Explore more sophisticated linguistic features or deep learning models (e.g., Transformer-based architectures like BERT) if necessary, after dataset diversification.
Comparative Human Studies: Conduct experiments to directly assess human detection capabilities against AI-generated scam text, comparing their performance to the models.

Further Details

For a complete breakdown of the data acquisition, preprocessing steps, feature engineering, model implementation, and full results, please refer to the main analysis notebook:

Read the full analysis notebook here

0 replies

GeehanAli · 2025-07-25T02:57:00Z

GeehanAli
Jul 25, 2025
Maintainer

POS_Tagging_Analysis_Summary.pdf

0 replies

Rouaa93 · 2025-07-26T14:16:03Z

Rouaa93
Jul 26, 2025
Maintainer

What did you notice?
In my analysis of the emotional tone and urgency of the job postings, I noticed that human-written fake job posts often use emotionally charged or persuasive language, especially phrases that create urgency (e.g., “Apply now!”, “Don’t miss out!”, “Limited time opportunity”). This kind of emotional pressure is much less common in real job posts, which maintain a more neutral and professional tone. Meanwhile, AI-generated fake posts sometimes include subtle persuasive elements, but their tone is often overly polished or formal.
How did your analysis support the research question?
The emotional tone analysis helped support our research question by highlighting linguistic patterns and emotional cues that can distinguish between real and fake job posts. These features can be used as indicators of deception, especially when fake posts try to manipulate the reader emotionally. Emotional urgency is a known red flag in scam detection, and its presence in our fake samples—especially human-written ones—strengthens the idea that emotional tone can be a feature for automatic classification.
Any observations?
Yes, one key observation is that AI-generated posts tend to avoid overly emotional language, possibly due to their polished nature, while human-written fake posts often exaggerate benefits or urgency to grab attention. This makes emotional tone a more reliable signal for detecting human-crafted scams than AI-generated ones. However, the AI posts still occasionally show signs of over-optimization, which can also be flagged as suspicious.

0 replies

Alaa-Elgozouli · 2025-07-26T17:14:38Z

Alaa-Elgozouli
Jul 26, 2025
Maintainer

Most Frequent Words Analysis

Datasets used

17014 real jobs extracted from the Aegean dataset.
866 fake jobs extracted from the Aegean dataset.
The same 866 fake jobs that were extracted but here they're LLM-refined.

For analysis purposes, we assume that the all posts written in the Aegean dataset were human generated since it was collected between 2012-2014.

The Initial Hypotheses

Before testing the notebook, the hypotheses is that real and fake jobs posts use different words or maybe same words but with a significant counts difference, and the same theory applies to human-written and LLM-refined posts.

The main purpose of this analysis to check if there're words that are used more frequently in each section compared to the other. The most_frequent_words.ipynb checked the columns that are usually included in an overall job description, so the columns included in this analysis are description, requirements, and benefits.

Points Noticed

There isn't really a big difference between the words used in human-written real and fake posts! They tend to use very similar words with reasonable counts! (Since the data is already highly imbalanced with 95.14% being real jobs).
LLM-refined posts tend to use more ambiguous words that don't necessarily add anything to the information that should be covered for a role compared to human-written real and fake jobs posts, words such as "dynamic, visionary, unparalleled, exceptional, strategic".
From the graph all_sections_words_count.png we can see that LLM-refined jobs posts, compared to human-written fake jobs, do tend to mimic real jobs posts more often!
The original hypotheses on mind was that LLM-refined fake jobs posts will tend to mimic real jobs posts! While this is to an extent true, the LLM-refined posts tended to shift from both real and fake jobs, using words that neither real or fake jobs posts tend to use very often!
I used cosine similarity matrix to uncover the extent of similarities and differences in a mathematical order for each column on its own.

For description:

Fake and real jobs posts have 0.8962 similarity, fake and LLM-refined fake have 0.2668 similarity, and real and LLM-refined have 0.2824 similarity.

For requirements:

Fake and real jobs posts have 0.9532 similarity, fake and LLM-refined fake have 0.3899 similarity, and real and LLM-refined have 0.3993 similarity.

For benefits:

Fake and real jobs posts have 0.8807 similarity, fake and LLM-refined fake have 0.9894 similarity, and real and LLM-refined have 0.9026 similarity.

We can see that for the first two columns, real and fake posts are already similar and the LLM-refined version is significantly shifting from both, however, we also notice that LLM-refined does tend to mimic real posts, but only in a slight difference.

Significantly, for the last one, LLM-refined version has a higher similarity range with both real and fake, and this is mainly due to the fact that benefits section usually use the same words for all types of jobs regardless of the context.

Possible Errors

I'd say the words frequency does not necessarily add justice to the hypotheses, it does support the hypotheses but in a very slight manner. I'd alternatively suggest to shed more light to N-grams since they could tell mush more!

Departments and Salaries Comparison

Used same dataset as mentioned above, except for the LLM-refined version as there's no need for it in this analysis.

The Initial Hypotheses

Asked if maybe fake jobs tend to focus more on certain departments and industries compared to real jobs along with the salaries corresponding to those departments to test the theory saying "fake jobs promise the dream salary".

The main purpose of this notebook departments_salaries_comparison.ipynb is to check the features which employers coming from both real and fake jobs tend to feed to the LLM in the first place and see if there's an actual difference. The columns used in this notebook are titles, department, industry, and function since they all relate to the same thing, and salary range to check the salary corresponding to each cluster.

I applied clustering to all four columns because sometimes there're NaN values in the majority of them, and they also refer to the same concept, so it's better to cluster them into certain categories and optimal number of clusters.

Points Noticed

The data is highly imbalanced and this would affect the clustering, therefore, I randomly excluded 866 real jobs from the 17041 to match the number of the existing fake jobs.
Used hdbscan for the clustering process and realized that fake employers do not necessarily target specific industries, they're everywhere with all types of jobs.
As for salaries comparison, there're 866 fake jobs and 643 of them has salary range as NaN values, so I wouldn't say the comparison is highly justifying any section, whether real or fake, however, based on the graph used to compare, in almost all clusters that had salary ranges from both real and fake jobs, fake jobs did promise a greater salary range compared to real jobs!

Possible Errors

There were a lot of missing salary range values in fake posts. If one has time for it, then I'd suggest to use synthetic data to better mimic fake jobs and make the comparison fair enough.

Final Notes

Fake jobs do not tend to focus on certain departments compared to real jobs, they're everywhere, however, they do promise salaries that are greater compared to real jobs for the same roles/clusters.

0 replies

Show our technical findings #93

Uh oh!

Uh oh!

AseelOmer Jul 23, 2025 Maintainer

Replies: 5 comments · 1 reply

Uh oh!

Uh oh!

AseelOmer Jul 24, 2025 Maintainer Author

Uh oh!

AseelOmer Jul 26, 2025 Maintainer Author

Uh oh!

Uh oh!

Elocodes Jul 24, 2025 Maintainer

Analysis Summary

Overview

Slide 1: Our Core Research Question

Slide 2: Data Strategy - Building the Dataset

Slide 3: Feature Engineering - What my Models "Saw"

Slide 4: Model Selection - Stress-Testing Classifiers

Slide 5: Model Results - The Uncanny Accuracy

Slide 6: Interpreting the 100% Accuracy - The Role of Data Leakage

Slide 7: Key Insights & Answering the Research Question

Slide 8: Limitations and Future Work

Further Details

Uh oh!

GeehanAli Jul 25, 2025 Maintainer

Uh oh!

Rouaa93 Jul 26, 2025 Maintainer

Uh oh!

Uh oh!

Alaa-Elgozouli Jul 26, 2025 Maintainer

Most Frequent Words Analysis

The Initial Hypotheses

Points Noticed

Possible Errors

Departments and Salaries Comparison

The Initial Hypotheses

Points Noticed

Possible Errors

Final Notes

AseelOmer
Jul 23, 2025
Maintainer

Replies: 5 comments 1 reply

AseelOmer
Jul 24, 2025
Maintainer Author

AseelOmer Jul 26, 2025
Maintainer Author

Elocodes
Jul 24, 2025
Maintainer

GeehanAli
Jul 25, 2025
Maintainer

Rouaa93
Jul 26, 2025
Maintainer

Alaa-Elgozouli
Jul 26, 2025
Maintainer