Skip to content

alexschied/newspaper_sentiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Master’s Thesis – Alex Schied

This repository contains all code and materials used for the empirical component of my master’s thesis, as well as the LaTeX files used to compile the final submission PDF.

The general workflow follows these steps:
raw data → data cleaning → feature generation → feature analysis → result visualization

For any questions regarding the code, feel free to contact me via email at: A.Schied@campus.lmu.de


Replication

To replicate the sentiment indexing, paste the shared data into the data folder and run plots_were_made_here.R.

To replicate the empirical analysis, please set up a Python environment with the required dependencies.
You can do this using Conda as follows:

Mac/ Linux

conda env create --name "schied_replication" --file "environment.yml"
conda activate schied_replication
python3 analysis_stepname.py

Windows

conda env create --name "schied_replication" --file "environment.yml"
conda activate schied_replication
python analysis_stepname.py

For steps involing an LLM, ollama needs to be available locally.


Pipeline Overview

The pipeline processes input data and manages the execution of analysis steps.

  • Input: Separate CSV files located in the input directory.
  • The pipeline automatically scans the input folder, loads all CSVs into memory using Polars (a faster alternative to Pandas), and tracks processed files by checking the output directory.
  • Output: Results are saved in the format
    inputfile_intermediatestep_finalstep.parquet

Each pipeline run requires:

  • A worker class (defining the analysis logic)
  • A configuration dataclass (defining directories and parameters)

Worker Classes

All analysis steps are implemented as classes in workers.py.
Each worker class must contain a .run() method, which serves as the main entry point for the pipeline.

Requirements for worker classes:

  • .run() must accept only a Polars DataFrame (and self) as input.
  • Additional helper methods can be defined within the same class and accessed via self.
  • The .run() method name must remain unchanged for the pipeline manager to execute correctly.

Configuration Classes

Configuration classes are dataclasses that store all hardcoded parameters required by both the pipeline and worker classes — such as:

  • Input and output directory paths
  • File naming conventions
  • Analysis-specific constants or thresholds

Project Structure

masterthesis/
│
│
│
│
│
├── scripts/                
│   ├── notebooks/          ← jupyter notebooks
│   ├── R/                  ← R scripts for fixed effects model
│   ├── streamlit/          ← contains the browser app used for data labeling  
│   │   ├── app.py          ← main structure of app (start using "streamlit run app.py" make sure to "cd ../streamlit" first)
│   │   └── pages/          ← contains the webpages of the app 
│   │       └── ...               
│   ├── bt_analysis_ri.py       ← runs the analysis (Text cleaning, Keyword matching) pipeline on the raw inputs data (Bundestag Speeches SpeakGer)
│   ├── bt_analysis_if.py       ← runs the analysis (LLM, Bert and Dictionary based Sentiment scoring, Embeddings, Named Entity Recognition) on the preprocessed data (Relevant Cleaned speeches)
│   │
│   ├── analysis_raw_inter.py   ← runs the analysis (Text cleaning, Keyword matching) pipeline on the raw inputs data (Genios Articles)
│   ├── analysis_inter_final.py ← runs the analysis (LLM, Bert and Dictionary based Sentiment scoring, Embeddings, Named Entity Recognition) on the preprocessed data (Relevant Cleaned Articles)
│   ├── data/                   ← public data
│   ├── .../                ← all raw inputs (Genios, Bundestag speeches)
│   │   └── results/            ← contains analysis ready dataframes and analysis results
│   ├── src/                    ← main python code for analysis
│.  │   ├── config.py           ← config of parameters for classes in workers.py for Genios Wiso Data (Promts, Bert Models, ...)
│.  │   ├── config_bt.py        ← config of parameters for classes in workers.py for Bundestag Speeches
│   │   ├── workers.py          ← contains all NLP classes for feature generation (LLM scoring, Embeddings, ...) 
│   │   ├── analysis.py         ← contains all steps analyzing the created features (Index aggregation, Event Studies, Semantic Deduplication, ...)
│   │   ├── plots.py            ← contains classes to create the final visualizations of the findings
│   └── pipeline.py         ← contains the pipeline reading/ writing data to/ from classes (build to fit the VM specific requirements, like raw data in CSV form) 
│   └── setup/              ← shell scripts for model setup
│
├── tex/
│   ├── main.tex            ← the structure of the main pdf
│   ├── beamer.tex          ← the final presentation (unfinished)
│   ├── chapters/           ← individual chapters per section
│   ├── figures/            ← figures
│   ├── tables/             ← tables
│   └── static/             ← bibliography, pictures, preamble, beamer class file
│
├── environment.yml         ← relevant packages for replication
│
└── README.md

newspaper_sentiment

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published