Master’s Thesis – Alex Schied

This repository contains all code and materials used for the empirical component of my master’s thesis, as well as the LaTeX files used to compile the final submission PDF.

The general workflow follows these steps:
raw data → data cleaning → feature generation → feature analysis → result visualization

For any questions regarding the code, feel free to contact me via email at: A.Schied@campus.lmu.de

Replication

To replicate the sentiment indexing, paste the shared data into the data folder and run plots_were_made_here.R.

To replicate the empirical analysis, please set up a Python environment with the required dependencies.
You can do this using Conda as follows:

Mac/ Linux

conda env create --name "schied_replication" --file "environment.yml"
conda activate schied_replication
python3 analysis_stepname.py

Windows

conda env create --name "schied_replication" --file "environment.yml"
conda activate schied_replication
python analysis_stepname.py

For steps involing an LLM, ollama needs to be available locally.

Pipeline Overview

The pipeline processes input data and manages the execution of analysis steps.

Input: Separate CSV files located in the input directory.
The pipeline automatically scans the input folder, loads all CSVs into memory using Polars (a faster alternative to Pandas), and tracks processed files by checking the output directory.
Output: Results are saved in the format
inputfile_intermediatestep_finalstep.parquet

Each pipeline run requires:

A worker class (defining the analysis logic)
A configuration dataclass (defining directories and parameters)

Worker Classes

All analysis steps are implemented as classes in workers.py.
Each worker class must contain a .run() method, which serves as the main entry point for the pipeline.

Requirements for worker classes:

.run() must accept only a Polars DataFrame (and self) as input.
Additional helper methods can be defined within the same class and accessed via self.
The .run() method name must remain unchanged for the pipeline manager to execute correctly.

Configuration Classes

Configuration classes are dataclasses that store all hardcoded parameters required by both the pipeline and worker classes — such as:

Input and output directory paths
File naming conventions
Analysis-specific constants or thresholds

Project Structure

masterthesis/
│
│
│
│
│
├── scripts/                
│   ├── notebooks/          ← jupyter notebooks
│   ├── R/                  ← R scripts for fixed effects model
│   ├── streamlit/          ← contains the browser app used for data labeling  
│   │   ├── app.py          ← main structure of app (start using "streamlit run app.py" make sure to "cd ../streamlit" first)
│   │   └── pages/          ← contains the webpages of the app 
│   │       └── ...               
│   ├── bt_analysis_ri.py       ← runs the analysis (Text cleaning, Keyword matching) pipeline on the raw inputs data (Bundestag Speeches SpeakGer)
│   ├── bt_analysis_if.py       ← runs the analysis (LLM, Bert and Dictionary based Sentiment scoring, Embeddings, Named Entity Recognition) on the preprocessed data (Relevant Cleaned speeches)
│   │
│   ├── analysis_raw_inter.py   ← runs the analysis (Text cleaning, Keyword matching) pipeline on the raw inputs data (Genios Articles)
│   ├── analysis_inter_final.py ← runs the analysis (LLM, Bert and Dictionary based Sentiment scoring, Embeddings, Named Entity Recognition) on the preprocessed data (Relevant Cleaned Articles)
│   ├── data/                   ← public data
│   ├── .../                ← all raw inputs (Genios, Bundestag speeches)
│   │   └── results/            ← contains analysis ready dataframes and analysis results
│   ├── src/                    ← main python code for analysis
│.  │   ├── config.py           ← config of parameters for classes in workers.py for Genios Wiso Data (Promts, Bert Models, ...)
│.  │   ├── config_bt.py        ← config of parameters for classes in workers.py for Bundestag Speeches
│   │   ├── workers.py          ← contains all NLP classes for feature generation (LLM scoring, Embeddings, ...) 
│   │   ├── analysis.py         ← contains all steps analyzing the created features (Index aggregation, Event Studies, Semantic Deduplication, ...)
│   │   ├── plots.py            ← contains classes to create the final visualizations of the findings
│   └── pipeline.py         ← contains the pipeline reading/ writing data to/ from classes (build to fit the VM specific requirements, like raw data in CSV form) 
│   └── setup/              ← shell scripts for model setup
│
├── tex/
│   ├── main.tex            ← the structure of the main pdf
│   ├── beamer.tex          ← the final presentation (unfinished)
│   ├── chapters/           ← individual chapters per section
│   ├── figures/            ← figures
│   ├── tables/             ← tables
│   └── static/             ← bibliography, pictures, preamble, beamer class file
│
├── environment.yml         ← relevant packages for replication
│
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master’s Thesis – Alex Schied

Replication

Mac/ Linux

Windows

Pipeline Overview

Worker Classes

Configuration Classes

Project Structure

newspaper_sentiment

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
tex		tex
.gitignore		.gitignore
.gitkeep		.gitkeep
README.md		README.md
environment.yml		environment.yml

alexschied/newspaper_sentiment

Folders and files

Latest commit

History

Repository files navigation

Master’s Thesis – Alex Schied

Replication

Mac/ Linux

Windows

Pipeline Overview

Worker Classes

Configuration Classes

Project Structure

newspaper_sentiment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages