An R-based exploration of PokĂ©mon names (English & Japanese) and their physical attributes. We use stringâpattern queries to characterize name structure, test for relationships between phonetic features and weight, and build regression models predicting weight from height and Attack power.
- Project Overview
- Features & Analyses
- Prerequisites
- Installation
- Usage
- Script Breakdown
- Key Findings
- Extending & Customizing
- Data Source & Citations
- License
We analyze the pokemon-advanced.csv dataset (1,008 Pokémon) to:
- Exercise 1: Load and inspect English and Japanese name fields.
- Exercise 2: Identify name patterns:
- Exactly two identical vowels in a row (e.g., âPikachuuâ).
- At least three consecutive consonants.
- Alternating consonantâvowel (or vowelâconsonant) sequences four times.
- Names starting with a vowel and ending with a consonant.
- Exercise 3: Hypothesize phonetic âheavinessâ associations.
- Exercise 4: Divide PokĂ©mon into two groups by whether their English name contains the âheavyâ consonants b, d, f, v, z, compare average weights with a t-test, and repeat for Japanese names.
- Exercise 6â8: Fit a simple linear regression of weight on height, then extend to a multiple regression adding Attack as a predictor; compare model fits.
-
Nameâpattern extraction via
stringr::str_detect:- Two identical vowels:
"(aa|ee|ii|oo|uu)" - Three+ consonants:
"[^aeiouAEIOU]{3,}" - Four+ alternating CâV or VâC:
"(?:[aeiou][^aeiou]){4,}|(?:[^aeiou][aeiou]){4,}" - Starts vowel, ends consonant:
"^[aeiou].*[^aeiou]$"
- Two identical vowels:
-
Group comparisons:
- With vs. without âheavyâ consonants in English names â t-test of weight.
- Same grouping for Japanese names.
-
Regression modeling:
- Model 1:
Weight ~ Height - Model 2:
Weight ~ Height + Attack - Compare RÂČ and predictor significance.
- Model 1:
All results (console output, plots if any) are produced by pokemon_name_physique_analysis.R.
- R (â„ 4.0)
- RStudio (optional)
- Internet access (to install any missing packages)
stringrggplot2(for any future plots)
The script auto-installs missing packages.
-
Clone this repository:
git clone https://github.com/yourusername/pokemon-name-physique.git cd pokemon-name-physique -
Ensure
pokemon-advanced.csvis placed in the project root.
Run the analysis script in R:
# From R or RStudio:
setwd("path/to/pokemon-name-physique")
source("pokemon_name_physique_analysis.R")The script will print:
- Lists of names matching each stringâpattern criterion.
- tâtest results for English and Japanese nameâbased weight comparisons.
- Regression summaries for Model 1 and Model 2.
library(stringr)
pokemon <- read.csv("pokemon-advanced.csv")
names <- pokemon$Name..English.
jpn <- pokemon$Name..Japanese.
# 2aâ2d: str_detect subsets for various patterns
# 4: group_by heavy consonants â t.test(weights_with, weights_without)
# 6â8: lm(Weight ~ Height), lm(Weight ~ Height + Attack)Each step prints results to the console.
-
Name patterns: Only a handful of PokĂ©mon have exactly two identical vowels in a row; several names exhibit long consonant clusters or vowelâconsonant alternations.
-
Weight comparisons:
- Englishâname heavyâconsonant group vs. others: p = ___
- Japaneseâname grouping: p = ___ (Insert the actual pâvalues printed by your script.)
-
HeightâWeight regression: Height is a significant predictor (ÎČ â 60 kg/m, RÂČ â 0.39).
-
Height+Attack model: Attack adds explanatory power (ÎRÂČ â 0.03), making the multiple regression a better fit.
- Additional name patterns: test for hyphens, numbers, or foreignâscript characters.
- Other physical predictors: include Defense, Speed, or captured date.
- Visualization: plot model diagnostics or nameâpattern frequency bar charts using
ggplot2.
-
pokemon-advanced.csv: extended Pokémon dataset with English & Japanese names, heights, weights, and stats. -
R Packages:
- Wickham H. (2023). stringr: Simple, Consistent Wrappers for Common String Operations.
- Wickham H. (2016). ggplot2: Elegant Graphics for Data Analysis.
This project is released under the MIT License. See LICENSE for details.