Skip to content

The `post_slug` function modules for Python, Bash, Javascript, and PHP, converts any given line of text into a URL- or filename- friendly ASCII slug. Used mainly to generate consistent cross-language slugs for headlines, article titles, book titles, and for URLs and filenames generally.

License

Notifications You must be signed in to change notification settings

Open-Technology-Foundation/post_slug

Repository files navigation

post_slug

License: GPL v3 Python 3.10+ PHP 8.0+ Bash 5.1+ Node 12.2+

A consistent, cross-language slug generator for creating URL-safe and filename-safe strings from any text input.

🎯 Overview

post_slug converts any text into clean, readable slugs that are safe for URLs, filenames, and other contexts where only ASCII alphanumeric characters are allowed. With identical implementations in Python, JavaScript, PHP, and Bash, it ensures consistent output across your entire stack.

Key Features

  • 🌐 Cross-language consistency - Identical output across Python, JavaScript, PHP, and Bash
  • 🛡️ Security-focused - Input sanitization with 255-character limit to prevent DoS attacks
  • 🔧 Flexible configuration - Customizable separator, case preservation, and length limits
  • 📦 Zero dependencies - Uses only built-in language features
  • Fast and lightweight - Optimized for performance
  • 🧪 Thoroughly tested - Comprehensive test suite with cross-language validation

Quick Example

from post_slug import post_slug

# Convert a complex title to a clean slug
title = "The Ŝtřãņġę (Inner) Life! of the \"Outsider\""
slug = post_slug(title)
# Output: "the-strange-inner-life-of-the-outsider"

📦 Installation

Python

# Install from PyPI (coming soon)
pip install post-slug

# Or install from source
python -m pip install .

JavaScript/Node.js

# Install from npm (coming soon)
npm install post-slug

# Or use directly
const { post_slug } = require('./post_slug.js');

PHP

# Install via Composer (coming soon)
composer require open-technology-foundation/post-slug

# Or include directly
require_once 'post_slug.php';

Bash

# Source the function
source post_slug.bash

# Or add to your .bashrc
echo 'source /path/to/post_slug.bash' >> ~/.bashrc

🚀 Usage

Basic Usage

All implementations share the same API:

post_slug(input_str, [sep_char], [preserve_case], [max_len])

Parameters

Parameter Type Default Description
input_str string required The text to convert into a slug
sep_char string '-' Character to replace non-alphanumeric characters
preserve_case bool/int false/0 Whether to preserve the original case
max_len int 0 Maximum length (0 = no limit beyond 255 chars)

Language-Specific Examples

Python
from post_slug import post_slug

# Basic usage
slug = post_slug("Hello, World!")
print(slug)  # "hello-world"

# With underscore separator
slug = post_slug("Hello, World!", "_")
print(slug)  # "hello_world"

# Preserve case
slug = post_slug("Hello, World!", "-", True)
print(slug)  # "Hello-World"

# With max length
slug = post_slug("This is a very long title that needs truncation", "-", False, 20)
print(slug)  # "this-is-a-very-long"

# HTML entities are replaced
slug = post_slug("Barnes & Noble")
print(slug)  # "barnes-noble"

# Special characters and Unicode
slug = post_slug("Über die Universitäts-Philosophie — Schopenhauer, 1851")
print(slug)  # "uber-die-universitats-philosophie-schopenhauer-1851"
JavaScript
const { post_slug } = require('./post_slug.js');

// Basic usage
let slug = post_slug("Hello, World!");
console.log(slug);  // "hello-world"

// With underscore separator
slug = post_slug("Hello, World!", "_");
console.log(slug);  // "hello_world"

// Preserve case
slug = post_slug("Hello, World!", "-", true);
console.log(slug);  // "Hello-World"

// With max length
slug = post_slug("This is a very long title that needs truncation", "-", false, 20);
console.log(slug);  // "this-is-a-very-long"

// Works with modern JavaScript
const titles = [
    "The Great Gatsby",
    "Pride & Prejudice",
    "1984"
];
const slugs = titles.map(title => post_slug(title));
// ["the-great-gatsby", "pride-prejudice", "1984"]
PHP
<?php
require_once 'post_slug.php';

// Basic usage
$slug = post_slug("Hello, World!");
echo $slug;  // "hello-world"

// With underscore separator
$slug = post_slug("Hello, World!", "_");
echo $slug;  // "hello_world"

// Preserve case
$slug = post_slug("Hello, World!", "-", true);
echo $slug;  // "Hello-World"

// With max length
$slug = post_slug("This is a very long title that needs truncation", "-", false, 20);
echo $slug;  // "this-is-a-very-long"

// In a WordPress context
function my_custom_slug($title) {
    return post_slug($title, '-', false, 200);
}
add_filter('sanitize_title', 'my_custom_slug');
Bash
# Source the function
source post_slug.bash

# Basic usage
slug=$(post_slug "Hello, World!")
echo "$slug"  # "hello-world"

# With underscore separator
slug=$(post_slug "Hello, World!" "_")
echo "$slug"  # "hello_world"

# Preserve case
slug=$(post_slug "Hello, World!" "-" 1)
echo "$slug"  # "Hello-World"

# With max length
slug=$(post_slug "This is a very long title that needs truncation" "-" 0 20)
echo "$slug"  # "this-is-a-very-long"

# Batch processing files
for file in *.txt; do
    new_name="$(post_slug "${file%.txt}").txt"
    mv "$file" "$new_name"
done

🛠️ Advanced Features

Batch File Renaming

The included slug-files utility allows batch renaming of files:

# Rename all .txt files in a directory
./slug-files *.txt

# With custom separator and preserved case
./slug-files -s _ -p /path/to/files/*

# Dry run (no actual renaming)
./slug-files -n *.pdf

Command-Line Usage

Create convenient command-line aliases:

# Add to your shell configuration
ln -s /path/to/post_slug.bash /usr/local/bin/post_slug
ln -s /path/to/slug-files /usr/local/bin/slug-files

# Now use directly
post_slug "My Document Title!"
# Output: my-document-title

🔧 How It Works

The slug generation process follows these steps:

  1. Input validation - Truncates input to 255 characters for filesystem safety
  2. Character normalization - Applies language-specific transliteration fixes
  3. HTML entity removal - Replaces entities like &amp; with the separator
  4. ASCII transliteration - Converts Unicode to closest ASCII equivalents
  5. Quote removal - Strips quotes, apostrophes, and backticks
  6. Case conversion - Optionally converts to lowercase
  7. Character replacement - Replaces non-alphanumeric chars with separator
  8. Cleanup - Removes duplicate/leading/trailing separators
  9. Length truncation - Optionally truncates to specified length

Transliteration Details

Different languages use different transliteration methods:

  • Python/JavaScript: unicodedata.normalize('NFKD')
  • PHP/Bash: iconv('UTF-8', 'ASCII//TRANSLIT')

To ensure consistency, manual transliteration tables ("kludges") handle edge cases:

# Example kludges
'€''EUR'  # Euro symbol
'©''C'    # Copyright
'®''R'    # Registered trademark
'™''-TM'  # Trademark

🧪 Testing

Running Tests

# Run cross-language validation
cd unittests
./validate_slug_scripts datasets/headlines.txt

# Test with specific parameters
./validate_slug_scripts datasets/booktitles.txt 0 '-,_' '0,1'

# Quiet mode (errors only)
./validate_slug_scripts -q datasets/products.txt

Test Datasets

The package includes extensive test datasets:

  • headlines.txt - News headlines with special characters
  • booktitles.txt - Book titles with Unicode and punctuation
  • products.txt - Product names with symbols and numbers
  • edge_cases.txt - Boundary conditions and special cases

Unit Tests

# Python unit tests
python -m pytest unittests/test_post_slug.py

# Run all language tests
python unittests/test_post_slug.py

⚠️ Important Notes

Character Set Limitations

Some character sets cannot be transliterated to ASCII and will result in empty strings:

# Cyrillic text returns empty string
post_slug("Привет мир")  # ""

# Use with Latin-based alphabets for best results
post_slug("Café résumé")  # "cafe-resume"

Security Considerations

  • Input is automatically limited to 255 characters
  • All implementations include error handling
  • Safe for user-generated content
  • No external command execution (except Bash iconv)

Version Compatibility

Language Minimum Version Tested Version
Python 3.10 3.12
PHP 8.0 8.3
Bash 5.1 5.2
Node.js 12.2 20.x

🤝 Contributing

We welcome contributions! Please follow these guidelines:

  1. Consistency is key - Changes must be applied to all language implementations
  2. Test thoroughly - Run validate_slug_scripts to ensure cross-language compatibility
  3. Update kludge tables - Submit PRs for new transliteration cases
  4. Follow conventions - Match the coding style of each language

Development Setup

# Clone the repository
git clone https://github.com/Open-Technology-Foundation/post_slug.git
cd post_slug

# Run tests
cd unittests
./validate_slug_scripts datasets/headlines.txt

# Make changes and verify
# ... edit files ...
./validate_slug_scripts -q datasets/booktitles.txt

📄 License

This project is licensed under the GNU General Public License v3.0 or later - see the LICENSE file for details.

🙏 Acknowledgments

  • Inspired by various slug generation libraries across different languages
  • Test datasets compiled from real-world content
  • Special thanks to all contributors

📚 See Also


Repository: https://github.com/Open-Technology-Foundation/post_slug
Author: Gary Dean [email protected]
Version: 1.0.1

About

The `post_slug` function modules for Python, Bash, Javascript, and PHP, converts any given line of text into a URL- or filename- friendly ASCII slug. Used mainly to generate consistent cross-language slugs for headlines, article titles, book titles, and for URLs and filenames generally.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published