This script analyzes markdown files in your Astro blog and extracts the most interesting content using Claude Code CLI. It creates individual output files for each markdown page, containing the best extracts along with full URLs.
Uses Claude Code CLI for AI analysis with intelligent fallback to heuristics if Claude Code is unavailable.
Features:
- Uses Claude Code CLI for high-quality content analysis
- Fallback analysis using smart heuristics
- Creates individual files for each post
- Processes all markdown/MDX posts automatically
- Generates proper URLs for each extract
- Supports both
.mdand.mdxfiles
Prerequisites:
- Node.js installed
- Claude Code CLI installed and configured (optional - script has fallback)
Usage (run from repository root):
# Run the script on all files
node scripts/extract-content.js
# Test mode - process only the first markdown file for faster iteration
node scripts/extract-content.js --test-modeNote: All scripts must be run from the repository root directory, not from within the scripts/ directory.
The script creates:
extracted-content/
├── 2025-01-15--reliability-all-stick-no-carrot.txt
├── 2021-05-11--unusual-tips-to-keep-slack-from-becoming-a-nightmare.txt
├── 2022-07-19--shit-shield.txt
├── ...
└── all-extracts.txt # Master file with all extracts
Each file contains 2-4 extracts in this format:
People don't generally value reliability that much unless the site is down, or things are really bad.
https://www.rubick.com/reliability-all-stick-no-carrot/
Make the incident review meeting and post-mortem writeup so interesting and spicy, that everyone wants to see it.
https://www.rubick.com/reliability-all-stick-no-carrot/
We were lucky to be bailed out by Alice, but we acknowledge that it's a dangerous dependency.
https://www.rubick.com/reliability-all-stick-no-carrot/
- File Discovery: Recursively finds all
index.mdandindex.mdxfiles insrc/content/posts/ - Content Parsing: Extracts title and content from markdown/MDX files, removing front matter
- URL Generation: Creates URLs by removing date prefixes (
YYYY-MM-DD--) from directory names - AI Analysis: Uses AI to identify the most interesting 2-4 passages per post
- Output Generation: Creates individual text files in
extracted-content/directory
The AI analysis prioritizes:
- Actionable advice and practical tips
- Surprising insights and non-obvious observations
- Memorable quotes and principles
- Practical frameworks and models
- Key takeaways that provide real value
Edit the configuration variables at the top of scripts/extract-content.js:
const BASE_URL = 'https://www.rubick.com'; // Change base URL
const CONTENT_DIR = './src/content/posts'; // Astro content directory
const OUTPUT_DIR = './extracted-content'; // Change output directoryEdit the AI prompt in the analyzeChunkWithClaudeCode function to change what type of content gets extracted. Look for the prompt variable and modify the "Focus on:" section.
- If Claude Code CLI is not available, the script automatically falls back to heuristic analysis
- Make sure you have Claude Code installed: visit claude.ai/code for installation instructions
- Check that Claude Code is properly configured and authenticated
- The script will fall back to heuristic analysis if Claude Code fails
- Check that your markdown files have substantial content (>100 characters)
- Ensure front matter is properly formatted with
---delimiters
Make sure you're running the script from the repository root directory, not from within scripts/.
From the "reliability-all-stick-no-carrot" post:
Reliability work is like that - you're making a class of problems disappear, and they only seem important if they don't disappear.
https://www.rubick.com/reliability-all-stick-no-carrot/
Her approach is to make the incident review meeting and post-mortem writeup so interesting and spicy, that everyone wants to see it.
https://www.rubick.com/reliability-all-stick-no-carrot/
We were lucky that we have Alice, thank you Alice! What if Alice had been in the hospital?
https://www.rubick.com/reliability-all-stick-no-carrot/
This creates a curated collection of the most valuable insights from your entire blog, perfect for sharing, referencing, or creating social media content.
This script posts extracted content to LinkedIn as article shares, creating professional posts with URL preview cards. Designed for daily automated posting.
- LinkedIn account
- LinkedIn access token with
openid,profile,w_member_socialscopes - LinkedIn Developer App (for generating tokens)
- Single Daily Post: Always posts one item (perfect for cron jobs)
- Article Share Format: Creates rich URL previews automatically
- Token Expiration Handling: Clear guidance when 60-day token expires
- Dry Run Mode: Test without posting
- Duplicate Prevention: Tracks sent posts in
linkedin-sent.json
- Go to LinkedIn Developers
- Create a new app
- Add required products:
- Share on LinkedIn (provides
w_member_socialscope) - Sign In with LinkedIn using OpenID Connect (provides
openidandprofilescopes)
- Share on LinkedIn (provides
- Generate access token using the token generator script (see Token Management below)
export LINKEDIN_ACCESS_TOKEN="your_access_token_here"# Dry run - test without posting
node scripts/extract-to-linkedin.js --dry-run
# Post next item in queue (daily usage)
node scripts/extract-to-linkedin.js- Random Selection: Picks a random unsent item from extracted content
- LinkedIn API: Creates article share with your text as commentary
- URL Preview: LinkedIn automatically generates rich preview card
- Tracking: Marks item as sent in
linkedin-sent.json - Rate Limiting: Single post per run (ideal for daily scheduling)
Each LinkedIn post includes:
- Your extracted text as the main commentary
- Automatic URL preview card with title, description, and image
- Public visibility (connects with your professional network)
Example output:
People don't generally value reliability that much unless the site is down, or things are really bad.
[LinkedIn automatically shows preview card for: https://www.rubick.com/reliability-all-stick-no-carrot/]
# Add to crontab: post daily at 9 AM
0 9 * * * cd /path/to/blog && node scripts/extract-to-linkedin.js >> linkedin-posts.log 2>&1# Check what would be posted next (random selection)
node scripts/extract-to-linkedin.js --dry-run
# Post it
node scripts/extract-to-linkedin.jsextracted-content/
├── post1.txt # Source content files
├── post2.txt
└── all-extracts.txt # Ignored by script
linkedin-sent.json # Tracking file (auto-created)
linkedin-posts.log # Optional log file for cron
$ node scripts/extract-to-linkedin.js --dry-run
Starting LinkedIn posting in DRY RUN mode...
Found 45 extracted posts
12 already sent, 33 remaining
Processing: "People don't generally value reliability that much..." from 2025-01-15--reliability-all-stick-no-carrot.txt
[DRY RUN] Posting: "People don't generally value reliability that much..."
Would post to LinkedIn with URL: https://www.rubick.com/reliability-all-stick-no-carrot/
=== Summary ===
Dry run: Posted 1 update successfully
Tracking file: ./linkedin-sent.json
Total posts sent to date: 12
Remaining posts in queue: 32- LinkedIn access tokens expire after ~60 days
- Script provides clear error messages when tokens expire
- No automatic refresh available (LinkedIn limitation)
Use the included linkedin-get-token.js script to easily generate a new access token:
-
Get your app credentials:
- Go to LinkedIn Developer Apps
- Select your app → Auth tab
- Open a separate tab in Terminal
- Copy your Client ID and Primary Client Secret
- In Terminal: export LINKEDIN_CLIENT_ID="your_client_id"
- In Terminal: export LINKEDIN_CLIENT_SECRET="your_client_secret"
-
Add redirect URL (only first time):
- In the Auth tab, under "OAuth 2.0 Settings" → "Redirect URLs"
- Add:
http://localhost:3000/callback
-
Run the token generator:
- In Terminal:
node scripts/linkedin-get-token.js
- In Terminal:
-
Authorize in browser:
- The script will open your browser automatically
- Log in to LinkedIn and click "Allow"
- The script will display your new access token
-
Save the token:
export LINKEDIN_ACCESS_TOKEN="your_new_token"
- You may want this in a file or place it will be referred to.
- Add Postman redirect URL to your app:
https://oauth.pstmn.io/v1/callback - Use Postman's OAuth 2.0 Authorization Code flow
- Configure with your Client ID, Client Secret, and required scopes
- Generate and copy the access token
# Store token securely
echo 'export LINKEDIN_ACCESS_TOKEN="your_token"' >> ~/.bashrc
# Or use environment file (don't commit to git)
echo 'LINKEDIN_ACCESS_TOKEN=your_token' > .envLinkedIn API authentication failed. Token may be expired.
Solution: Generate new access token and update environment variable.
All posts have already been sent!
Solution: All content has been posted. Run content extraction to generate new posts.
LinkedIn API error 403: Insufficient permissions
Solution: Ensure your LinkedIn app has w_member_social scope enabled.
LinkedIn allows 150 requests/day per member. Single daily posts stay well within limits.
- Script posts your extracted content exactly as written
- Ensure extracted content is professionally appropriate
- LinkedIn article shares work best with insightful commentary
- Run daily at consistent times for best engagement
- Monitor LinkedIn analytics to optimize posting times
- Consider LinkedIn's professional audience when extracting content
- Never commit
linkedin-sent.jsonor tokens to version control - Rotate access tokens periodically
- Use environment variables, not hardcoded credentials
LinkedIn's API provides several advantages for professional content sharing:
- Professional Audience: Reach your professional network with relevant insights
- Rich Preview Cards: Automatic URL previews with title, description, and image
- Article Share Format: Perfect for blog content with commentary
- Daily Posting: Single post approach works well for professional content
- Native Integration: Posts appear natural in LinkedIn feed
This script posts extracted content to Bluesky, creating posts with automatic website card previews. Designed for automated social media posting with intelligent content selection.
- Bluesky account
- Bluesky App Password (recommended) or Access Token
- Node.js installed
- Random Content Selection: Posts a random unsent item (perfect for variety)
- Website Card Generation: Automatically fetches website metadata for rich previews
- Multiple Authentication Methods: App Password, Access Token, or Refresh Token
- Dry Run Mode: Test without posting
- Duplicate Prevention: Tracks sent posts in
bluesky-sent.json - Session Management: Handles token refresh automatically
-
Create App Password:
- Go to Bluesky Settings → Privacy and Security → App Passwords
- Generate a new app password (format:
xxxx-xxxx-xxxx-xxxx)
-
Set Environment Variables:
export BLUESKY_IDENTIFIER="your-handle.bsky.social"
export BLUESKY_APP_PASSWORD="xxxx-xxxx-xxxx-xxxx"export BLUESKY_ACCESS_TOKEN="your_access_token"
export BLUESKY_DID="your_did"
export BLUESKY_REFRESH_TOKEN="your_refresh_token" # Optionalexport BLUESKY_REFRESH_TOKEN="your_refresh_token"# Dry run - test without posting
node scripts/extract-to-bluesky.js --dry-run
# Post random item from queue
node scripts/extract-to-bluesky.js- Random Selection: Picks a random unsent item from extracted content
- Website Card Generation: Fetches metadata (title, description, image) from URLs
- Bluesky API: Creates post with embedded website card
- Tracking: Marks item as sent in
bluesky-sent.json - Session Management: Automatically refreshes tokens when needed
Each Bluesky post includes:
- Your extracted text as the main content
- Automatic website card with title, description, and preview image
- Proper URL handling and metadata extraction
Example output:
People don't generally value reliability that much unless the site is down, or things are really bad.
[Bluesky shows website card with title, description, and image from: https://www.rubick.com/reliability-all-stick-no-carrot/]
# Random post 3 times daily at 9 AM, 2 PM, and 7 PM
0 9,14,19 * * * cd /path/to/blog && node scripts/extract-to-bluesky.js >> bluesky-posts.log 2>&1# Check what would be posted next (random selection)
node scripts/extract-to-bluesky.js --dry-run
# Post it
node scripts/extract-to-bluesky.jsextracted-content/
├── post1.txt # Source content files
├── post2.txt
└── all-extracts.txt # Ignored by script
bluesky-sent.json # Tracking file (auto-created)
bluesky-posts.log # Optional log file for cron
- Simplicity: Easy to generate and use
- Security: Limited scope, can be revoked independently
- Persistence: Doesn't expire like access tokens
- Recommended: Official Bluesky recommended method
- Advanced Use: For complex API integrations
- Token Management: Requires handling refresh cycles
- Session Control: More granular control over sessions
- Script automatically refreshes expired tokens when refresh token is available
- Handles authentication errors gracefully with helpful messages
$ node scripts/extract-to-bluesky.js --dry-run
Starting Bluesky posting in DRY RUN mode...
Authenticating with Bluesky...
Authenticated as @yourusername.bsky.social
Found 45 extracted posts
12 already sent, 33 remaining
Processing: "People don't generally value reliability that much..." from 2025-01-15--reliability-all-stick-no-carrot.txt
[DRY RUN] Would post to Bluesky:
Text: "People don't generally value reliability that much unless the site is down, or things are really bad."
URL: https://www.rubick.com/reliability-all-stick-no-carrot/
Website card: "Reliability: All Stick, No Carrot - How to get recognition for preventing problems"
=== Summary ===
Dry run: Posted 1 update successfully
Tracking file: ./bluesky-sent.json
Total posts sent to date: 12
Remaining posts in queue: 32The script automatically generates rich website cards by:
- Metadata Extraction: Fetches title, description, and image from URLs
- Image Upload: Uploads website preview images to Bluesky
- Fallback Handling: Graceful degradation when metadata unavailable
- Caching: Efficient handling of repeated URL metadata requests
Bluesky API authentication failed. Token may be expired.
Solutions:
- Verify app password format:
xxxx-xxxx-xxxx-xxxx - Check that your Bluesky account is active
- Regenerate app password if needed
- Ensure environment variables are set correctly
All posts have already been sent!
Solution: All content has been posted. Run content extraction to generate new posts.
- Script continues posting even if website metadata fails
- Check internet connectivity for URL fetching
- Some websites may block automated metadata requests
Make sure you're running the script from the repository root directory, not from within scripts/.
- Random Selection: Creates more natural, varied posting pattern
- Quality Control: Ensure extracted content is appropriate for social media
- Frequency: Multiple daily posts work well due to random selection
- Environment Variables: Never commit credentials to version control
- App Password Rotation: Periodically regenerate app passwords
- Tracking File: Add
bluesky-sent.jsonto.gitignore
- Peak Hours: Schedule posts during high engagement times
- Content Mix: Random selection ensures variety in posted content
- Analytics: Monitor Bluesky analytics to optimize posting strategy
| Feature | Bluesky | |
|---|---|---|
| Selection Method | Random | Random |
| Content Style | Casual, varied | Professional |
| Posting Frequency | Multiple daily | Single daily |
| Authentication | App Password | OAuth Token |
| Website Cards | Auto-generated | Auto-generated |
| Audience | General social | Professional network |
Both scripts can run simultaneously, posting different content to different audiences with appropriate timing and style.